From Tables to Knowledge: Recent Advances in Table Understanding

KDD'21 Tutorial
August 14, 2021
usc-shield-name-white
A wealth of human knowledge is expressed in structured tables, across web pages, scientific articles, spreadsheets, and databases. This wealth of knowledge is mirrored by diversity in the vast number of layout structures, content types, formats, and surface forms used to express tables. Recent advances in representation learning and knowledge representation have made progress in exploiting structural regularities in tabular data to unlock this knowledge. In this tutorial, we provide a survey of these advances for a host of table understanding tasks, including table segmentation, semantic typing of cells, transforming tables to knowledge graphs, entity linking, and table retrieval tasks for question answering.

Tutorial Program

All times are tentative and given in Singapore Time (UTC +8).

Time

Subject

Presenter

Description

00:00 - 00:35

Introduction / Understanding Table Structures

(slides)

Jay Pujara

Tables found on the Web and other structured sources take diverse forms. Understanding cell-level features, functional blocks, and relational structure are an important task for modeling tabular data.
 

00:40 - 01:15

Semantic Understanding of Tables

(slides)

Pedro Szekely

Semantic models are the key to representing the knowledge in tables. Identifying entities, relationships, and meta-information in tables are important subproblems of semantic modeling.
 

01:15 - 01:30

Break

01:30 - 02:05

Representation Learning for Tables

(slides)

Huan Sun

Representation learning approaches have demonstrated great success in solving multiple table understanding tasks.
 

02:10 - 02:45

Bridging Tables and Language

(slides)

Muhao Chen

Downstream tasks such as question answering, table retrieval, and summarization demonstrate the power of recent work in table understanding.
 

02:45 - 03:00

Questions and Discussion

All

Time for asking questions, discussing your own research approaches, or learning more about practical aspects.
 

Background & Requirements

A wealth of human knowledge is expressed in structured tables, across web pages, scientific articles, spreadsheets, and databases. This wealth of knowledge is mirrored by diversity in the vast number of layout structures, content types, formats, and surface forms used to express tables. Recent advances in representation learning and knowledge representation have made progress in exploiting structural regularities in tabular data to unlock this knowledge. In this tutorial, we provide a survey of these advances for a host of table understanding tasks, including table segmentation, semantic typing of cells, transforming tables to knowledge graphs, entity linking, and table retrieval tasks for question answering.

The structure of the tutorial will include three major modules. The first will provide attendees an introduction to the seminal work in organization of data in tables, and cover the major goals and approaches of computational systems that undertake table understanding. The second module will cover specific models used for table understanding tasks, such as table discovery, table segmentation and layout detection, cell classification and semantic typing, mapping tables to knowledge graphs and linking to known entities, and table retrieval in search and question answering. The final tutorial module will provide a primer for researchers who want to get involved with the table understanding community, providing them a guide to the most commonly used benchmark datasets and models, downstream applications and evaluations, and a sketch of the open problems in table understanding.

The target audience of this tutorial are researchers in computer science and data science who are interested in receiving a high-level overview of the recent research undertaken in table understanding. We expect the audience to be broadly familiar with important concepts in data science, such as machine learning concepts of training, testing, multi-class prediction, and basic ideas from representation learning approaches. After the tutorial, the attendees will be abreast of the fundamental ideas in the field, and be abreast of the current research advances enabling them to dive deeper and contribute to table understanding research.

Bibliography:

Presenters