WWW 2018 Tutorial

Abstract

In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community.

In the first half of the tutorial, we introduce data-driven methods on mining structured facts (\ie, entities and their relations for types of interest) from massive text corpora to construct knowledge bases, with a focus on methods that are minimally-supervised, domain-independent, and language-independent for timely knowledge base construction across various application domains (e.g., news, social media, biomedical, business).In the second half of the tutorial, we discuss the challenges of querying large-scale knowledge bases, and give a systematic discussion on several emerging \emph{schema-agnostic} querying paradigms for knowledge bases, including keyword query, graph query, natural language query (\ie, question answering), and query by example, which allow users to easily query knowledge bases without writing complex structured queries like SPARQL. We will also dedicate a session to a hands-on exercise that will take attendees through the process of creating and searching their own knowledge graphs using the Domain-specific Insight Graph (DIG) knowledge graph construction architecture.

Outline

Overview of Knowledge Base Construction and Querying [slides]
Domain-specific Knowledge Graph Construction [slides]
Schema-agnostic Knowledge Base Querying [slides]

Projects

Multi-tasking sequence labeling [project]

Learning with Heterogeneous Supervision [project]

Learning with Indirection Supervision [project]

Code & Data

Sequence Tagging: [LM-LSTM-CRF]

Phrase Mining: [AutoPhrase]

Entity Typing: [PLE] [AFET]

Relation Extraction: [ReHession] [ReQuest] [GloRE]

Co-extraction of Entities and Relations: [CoType]

Knowledge-based Question Answering: [GraphQuestions]

Schema-agnostic Graph Query on Knowledge Bases: [GRF]

Publications

Global Relation Embedding for Relation Extraction
Yu Su*, Honglei Liu*, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan.
Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2018.
[Github]
Indirect Supervision for Relation Extraction using Question-Answer Pairs
Ellen Wu, Xiang Ren, Frank Xu, Ji Li, Jiawei Han.
ACM International Conference on Web Search and Data Mining (WSDM), 2018.
[Project] [Github]
Empower Sequence Labeling with Task-Aware Neural Language Model
Liyuan Liu, Jingbo Shang, Xiang Ren, Frank Xu, Huan Gui, Jian Peng, Jiawei Han.
The AAAI Conference on Artificial Intelligence (AAAI), 2018.
[Github] [Project] [Documents]
Cross-domain Semantic Parsing via Paraphrasing
Yu Su, Xifeng Yan.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Recovering Question Answering Errors via Query Revision
Semih Yavuz, Izzeddin Gur, Yu Su, Xifeng Yan.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach
Liyuan Liu*, Xiang Ren*, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017. [Project] [Github]
CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji,
Tarek F. Abdelzaher, Jiawei Han.
International World-Wide Web Conference (WWW), 2017.
[Github] [slides] [arxiv]
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Xiang Ren*, Wenqi He*, Meng Qu, Heng Ji, Clare R. Voss, Jiawei Han.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016. [Github] [video]
On Generating Characteristic-rich Question Sets for QA Evaluation
Yu Su, Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gur, Zenghui Yan, Xifeng Yan.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
[Github]
Improving Semantic Parsing via Answer Type Inference
Semih Yavuz, Izzeddin Gur, Yu Su, Mudhakar Srivatsa, Xifeng Yan.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Exploiting Relevance Feedback in Knowledge Graph Search
Yu Su, Shengqi Yang, Huan Sun, Mudhakar Srivatsa, Sue Kase, Michelle Vanni, Xifeng Yan.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015.
[Data]
Querying Knowledge Graphs by Example Entity Tuples
Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, Ramez Elmasri.
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2015.
Schemaless and Structureless Graph Querying
Shengqi Yang, Yinghui Wu, Huan Sun, Xifeng Yan.
International Conference on Very Large Databases (VLDB), 2014.

WWW 2018

Scalable Construction and Querying of Massive Knowledge Bases

Abstract

Outline

Projects

Code & Data

Publications

PEOPLE