WWW 2018

Scalable Construction and Querying of Massive Knowledge Bases


In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community.

In the first half of the tutorial, we introduce data-driven methods on mining structured facts (\ie, entities and their relations for types of interest) from massive text corpora to construct knowledge bases, with a focus on methods that are minimally-supervised, domain-independent, and language-independent for timely knowledge base construction across various application domains (e.g., news, social media, biomedical, business).In the second half of the tutorial, we discuss the challenges of querying large-scale knowledge bases, and give a systematic discussion on several emerging \emph{schema-agnostic} querying paradigms for knowledge bases, including keyword query, graph query, natural language query (\ie, question answering), and query by example, which allow users to easily query knowledge bases without writing complex structured queries like SPARQL. We will also dedicate a session to a hands-on exercise that will take attendees through the process of creating and searching their own knowledge graphs using the Domain-specific Insight Graph (DIG) knowledge graph construction architecture.


  1. Overview of Knowledge Base Construction and Querying [slides]
  2. Domain-specific Knowledge Graph Construction [slides]
  3. Schema-agnostic Knowledge Base Querying [slides]


  • Multi-tasking sequence labeling [project]
  • Learning with Heterogeneous Supervision [project]
  • Learning with Indirection Supervision [project]
  • Code & Data

  • Sequence Tagging: [LM-LSTM-CRF]
  • Phrase Mining: [AutoPhrase]
  • Entity Typing: [PLE] [AFET]
  • Relation Extraction: [ReHession] [ReQuest] [GloRE]
  • Co-extraction of Entities and Relations: [CoType]
  • Knowledge-based Question Answering: [GraphQuestions]
  • Schema-agnostic Graph Query on Knowledge Bases: [GRF]
  • Publications