Center on Knowledge Graphs

ABOUT

ISI's Center on Knowledge Graphs research group combines artificial intelligence, the semantic web, and database integration techniques to solve complex information integration problems. We leverage general research techniques across information-intensive disciplines, including medical informatics, geospatial data integration and the social Web.

The CKG group currently is pursuing automatic discovery and semantic modeling of online information sources, interactive data integration, information mediators for bioinformatics, and learning from online social networks, among other techniques. Our work focuses on solving real-world problems to bridge research and relevant applications, such as integrating and linking cultural heritage data, analyzing social media data, and the integration of genetic, pathology and other data with clinical assessment to support clinical trials and medical research.

If you are interested in joining our group, please contact us

Our work has included development of:

PROJECTS

Current

Automating Data Science

Developing technology to automate the creation of machine learning pipelines to solve a wide variety of data driven modeling problems

Causal Reasoning

A novel knowledge organization system that integrates concepts of causality, factual knowledge and meta-reasoning

Commonsense Reasoning

A Mulit-modal Open World Grounded Learning and Inference project

Datamart

Creating the largest publicly available knowledge graph to power data-driven models in a wide variety of domains

Integrating Scientific Models

Model Integration through Knowledge-Rich Data and Process Composition

Karma

A data integration tool

Knowledge Graphs for Business

Creating a public resource containing knowledge about businesses, their products, and their patents as well as the relationships between them, such as customer, competitor or supplier.

Knowledge Graph Toolkit

Building a comprehensive library of tools

Linked Maps

Exploiting Context in Cartographic Evolutionary Documents to Extract and Build Linked Spatial-Temporal Datasets

Scoring Scientific Research

Developing automated techniques for evaluating scientific claims and assessing the confidence of their reproducibility and replicability

Semantic Modeling

Automatically building semantic descriptions of sources

Table Understanding

Extracting and Interpreting Time Series for Causal Discovery

Completed

Domain-Specific Insight Graphs

Technologies for building domain-specific knowledge graphs

Learning to Repair Failed Sensors

Automatically adapting to changes and failures in sensors

Learning to Predict Cyber Attacks

Predicting Cyber attacks by mining online sources.

AI for Crisis Response

Text-enabled Humanitarian Operations in Real-time

Identifying Threats in Space

Multi-Source Data Fusion for Space Situational Awareness

PEOPLE

DOWNLOADS

Software: Karma

Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs.

MapFinder: Harvesting maps on the Web

Maps are one of the most valuable documents for gathering geospatial information about a region. We use a Content Based Image Retrieval (CBIR) technique to built an accurate and scalable system, MapFinder, that can discover standalone images as well as images embedded within documents on the Web that are maps. The implementation provided here has the capabilities of extracting WaterFilling features from images, and classifying a given image as a map or nonmap. We also provide the data collected by us for our experiments.

ARX and Phoebus: Information Extraction from Unstructured and Ungrammatical Text on Web

The project presents two implementations for performing information extraction from unstructured, ungrammatical text on the Web such as classified ads, auction listings, and forum posting titles. The ARX system is an automatic approach to exploiting reference sets for this extraction. The Phoebus system presents a machine learning approach exploiting reference sets.

BSL: A system for learning blocking schemes

Record linkage is the problem of determining the matches between two data sources. However, as data sources become larger and larger, this task becomes difficult and expensive. To aid in this process, blocking is the efficient generation of candidate matches which can then be examined in detail later to determine whether or not they are true matches. So, blocking is a preprocessing step to make record linkage a more scalable process.The BSL system presented here does this in the supervised setting of record linkage. This means that given some training matches, it can discover rules (a blocking scheme) to efficiently generate candidate matches between the sets.

EIDOS: Efficiently Inducing Definitions for Online Sources

The Internet is full of information sources providing various types of data from weather forecasts to travel deals. These sources can be accessed via web-forms, Web Services or RSS feeds. In order to make automated use of these sources, one needs to first model them semantically. Writing semantic descriptions for web sources is both tedious and error prone.

Wrapper maintenance

Wrappers facilitate access to Web-based information sources by providing a uniform querying and data extraction capability. When wrapper stops working due to changed in the layout of web pages, our task is to automatically reinduce the wrapper. The data sets used for experiments in our JAIR 2003 paper contain web pages downloaded from two dozen sources over a period of a year.S

Tutorials

ISWC 2021 Tutorial: KGTK: Tools for Creating and Exploiting Large Knowledge Graphs

Knowledge Graphs (KGs) have become the de facto method for representing, sharing, and using knowledge, but exploiting KGs in AI applications is challenging for most researchers and developers, as it requires knowledge of a variety of approaches, tools, and formats. Our tutorial will showcase the Knowledge Graph Toolkit (KGTK), a comprehensive framework for creating and exploiting large KGs such as Wikidata. KGTK is designed for ease of use, scalability, and speed, and can process Wikidata-size KGs on a laptop. In the first half of the tutorial, we will introduce and experiment with a wide range of import, curation, transformation, analysis, and export commands, which can be flexibly chained into streaming pipelines through the command line. In the second half, we will show its applicability to three common and diverse KG use cases. This tutorial will introduce AI researchers and practitioners to effective tools for addressing a wide range of KG creation and exploitation use cases, and inform us on how to bring KGTK closer to its users.

AAAI 2021 Tutorial: Commonsense Knowledge Acquisition and Representation

The tutorial consists of four main components, each covered by one of the presenters, followed by a discussion session. We start by introducing theories on an axiomatization of commonsense knowledge. Next, we cover efforts to harmonize nodes and relations across heterogeneous commonsense sources, as well as the impact of such consolidation on downstream reasoning tasks. Thirdly, we discuss how commonsense knowledge can be automatically extracted from text, as well as quantitatively and qualitatively contextualized. Then, we discuss how large-scale models, such as BERT, GPT-2, and T5, learn to implicitly represent an abundance of commonsense knowledge from reading the web. Also, how this knowledge can be extracted through carefully-designed language prompting, or through fine-tuning on knowledge graph tuples. We conclude the tutorial with a discussion of the way forward, and propose to combine language models, knowledge graphs, and axiomatization in the next-generation commonsense reasoning techniques. Prior knowledge expected from participants is minimal. Some knowledge of machine learning and language modeling is helpful, but not compulsory: we introduce relevant machine learning concepts so that everyone has an opportunity to follow along.

ASONAM 2020 Tutorial: Knowledge Graphs: A Practical Introduction across Disciplines

Knowledge Graphs (KGs) like Wikidata, NELL and DBPedia have recently played instrumental roles in several machine learning applications, including search and information retrieval, natural language processing, and data mining. The simplest definition of a KG is as a directed, labeled multi-network. Yet, despite being ubiquitous in the communities mentioned above, KGs have not witnessed much research attention in the network science and social network communities. With the rapid rise in Web data, there are interesting opportunities to construct domain-specific knowledge graphs, including over social media data. This tutorial provides a detailed and rigorous introduction to KGs, and a synthesis of KG research and applications in multiple areas of computer science and AI, including e-commerce, social media analytics and biology.

ISWC 2020 Tutorial: Common Sense Knowledge Graphs (CSKGs)

Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths and weaknesses. Our tutorial will survey the most important commonsense knowledge resources, and introduce a new commonsense knowledge graph (CSKG) to integrate several existing resources. The tutorial will also introduce several tools to work with CSKG including query mechanisms, knowledge graph embeddings, and a framework to create a commonsense question answering systems. In a hands-on session, participants will use the framework and tools to build a question answering application using CSKG and language models.

Mining Knowledge Graphs from Text

Knowledge graphs have become an increasingly crucial component in machine intelligence systems, powering ubiquitous digital assistants and inspiring several large scale academic projects across the globe. Our tutorial explains why knowledge graphs are important, how knowledge graphs are constructed, and where new research opportunities exist for improving the state-of-the-art. In this tutorial, we cover the many sophisticated approaches that complete and correct knowledge graphs.

WWW 2018 Tutorial: Scalable Construction and Querying of Massive Knowledge Bases

In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community.

AAAI 2018 Tutorial: Knowledge Graph Construction from Web Corpora

Knowledge Graphs (KGs) like Wikidata, NELL and DBPedia have recently played instrumental roles in several machine learning applications, including search and information retrieval, information extraction, and data mining. Constructing knowledge graphs is a difficult problem typically studied for natural language documents. With the rapid rise in Web data, there are interesting opportunities to construct domain-specific knowledge graphs over corpora that have been crawled or acquired through techniques like focused crawling. In this tutorial, we survey the techniques for knowledge graph construction from domain-specific Web corpora.

KDD 2017 Tutorial: Data mining in unusual domains with information-rich knowledge graph construction, inference and search

The growth of the Web is a success story that has spurred much research in knowledge discovery and data mining. Data mining over Web domains that are unusual is an even harder problem. There are several factors that make a domain unusual. In particular, such domains have significant long tails and exhibit concept drift, and are characterized by high levels of heterogeneity. Notable examples of unusual Web domains include both illicit domains, such as human trafficking advertising, illegal weapons sales, counterfeit goods transactions, patent trolling and cyberattacks, and also non-illicit domains such as humanitarian and disaster relief. Data mining in such domains has the potential for widespread social impact, and is also very challenging technically. In this tutorial, we provide an overview, using demos, examples and case studies, of the research landscape for data mining in unusual domains, including recent work that has achieved state-of-the-art results in constructing knowledge graphs in a variety of unusual domains, followed by inference and search using both command line and graphical interfaces.

ISWC 2017 Tutorial: Constructing Domain-specific Knowledge Graphs (KGC)

The vast amounts of ontologically unstructured information on the Web, including semi-structured HTML, XML and JSON documents, natural language documents, tweets, blogs, markups, and even structured documents like CSV tables, all contain useful knowledge that can present a tremendous advantage to Semantic Web researchers if extracted robustly, efficiently and semi-automatically as an RDF knowledge graph. Domain-specific Knowledge Graph Construction (KGC) is an active research area that has recently witnessed impressive advances due to machine learning techniques like deep neural networks and word embeddings. This tutorial will synthesize and present KGC techniques, especially information extraction (IE) in a manner that is accessible to Semantic Web researchers. The presenters of the tutorial will use their experience as instructors and Semantic Web researchers, as well as lessons from actual IE implementations, to accomplish this purpose through visually intuitive and example-driven slides, accessible, high-level overviews of related work, instructor demos, and at least five IE participatory activities that attendees will be able to set up on their laptops.

Workshops

AAAI 2021

Commonsense Knowledge Graphs (CSKGs)

KDD 2020

Knowledge Graphs and E-commerce

ACM WWW 2018

BigNet 2018 (merged with Latent Semantics for the Web workshop)

ESWC 2018

Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies (DL4KGS)

ISWC 2017

Hybrid Statistical Semantic Understanding and Emerging Semantics