Pedro Szekely

Project Leader, Research Associate Professor

Ph.D. in Computer Science, CMU, 1987


Detailed information about my work:

I like working with students. More info here.

Research Interests

Nowadays we have access to lots of data to make decisions, but it is difficult to combine these data to act on them. The problem is that these data are scattered in different sources, in different formats and schemas, and with no metadata to describe their meaning and provenance. Data can be in databases, Excel spreadsheets, CSV, XML or JSON files, or is accessible only via a Web service or REST API. My research objective is to help the consumers of these data to easily clean, transform and combine data to do analysis, and to help providers publish their data with the appropriate metadata so it is more useful to consumers.

Our approach is based on two ideas: semantics and examples. When tools understand the meaning of data, they can more effectively help users combine it in a meaningful way. To this end, we are developing techniques to semi-automatically infer the semantics of the data from examples. Users then show the system using the sample data how to they want the data combined and processed, and the system infers a workflow that can be used in batch on large datasets (big data).

I am interested in technology and applications. Our information integration toolkit Karma, is open source software that you can download to solve your information integration problems. I also collaborate with multiple organizations to apply Karma to build interesting applications in multiple domains such as intelligence analysis, bioinformatics, cultural heritage and business intelligence.

Here is a video that illustrates how we use Karma to publish the data from the Smithsonian American Art Museum as Linked Open Data:


At ISI I work in Craig Knoblock's Information Integration Group, and I collaborate very closely with him on most projects. I collaborate with Jose Luis Ambite on information integration topics, with Gully Burns on bioinformatics data integration, with Yolanda Gil on provenance and workflows, with Yao-Yi Chiang on data mining and geospatial data integration, and with Rajiv Maheswaran and Yu-Han Chang in analysis of spatio-temporal data.

I am working with Rudi Studer, Andreas Harth and Steffen Stadtm├╝ller from KIT on combinging their Dat-Fu engine with Karma to support integration of dynamic data; with Freddy Priyatna in Oscar Corcho's group to use Karma in his work with Google Fusion tables; with Alex Viggio and other folks from the VIVO community to use Karma as a data ingestion tool for VIVO; with Rachel Allen from the Smithsonian American Art Museum and Eleanor Fink on our work to publish museum data to the Linked Open Data cloud; with Joan Cobb from the Getty on publication of the Getty vocabularies to the Linked Open Data cloud; with Miel Vander Sande and an enthusiastic group of USC undergraduates to adapt his wonderful everythingisconnected work to produce stories using the Smithsonian American Art Museum data.

I am always looking for new opportunities to collaborate, so please send me a note if you see any topics of mutual interest. Nowadays, I attend the Semantic Web conferences (ISWC and ESWC) and the Intelligent User Interfaces Conference (IUI), so look for me there.


In the past, I was conference chair for UIST and IUI, and I was IUI program co-chair in 2013. I regulary review for HCI, semantic web and AI conferences. I figure I should review at least as many papers as I send. I often have at least 2 coauthors, so things should balance out.

Lately, I became interested in promoting Semantic Web in Latinoamerica. In 2012 and 2013 I taught summer courses on Semantic Web in the Universidad de los Andes, my undergraduate college, and Pontificia Universidad Javeriana, both in Bogota, Colombia. Both times I had enthusiastic students and it was a pleasure to teach the course. I intend to go back every summer to teach this class (I would like to do it in Medellin in 2014, and I need an invitation, hint?). I am also working with a team from the Universidad de los Andes in Bogota and Universidad Simon Bolivar in Caracas on a bid to host the 2015 International Semantic Web Conference, yes ISWC, in Latinamerica.

There is a group of latinamerican Semantic Web researchers, scattered all around the world, but eager to work to promote Semantic Web technologies in latinamerica. Boris Villazon-Terrazas is doing the heavy lifting organizing the group, kudos to him, and if you can help, please email me or contact Boris.



Directed Research: there are many interesting projects that you can work on, including projects on semantic web-publishing, linked-data browsers, automatic data cleaning, record linkage, virtual museum, data integration and visualization of Twitter data.

PhD Students: I enjoy working with students. Let me know if you are interested in working with me.

Meetings: I will be on campus every Monday afternoon and most of the day on Wednesdays. If you want to schedule an appointment with me, do the following. 1) Take a look at my calendar to find free times. Note that I blocked off Monday afternoon and all day Wednesday to be on campus, so it shows busy, but check for the little blocks to see when I am really busy; 2) Send me email with suggested times. I like to meet with students at the Tutor Cafe right between Tutor Hall and EEB, right here.




Technologies for building domain-specific knowledge graphs


A data integration tool


Learning about art by building multimedia stories

Semantic Modeling

Automatically building semantic descriptions of sources


Extract information to find relationship among 500,000 private and public firms


Automatically adapting to changes and failures in sensors


Integrate data from multiple-sources, provide common operating picture, and issue alerts for large air operations


Predicting Cyber attacks by mining online sources

American Art Collaborative

Creating linked data for cultural heritage data from American art museums


Text-enabled Humanitarian Operations in Real-time


Data Preparation for Semantic Workflows (2011)

GAMBIT (2011)

Geospatial Analysis of Motion-Based Intelligence and Tracking

Enriching Sensor Data with Context Information (2011)

COMPASS (2011)

Plan Analysis using Stochastic Simulation

VizScript (2007-2010)

Visualization Tools for Understanding Complex Systems

VizPattern (2008-2010)

Visual Analytics Tools

Living Classroom (2010)

CSC (2005-2009)

Criticality-Sensitive Coodination

LANdroids (2009)

Distributed Control Algorithms

Commander's Coordinator (2007)

Human in the Loop Planning and Scheduling

SNAP (2000-2005)

Schedules Negotiated by Agent-based Planners

DEALMAKER (1998-2000)

End-User Programming of Rules to Select Purchase Contracts


Selected Publications

All Publications


