Unsupervised Data Integration

Automatic integration of the data for the City of Los Angeles


Organizations are awash in data. In many cases, they do not know what data exists within the organization and much information is not available when needed, or worse, information gets recreated from other sources. In this project, we are developing an automatic approach to spatio-temporal indexing of the datasets within an organization. The indexing process automatically identifies the spatial and temporal fields, normalizes and cleans those fields, and then loads them into a big data store where the information can be efficiently searched, queried, and analyzed. We evaluated our approach on 600 datasets published by the City of Los Angeles and show that we can automatically process their data and can efficiently access and analyze the indexed data.