You can now download DIG and run it on your laptop: dig-etl-engine.
DIG is a domain-specific indexing, search and analysis system. The DIG system harnesses state-of-the-art open source software combined with an open architecture and flexible set of APIs to facilitate the integration of a variety of extraction and analysis tools.
DIG builds on rich models of a domain that support fine-grained data collection, organization, and analysis. DIG builds a graph of the entities and relationships within a domain using scalable extraction and linking technologies. DIG also includes a faceted content search interface for users to query DIGs and visualize information on maps, timelines, and tables.
DIG is designed to be scalable by building on open-source cloud-based infrastructure (i.e., HDFS, Hadoop, Elastic Search, etc.), supports a diversity of source types, and is rapidly re-targetable to new domains of interest.
Popular Science published a very interesting article THE MAN WHO LIT THE DARK WEB: Data-mining tools are helping cops bust open online human trafficking that describes the history of the DARPA MEMEX program that funds our DIG project, and provides details on how DIG is being used by law enforcement agencies to combat human trafficking.
For the information on MEMEX you can checkout this website http://www.ee.columbia.edu/ln/dvmm/memex/index.html#About provided by Columbia University's Digital Video and Multimedia (DVMM) Lab.
This research is supported by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under contract number FA8750-14-C-0240.