Automating Data Science

Data Scientist in a Box



The Data Scientist in a Box (DSBox) project (DARPA D3M) is developing technology to automate the creation of machine learning pipelines to solve a wide variety of data driven modeling problems. Unlike conventional Auto-ML projects, which focus on tabular classification and regression problems, DSBox addresses graph and time series problems, accepting non tabular inputs including text, images, video and speech. The DSBox project developed state of the art primitives to featurize time-series and graph problems and uses a template-driven approach to auto-ML enabling users to guide and influence DSBox's high performance algorithm to effectively search all aspects of a machine learning pipeline. DSBox automates creation of pipelines including data preparation, featurization, feature and model selection and hyper-parameter tuning. In the recent DARPA evaluations, DSBox was the top-ranked D3M system in terms of the number of problem types solved, solving 78 out of 85 challenge datasets.




DSBox is a project in the DARPA D3M program, supported by United States Air Force under Contract No. FA8650-17-C-7715