Web Text-based Network Industry Classifications


Capturing Organizational Form, Competition, and Industry Change through Text-mining of Private and Public Firm Webpages

This project will analyze how organizations and industries change over time by building a large-scale database compiled from Internet web pages of over 1,000,000 private and public firms, and an analytical web-based tool to provide efficient access to this database. This large-scale database will contain information on products and services directly offered by public and private firms to their customers over the last 20 years, as well as links to the US Patent and Trademark Office’s patent data for the last 20 years.

To create this database, the project will develop a highly scalable approach for mining historical Web data to create comprehensive product-based databases and tools to query and analyze this data. This integrated database will be built using firm web pages from the Internet Archive Wayback Machine project. Using the text from these web pages, the resulting database will classify firms as competitors, which will be used to build new industry definitions. These new industry definitions will be based directly on the product and service descriptions firms use to interface with customers.

The resulting database and analysis tool will enable researchers to answer many questions that are difficult to answer today, including: How do industries, competition in industries and their products change over time? What are the dynamics of product introduction rates for private and public firms and how did private firms and their array of product offerings change during the recent financial crisis? What products are introduced by public and private firms following increases in patenting activity within an industry? Do the patenting firms introduce the new products or do non-patenting firms? Which government policy changes were most effective in stimulating the growth of entrepreneurial firms, and in what kind of markets did these policies work best? What local product market conditions are most conducive to successful entry by entrepreneurial firms and how do waves of innovation impact product market competition? What economic forces trigger firms to cross the boundary between public and private status?

In addition to impacting academics that study innovation and how industries change over time, business decision makers, consumers, and regulators will benefit from the new industry designations. Businesses can use the database and web based tool to assess existing market structure around new products. This will facilitate more informed decisions about where and when to commit scarce resources to enter new markets. By examining the nature of existing competition and market structure by both public and private firms, large and small, entrepreneurs can also better assess the likely success of their new ventures. Regulators including the Securities and Exchange Commission (SEC) and the Department of Justice (DOJ) can also benefit from refined knowledge of industry structure and product market boundaries.

The economic foundation for this project is based on Hoberg and Phillips (2016 JPE). Industry data from this paper, which is limited to roughly 5000 publicly traded firms per year only, is available at www.marshall.usc.edu. More information about TNIC can be found at http://hobergphillips.usc.edu/.The new WTNIC project described here will dynamically extend this database to include up to 1,000,000 public and private firms that have a web presence over the past 20 years.



This material is based on research sponsored in part by the National Science Foundation under Grant No. 1561057.