Map Discovery  
  There are a huge number of high quality maps on the Internet that can be used to extract useful geospatial information about the region they describe. For example, by aligning satellite images with these maps, as shown in Figure 1, we can label the streets automatically. This procedure is even more useful for areas such as the Middle East, where we do not have detailed road vector data but there do exist maps for large parts of it.  
       
  conflation
Figure 1: This figure shows a map of Washington D.C. conflated with its satellite imagery
 
       
  But identifying these high quality maps from among the huge number of images that exist on the Internet is a difficult task. First, there is no meta-data to directly identify these maps. This makes it necessary to look at the content of the images to distinguish between the maps and other images. Second, many such high quality maps are embedded in other documents such as PDF reports, PowerPoint presentations, etc. Therefore, we have to analyze these documents as well to obtain those high quality maps.  
       
  Simply using Web-based image search engines does not solve the problem, as is evident from the snapshot of results page on Yahoo Image Search shown in Figure 2. For the query term 'Tehran maps', the first page returns only two maps among the top 21 matching images. This happens because image search engines only use the textual information surrounding the image, such as its name, title of the page it is displayed on, etc. to identify them.  
       
  Yahoo
Figure 2: A snapshot of search results on Yahoo Image Search
 
       
  Approach  
       
  We solve both the above mentioned problems through our approach described here. Figure 3 depicts the architecture of our system. We start by collecting images and documents of an area from the Internet. Then, we parse the documents to extract the images embedded in them. Having collected the candidate maps, we use a Content Based Image Retreival (CBIR) technique to classify each image as a map or nonmap based on a new set of features extracted from it.  
       
  Architecture
Figure 3: The architecture of our MapFinder system
 
       
  Content Based Image Retreival  
       
  Maps come in wide varieties and the graphical characteristics of each type of map varies significantly from the other types. In order to identify all kinds of maps accurately, we use a Content Based Image Retrieval technique to match each query image with a pre-labeled repository of images, and then classify it based on the most similar images found. Figure 4 shows how we find the most similar images using the CBIR technique. Given a new query image, the CBIR system matches it with all the images in the map and nonmap repository and then finds the most similar images.  
       
  CBI
Figure 4: The architecture of our MapFinder system
 
       
  Features  
       
  In order to compare images using the CBIR technique, we have introduced a new set of features that we believe are much more effective in capturing the characteristics of maps. These features operate on the individual edge sections in the Canny edge map of the original image. Figure 5 shows a sample map in color, its Canny edge map and an enlarged sub part to show the individual edge sections. We extract properties of each edge section such as its length, the number of branches in it, etc. Then, we find aggregate features for the entire image from these properties.  
       
  Example
Figure 5: A sample map, its Canny edge map and a sub part enlarged to show disjoint edge sections