Data Sources

Geocode Earth strives to combine the best open datasets together to build the most useful geocoding tools.

Here we provide the details behind each of the datasets we currently support.

Attribution is required for many of data providers. License information is listed here, but you are responsible for researching each project to follow their license terms.

OpenAddresses #

OpenAddresses is a collection of over 600 million addresses around the world. Data in OpenAddresses only comes from national, state, and local governments, so this data is highly authoritative.

Because it consists of entirely bulk imports, OpenAddresses is a large, global, and rapidly growing dataset. Many countries, particularly in Europe, now have every address represented in OpenAddresses.

The license for each individual source within OpenAddresses differs. Many of the sources require attribution, and many others have a share-alike clause.

Note: Geocode Earth does not currently return license information directly, but the license and attribution requirements for each source within OpenAddresses can be determined from the machine-readable state.txt file published on the OpenAddresses website.

Who’s on First #

Who’s on First is an open-data directory of worldwide administrative places. Originally started at Mapzen, it is the primary provider of:

  • Countries
  • Macroregions (for example, England is a Macroregion within the United Kingdom)
  • Regions (for example, states, provinces)
  • Macro-counties (for example, Departments of France)
  • Counties
  • Localities (cities, towns, hamlets)
  • Neighbourhoods
  • Postal codes (such as ZIP codes in the United States)

Additionally, Geocode Earth uses Who’s on First to provide standardized fields for the country, region, locality, and neighbourhood across all responses.

Global IDs returned as part of Who’s on First records are stable, and can be relied on not to change over time.


OpenStreetMap #

OpenStreetMap is a community-driven, editable map of the world. It prioritizes local knowledge and individual contributions over bulk imports, which often means it has excellent coverage even in areas where no large-scale commercial mapping efforts have been attempted. OpenStreetMap contains high quality information on landmarks, buildings, roads, and natural features.

With its coverage of roads as well as rich metadata, OpenStreetMap is arguably the most valuable dataset used by Geocode Earth for general usage.

All OpenStreetMap data is licensed under the ODbL, a share-alike license which also requires attribution.

Geonames #

Geonames is an aggregation of many authoritative and non-authoritative datasets. It contains information on everything from country borders to airport names to geographical features.

While most data from Geonames has now been imported into Who’s on First and expanded upon, Geonames data is still a useful additional data source that fills in a few missing pieces. Since it’s been around for so long, Geonames IDs are practically a gold standard for building concordances between multiple geo datasets.

Geonames data is licensed CC-BY-4.0.

US Census TIGER/Line Geodatabases #

The TIGER dataset from the US Census is one of the most well known open datasets. It contains street geometries and address ranges for the entire United States.

We use TIGER data both for address interpolation and to provide additional ZIP code data under the uscensus source.