20 August 2019

An (almost) one line coarse geocoder with Docker

One of the advantage of the Pelias geocoder’s modular design is that we can use each component by itself accomplish specific tasks quickly.

Today we’ll take a look at setting up a coarse geocoder in just a few lines with the Pelias Placeholder service and Docker.

A coarse geocoder only supports cities, countries and other administrative areas, so it includes a lot less data than a full geocoder with addresses, streets, points of interest, and so on.

That makes it perfect for a quick demo.

The script

Here’s a small shell script that will download the required data and run a Docker image to give you a geocoder:

mkdir -p data/placeholder # you'll need about 5GB free

# this is a 1.8GB download, so make sure your connection is up to the task
curl https://data.geocode.earth/placeholder/store.sqlite3.gz | gunzip > data/placeholder/store.sqlite3

docker run -d -p 3000:3000 -v $(pwd)/data:/data pelias/placeholder

Now you can send queries to Placeholder and get back JSON. Assuming you have jq installed, the results will look like this:

curl -s localhost:3000/parser/search?text=london | jq .
[
  {
    "id": 101750367,
    "name": "London",
    "placetype": "locality",
    "population": 7556900,
    "lineage": [
      {
        "continent": {
          "id": 102191581,
          "name": "Europe",
          "languageDefaulted": true
        },
        "country": {
          "id": 85633159,
          "name": "United Kingdom",
          "abbr": "GBR",
          "languageDefaulted": true
        },
        "locality": {
          "id": 101750367,
          "name": "London",
          "languageDefaulted": true
        },
        "macroregion": {
          "id": 404227469,
          "name": "England",
          "languageDefaulted": true
        },
        "region": {
          "id": 1360698645,
          "name": "Greater London",
          "languageDefaulted": true
        }
      }
    ],
    "geom": {
      "area": 0.206467,
      "bbox": "-0.510375069372,51.2867601625,0.334015563634,51.6918741161",
      "lat": 51.509648,
      "lon": -0.099076
    },
    "languageDefaulted": true
  },
  {
     # and then a lot more Londons. Do you have any idea how many Londons there are?
  }
]

There’s also an autocomplete endpoint, accessible by passing mode=live:

 curl -s 'localhost:3000/parser/search?mode=live&text=barcelon' | jq .[].name
"Barcelona"
"Barcelona"
"Barcellona Pozzo di Gotto"
"Municipio de Barceloneta"
"Barceloneta"
# there are many more Barcelonas too

Finally, its worth pointing out that you can ask Placeholder to return results in many languages

curl -s 'localhost:3000/parser/search?text=beijing&lang=zho' | jq .[].name | head -2
"北京"
"北京市"

curl -s 'localhost:3000/parser/search?text=beijing&lang=fra' | jq .[].name | head -2
"Municipalité de Pékin"
"Pékin"

curl -s 'localhost:3000/parser/search?text=beijing&lang=kor' | jq .[].name | head -2
"베이징"
"베이징 시"

For lots more examples of the queries you can send, see the Placeholder usage documentation.

The details

We started with the script first to make for a good blog post, but it’s usually worth understanding how shell commands work before you run them! So here’s a piece-by-piece breakdown.

mkdir -p data/placeholder

Recursively make a directory with the structure the Placeholder server expects.

curl https://data.geocode.earth/placeholder/store.sqlite3.gz | gunzip > data/placeholder/store.sqlite3

This command downloads a compressed archive of data for the Placeholder service, and using gunzip, extracts it into the file for use by Placeholder. The output of gunzip is redirected (using >) to the location Placeholder expects.

docker run -d -p 3000:3000 -v $(pwd)/data:/data pelias/placeholder

Finally, this command downloads the Pelias Placeholder Docker image and runs it in a container. The -p flag is used to expose the standard HTTP interface, and -v $(pwd)/data:/data mounts the directory we just filled with data for use by the Placeholder server. Without further configuration, Placeholder expects to find a SQLite database in /data/placeholder, so this command ensures that expectation is met no matter where the data actually lives on your computer.

The -d flag tells Docker to run the container “detached” so you can reuse the same terminal window to send HTTP requests. You can learn more about that in the Docker documentation.

The data

The Placeholder service needs data in order to run. In particular, it expects data from the Who’s on First gazetteer. The Who’s on First project is an active, growing open-data project that was started along with Pelias at Mapzen. If you’re familiar with datasets like Quattroshapes or Geonames, you already have a good idea of what to expect from Who’s on First.

The Who’s on First project has a lot of great properties: stable identifiers, very high quality geometries, excellent structures for handling disputed territories, translations in multiple languages including official, preferred, and colloquial variants, and much more. Who’s on First includes data for administrative areas (cities, regions, countries, etc), postal codes, and POIs.

The history of Yugoslavia in GIF form
An illustration of the detailed, complex data Who's on First can store: the entire history of the former country of Yugoslavia. (More info)

For Placeholder in particular, we care about hierarchies (for example, that there is a city called Paris in a US state called Texas), and about names (for example, that Beijing, China has the translations – and many more – listed in the example above).

Placeholder includes tools for processing data from Who’s on First into an efficient, compact archive. At geocode.earth were regularly generate these archives, and are making them available for general use.

In the future, we’re looking at making much more data available, including both validated copies of existing open data, and datasets processed specifically for use by Pelias. We’ll likely do this with both free and paid offerings. If that sounds like something that would really help you, feel free to shoot us a line at hello@geocode.earth to be an early beta tester.

The caveats

As a reminder, this is a coarse geocoder. That means it only supports administrative areas such as neighbourhoods, cities, counties, regions (such as US states), and countries. If you’re familiar with the Foursquare TwoFishes geocoder, you can expect similar functionality from Placeholder. There aren’t any streets, addresses or POIs. Placeholder is just one component of a complete geocoder, and coarse geocoding is it’s specialty.

Similarly, because Placeholder is meant to be called by other services, the results are optimized for the data needed by those services, not human friendliness. If you’ve used Pelias before, you’re probably more familiar with the output from the Pelias API. The Placeholder responses contain much of the same information, but not quite in the same format.

The next steps

There’s a whole lot more to geocoding than just a coarse geocoder can provide.

Fortunately, it’s possible to set up a complete Pelias installation with Docker as well. Take a look at pelias/docker, our helpful set of scripts for setting up an entire geocoder.

Pelias supports forward and reverse geocoding, addresses and POIs (including address interpolation), support for multiple languages, as well as the ability to bring in your own data to geocode against.

The Pelias Docker scripts can be use to get started in just a few minutes for a single city, and can scale up all the way to a full planet if you have the time and hardware.

Finally, we’d be remiss if we didn’t mention that we offer a hosted instance of Pelias with high capacity, global coverage, frequent data updates, and even less setup than this example.

Photo credit: Great Exuma Island, Bahamas as seen from the International Space Station on July 19th, 2015.