20 August 2019
An (almost) one line coarse geocoder with Docker
One of the advantage of the Pelias geocoder’s modular design is that we can use each component by itself accomplish specific tasks quickly.
Today we’ll take a look at setting up a coarse geocoder in just a few lines with the Pelias Placeholder service and Docker.
A coarse geocoder only supports cities, countries and other administrative areas, so it includes a lot less data than a full geocoder with addresses, streets, points of interest, and so on.
That makes it perfect for a quick demo.
Here’s a small shell script that will download the required data and run a Docker image to give you a geocoder:
Now you can send queries to Placeholder and get back JSON. Assuming you have jq installed, the results will look like this:
There’s also an autocomplete endpoint, accessible by passing
Finally, its worth pointing out that you can ask Placeholder to return results in many languages
For lots more examples of the queries you can send, see the Placeholder usage documentation.
We started with the script first to make for a good blog post, but it’s usually worth understanding how shell commands work before you run them! So here’s a piece-by-piece breakdown.
Recursively make a directory with the structure the Placeholder server expects.
This command downloads a compressed archive of data for the Placeholder service, and using
gunzip, extracts it into the file for use by Placeholder. The output of
gunzip is redirected (using
>) to the location Placeholder expects.
Finally, this command downloads the Pelias Placeholder Docker image and runs it in a container. The
-p flag is used to expose the standard HTTP interface, and
-v $(pwd)/data:/data mounts the directory we just filled with data for use by the Placeholder server. Without further configuration, Placeholder expects to find a SQLite database in
/data/placeholder, so this command ensures that expectation is met no matter where the data actually lives on your computer.
-d flag tells Docker to run the container “detached” so you can reuse the same terminal window to send HTTP requests. You can learn more about that in the Docker documentation.
The Placeholder service needs data in order to run. In particular, it expects data from the Who’s on First gazetteer. The Who’s on First project is an active, growing open-data project that was started along with Pelias at Mapzen. If you’re familiar with datasets like Quattroshapes or Geonames, you already have a good idea of what to expect from Who’s on First.
The Who’s on First project has a lot of great properties: stable identifiers, very high quality geometries, excellent structures for handling disputed territories, translations in multiple languages including official, preferred, and colloquial variants, and much more. Who’s on First includes data for administrative areas (cities, regions, countries, etc), postal codes, and POIs.
For Placeholder in particular, we care about hierarchies (for example, that there is a city called Paris in a US state called Texas), and about names (for example, that Beijing, China has the translations – and many more – listed in the example above).
Placeholder includes tools for processing data from Who’s on First into an efficient, compact archive. At geocode.earth were regularly generate these archives, and are making them available for general use.
In the future, we’re looking at making much more data available, including both validated copies of existing open data, and datasets processed specifically for use by Pelias. We’ll likely do this with both free and paid offerings. If that sounds like something that would really help you, feel free to shoot us a line at firstname.lastname@example.org to be an early beta tester.
As a reminder, this is a coarse geocoder. That means it only supports administrative areas such as neighbourhoods, cities, counties, regions (such as US states), and countries. If you’re familiar with the Foursquare TwoFishes geocoder, you can expect similar functionality from Placeholder. There aren’t any streets, addresses or POIs. Placeholder is just one component of a complete geocoder, and coarse geocoding is it’s specialty.
Similarly, because Placeholder is meant to be called by other services, the results are optimized for the data needed by those services, not human friendliness. If you’ve used Pelias before, you’re probably more familiar with the output from the Pelias API. The Placeholder responses contain much of the same information, but not quite in the same format.
The next steps
There’s a whole lot more to geocoding than just a coarse geocoder can provide.
Fortunately, it’s possible to set up a complete Pelias installation with Docker as well. Take a look at pelias/docker, our helpful set of scripts for setting up an entire geocoder.
Pelias supports forward and reverse geocoding, addresses and POIs (including address interpolation), support for multiple languages, as well as the ability to bring in your own data to geocode against.
The Pelias Docker scripts can be use to get started in just a few minutes for a single city, and can scale up all the way to a full planet if you have the time and hardware.
Finally, we’d be remiss if we didn’t mention that we offer a hosted instance of Pelias with high capacity, global coverage, frequent data updates, and even less setup than this example.
Photo credit: Great Exuma Island, Bahamas as seen from the International Space Station on July 19th, 2015.