Batch Geocoding CSV Files

This guide covers how to use our command-line tool to geocode any CSV file.

The goal is to take values from an existing CSV file and append new geocoded columns which contain detailed geographical information about each row.

Preparing your CSV file #

Your input CSV must contain a header row on the first line with column names.

Please ensure the CSV file is valid before continuing.

Installation #

With a modern version of node installed, simply execute:

# note: some systems (such as Ubuntu) may require you use 'sudo'.
npm install -g @geocodeearth/ge

Authentication #

In order to authenticate, you’ll need a valid API key from Geocode Earth.

Use the environment variable GE_API_KEY to make the key available in your shell:

export GE_API_KEY=<YOUR API KEY>

You can check that it’s been set correctly with the env command.

Command Usage #

ge batch csv --help

ge batch csv <file>

append geocoded columns to a CSV file

Positionals:
  file  location of the input CSV file.                      [string] [required]

Options:
      --version      Show version number                               [boolean]
  -v, --verbose      enable verbose logging           [boolean] [default: false]
      --help         Show help                                         [boolean]
  -p, --param        Define a parameter.                                [string]
  -t, --template     Define a template.                                 [string]
      --endpoint     API endpoint to query.     [string] [default: "/v1/search"]
      --concurrency  Maximum queries per-second.           [number] [default: 5]
      --discovery    Maximum concurrency will be applied based on your plan
                     limits.                           [boolean] [default: true]
The <file> can be either a normal file or a stream, you can use /dev/stdin to accept data from a pipe.

Parameter templating #

Without configuration ge batch csv <file> alone will not yield results.

You’ll first need to define a mapping from the field names in your CSV file to HTTP request parameters which will be sent to Geocode Earth.

This can be achieved using a pair of flags:

  • -p to name the parameter
  • -t to define a template for the parameter value

For example the following will set the querystring parameter text to equal 1 Main Street, London.

* assuming your CSV file contains columns named number, street and city

ge batch csv \
  -p 'text' \
  -t '${row.number} ${row.street}, ${row.city}'

Templating is based on the lodash template engine, data from each row of your CSV file is available in the row variable.

You can add multiple pairs of parameters but take care to match each -p with a -t.

Be careful to use single-quotes ' instead of double-quotes " on the command-line to avoid your shell interpolating the string.

Structured Search Example #

The Structured Geocoding endpoint is perfect for tasks where you have all request information already split into individual columns.

See the Structured Geocoding documentation for a list of which parameters are available.

ge batch csv \
  --endpoint '/v1/search/structured' \
  -p 'address' -t '${row.NUMBER} ${row.STREET}' \
  -p 'locality' -t '${row.CITY}' \
  -p 'country' -t 'US' \
  example.csv

Reverse Example #

The Reverse Geocoding endpoint is suitable for tasks where you only have lat/lon co-ordinates and you’d like to discover what places are at/near that location.

See the Reverse Geocoding documentation for a list of which parameters are available.

ge batch csv \
  --endpoint '/v1/reverse' \
  -p 'point.lat' -t '${row.LAT}' \
  -p 'point.lon' -t '${row.LON}' \
  example.csv

Search Example #

The Search endpoint is for tasks where you’re not able to use the Structured Geocoding endpoint due to the data not being completely normalized.

You can set the text parameter with a single string which concatenates multiple fields, it will be parsed and interpreted before searching.

See the Search documentation for a list of which parameters are available.

ge batch csv \
  --endpoint '/v1/search' \
  -p 'text' -t '${row.NUMBER} ${row.STREET}, ${row.CITY}' \
  -p 'boundary.country' -t 'NZ' \
  example.csv

Estimating Running Time #

If your CSV file contains many rows it can take some time to complete. You can estimate the running time by dividing the row count by your plan concurrency limit.

For example, a 10,000 line CSV file on the Basic plan (10 QPS) will take ~17 minutes.

10,000 rows / 10 QPS / 60 seconds = 16.66 minutes

We can help with larger jobs #

If you have a job with 2 million+ rows then we can handle the process for you. Contact us for a quote.

Feedback & Feature Requests #

If you find a bug, or would like to request a feature, please open an issue on Github.