Our previous parser, on the other hand, used signals such as a comma to separate parts of an address, or street types (such as road, avenue, or way) to signal the end of a street address.
Assumptions such as “the word ‘avenue’ signifies the end of the street name are not only often incorrect (such as in France where street names like Avenue Aristide Briand are common), but means a street can’t be classified correctly until it’s been typed completely.
For the new Pelias Parser, we worked hard to eliminate reliance on complete input wherever possible. As a result, the Pelias Parser has better support for international address formats and, finally, after many years, means Pelias no longer requires commas before city or country names!
Quick, what place do you think of when you see the name
If you said the Canadian province Ontario, you are correct!
But, if you said the city of Ontario in California, you are also correct.
One job of an autocomplete interface is to present multiple possible options, and have the user select the one they want.
All previous parsers used by Pelias returned only a single interpretation of a given input. No matter how good of a job these parsers did, they were never telling the whole story.
The Pelias Parser was designed from the start to provide multiple interpretations of a given input, ordered with a confidence score.
Multiple parsing interpretations is a massive new superpower, one we’ve only just begun to explore. We have a lot of exiting work left to do to start using these interpretations elsewhere in our geocoder to return better results, and will likely be continuing to make improvements from this new capability for years.
In addition to running a hosted geocoding API, we also help teams and organizations run their own geocoders that meet their custom needs.
One of our very first clients, TriMet, needed street intersection support for their new trip planner, so we began investigating what it would take.
Street intersection support has been one of the goals of the Pelias geocoder almost since the beginning, but we quickly confirmed our fears that our existing parser simply could not understand intersections. The same code that erroneously assumed the word ‘street’ signaled the end of an address was even more confused when it would see the word ‘street’ twice.
As a result, when designing the Pelias Parser, a modular, reusable set of classification rules was one of the key considerations.
Essentially, we are able to define parsing logic for intersections as anything that matches a street, followed by a separator (such as ‘and’, ‘at’, etc), followed by another street. Then any improvements to our street parsing logic are automatically improvements to our street intersection parsing logic too!
A street intersection query is no use without a street intersection database. TriMet, thanks to their focus on the Portland metro area, was able to generate a dataset of intersections. One of our projects in the future will be to generate a suitable global street intersection dataset so we can bring this functionality to Geocode Earth users.
The new Pelias Parser has been in use on the Geocode Earth service since mid-October now, and is already showing significant improvements.
With its extensibility and configurability, we’re anticipating even more improvement to come as we continue to work with our users, partners, and community members.
Like its inputs, the Pelias Parser will always be incomplete. As with all of Pelias, we release our work as open source because we believe good software is built buy the contributions and feedback of a wide range of people.
If you’re interested in being involved, reach out to us or take a look at the Pelias Parser on GitHub.