Part 1: Fixing Spatial Inconsistencies through Open Geocoders
Foursquare OS Places is a comprehensive dataset of real-world locations encompassing commercial establishments, architectural landmarks, transportation hubs, healthcare centers, and other points of interest across the global landscape. Our unique Places Engine leverages a combination of human and agentic workforces in a collaborative crowd-sourced system to build this dataset.
Each contributor, whether a human or an agent, employs a distinct process for collecting the various attributes associated with a place, resulting in varying levels of accuracy. While our consensus-based approach accounts for contributor reliability and helps eliminate many errors across place attributes, we have found that location-specific attributes—such as address, zipcode, and neighborhood—require an additional layer of validation. These spatial attributes cannot rely solely on consensus; they must ultimately be grounded in the physical realities of the world they represent. As we started to scale and refine this dataset, we encountered three persistent spatial challenges that required us to offset our robust crowd-sourced systems in transparent, validated spatial foundations.
The first of these challenges is a fundamental inconsistency in spatial attributes. It is not uncommon for the latitude and longitude of a venue to place it squarely within one zipcode, while the associated zipcode attribute suggests another. These inconsistencies arise due to the disparate methods used by different sources for populating these attributes. Within Foursquare’s Places Engine, these issues are amplified by the varied approaches of our human contributors. While their local knowledge is invaluable, they may rely on different map sources, outdated data, or personal interpretations of spatial boundaries. One contributor might assign a zipcode from signage or websites, while another uses mapping tools with different baselines. This variability creates a tangle of inconsistencies that consensus alone cannot fix. Resolving them requires transparent processes, clear data lineage, and open validation rooted in how space is truly organized.
The second challenge stems from the variability in how addresses are structured across locales. Unlike geographic coordinates, which adhere to global standards, address formats are inherently local. A street address in Tokyo is fundamentally different in structure from one in Paris or São Paulo. There is no single, proprietary schema that can adequately standardize these formats across the globe. In our system, human and agentic contributors often handle address data according to regional norms, which enhances coverage but complicates normalization. Addressing this issue calls for an adaptive model—one that can incorporate crowd-sourced intelligence to shape and refine address representations in line with local conventions.
The third, and perhaps most nuanced, challenge lies in micro-location—pinpointing the exact spatial context of a place within a larger structure. A rooftop centroid is no longer sufficient for a shopping mall or a shopping complex. Increasingly, we need to identify the correct entrance, the specific floor, or a particular suite within the building. Traditional geocoding solutions, which typically resolve to broad building-level coordinates, fall short here. Our collaborative system captures some of this detail through manual input, but scaling micro-location accuracy requires a fusion of high-resolution spatial data, intelligent inference algorithms, and continuous ground-level validation by contributors who interact with these spaces in real time.
Rebuilding the Geospatial Foundation: Tackling Spatial Inconsistency
In the first part of this blog series, we turn our attention to the first of these persistent issues: spatial inconsistency. When essential spatial attributes like geographic coordinates and postal codes do not align, the issue goes beyond mere disagreement among contributors—it underscores the need for a transparent and verifiable geospatial foundation.
About a decade ago, Foursquare built an open‑source geocoder called TwoFishes to address these challenges. Geocoders are software services that translate human‑readable addresses into geographic coordinates and vice versa, using foundational geospatial datasets like roads, buildings, addresses, and administrative boundaries. TwoFishes had provided the spatial backbone for our dataset for many years, but its static datasets and legacy algorithms have struggled to keep pace with a rapidly evolving geospatial landscape. Recognizing this, we initiated a thorough evaluation of modern geocoding technologies —considering both open‑source and commercial offerings—to find a solution aligned with our foundational principles of transparency, adaptability, and community‑driven improvement.
Evaluating Modern Geocoders: Closing the Gap
Candidate Selection
Since we are an open-source project backed by a community of Placemakers, we sought to adopt a geocoder that aligns with our values—namely, transparency, flexibility, and the ability to incorporate feedback from our contributors. Open-source solutions enable us to not only improve the accuracy of our data but also to close the loop by updating underlying spatial assets based on real-world feedback. However, alignment with our philosophy wasn’t enough—we needed evidence that open-source geocoders could match or exceed the performance of commercial alternatives. Our goal was to ensure that embracing openness wouldn’t come at the cost of quality or reliability.
So, we selected four geocoding providers spanning the full spectrum of market solutions based on two key dimensions: data sources (open vs. proprietary) and code accessibility (open-source vs. closed-source). We selected four candidates: two commercial providers with proprietary datasets and two providers built on top of Pelias: Geocode.earth (Pelias with open data) and Stadia Maps (fork of Pelias with open data). We centered the study on Pelias because its commitment to transparency, community stewardship, and long-term performance aligns directly with Foursquare’s own principles and spatial-data needs.
Establishing the Ground Truth
In the context of real-world locations, we recognize that perfect spatial truth is often elusive. So, we leveraged Foursquare’s global Placemaker network—human annotators who physically verify location details–to establish this reference dataset. For each test case, multiple trustworthy independent contributors (Placemakers level 7 and higher) reviewed and corroborated the place attributes, creating a consensus-driven ground truth. Our globally diverse ground truth dataset comprised close to 1,000 location records per country across 27 countries, forming a rigorous benchmark that spans structured North American grid systems, organic European street patterns, and complex Asian addressing conventions. Each record featured complete address information and independently verified geographic coordinates from multiple trusted sources, ensuring high fidelity and real-world relevance to our Places dataset. Through structured human consensus, we arrived at the most reliable approximation of place-based reality available at scale. Here is the distribution of the ground truth datasets by region:
Region | Country |
---|---|
Anglosphere | US, AU, CA, GB |
Latin America | BR, MX |
Europe | AT, BE, CZ, DE, ES, FI, FR, GR, HU, IT, NL, PT, SE, SK |
South East Asia | ID, MY, PH, TH |
East Asia | JP, KR, TW |
Evaluation Methodology
The evaluation encompassed two key functions: first, forward geocoding, which involves taking a physical address and determining its corresponding latitude and longitude; and second, reverse geocoding, which starts with a latitude and longitude and aims to identify the associated address components. For reverse geocoding, we focused on region, locality and postcode attributes. To gauge the effectiveness of these services, the evaluation centered on two primary dimensions: accuracy and coverage.
In case of ‘forward geocoding’, coverage is just the percentage of calls to the geocoding API that resulted in some latitude and longitude values, whereas in the ‘reverse geocoding’ case, it is measured at a per attribute level and represented the proportion of cases where that geocoder returned some value for that attribute.
For forward geocoding, a result (latitude & longitude) was deemed accurate if it fell within 100 meters of the ground truth; otherwise, it was classified as “inaccurate”. Reverse geocoding introduced additional complexity due to differences in localization (e.g., ‘Vienna’ vs. ‘Wien’), abbreviations (‘New York’ vs. ‘NY’), and inconsistent administrative schemas (e.g., ‘region’, ‘province’, ‘admin1’). To standardize comparisons, we implemented a multi-pronged approach: we allowed matches across abbreviated and full forms; leveraged the lang parameter to specify language where applicable; and mapped diverse boundary taxonomies into a unified schema (see table below).
FSQ Places | Commercial Provider 1 | Commercial Provider 2 | Geocode Earth (Pelias) | Stadia Maps (Pelias) |
---|---|---|---|---|
address | address | StreetNumber, StreetName | address | address |
locality | locality | City | locality | locality |
region | region, region_code | Province | region, region_a (macroregion in GB) | region, region_a (macroregion in GB) |
postcode | postcode | PostalCode | postalcode | postalcode |
neighborhood | neighborhoods | N/A | neighborhood | neighborhood |
country | country | CountryCode | country_code | country_code |
Evaluation Results & Key Insights
Our comprehensive evaluation revealed a fascinating paradox: the gap between open-source and commercial geocoding isn’t where conventional wisdom suggests it should be. While proprietary data was expected to dominate across all metrics, our findings tell a more nuanced story about where community-driven approaches actually excel—and where they present strategic enhancement opportunities. You can access the detailed results in the google sheet linked here.
- Open Source is Closing the Gap on Geocode Coverage Globally
Across all regions, open-source geocoders are now on par with commercial ones in terms of basic geocode coverage. This reflects the growing maturity of open datasets such as OpenStreetMap, which now offer broad and reliable global representation. This level of completeness enables open-source tools to support large-scale applications where fill rate is critical. Once considered hobbyist-grade, open geocoding has reached a level of readiness suitable for many production scenarios, especially where coverage outweighs the need for high-precision results. - Accuracy Gap Remains – and it’s Region Dependent
Despite coverage parity, open-source geocoders still trail on accuracy, but the extent of the gap varies by region and provider. In the US and Europe, the gap is about 17%, but in non-US Anglosphere countries it’s closer to 9%, showing decent open-source performance in English-speaking markets. However, in regions like East Asia and Latin America, the difference is more pronounced, sometimes exceeding 20%. This variation is closely tied to the quality and structure of local addressing systems and the depth of investment by commercial providers. These gaps should inform deployment decisions, especially for use cases where location accuracy is critical. - Reverse Geocoding Shows Nuanced Tradeoffs
Attribute-level performance in reverse geocoding reveals a complex pattern of tradeoffs. In the Anglosphere, for example, commercial providers show 5% higher coverage on locality, yet open-source systems outperform them on accuracy by 11%. In East Asia, open-source providers offer 12% higher accuracy on average, despite 15% lower coverage. For the region attribute, open-source geocoders often match or exceed commercial performance—by 10% in East Asia and 6% in Europe—at similar coverage levels. However, performance is inconsistent for more granular attributes like postcodes, where open-source tools tend to lag. - Commercial Geocoder Performance Varies Widely by Region and Provider
Commercial providers do not offer uniform performance across geographies, and in several regions, the differences between them are as stark as the gap with open source. In Latin America, for example, one commercial provider substantially outperforms the other—indicating that vendor-specific regional investments directly influence quality. A similar pattern emerges in Southeast Asia: one commercial provider leads open-source solutions by 13% in locality accuracy and 22% in coverage, while another trails by 37% in coverage and 15% in accuracy. These disparities show that commercial superiority cannot be assumed universally, and that localized performance needs to be evaluated per vendor, especially in markets where data quality is heterogeneous. - Postcodes Are the Achilles’ Heel of Open-Source Systems
Among all attributes, postal codes consistently underperform in open-source geocoding systems. In reverse geocoding, commercial providers lead open-source counterparts by 62% in postcode coverage in the Anglosphere, and by 50% in Europe, with accuracy advantages often accompanying that lead. These gaps stem from systemic challenges—postal datasets are frequently proprietary, fragmented, or encumbered by restrictive licenses, preventing their integration into open platforms.provider postcodes are more accurate than commercial ones in Latin America and Southeast Asia when data is available, suggesting that community-driven updates generally produce higher accuracy.
Improving Open Geocoding: Foursquare’s Opportunity & Roadmap
This evaluation started with a clear internal need: our legacy geocoder, TwoFishes, had become increasingly inadequate in resolving the spatial inconsistencies we face at scale—particularly around postcode precision and address normalization. While TwoFishes once served as a reliable backbone, its static data model and outdated architecture could not keep up with the complexity of today’s spatial requirements. This prompted a rigorous assessment of modern geocoders across commercial and open-source options. Open-source solutions like Pelias stood out for their transparency, adaptability, and alignment with our community-driven values. However, our findings showed that while open-source geocoders have largely closed the gap on coverage, they continue to trail commercial providers on accuracy—by up to 17% in geocoding and as much as 60% in postcode coverage depending on the region. These limitations, rather than deterring us, present an opportunity for Foursquare to contribute meaningfully to the open ecosystem by strengthening these tools with high-quality Places data and real-time contributor feedback.
Our immediate goal is to partner with one of the open geocoding providers to replace the aging TwoFishes system and begin resolving the spatial inconsistencies within our dataset. But our long-term vision is more ambitious: to leverage our global Placemaker network not just to identify inaccuracies, but to feed those corrections directly back into the underlying open datasets, thereby improving them at the source rather than building proprietary overlays. This approach aligns deeply with our belief in transparent, community-driven data infrastructure. As we advance this strategy, we’re excited about the role Foursquare is uniquely positioned to play—not only in building a more accurate and responsive spatial foundation for our OS Places platform, but also in elevating the fidelity of the open geocoding ecosystem at large.
Looking ahead, Part 2 of this series will explore the structural variability of global address formats and the normalization challenges they present across diverse regions. Part 3 will dive into the micro-location problem, examining how we plan to go beyond building-level resolution to capture entrances, suites, and floors. Together, these efforts form the core of our spatial engineering strategy: to build a system that doesn’t just interpret the world—but actively learns and improves from it in real time.