In April 2016, a Kansas family from a remote farm in the town of Potwin (population 449) made headlines for a very peculiar reason. Having suffered years of mistaken identity related harrassment, abuse and bizarre claims, the root of this family’s unusual ordeal continued to go undetected. Branded “identity thieves, spammers, scammers and fraudsters”, confronted by FBI agents, federal marshals and IRS collectors, the strange occurrences continued to impact their lives in a very real way.
That is until Fusion’s Kashmir Hill finally uncovered the truth behind their torment, which came down to a simple internet mapping glitch. In this data error of epic proportions, a staggering 600 million IP addresses had been misallocated to the front door of the unassuming Arnold family’s farm house, with catastrophic results.
Just one example of the real world ramifications of misinformation, what lies at the heart of this story is not the danger of technical error, but a broader truth about the flaws of web analytics, urging us to question what we think we know about the information we receive.
The Flaws of Web Analytics
The data industry, in many ways, has failed the modern marketer. This may sound dramatic, but with every marketer around the world now placing their trust in data that is, to a large extent, unreliable, this presents a worrying reality that desperately needs our attention. What this comes down to is an over-reliance on analytics based on probabilistic identification. Geo-targeting based on IP address is the foundation of web analytics.
Every organization worldwide is reliant on IP geo-targeting to report the location of internet traffic and target ads. But its validity is highly questionable, as this story clearly shows. To understand why, it’s important to know how IP geo-targeting works.
First and most crucial, an IP address alone carries no geographical information. An IP address, linked to a cookie or deterministic ID is universally used as a proxy for a real world person and the country they live in. In order to give it a location, a manually-created database must be used to find a match.
These databases are available from a number of players (Maxmind and Digital Element are the market leaders), but they are all generally built in the same way. They can guess a location by looking at the latency data (time it takes for data to travel from a known IP location to unknown) and by making assumptions based on where the IP was distributed in the first place.
This is a human process, open not only to error, but commercial bias. It is also a task of mammoth proportions with billions of IP addresses in existence. Knowing this fundamentally undermines the idea of geographic location in analytics as fact, when in fact, it is far from it.
The ‘Digital Centre of the World’
When breaking this story, Kashmir Hill labelled the remote Potwin farm the ‘Digital Centre of the United States’. Based on our research, an even more apt description would be the ‘Digital Centre of the World’ as, without doubt, a very large proportion of these IP addresses relate to devices that exist outside of the U.S.
This is due to a number of factors which have made it impossible to accurately measure or understand online behaviors and location through passive measurement techniques alone. As previously outlined in our Missing Billion report, huge numbers of internet users have become invisible, absent as a result of:
#1. The amount of web traffic that automatically flows through servers in the US, as well as America’s disproportionately high number of IP addresses.
#2. The use of Private Browsers, Virtual Private Networks and Proxy Servers.
#3. Multi-device and mobile-only internet usage.
#4. High levels of device sharing.
By our estimates, over 500 million users are absent or misallocated from passively collected data because they are using Virtual Private Networks (VPNs), with hundreds of millions more being overlooked because they are only using shared devices or accessing the internet from mobile only.
Today, 30 percent of global internet users use a Virtual Private Network (VPN) monthly, whether to browse the internet anonymously or to gain access to content that is unavailable in their location. This equates to almost half a billion users, 110 million of whom use a VPN every single day.
Before I founded this company, I worked at a U.S. public company. However, despite sitting in London, all my internet traffic routed via the headquarters in New York, and consequently, I was always identified as a U.S. user. There are also a very large number of internet users getting online via Proxy Networks that route their internet traffic via a server in another geographic location without their knowledge.
The impact of this is difficult to size, but as a guideline, we surveyed Facebook users in China in 2013, asking them how they managed to gain access. Interestingly 25 percent said they could access ‘via a work connection’, all of which would be proxy-based.
The Cost of Misinformation
The failure of IP geo-targeting is having multi-billion dollar implications, and yet no one seems to have noticed.
With leading brands worldwide relying on inaccurate information to guide their marketing strategies, many have winded up chasing phantom audiences, investing heavily in the wrong markets. One reputable UK publisher, for example, decided to set up a New York office and allocate a large portion of its budget to marketing in the U.S. based on inaccurate insights.
This represents tens of millions of sunken cost, signed off on an incorrect analytics report. Similarly, the majority of online advertising is geo-targeted which means many messages are failing to reach their target audiences.
There is also a more profound impact – the digital revolution is not being shared equally among the world’s internet users. Today the vast majority of internet users reside outside mature markets of the U.S.A and Western Europe, yet the majority of ad spend, Venture Capital investment and the creation of content and services still happen there.
The US today makes up less than 8% of the global internet population, but still represents 38% of digital ad spend. The internet users of fast growth markets like Brazil, Indonesia and India are losing out, and primarily due to the face that IP geo-targeting makes them invisible.
The Future of Web Analytics
Fixing this problem is at the core of what we do at GlobalWebIndex, and we’re doing this by combining active data with passively derived analytics.
We maintain deterministic IDs on all of our panelists; 18 million real internet users who are geographically located with multiple opt-ins and language choices. This way, we can be 99.9% confident that their location is accurate.
This kind of detail is unprecedented and already bringing major gains to clients that have integrated it, both by enabling the monetization of the global audience, but also by being able to link any audience to actual web visits.
Our aim is to enable every company in the world to measure their analytics through this kind of data, so we are now scaling this panel using pioneering research techniques, incentives and survey distribution. We forecast that we will grow our panel to 20 million people by the end of 2017 with a target to reach 50 million people and beyond.
The future of analytics is real people. So to those in the industry who claim to have over 500 million cookies under management in the U.S., or that 60% of their audience is located in the US, perhaps it’s time to ask: How many of those come from a remote farm in Kansas?