20 May 2012

How Unique Visitors Are Calculated in Google Analytics

The Visitor Metrics data in Google Analytics don't match up in various parts of the UI and API. So here's how how they are calculated (as explained by Nick Mihailovski from GA Team).

This metric is extremely powerful because it represents "reach" of a site, and gives you a true view of total visitors for most combinations of dimensions, across the date range.

Currently there are 2 calculations of Unique Visitors in Google Analytics and they depend on other dimensions present in the query:

If you query for Visitors with only time/date dimensions:

Each session has a timestamp of the first hit in the previous session. (utma cookie format = Domain-Hash.Visitor-Token.First-Visit-Start.Previous-Visit-Start.Current-Visit-Start.Visit-Count). As Google Analytics goes through all the sessions in the date range, it will increment Visitors if the previous timestamp is before the start of the date range. This works well because it requires no memory, so it's fast and how the overview reports are calculated. The only issue is if the browser time is off, the timestamps will be incorrect, leading to some bad data.
In Custom Reports this metric is called Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

If you query for Visitors with any other dimension, or include a filter of a non-time dimension:

Each session also has a Visitor ID. This ID is the same value for a Visitor for all their sessions. As Google Analytics processes each session, it stores each ID in memory once, then returns the total count. So while this method is a bit more reliable in calculating data, it requires memory and is a bit slower.

In Custom Reports this metric is called Unique Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

The reason why there are two calculations, is that Google Analytics wishes to provide fast user experience. The main overview report gets viewed many times, so to keep the experience fast, the timestamp method is used. In other custom reports, GA wishes the data to be as accurate as possible, so the Visitor ID approach is used.