20 May 2012

How Unique Visitors Are Calculated in Google Analytics

The Visitor Metrics data in Google Analytics don't match up in various parts of the UI and API. So here's how how they are calculated (as explained by Nick Mihailovski from GA Team).

This metric is extremely powerful because it represents "reach" of a site, and gives you a true view of total visitors for most combinations of dimensions, across the date range.

Currently there are 2 calculations of Unique Visitors in Google Analytics and they depend on other dimensions present in the query:

If you query for Visitors with only time/date dimensions:

Each session has a timestamp of the first hit in the previous session. (utma cookie format = Domain-Hash.Visitor-Token.First-Visit-Start.Previous-Visit-Start.Current-Visit-Start.Visit-Count). As Google Analytics goes through all the sessions in the date range, it will increment Visitors if the previous timestamp is before the start of the date range. This works well because it requires no memory, so it's fast and how the overview reports are calculated. The only issue is if the browser time is off, the timestamps will be incorrect, leading to some bad data.
In Custom Reports this metric is called Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

If you query for Visitors with any other dimension, or include a filter of a non-time dimension:

Each session also has a Visitor ID. This ID is the same value for a Visitor for all their sessions. As Google Analytics processes each session, it stores each ID in memory once, then returns the total count. So while this method is a bit more reliable in calculating data, it requires memory and is a bit slower.

In Custom Reports this metric is called Unique Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

The reason why there are two calculations, is that Google Analytics wishes to provide fast user experience. The main overview report gets viewed many times, so to keep the experience fast, the timestamp method is used. In other custom reports, GA wishes the data to be as accurate as possible, so the Visitor ID approach is used.

6 comments:

  1. "Each session has a timestamp of the first hit in the previous session"

    -- This one is not much clear sir.. kindly put some light here with example :)

    ReplyDelete
    Replies
    1. Ravi,

      Let me elaborate the section "If you query for Visitors with only time/date dimensions" a little more.

      Unique Visitors are counted only once no matter how many times they visit the site during the selected reporting period.

      A Unique Visitor metric is only valid for its given set of dimensions e.g. time, browsers. For example a website may have 100 Unique Visitors on each day (day being the dimension) of a particular week. With only this data, one cannot extrapolate the number of weekly Unique Visitors (only that the Unique Visitor count for the week is between 100 and 700). A Unique Visitor is counted only once within the timescale.

      The calculation of Unique Visitors is very computationally intensive to calculate for you the true real (Absolute) Unique Visitor number across any arbitrary time period or across multiple weeks or months.

      Increased computational intensity for the Web Analytics Vendor means more processing time and higher costs. So doing Daily, Weekly and Monthly counts (and then summing them up) is cheaper for them. For the most Web Analytics Vendors in the space Google Analytics is one the rarest that provides the truly de-duped Absolute Unique Visitor metric (in aggregate, but not segmented). Only time will tell when Google will buckle under the computation/cost weight and stop providing it true Absolute Unique Visitors.

      To explain the calculation with a simplified example, let me take the analogy of a Hotel Room Occupancy.

      The way to picture the situation is by imagining a hotel. The hotel has two rooms (Room A and Room B).

      Day 1 --- Day 2 --- Day 3 --- Total
      Room A --- John --- John --- Mark 2 --- Unique Visitors
      Room B --- Mark --- Jane --- Jane 2 --- Unique Visitors
      Total --- 2 --- 2 --- 2 --- ?

      As the table shows, the hotel has two Unique Visitors each day over three days. The sum of the totals with respect to the days is therefore six.

      During the period each room has had two Unique Users. The sum of the totals with respect to the rooms is therefore four.

      Actually only three Visitors have been in the hotel over this period. The problem is that a Visitor who stays in a room for two nights will get counted twice if you count them once on each day, but is only counted once if you are looking at the total for the period. Any software for web analytics will sum these correctly for whatever time period, thus leading to the problem when a user tries to compare the totals.

      Thus, the Uniqueness of the Visitor needs to be computed on the fly depending on the time period selected and also depending on when did the first Visit of the Visitors happened.

      This problem is solved by the GA's Cookie, which stores the timestamp information about the a Visitor's first Visit. The Format of the Cookie is as:


      utma-cookie=Domain-Hash.Visitor-Token.First-Visit-Start.Previous-Visit-Start.Current-Visit-Start.Visit-Count

      Domain-Hash --> The first number is the domain hash. This is set by all cookies from this domain.

      Visitor-Token --> The second number is a random "Unique Visitor ID".

      First-Visit-Start --> The third number is the unix time stamp for the initial / first Visit and is set as soon as Visitor enters the site.

      Previous-Visit-Start --> The fourth number is the unix time stamp for the previous session (Visit).

      Current-Visit-Start --> The fifth nubmer is the unix time stamp for the current session. If you are on the first pageview of the site then all three numbers will be the same.

      Visit-Count --> This is the number of the session (Visit). So if this data has been written once before (since the last time you cleared your cookies) then this number will be 2.

      So, the "First-Visit-Start" is the most crucial for the on the fly calculation of the Uniqueness of a Visitor.

      Hope this clarification helps !

      Delete
  2. Whats the difference between absolute unique visitors and unique visitors ? GA shows both in its report

    ReplyDelete
  3. Unique Visitor v/s Absolute Unique Visitor

    The metrics themselves actually differ slightly in the way they measure Visitors (how they tabulate the individual cookies). So the numbers won’t match up exactly.

    In fact, Unique Visitors is actually more accurate than Absolute Unique Visitors. Unique Visitors will usually show a slightly greater number. Since Unique Visitors is the more accurate and flexible metric, Absolute Unique Visitors is being phased out (in GA) over time.

    The technical difference works like this:
    Google Analytics stores an identifier in each visitor’s __utma cookie that identifies them as unique. It’s created from a hash value, which means that different visitors can occasionally get the same value — this is known as a “hash collision”, see http://en.wikipedia.org/wiki/Collision_(computer_science).
    Absolute Unique Visitors just uses the hash values. Unique Visitors uses the hash value plus the user agent (browser identifier) to help correct for hash collisions. As a result, UV should be greater than or equal to AUV. I don’t usually see much difference, on the order of a couple percent, but UV is more accurate because of the hash collision correction.

    ReplyDelete
  4. We can say,' Absolute Unique Visitors' are first time visitors to the site, whilst 'Unique Visitors' are first time visitors to the site within the selected time-frame. Analytics tracks the activity of a visitor by recording their unique browser ID, which is provided via the _utma cookie.

    For example, if one user visited a website using two different web browsers, such as Firefox and Internet Explorer, Google Analytics would class this as 2 unique visitors. However, if two people using the same computer and browser visit the site, this is registered as one unique visitor.

    ReplyDelete
    Replies
    1. Manish,

      No, the metrics "Absolute Unique Visitors" and "Unique Visitors" does not mean "first time visitors to the site".

      Suppose you visit a site daily, since last few months. But, if I will look at the AUV or UV for say this week, you will be counted here (although you are not 1st time visitor withing this period).

      The difference between AUV & UV was the way the Visitor-ID was calculated using UTMA Cookie.

      But, you can safely forget the difference as GA has already phased out AUV.

      Delete