Thoughts of DS: 2012

18 Nov 2012

Google Analytics Limits and the "(Other)" Bucket

Each Standard Report in Google Analytics (GA) is a pre-calculated on a daily basis called "Dimension value aggregates". Each pre-calculated report stores only 50,000 rows per day. The top 49,999 rows get actual values. The the last 50,000th row gets the value of "(other)" with the sum of all the remaining row values.

Its a “good problem to have” - per day Landing Pages more than 50,000 - wow !

In the above illustration (with one day data range), we are noticing the "(other)" bracket in Landing Pages Report because we are sending more than 50,000 Landings per day for this standard report. Generally this works fine. The totals are always correct. Also most people only view the top 100 results and don't jump to the 49,999 row. But, when I try to do a long tail analysis of Landing v/s Bounces with Estimated True Value, so as to arrive a list of Pages to improve, I get bottle-necked. The problem gets more aggravated when we try to select weeks of data range.

For multi-day reports a page that is grouped in the "(other)" category one day, may not necessarily be grouped in the "(other)" category another day. So when running a report for a multi-day date range, you may run into inconsistencies as some pages (or other dimension value) in the long-tail may be included in the “(other)” bucket or its own row across days.

Further, for multi-day standard reports, the maximum number of aggregated rows per day is 1M/D, where D is the number of days in the query. For example:
A report for the past 30 days would process 33,333 rows per day (e.g. 1,000,000/30).

A report for the past 60 days would process a maximum of 16,666 rows per day (e.g. 1,000,000/60).

Is there a way out to get around the "(Others)" bucket issue?

Yes, we can partially circumvent the "(Other)" bucket issue. I said partially, because we will be able to see data upto 250K Visits, after which the GA's Sampling algorithm kicks in.

We can create an advanced segment to match all sessions and apply that segment to a standard report. For example we can create an advanced segment for the dimension Visitor Type that matches the regular expression .* (this is NOT the same as applying the "All Visits" Segment).

Let us see the original report with this Advanced Segment applied.

Wow, it works !

In cases where the report query cannot be satisfied by existing aggregates (i.e. pre-aggregated tables), GA goes back to the raw session data to compute the requested information. This applies for reports with Advanced Segments too. Reports with advanced segments use the raw session and hit data to re-calculate the report on-the-fly.

Typically, advanced segments are used to include or exclude sessions from being processed. But when we create a segment to match all sessions, we end up only by-passing the pre-calculated reports and force the entire report to be re-calculated.

Few points to note: The numbers between pre-calculated and on-the-fly calculated reports may differ as each type of report has different limits. Pre-calculated reports only store 50k rows of data per day but process all sessions (visits).

Reports calculated on-they-fly can return up to 1 million rows of data, but the only process 250k sessions (visits). After the 250k visits, sampling kicks in. 250k sampling is default, which can be slided upto max 500k.

So this solution works best when we have less that 500k visits in our date range. (We can find the number of sessions in the date range by looking at the visits metrics in the traffic overview report).

References:
How Sampling Works in Google Analytics
http://blog.intrapromote.com/google-analytics-50000-row-limit/
https://plus.google.com/112976464453422312311/posts/FtFtkCCXkr3
https://plus.google.com/112976464453422312311/posts/BCwbdDsXwet

20 May 2012

How Unique Visitors Are Calculated in Google Analytics

The Visitor Metrics data in Google Analytics don't match up in various parts of the UI and API. So here's how how they are calculated (as explained by Nick Mihailovski from GA Team).

This metric is extremely powerful because it represents "reach" of a site, and gives you a true view of total visitors for most combinations of dimensions, across the date range.

Currently there are 2 calculations of Unique Visitors in Google Analytics and they depend on other dimensions present in the query:

If you query for Visitors with only time/date dimensions:

Each session has a timestamp of the first hit in the previous session. (utma cookie format = Domain-Hash.Visitor-Token.First-Visit-Start.Previous-Visit-Start.Current-Visit-Start.Visit-Count). As Google Analytics goes through all the sessions in the date range, it will increment Visitors if the previous timestamp is before the start of the date range. This works well because it requires no memory, so it's fast and how the overview reports are calculated. The only issue is if the browser time is off, the timestamps will be incorrect, leading to some bad data.
In Custom Reports this metric is called Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

If you query for Visitors with any other dimension, or include a filter of a non-time dimension:

Each session also has a Visitor ID. This ID is the same value for a Visitor for all their sessions. As Google Analytics processes each session, it stores each ID in memory once, then returns the total count. So while this method is a bit more reliable in calculating data, it requires memory and is a bit slower.

In Custom Reports this metric is called Unique Visitors.

In the GA API both calculations are mapped to ga:visitor and one is picked depending on the dimensions selected.

The reason why there are two calculations, is that Google Analytics wishes to provide fast user experience. The main overview report gets viewed many times, so to keep the experience fast, the timestamp method is used. In other custom reports, GA wishes the data to be as accurate as possible, so the Visitor ID approach is used.

22 Apr 2012

Should I Check eMail ?

I stumbled upon an interesting post Managing Distraction: How and Why to Ignore Your Inbox. The whole article is interesting to read, but the Graphics says it all in short.

15 Apr 2012

Most Effective SEO Tactics - Content is the King

In order of effectiveness, the most important SEO Tactics adopted by Search Engine Marketers are as:

Content Creation - "Content is King"
Keyword and Key-Phrase Research
Title Tags
SEO Landing Pages
External Link Building
URL Structure
Blogging
Meta Description Tags
Digital Asset Optimization (images, videos, podcast, webinars, PDFs etc)
Social Media Integration
XML Sitemap
Internal Linking
Competitor Benchmarking

According to the 2012 MarketingSherpa Search Engine Marketing Report, of all SEO tactics available, “content creation works the best, but takes the most work,” says Kaci Bower, Research Analyst, MECLABS.
I always say that, if you just consistently focus on the Content and even forget everything else, your site is sure to be a winner.

Here is a Graphics from MarketingSherpa, which gives an idea of efforts v/s effectiveness of SEO Tactics.

11 Mar 2012

Understanding Google Crawling & Indexing

Pierre Far (Webmaster Trends Analyst at Google) spoke on "Understanding Google Crawling & Indexing" at Think Visibility SEO conference at Alea Casino (Leeds) on 3rd March 2012.

I have tried to sum up the points he touched in his presentation (collected from various Blogs and tweets). Plus, I have added my own interpretation.

Google gets URLs by crawling, links, site maps and the add URL feature.

There are always more URLs than Google can fetch, so they try to get as many as possible without destroying your website. To do this they use a relaxed crawl rate.

Google increase the URL crawl rate slowly and see if response time goes up. If your site can’t handle the crawler they will not crawl much of your site.

Google checks Robots.txt only about once per day to help keep the load off your server? Having a +1 button on your site can override robots.txt? Both these points are interesting to me.

Google sets a conservative crawl rate per server. So too many domains or URLs will reduce crawl rate per URL. If you use shared hosting, then this could easily be problematic for you. If you do not know how many other websites are on the same IP-Address as you, then you may be surprised. You can easily check this by putting your domain or IP-Address into Majestic’s neighborhood checker to see how many other websites are hosted on the same IP-Address. If one shared site on the same IP has large number of URL and it is not yours, then you could be losing crawl opportunities, just because there’s a big site that isn’t connected to you in any way on the same IP. You can’t really go complaining to Google about this. You bought the cheap hosting, and this is one of the sacrifices you made.

Google crawl more pages than those in your sitemap but it does help them decide which pages are more popular.

If a CMS has huge duplication, Google then knows, and this is how it notifies you of duplicates on GWMT. This is interesting because it is more efficient to realize a site has duplicate URLs at this point than after Google has had to analyze all the data and deduplicae on your behalf. Google then picks URLs in a chosen order. One important to choose one page in comparision to other is Change Rate of page content.

Googlebot can be blocked from accessing your server, so you need to make sure your hosts have no issues or they will think your site is down. Biggest and smallest ISPs can block Googlebot at the ISP level. Because ISPs need to protect their bandwidth, the fact that you want Google to visit your site does not necessarily mean it will be so. Firewalls at the ISP may block bots even before they see your home page. They may (more likely) start throttling bits. So if your pages are taking a long time to get indexed, this may be a factor.

Strong recommendation – set up email notifications in Web Master Tool. Setup email forwarding on webmaster tools as a priority – this is very important so you don’t miss any error messages.

Make sure your 404 page delivers a 404 status – or it will get indexed which happens a lot. Soft error pages create an issue and so Google tries hard to detect those. If they can’t, they end up crawling the soft error as a crawl slot (at the expense of another URL crawl, maybe). So if you don’t know what a soft error is, it is when an error page returns a 200 response instead of a 404 response. You can use Firefox add-on Live http header to check this.

Google has to pick the best URL and title for your content. They can change it to better match the query. They then generate a snippet and site links. Changing them improves the CTR. It’s as if you are writing a different title for each query.

If server spikes with 500 errors, Googlebot backs off. Also, firewalls etc can block the bot. This can after a few days, create a state in Google, that says the site is dead. If Googlebot gets 503 error on robots.txt they stop crawling. Be careful, if only some part of your site is offline, do not to serve a 503 on robots.txt.

Googlebot is getting better and better at seeing JavaScript / ajax driven sites and pages.

For displaying result, Google needs to:
Pick a URL
Pick Title: Usually Title Tag, sometimes changes tag based on user query. This is win win for everyone.
Generate Snippet: Will create stuff on page, but strongly recommends using Rich Snippets.
Generates Site-links: depends on query and result as to whether this appears. If you see a bad site-link issue (wrong link) check for canonicalisation issue.

Pierre pointed out that all this is in the Google Webmaster Documentation - http://support.google.com/webmasters/?hl=en

4 Mar 2012

The Great Engineering Elephant

This gigantic animal, 12-metre high by 8-metre wide, lives in the largest warehouse.

Massive, fully-functioning robots made from reclaimd materials are a green tech lover's dream come true.

Inspired by the Sultan's Elephant, an interactive show featuring a mechanized elephant, the massive robot looks surprisingly lifelike aside from a few nuts and bolts and some joints at the trunk and legs showing.

The 12-meter high x 8 meter wide elephant was pieced together using 45 tons of reclaimd wood and steel.

When the majestic animal goes out for its walk, it is like architecture in motion departing a steel cathedral. The 49 passengers on board embark on an amazing journey on the Ile de Nantes. Each time the pachyderm goes out, it is a unique spectacle for everyone to enjoy.

From the inside, the passengers will be able to see the moving gears that power the legs. They can make the elephant trumpet and control some its movements, thus becoming truly a part of the Machine. On the back of the Elephant, it’s like being on the 4th floor of a moving house, with a breathtaking view of the banks of the Loire River. In this time-travelling carriage, the passengers can voyage to the imaginary world of Jules Verne in the city where he was born.

12 m high and 8 m wide
50 tonnes
Wood: American Tulip
Metallic carcass irrigated by 3000 liters of hydraulic oil
450 HP engine
An indoor lounge with French doors and balconies
A terrace accessible via stairways
Route: Approximately 45 minutes
Speed – 1/3 km per hour

I don't think that you will be able to find a more impressive reclaimed robot...I dare you.

19 Feb 2012

Some Tips on Learning SEO for Newbies

While surfing the Net I found an interesting post by Bill Slawski (Founder and President of SEO by the Sea). Thought to share it. (I have also added the additional view of other Experts on the Subject).

1. Start a blog on a subject that you enjoy enough to write regularly that isn't critical to the success and failure of your business. Use it as a testing ground, and experiment with plugins, with the layout, with the graphics, with themes, with different styles of blog posts, and more. Optimize your blog, install Google Analytics and verify it at Google Webmaster Tools.

2. Read everything you can from the Search Engines on their help pages, their corporate pages, their blogs, their patents and whitepapers and more. Don't limit yourself to Google, but look at other search engines as well, including Yahoo, Bing, Blekko, Duck Duck Go, WolframAlpha, etc. Do the same with social networks such as Facebook, Stumbleupon, Twitter, Google Plus. Watch the video tutorials from Google on Google Analytics, on Adwords, etc. Look at these as primary sources, but try to understand the motivations behind what they share and why.

3. Visit the Google Webmaster Central Help forums (http://www.google.com/support/forum/p/Webmasters?hl=en) everyday and learn about the problems and issues that site owners have, and see if you can figure out solutions for those problems. You don't necessarily have to post responses or solutions or suggestions to the people who bring their problems to the forums, but you can if you want to.

4. Study successful sites, or sites that you think are successful. Try to understand why they are. Why does Wikipedia tend to rank so well in search results for instance? What positive or negative things are sites like Huffington Post or Sears or Dell doing? Think critically about their designs, their usability, their methods of communication, their use of social networks, how they optimize their pages. Do they use robots.txt files? Do they seem to focus upon specific keyword phrases for different pages? Do they have unique URLs for each page on their site?

5. Try out and get involved with other online services such as Google Mapmaker and Google Earth, Wikipedia, Hacker News, Facebook, Google Plus, and others. Learn as much as you can about how they work, what their rules and policies are, and how and why people use them.

6. Find tutorials on HTML, CSS, PHP, Perl, and other web-based technologies. Search for [tutorial css] for example, to find some. You can often find some good ones on .edu sites, so try a search for something like [site:edu tutorial css]. Visit lots of sites and look at the source code for those pages to see how they do what they do.

7. Visit Webdesign, SEO, and Technology forums, and read, but with a critical eye. Lurk for a fair while before you participate in any of them, and learn about the people who are participating at those. Take just about everything you read with some rational skepticism and an with an intent to test and try things out on your own. Maintain a civil and polite presence on those, follow their rules, and avoid arguments. It's OK to "agree to disagree" with someone. Don't believe everything you read/hear. Look at "where" you are finding this "information". Is it reputable, is it trustworthy, is it well writen etc. Further - look for Dates/Times - see if you can identify just how old the information may be. Sometimes, even reputable, informed and knowledgable people get it wrong.

We all have to start somewhere. With SEO, doing some hands-on things like starting a blog or editing Wikipedia pages or setting up reports in Google Analytics or attempting to diagnose problems at the Webmaster Central Help forums can help more than reading a book or some articles about SEO. Learning to think like an SEO, so that you can analyze sites, create great user experiences, and solve potential problems is a large part of what SEOs do.

Think things through. Ask yourself solid questions such as:

Why would that count?
How would that benefit searchers/readers/users?
How can G verify/confirm/utilise that data?
How easily can that be misused/abused/faked?

If you cannot figure those sorts of questions out fairly quickly - then you may want to take it with a pinch of salt.

Don't jump to conclusions, make assumptions are think that X=Y just because you saw it once. If at all possible, come up with your theory, then figure ways to test it and prove it wrong. Only upon several failures should you believe you may be onto something

Take whatever advantages you can get.

Some forums (such as GWC) get the occassional Googler, or people in contact with them. It's worth reading and asking and listening. Then there are Hangouts - where you may get direct access to a Googler (very helpful!). There are also numerous videos by Matt Cutts (and possibly others?) ... watch and listen.

Rather not for beginners but for more sophisticated users:

Dive into different cultural/geographic areas and find out what they do different or better e.g. baidu, yandex etc.

Understand strategies these companies have and how they might implement them (against each other). React accordingly.

In this manner also watch acquisitions of companies/patents closely and ask yourself why they were done.

Build your own (simple) crawler and understand the problems of indexing, e.g. DOM tree manipulation, XHR, third party artifacts (ActionScript, Browsercache etc.) and low latency networks.

Understand how information is processed within data centers to leverage onsite keyword selection, i.e. stemming, ETL process etc.

Any other suggestions?

15 Feb 2012

Visualizing the Psychology of Color

Psychology of Color [Infographic] - Courtesy of http://nowsourcing.com (NowSourcing, Inc)

See the Full Size at http://nowsourcing.com/blog/wp-content/uploads/2012/01/louisville-painter.html

13 Feb 2012

Google Analytics has updated the default Search Engines List

WebMasters using Google Analytics know that the Organic Traffic Data (Traffic-Sources -> Sources -> Search -> Organic) is automatically populated on the basis of default search engines list maintained by Google.

Daniel Waisberg at Search Engine Land, says he has confirmed with Google that a few more search engines have been added to the list - rakuten.co.jp, biglobe.ne.jp, goo.ne.jp, and startisden.no/sok, search.conduit.com, search. babylon.com, search-results.com, isearch.avg.com, search.comcast.net, and search.incredimail.com.

Google has also fixed a long-pending issue of how they they recognized search engines. Before this change, if a URL contained the word “search” and a query parameter “q”, Google would attribute it to the search engine search.com, which led to inaccurate reports, especially as a consequence of big customized search engines, such as Conduit, Babylon and others.

Earlier, whenever I looked at the Organic Search Traffic Sources data, the data attributed to "search" (search.com) always mystified me. The search.com source always looked to be heavily over counted.

It seems that from 1st-Feb-2012, GA has changed the logic, such that customized search engines (as the ones shown in the list below) will not be shown as search.com.

The GA has also explicitly added known large customized search engines with “search” in their domain referrer to the list of known search engines:

http://search.conduit.com

http://search.babylon.com

http://search-results.com

http://isearch.avg.com

http://search.comcast.net

http://search.incredimail.com

Basically, if you receive a large amount of organic traffic, you will probably see your search.com organic traffic going down, and other search engines will start to appear as a source (such as the customized search engines shown in the list above). But your Google or Bing organic will not change.

9 Feb 2012

Google Screenwise Pays You To Give Up Privacy & Surf The Web With Chrome

Google is quietly taking requests from web users who want to get paid to surf the web using the Chrome browser while sharing data with Google. The program is called Screenwise and, though we’re not aware of any official announcement, Google has a signup page at http://www.google.com/landing/screenwisepanel.

The page explains that Google wants to create a panel of people to help it “learn more about how everyday people use the Internet.” It explains that panel members have to be at least 13 years old, have (or sign up for) a Google account and use the Chrome web browser. They also have to be willing to let Google track their web surfing activity:

As a panelist, you’ll add a browser extension that will share with Google the sites you visit and how you use them. What we learn from you, and others like you, will help us improve Google products and services and make a better online experience for everyone.

In exchange for that, panel members get a $5 Amazon gift card code for installing the browser extension, and then can earn another $5 Amazon code for every three months that they continue in the Screenwise program. The sign-up page advertises a $25 max total payment, but the fine print says Google will decide later what payment, if any, will be given for panelists who continue longer than a year.

Amazon isn’t involved in the promotion; Google says it’s using the online research firm Knowledge Networks as its “panel management partner” for Screenwise.

You may like to read the Full Story at http://searchengineland.com/google-screenwise-panel-open-110716

27 Jan 2012

Interesting Advertising Quotes

The "Ad Contrarian (Blog)" is a great mix of astonishing incisive commentary, irreverence and hilarious. If you are connected to advertising and marketing, you might love it.

Here are some delicious quotes from the right navigation of the blog .... interesting !

"Creative people make the ads. Everyone else makes the arrangements."

"Brand studies last for months, cost hundreds of thousands of dollars, and generally have less impact on business than cleaning the drapes."

"Nobody really knows what "creativity" is. Every year thousands of people take a pilgrimage to find out. This involves flying to Cannes, snorting cocaine, and having sex with smokers."

"Marketers always overestimate the attraction of new things and underestimate the power of traditional consumer behavior."

"If you're looking for perfection, you came to the wrong planet."

"We don’t get them to try our product by convincing them to love our brand. We get them to love our brand by convincing them to try our product."

"As an advertising medium, the web is like communism. It's never very good right now, but it's always going to be great some day."

"In American business, there is nothing stupider than the previous generation of management."

"If the message is right, who cares what screen people see it on? If the message is wrong, what difference does it make?"

"The only form of product information on the planet less trustworthy than advertising is the shrill ravings of web maniacs."

"In the entire history of civilization, nothing good ever happened to a teenager after midnight."

"There's no bigger sucker than a gullible marketer convinced he's missing a trend."

"All ad campaigns are branding campaigns. Whether you intend it to be a branding campaign is irrelevant. It will create an impression of your brand regardless of your intent."

"Nobody ever got famous predicting that things would stay pretty much the same."

24 Jan 2012

Web Page Loading Time Affects the Bottom Line

Web Page loading time is obviously an important part of any website’s user experience. And many times we’ll let it slide to accommodate better aesthetic design, new nifty functionality or to add more content to web pages. Unfortunately, website visitors tend to care more about speed than all the bells and whistles we want to add to our websites. Additionally, page loading time is becoming a more important factor when it comes to search engine rankings.

I read the two related articles from KISSmetrics:
How Loading Time Affects Your Bottom Line
Speed Is A Killer – Why Decreasing Page Load Time Can Drastically Increase Conversions

I am reproducing the infographic from KISSmetrics here below.

Click here to download a pdf version of this infographic from KISSmetrics.

As per Avinash Kaushik, there are three more factoids (not in the infographic, but still connected to impact of speed):

Google: If search results are slowed by even a fraction of a second, people search less. A 400ms delay leads to a 0.44 percent drop in search volume!

Edmunds: Reduced load times from 9 secs to 1.4 secs, ad revenue increased three percent, and page views-per-session went up 17 percent!

Shopzilla: Dropped latency from 7 seconds to 2, revenue went up 12 percent and page views jumped 25 percent. They also reduced their hardware costs by 50 percent!

22 Jan 2012

Chrome blocks HTTP auth for subresource loads

We had a website in Production Environment http://www.example.com. It's Page Construction related static subresource (css, js, jpg, gif & png) are being served from http://eimg.com. Why two separate domains? To serve the static content from cookieless domain.

For it's development we had a development environment at http://dev.example.com and subresource at http://dev.eimg.com. Both these development environment domains are password protected by HTTP Basic Authentication.

Issue:

On http://dev.example.com the static subresources are being served from http://dev.eimg.com. Both are password protected by HTTP Basic Authentication.

Everything works okay with Firefox & MS Internet Explorer. On accessing the Development Environment Website it asks for password twice. Once for http://dev.example.com and another for http://dev.eimg.com.

But, the Chrome does not ask for the second Password (for http://dev.eimg.com) and issues http status 401. Thus, blocks all content from dev.eimg.com and renders the page without stylesheet and page construction images.

Google-Chrome Developers say:

This behavior is an intentional change as a phishing defense. Sites shouldn't be doing this kind of authorization cross-origin. If you need to allow this behavior, launch chrome with the --allow-cross-origin-auth-prompt flag.

Read: http://code.google.com/p/chromium/issues/detail?id=91814

Further Read: http://blog.chromium.org/2011/06/new-chromium-security-features-june.html. It says:

Chromium 13: blocking HTTP auth for subresource loadsThere’s an unfortunate conflict between a browser’s HTTP basic auth dialog, the location bar, and the loading of subresources (such as attacker-provided <img> tag references). It’s possible for a basic auth dialog to pop up for a different origin from the origin shown in the URL bar. Although the basic auth dialog identifies its origin, the user might reasonably look to the URL bar for trust guidance. To resolve this, we’ve blocked HTTP basic auth for subresource loads where the resource origin is different to the top-level URL bar origin. We also added the command line flag switch --allow-cross-origin-auth-prompt in case anyone has legacy applications which require the old behavior.

Do you wish to see List of Chromium Command Line Switches?
http://peter.sh/experiments/chromium-command-line-switches/

Possible Resolution:

Let’s remove the Auth from the dev.eimg.com.
Pros: Issue would be resolved.
Cons: Google-Images may get the images from dev.eimg.com and index it in Google Images. This would cause duplicate images content for Search Engines (development - dev.eimg.com & production - eimg.com).

But, I do not see it as a significant duplicity threat, because only Page Construction related images and JS / CSS / images / sprites are served from this sub-domain.

Hope, this post helps anyone encountering the similar situation. BTW, I totally agree with the Chrome's implementation of defense against Phishing.

15 Jan 2012

What is Bounce Rate and Why should we worry about Bounces?

As per the "Content Characterization" section of Web Analytics Definitions by Web Analytics Association (www.webanalyticsassociation.org):

TERM: Bounce Rate
Type: Ratio
Universe: Aggregate, Segmented
Definition/Calculation: Single page view visits divided by entry pages.
Comments: If bounce rate is being calculated for a specific page, then it is the number of times that page was a single page view visit divided by the number of times that page was an entry. If bounce rate is calculated for a group of pages, then it is the number of times pages in that group was a single page view visit divided by the number of times pages in that group were entry pages. A site-wide bounce rate represents the percentage of total visits that were single page view visits.

That's the standard definition. But, it would have been better if we could measure "the percentage of website visitors who stay on the site for a small amount of time (usually five seconds or less)". This less than 5-seconds stay on the site is difficult to measure. This is because all Web-Analytics Software calculate the Time-on-Site by tracking the time spent on previous page when Visitor clicks on the next page. If the Visitor did not click on next page (bounce), how do we know the time spent on single page visits? Alas, if we could measure it somehow !

Anyway, bounce rate is a interesting way to measure the quality of traffic coming to a website. In short bounce rate measures the percentage of people who came to your website and left "instantly". Bounce rate measures quality of traffic you are acquiring, and if it is the right traffic then it helps you understand that where and how your website is failing your website visitors.

What is a Bounce?

Now, lets try to understand that how the Bounces happen:

Any click on the page that directs a user to an external website or your sub-domain (yes, sub-domains are counted into your bounce rates).
Pushing the back button and going back to the source.
Closing the browser tab or the entire browser.
Typing a new URL from that page and leaving.
Timeout session, which is more than 30 minutes on a single page (in my opinion this one is little misleading factor, it can mean both that the user is inactive or that the page is so interesting that it engaged the user beyond 30 minutes - for example, a page having interesting large Video - this must be studied while analyzing bounce rate for the site in question).

All these are taken into account when calculating your bounce rate compared to the total number of visits to a single page.

By the way I would like to emphasize here that we should not get confused between Bounce-Rate and Exit-Rate. You may like to read Web Analytics Bounce-Rate and Exit-Rate.

What is the industry standard for bounce rate?

Short answer is that there is no industry standard. There are a lot of factors that influence the bounce rate, so you really can’t compare bounce rates of one site (or page) to another. The best way to know if you are doing better or worse is to set your own baseline and compare your performance over time.

Here are some of the numbers that were listed by Steve Jackson (Co-Chair Nordic Branch, Web Analytics Association), based on his experience with various sites.
Source: http://tech.groups.yahoo.com/group/webanalytics/message/6116

What are the factors that affect the bounce rate?

Below are some of the factors that determine the bounce rates. You can use this as a checklist to diagnose a high bounce rate issue. (Source: http://webanalysis.blogspot.com/2007/07/bounce-rate-demystified.html)

1. Source of your traffic – Each source results in a different bounce rate. When setting your baseline create overall baseline and baselines for each traffic source e.g. display advertising, organic traffic. Segment the Bounce Rate with Traffic Sources and then Analyze.

Do you need to revisit relationships with sites that are not sending you high quality traffic? What is the call to action that is causing people to come to your site and bounce? Are your email, affiliate, other marketing campaigns yielding low bounce rates?

2. Search engine ranking of the page – A page which ranks higher on irrelevant keyword will get a higher bounce rate. Measure the bounce rate of your search keyword.

3. Type of Audience – If you are advertising and reaching the wrong audience you will see higher bounce rate. Bounce rate will tell you if you need to better target your ads.

4. Landing Page Design – Landing page design affects the bounce rate. I suggest A/B testing to improve after you have set your baseline. No matter how low you go there is always an opportunity for improvement unless you somehow achieved 0% bounce rate.

5. Ad and Landing Page Messages – If the messages on your banner or search ads are not aligned with the messages on the landing page then the chances are you will have one of those 50% + bounce rates. Make sure messages are aligned and give visitors a clear call to action. Many a times marketers send users to a generic page instead of an appropriate landing page. This can (and will) result in higher bounce rates. Again A/B or multivariate testing should be used to reduce the bounce rate.

6. Emails and Newsletters – Subject lines, to and from, links, banners, the layout of email and the landing pages all work in tandem. They can either result in a great user experience and hence lower bounce rate or can result in a disaster. Do testing to reduce bounce rate.

7. Load time of your page(s) – A longer load time can result in visitor bailing out of the site causing higher bounce rates. Conversely, users can hit the refresh button, thinking there was a problem with the page load. This will incorrectly reduce bounce rate.

8. Links to external sites – A page that has links to external sites (or sub domains or pages that are not tracked in the same data warehouse) will show higher bounce rates.

9. Purpose of the page – Some pages’ purpose is to drive users inside the site while other pages provide the information that user is looking for. A page that provides the end result can show higher bounce rate. One example is the Branch-Offices page on my company’s web site, I have this page bookmarked. Whenever I need a particular branch's phone number, I go to my favorites, pull this page, get the number and leave.

10. Other factors - Pop-up ads, pop-up survey requests, music, streaming video, all can have an adverse effect on bounce rates if users become annoyed.

Worry about Bounce-Back rater than Bounce-Rate

First I want to make a couple of things clear. It’s highly unlikely that search engines use bounce rate directly when scoring or ranking webpages. Nor is a high bounce rate a definite signal of low quality or a failure to meet visitor expectations or needs.

Over the past few years, there has been a lot of debate (and confusion) about whether bounce rate is a signal that the search engines use to determine quality content. Matt Cutts has probably been asked this question ten thousand times, and has referenced bounce rate when answering some questions about ranking factors. Matt Cutts has said that bounce rate is spammable and a noisy signal. I also think that bounce rate is a poor signal for Google's use.

Just think about all of online calculators, Time Zone Converters or a Video article. They can have extremely high bounce rates since they can satisfy the user with only 1 page view needed. If your pages are satisfying the user I would not worry about bounce rate.

I have seen many instances of pages with high bounce rates (in Google Analytics) that still ranked well in Search. The content I’m referring to would clearly be identified as high quality, unique, and valuable, but had high bounce rates. Many of the pages I am referring to ranked in the top three to five listings on page one of Google, Bing, and Yahoo.

Duane Forrester, Sr. Product Manager at Bing as part of a post (How To Build Quality Content), explains that the engines can monitor “dwell time”, or the time a person remains on your page before clicking back to the search results. If visitors are clicking through the search listings to your site, and then clicking back to Bing quickly (in just a few seconds), that can be a negative signal to Bing.

Google internally refers to a Visitor that doesn't bounce back as a "long click". A long click is a click that leaves Google and doesn't come back for a long time: until that person wants to use Google for an unrelated search. The user doesn't refine their Keyword, nor does the user use the back button and click another result in the SERPs instead.

If Google is trying to use Bounce-Back as a ranking signal, they will have to deal with some complications -- A lot of people will rapidly open a series of tabbed results by repeatedly jumping back to the SERPs page. They won't even look at any of the pages until they finish opening a series of tabs. This makes it appear that rapid bounce-backs are occuring when they aren't.

In my opinion Google is clearly able to handle this multi-tab result opening. If I do a search, click on the result and then use the back button, I get a little notice in the SERPs to "block all <site> results". If I open many sites in tabs, I don't get this notice. So Google is able to penalize the Bounce-Backs. I believe that Google started using this as site wide signal as a big part of panda: "Bounce-Back from so many search terms sent to your site is high, the site (or a section thereof) must not have high quality content at all."

My recommendation for Bounce-Backs: If your traffic is coming to pages with high bounce rate with irrelevant keywords, try to adjust the content so that it should not rank for those irrelevant keyword. In case Bouncing Traffic is coming to a page with Keywords with different intent, give links to relevant content even if it is outside your site. This will reduce Bounce-Back, and the Pages / Site will ranks high.

Currently, there is no methodology available for the Webmasters to track Bounce-Back. Bounce Rate is easy to measure and can be a great proxy for Bounce-Back or Long-Click on a site with deep content.

Hopefully, analytics packages will evolve to let us see Dwell Time or Bounce-Back or Long-Click. In the meantime any measure that can be employed to improve engagement and increase the time visitors spend interacting with our content is essential.

12 Jan 2012

How Does Our Brain Know What Is a Face and What's Not?

Objects that resemble faces are everywhere. Whether it’s New Hampshire’s erstwhile granite “Old Man of the Mountain,” or Jesus’ face on a tortilla, our brains are adept at locating images that look like faces. However, the normal human brain is almost never fooled into thinking such objects actually are human faces. (Credit: MIT)

Image found through an article published at:
http://www.sciencedaily.com/releases/2012/01/120109132705.htm

9 Jan 2012

What if Google had to SEO Optimize its own Home Page

This is really very interesting.

http://meangene.com/google/design_for_google.html

Also, see the SEO Report Card for Google's Own Product "Google Products" at:

http://www.google.com/webmasters/docs/google-seo-report-card.pdf
Little old (Tuesday, March 02, 2010) but still interesting.

Geeks vs Non-geeks – Script Automation

I stumbled upon this Image while surfing the Internet. Liked it so much that thought of sharing it.

Quite interesting, isn't it ?

2 Jan 2012

My Portrait Sketch

This Christmas I was at Banaras Hindu University (BHU), Varanasi Campus. Prabhakar, a Fine Arts Student tried to draw a Portrait Sketch with Color Crayons.

DS getting his Portrait Sketched at BHU, Varanasi

Finally, the Sketch did not resemble me :) But, I did not let him know that I am disappointed and paid him Rs 200 as per his desire.

AngularJS

In this BayJax talk from September 16, 2011, Miško Hevery of Google speaks about AngularJS, an open source MVC framework for JavaScript.

Angular is an open-source MVC JavaScript framework, which simplifies web development by offering automatic view/model synchronization.

In addition to two-way binding, Angular is lightweight, supports all major browsers, and built for creating testable JavaScript code.

Angular was created by Miško Hevery (http://misko.hevery.com/).

From the Angular website:

<angular/> is what HTML would have been if it had been designed for building web applications. It provides your application’s plumbing so you can focus on what your app does, rather than how to get your web browser to do
what you need.

For more information visit: http://angularjs.org

You can find the source code here: https://github.com/mhevery/angular.js

Great talk ! Looking forward to learning AngularJS

Usefulness

I liked it - courtesy: http://www.bonkersworld.net/usefulness/

1 Jan 2012

Pages prohibited by robots.txt gets indexed in Search Engine

My Friends from SEO World keep on asking me following question from time to time:

Why is my url showing up in Google when I blocked it in robots.txt? It seems that Google Craws the disallowed urls.

Lets take a case from a popular B2B Portal http://dir.indiamart.com.

Now, lets see what happens when we search for a specific url from the disallowed /cgi directory.

Google has 360 pages from that "disallowed" directory.

How could this happen? The first thing to note is that Google abides with your robots.txt instructions - it does not index text of those pages. However, the URL is still displayed because Google found a link somewhere-else as:

<a href="http://dir.indiamart.com/cgi/compcatsearch.mp?ss=Painting">Painting Manufacturers & Suppliers - Business Directory, B2B...</a>

Google hasn't crawled these URLs, so it appears as an URL rather than a traditional listing.

Also, because it found the link with anchor-tag "Painting Manufacturers & Suppliers - Business Directory", it associated the listing with it.

In addition, Google can show a page description below the URL. Again, this is not a violation of robots.txt rules — it appears because Google found an entry for your robots.txt disallowed page / site in a recognized resource such as the Open Directory Project. The description comes from that site rather than your page content.

The robots.txt tells the engines to not crawl the given URL but tells them that they may keep the page in the index and display it in results (see the snapshot above – in the snapshot you will notice that there is no snippet).

This becomes a problem when these pages accumulate links. Those pages then can accumulate link juice (ranking power) and other query-independent ranking metrics (like popularity and trust) but these pages can't pass these benefits to any other pages since the links on them don't ever get crawled.

This is further more elaborated from a SeoMoz cartoon below (courtesy: Robots.txt and Meta Robots ):

This means in order to exclude individual pages from search engine indices, the noindex meta tag <meta name="robots" content="noindex, follow"> is actually superior to robots.txt.

Blocking with Meta NoIndex tells engines they can visit but they are not allowed to display the URL in results.

Matt Cutts explains in a WebMastersHelp Video titled: "Uncrawled URLs in search results" about "why a page that is disallowed in robots.txt may still appear in Google's search results".

A SitePoint Article Why Pages Disallowed in robots.txt Still Appear in Google may also be worth reading in this regard.

I did all this research for my own purpose. But, thought of sharing it, just in case it helps others.