"Consistency is the last refuge of the unimaginative.", Oscar Wilde
"Probably the best wish for a dreamer: Be underestimated", 잫읻 젤지옥루
RSS icon Home icon
  • Google’s canonical tag experience report

    Posted on April 2nd, 2009 Cahit Crcioglu No comments

    Google has started a new standard with other companies like microsoft and yahoo for helping to determine content quality or similarity with a new unofficial html <link> parameter “canonical”. This is not fresh news! In this post, i want to share my experience with “canonical”. I’ll tell about how duplicates are decreased and significance of the pages increased quickly!

    I was developing a new customer-to-customer e-commerce project. Before that, i was working on a very big (huge traffic and search engine index = approx. 18 million+) blog service. ( in the top 10 of my country according to the alexa )

    While i was working on the e-commerce project, google announced “canonical” thing. By clicking here you can see my question to google related to this new release on that day. (Answer is in Maile Ohye’s response, you can see it here)

    People were talking about duplicate content and it’s penalty result. I’ve read lots of documents related to this from google and other people’s experiences. THERE IS NO SUCH THING.

    Ok.. There is something almost like that.. but not exactly. I am not a google engineer but i can guess ( even i could test and did ) it. Even people in my company was not understanding that, they were just looking straight ahead and telling that “OH! NO! Penalty! That can’t be used! If you do it, i’ll cry and i’ll want milk with some baby biscuits.”. They aren’t bad people. But on this topic, almost everyone produces their own theory. But theory gets end when you prove it… So… ACTA non VERBA!

    Well, google continuously changing and trying things, and some folks from google trys to inform us about changes ( a little (; ). ( like matt cutts ). Even while i’m writing this post, google may be switching on/off many things..

    Let’s talk about what i’ve experienced.. ( yeah, finally )

    I wasn’t started to the project with “canonical” or something related to it. But i was able to easily adapt that to the project. Before canonical and my other optimizations ( not forgotten ones… The ONES in to-do list ), i had to open the site because of the deadline. I was keeping an eye on google’s diagnostic tools which is in google’s webmasters tools. I saw a huge increase in duplicate title tags; it went till 9000 duplicate titles! ( i was trying to do my best in that particular duration, i had to be fast! Project should already have been release to the public! ) Ok there should be no excuse for that. I was waiting for this. When i dig into those duplicate titles, i did not see any (almost) critical situation that i can not resolve.

    Many of them were from item searches, listings. As i have asked to google in their blog (which you can see here ), i first started to apply “canonical” tag to item listing pages, where i have many links those shows results in different visual template. ( like, “gallery view / list view” ).

    So there was more then 1 link to define those contents.. Link was more than 1 but content is the same. Ok, this is almost a duplicate content situation ( however duplicate title does not guarantee you that you have a duplicate content problem). However google may be smart enough to classify duplicate content situation correctly, i couldn’t blame if she says “Look look loook! What we have here! I think someone didn’t turn off the photocopy machine!”. Because it is really DUPLICATE! Oh no! Now what? Ok calm down (: When duplication occurs, significany of that document decreases.. Why? Because there is 2 different links, showing the same thing. So this shouldn’t be so special.. So what happens? Google bans the site? Google deletes those indexes? Hell no! It indeed shows  those pages in the seach results. But probably, 1 page under the other result, aligned to the right a little. So google is telling you that you’ve found these pages which are almost identical, so we chose this one for you, but if you don’t believe us or if you want to try your chance, here is the other link.


    But what could happen more? Another site with the same importance and trust  with our own project has a page like ours ( i mean the searched keyword weight ), may show in the upper part of the results. Why? Because it has no duplicates, it’s probably the one YOU are  SEARCHING for! So it’s the natural result of it, not google officer’s guns.

    I have added canonical tag to see what will happen, and started to track it from webmasters tools. BINGO! duplicates are decreasing!

    ( I forgot to tell that i’ve added canonical tag to main pages before we have many indexes in google. This was the first day “canonical tag” has been announced and on that day we have 10 duplicate titles showing up in webmasters tools. I’ve given it a try to see what would happen. Yes you guessed right: ZERO duplicates (: )

    What did i do?

    I’ve simply added canonical tag to search results/item listings to target the ORIGINAL links.

    For example:

    <link rel=”canonical” href=”http://www.*****.com/ara?ara_tur=urun&ara_sor=cam&ara_sayfa=6″ />
    (which translates into http://www.*****.com/search?search_type=item&search_keyword=glass&result_page=6)

    This was valid even pages already indexed by google earlier. Google just couldn’t fix them all quickly. It took nearly 3-4 weeks to fix 20000 indexes.( Site was growing fastly even we have opened it recently. )m

    Google was simply understanding that,

    there is a page at the address http://www.*****.com/ara?ara_tur=urun&ara_sor=cam&ara_sayfa=6 which is the original and trusted version of the page at the link

    So all of those possible combinations occured because of the parameters in the link started to vanish!

    But that wasn’t the whole thing i did to optimize it. I’ve also removed show_type parameter with the other parameters(sort direction, sort type) together. So the content is unique! It doesn’t make sense if you have 40000 indexes. I can live with healthy 10000 ones!

    I’ve quickly added some lines to make it changable with an ajax call which works with cookies ( only lives in that path. So it won’t make any garbage in headers going to server to request the page. ). I didn’t object to what google says: “do NOT make TWO versions of your site; one for users and one for search engines”. I didn’t violated anything. Users can dig into MORE detail if they want. Infact users are customizing the site for themselves and personal customizations SHOULD NOT be indexed by a search engine. Search engines may give us customization options for the result sets ( for example google is already doing it ), but we are responsible for giving the the most objective version of the document. There is a tiny line between them!

    So after all optimizations, now we had only & only 1 unique copy for that category! While you were searching for the results of the domain name in the google ( not www.AAAA.com, just AAAA ), the site was not even listed int top 600. After sometime, it showed up in the first 3. And now it’s the first result that returns from google ! ( Don’t tell me that i have no role on that. )

    By the way, do not be afraid of permalinks so much. I’ve read it, i TRIED it and i saw it! (: Google is smart enough to evaluate them. Just don’t forget that google is still being developed by human-beings, and if they see that someone is trying to fool them, they’ll probably work on it!

    For summary,

    • Optimize your site like there is no canonical tag
    • Use canonical tag for SURE!
    • Try to avoid indexing of your site’s personally customized versions.
    • Do not throw away your good friend “Permanent redirection header”. It’s still the preferred way to say “I am not him”. But you can use both canonical and redirection headers for resolving different problems.. determine which fits where..

    I hope someone finds it useful.

    Leave a reply