free hit counter

FREE HIT COUNTER

Articles, News, Reviews

Don’t end your urls with .exe

by Matt Cutts

Sometimes at a conference people will ask me “Does it matter what extension I use for my pages? Does Google prefer .php over .asp, or .html over .htm?” And my answer is “We’re happy to crawl all of these file extensions. It doesn’t matter what you choose between any of those.”

Usually I also try to insert a reminder at the end of my reply such as “But there are some file extensions that are mostly binary data, such as .exe, where the vast majority of the time the data would be meaningless blobs, so there are a few extensions to avoid. If your files are named example.dll or example.bin and you don’t see Google crawling pages with that file extension, I’d recommend changing your file extension to something else.”

There’s a simple way to check whether Google will crawl things with a certain filetype extension. If you do a query such as [filetype:exe] and you don’t see any urls that end directly in “.exe” then that means either 1) there are no such files on the web, which we know isn’t true for .exe, or 2) Google chooses not to crawl such pages at this time — usually because pages with that file extension have been unusually useless in the past. So for example, if you query for [filetype:tgz] or [filetype:tar], you’ll see urls such as “papers.ssrn.com/pape.tar?abstract_id” that contain “.tar” but no files that end directly in .tar. That means that you probably shouldn’t make your html pages end in .tar.

The SEOmoz folks stumbled across this when they had a url that ended with “/web2.0″ . It looks like previously they had a url looked like “/web2.0/” (note the trailing slash), which we were happy to crawl/index/rank. But when their linkage shifted enough that “/web2.0″ became their preferred url, Google wouldn’t crawl urls ending in “.0″, so the page became uncrawled.

Even though urls ending in “.0″ are often binary and therefore end up getting dropped later in our indexing pipeline, it’s always good to revisit old decisions and respond to feedback by running new tests. So just in the last day or so, we switched it so that Google is willing to crawl pages that end in in “.0″. This will help the small number of pages out on the web that want to serve up HTML pages with a “.0″ extension.

You can the results trickling into Google with a bunch of “X hours ago” fresh results:

0 file extension

So my quick takeaways would be:
- Why Google doesn’t crawl some filetype extensions (when we’ve seen good evidence that the extensions are mostly binary or otherwise not-very-indexable files).
- An easy was to use the filetype: operator, so that you can decide whether to avoid a particular filename extension yourself.
- Google is willing to revisit old decisions and test them again, which is what we’re doing with the “.0″ filetype extension.

I hope that helps a few people who are considering unusual filetype extensions of their own. :)

Original publication: http://feeds.mattcutts.com/~r/mattcutts/uJBW/~3/311262017/
June 13, 2008, 11:40 am
  Reliance cards
  International phone cards
  Web Hosting Resources
  Outlook Express Backup
  Free Web Counter
  Repair corrupt rar repair
Google
Web Partners: Iflexion Website development company offers web design, website development and web application development services. We deliver websites and web applications that tailored to client specific requirements.
Graphic converter  |   Ecommerce Development  |   Internet Monitoring Software  |   SEO  |   Web hosting ratings

Copyright 2004-2006 © HitsLog, All rights reserved. Site Map | Privacy Policy | Terms and Conditions