Creating A WordPress Robots.txt To Improve SEO


I tried to enter uncharted waters today as I’ve decided to create a robots.txt file, which I haven’t done before. Daniel explains best why this should be a sensible move:

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation.

The problem I’m having, is I can’t find a definitive source that gives clear instructions on what should be included in a robots.txt so I thought I’d throw my problem out to the Connected Internet ‘team’ to see what should and shouldn’t be included. Here’s what I have so far, which is based mainly on this guide:


User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /rss/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /date/
Disallow: /comments/

What else do you think I should include or exclude?


Read Related Posts




Latest Posts

Filed Under: BloggingTools & Tips

Tags:

About the Author: Everton is based in London and has worked in the internet and mobile space for over ten years now, and before that worked in corporate strategy and consulting. He has a degree in Economics from Cambridge University.He also writes for Windows 7 News, Windows 8 News and One Tip A Day.

  • i dont like this version of the robots.txt

    why would you want to exclude the monthly archives, for instance?

    plus there are some non clear attributes that might end up messing your indexation.

    I would rather stick to a more simple and clear robots.txt, you can see then one i am using http://www.dailyblogtips.com/robots.txt
  • just found your link on the Z list and thought I'd come check out your website. Lots of great information on it. Keep up the good work.
  • I agree with Daniel; why would you want to block that much from Google? Having Google spider your monthly archives will rather have a positive impact as it'll be able to find your posts.

    @Daniel; I don't see why you'd want to block your /feed/ page from Google though; Google is doing more and more with blog posts, RSS feed parsing is one of these things.
  • guys - as I said I'm entering uncharted waters, hence the open question. the example i've listed is the biggest one i found.

    rather than saying the list is 'too big' it'd be more useful if you said what should be included, or what shouldn't be included
  • I'd leave the robots.txt file as uncluttered as possible. If you make the slightest mistake, the good bots will stop visiting parts of your site. The bots you don't want o the page will anyhow not pay attention to the robots.txt file.
    The example at http://www.dailyblogtips.com/robots.txt is good, you don't need more than that.
  • Well Everton, I`m not so good in this but I don't think that you need all of this
    For a blog directory, disallowing the SE-bots from visiting the main three folder(wp- admin,-content, -includes) and the wp-"files" in the main directory, will be sufficient! as all .php, .css, .js..etc files will be included in these directories
    So a robots.txt file like Daniel or mine will be good.

    Anyway, waiting for more comments or asking an expert is the best way to minimize your robots file!
  • Everton, I'd recommend using a mixture of robots.txt and meta tags. For things like your archives and tag directories you'd want to have a meta tag which allowed following but not indexing. This prevents you from getting your archives entered into Google but allows the posts to still get crawled.

    Putting meta name="robots" content="noindex,follow" at the top of your archive and tag templates will allow the links to be followed but should prevent the actual archive page from being indexed.

    In your robots.txt you should probably only disallow anything that you want a well behaved robot to completely ignore. If files or directories aren't specifically linked to (and your wp-includes, wp-admin, and wp-content directories are among these) then you can leave them out altogether.

    A well behaved spider will only follow links. Any spider, well behaved or not, won't go anywhere that isn't specifically referred too. If there's no link to a directory it doesn't know that directory exists.
  • SoKoOLz
    omg, this is very useful.
    thx u thx u
  • I agree with Daniel, but Everton is right. He doesn't want Googlebots to waste their time spidering unwanted images or feeds. Instead, he wants them to be redirected towards essential pages
  • I don't understand Thilak
    he wants them to be redirected towards essential pages

    What this has to do with the robots file?!
    And why should it be a waste of time to crawl the feeds?
blog comments powered by Disqus