Help with robots.txt please

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Darrell
    Confirmed User
    • Feb 2003
    • 803

    #1

    Help with robots.txt please

    I don't want search engines to index any pages in the root folder and any other folders that has the word page in the url. So for example domain.com/page1.php I don't want indexed.

    I believe Disallow: /page will stop any pages starting with page in the root folder, but can a wild card be added to include all files in folders as well?

    Thanks for any help.
  • CYF
    Coupon Guru
    • Mar 2009
    • 10973

    #2
    User-agent: *
    Disallow: /

    or:

    User-agent: *
    Disallow: /folder

    / will disallow your whole site, and /folder will disallow everything in that folder.
    Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
    AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

    Comment

    • Darrell
      Confirmed User
      • Feb 2003
      • 803

      #3
      Thanks for the quick reply but you didn't understand my question. I want every folder and page to be indexed except for any file that has the word page in it.

      So for example I don't want the following indexed:
      domain.com/page1.php
      domain.com/page2.php
      domain.com/folder/page1.php
      domain.com/folder/page2.php

      I read somewhere that Disallow: */page- should do that but I am not sure.

      Comment

      • CYF
        Coupon Guru
        • Mar 2009
        • 10973

        #4
        not all robots allow wildcards. Google and yahoo do, I'm not sure about the others.

        This should work for what you want to do:

        User-agent: *
        Disallow: /folder/page*
        Disallow: /page*
        Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
        AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

        Comment

        • CYF
          Coupon Guru
          • Mar 2009
          • 10973

          #5
          you could also use:

          Disallow: /*php
          Disallow: /folder/*php
          Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
          AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

          Comment

          • acrylix
            Confirmed User
            • Oct 2006
            • 362

            #6
            Originally posted by Darrell
            I believe Disallow: /page will stop any pages starting with page in the root folder, but can a wild card be added to include all files in folders as well?
            Yes, you could do this:

            User-agent: *
            Disallow: /page
            Disallow: /*/page

            But that will also block all folders beginning with "page", for example "mysite.com/pages-with-the-hottest-babes/" will be blocked.


            So more specifically, if these pages are all php files use this instead:

            User-agent: *
            Disallow: /page*.php
            Disallow: /*/page*.php


            You can test with a tool like this: http://tools.seobook.com/robots-txt/analyzer/


            However, it seems you're asking how to prevent these pages from being indexed. Note the difference between "Disallow" and "noindex". Disallow will not prevent those URLs from appearing in the index. It only tells search engines not to crawl those files/folders. If they are being linked to from somewhere on your site (or from someone else's), they will still appear in the index whether you disallow them or not. Google for example, will still index the URL using anchor text to describe it.

            To prevent these pages from being indexed, you should use the <meta name="robots" content="noindex"> tag on them.

            If you can't do that, then look into adding a noindex X-Robots-Tag to your .htaccess file. Scroll down to the bottom of this page, and look for "Practical implementation of X-Robots-Tag with Apache" for some examples on how to do that:

            https://developers.google.com/webmas...obots_meta_tag

            Comment

            Working...