![]() |
Robots.txt
I am looking into the SEO of my sites and someone mentioned a robots.txt file that I should have ??? I am of course going to speak to my best friend Mr Google... but wondered if anyone could take 5 mins to explain what it is and what it should consist of... ?
:helpme |
In a nutshell, robots.txt is used to tell spiders / crawlers what NOT to scan.
Not every robot obeys it however. User-agent: * Disallow: / Would tell ALL robots not to look at any pages. I recommend NOT using the example above. |
I tell the robots the adress to my sitemap, done!
|
google it and read some stuff it can do a lot for your sites for sure
|
Hmmm...
Quote:
I understand however some bots are undesirable... where is there a list of the bots that are not wanted ??? Someone said about bots that some are bad bots that for example harvest emails, and others that are site rippers... but it appears these are more blockable by re writing your HTACESS file as opposed to editing the robots.txt file... but again I would ask where you find a list of bad bots/site rippers that is up to date to put into your HTACCESS |
Robot text files are pretty much useless unless you're trying to get Google to not index a page. Google will crawl your page and index your content so just keep the robot.txt file out of your header. You really don't need it. :2 cents:
|
Quote:
To allow all robots complete access: User-agent: * Disallow: To exclude all robots from the server: User-agent: * Disallow: / To exclude all robots from parts of a server: User-agent: * Disallow: /private/ Disallow: /images-saved/ Disallow: /images-working/ To exclude a single robot from the server: User-agent: Named Bot Disallow: / |
You should hooked this up on your site.
http://www.google.com/webmasters/tools/ |
Use it to keep spiders out of certain parts of your site.
|
They are pretty simple files and can do alot for your site. I also suggest using a .xml site map and have google spider it often
|
Thanks guys...
great advice keep it coming... I was thinking of a xml sitemap, just not sure how many things will be on it as we're a paysite and not that many pages :) but if it helps the SEo then I will speak to my guy about it...
How do I get google to look at it lots :) |
This is what I put on all of my sites
User-agent: Fasterfox Disallow: / User-agent: * Disallow: Sitemap: http://...../.....txt The sitemap is just a text listing of all the urls in UTF-8. Sitemap is a recent addition to the robot.txt spec and the big 3 all grab it now. Much easier if you have lots of sites to do this than fuck around with the google stuff. http://www.sitemaps.org/protocol.php#informing The Fasterfox thing is due to some FF plugin or something that causes "false" hits to your site because it preloads pages. |
Ok...
So the general consencus is we should create some form of site map... either xml or txt... and then put within the robots.txt a command to make sure that the bots read/crawl the sitemap :)
Anyone add anything to keep the bad ones out... or is that done mainly via the htpaccess file ? either way is there a good example ? For either... that shows ones that should be included ? :thumbsup |
Quote:
If you submit your sitemap to Google Webmaster Tools, then they'll spider everything regularly...Amongst lots of other cool stuff. |
Quote:
|
I add this to all of my sites' robots.txt :
User-agent: ia_archiver Disallow: / that keeps archive.org from spidering and keeping a copy of your site forever |
Question
Quote:
|
Quote:
http://www.alexa.com/site/help/webmasters |
Question
Can anyone show me an example of a xml sitemap for a paysite that they use to encourage the spiders to crawl the page... ??? :helpme
|
Quote:
Quote:
|
Quote:
|
Hey
Quote:
|
Quote:
|
Google would help
|
All times are GMT -7. The time now is 04:03 PM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123