help wanted
Can someone please direct me to instructions on how to stop search engines from crawling my site? I've officially had it with some of the requests that lead people here. There are some sick, sick people in this world and I do not want to be part of their quest to find the depraved things they are looking for. They won't find them here, of course, but the thought that people like that are on my site looking for this crap really freaks me out.
UPDATE: Thanks to everyone for their help. I'm going to try a couple of different things today and I'll let you know how it works out. You all rock, as usual.
Comments
Read this:
http://www.tamingthebeast.net/articles/robotswhoread.htm
Posted by: Jack | January 17, 2004 08:19 AM
http://www.annessa.net/blog/archives/001039.php
Posted by: Solonor | January 17, 2004 08:28 AM
LGF has a link to it's search requests log...
lgf search requests
Since I don't have a web page, that's the only way I can see what sort of searches people are making.
Amazing.
Posted by: Josh Scholar | January 17, 2004 08:58 AM
Sorry if the world is treating you poorly. What specifically do they seem to be tracking? I think there are some common PGP code keys that might help you...?
Posted by: Lloyd Warren Ravlin III | January 17, 2004 09:11 AM
Creat a file named robots.txt Put the following lines in it:
User-agent: *
Disallow: /
upload it your public_html directory. If you have subdomains, also upload it to those directories. That will tell the search engines to stay away. (But it will not remove old caches they already have, you'll still get hits from those.)
Posted by: Kathy K | January 17, 2004 09:18 AM
robots.txt is the way to go - but be warned that some search engines don't respect it (I don't remember the full list at the moment, but there are a couple). You can go and ask Google to remove your site from the index. Not sure how quickly that happens, but it's worth a shot.
Posted by: Kevin | January 17, 2004 09:33 AM
In increasing order of veracity
1) <meta name="robots" content="noindex,nofollow">
2) the robots.txt file listed above
3) Complete blockage via your .htaccess file. I did this with a few of the more spam-type bots that don't respect robots.txt or anything else. Google will respect both the meta tags and the robots.txt file.
Posted by: Doc | January 17, 2004 10:40 AM
Sigh. As the "My mother was a splody dope" entry scrolls off the bottom of the log, the chances that my last rants will be answered nearly evaporates.
I think I'm going to go into withdrawl.
Posted by: Josh Scholar | January 17, 2004 10:49 AM
How do you know they're sick? If so, figure out their IP and block it. There are some sites. Google "IP Blocker"...
Posted by: Cecile | January 17, 2004 11:16 AM
I'm not going to go through my logs and block the IP of every person who comes here looking for things about sexual acts that are illegal in most countries (probably not Thailand).
Posted by: michele | January 17, 2004 11:24 AM
... and (on the scrolled thread) I'd posted this cool picture of Vash the Stampede (no it's not MY artwork)
"The world is made of Love and Peace! Love and Peace!""The world is made of Love and Peace! Love and Peace!"
Posted by: Josh Scholar | January 17, 2004 11:35 AM
oops didn't mean to double the caption
Posted by: Josh Scholar | January 17, 2004 11:35 AM
Hey, I didn't expect a pic of Vash. Yes, love and peace with a big gun, ha! Anyway, as to use of robots.txt and metatags, they do work with Google which is the biggie. I do remember Excite not recognizing metatags but haven't checked the other search engines. Why bother with Google bars in my browsers. If you are still set up to ping sites like weblogs.com, blogrolling, blo.gs, they will still work and N.Z. Bear and Technorati will still spider you if you signed up for them too. What you will find is Google spidering your index with no description, any and all trackbacks from other blogs listing you, any comments you make elsewhere, webrings, stuff like that. I did both but did finally edit it to allow Google some spidering. best is to do it all.
Posted by: mog | January 17, 2004 01:54 PM
Michele, the robots.txt thing works very well. I've used it to stop people from grabbing images via google images, etc.
Posted by: StrobeAlific | January 17, 2004 10:49 PM