Go Back   Defender Hosting Forums > PowerVPS Virtual Private Servers > Linux VPS - Security and Tuning

Linux VPS - Security and Tuning Security and Tuning Discussion for Linux Virtual Private Servers based on Virtuozzo by SWsoft

Reply
 
Thread Tools Display Modes

  #1  
Old 08-25-2005, 04:17 PM
Jad
Guest
 
Posts: n/a
Default robot.txt

Hi,
I have some forums being harvested by some search engine bots
whats the best robot.txt configuration to block ALL of search engine on that forums Only ?

Thanks in advance.
Reply With Quote

  #2  
Old 08-25-2005, 04:31 PM
BornOnline BornOnline is offline
Senior Member
 
Join Date: Feb 2005
Location: Earth
Posts: 173
BornOnline is on a distinguished road
Default

Depending on the search engine, it may not even look at your robots.txt.
Validator

User-agent: *
Disallow: /forum/

The following allows all robots to visit all files because the wildcard "*" specifies all robots.

User-agent: *
Disallow:

This one keeps all robots out.

User-agent: *
Disallow: /

The next one bars all robots from the cgi-bin and images directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

This one bans BadSearch from all files on the server:

User-agent: BadSearch
Disallow: /

This one bans keeps googlebot from getting at the whatever.htm file:

User-agent: googlebot
Disallow: whatever.htm
Reply With Quote

  #3  
Old 08-25-2005, 04:44 PM
Fred Fred is offline
Senior Member
 
Join Date: Jun 2005
Posts: 601
Fred is on a distinguished road
Default

You can't block all bots... some of them doesn't respect the standard robots.txt ...
If you have too many bots that doesn't respect the robots.txt, you can use .htaccess to block them... it's pretty easy.

Something like this:

Code:
SetEnvIfNoCase User-Agent "^The_super_bot" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
Reply With Quote

  #4  
Old 08-25-2005, 04:55 PM
Jad
Guest
 
Posts: n/a
Default

Thank you, I'll try it and report
Reply With Quote

  #5  
Old 08-25-2005, 04:57 PM
Jad
Guest
 
Posts: n/a
Default

hey is there anyway to test if .haccess method works fine instead of waiting for the next crawel ?
Reply With Quote

  #6  
Old 08-25-2005, 05:00 PM
Fred Fred is offline
Senior Member
 
Join Date: Jun 2005
Posts: 601
Fred is on a distinguished road
Default

well... probably by using a browser where you can change your user-agent ?
If you use firefox... there's an extension that can do it i think...
Reply With Quote

  #7  
Old 08-25-2005, 05:04 PM
Jad
Guest
 
Posts: n/a
Default

I'm watching them crawling me now heh

tcpdump -i venet0 port 80
Reply With Quote

  #8  
Old 08-25-2005, 05:37 PM
Jad
Guest
 
Posts: n/a
Default

I'm not sure if i'm being slashdotted or not
but see this
17930 23.03% Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com
2744 3.52% Googlebot/2.1 (+http://www.google.com/bot.html)
Reply With Quote

  #9  
Old 08-25-2005, 07:42 PM
Fred Fred is offline
Senior Member
 
Join Date: Jun 2005
Posts: 601
Fred is on a distinguished road
Default

i think that if you were slashdotted, you will notice far more than bots

Check the raw logs for a better investigation ...
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump



All times are GMT -4. The time now is 11:11 PM.


vBulletin skin developed by: eXtremepixels
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Copyright Defender Technologies Group, LLC 2006