If it's an honest crawler, then something was missing from robots.txt. If it's dishonest, then Limit or RedirectMatch can be used by Apache to stop it, or even IP Tables.
Assuming we're still on about the yandex robot - it is a well behaved robot.
If you just want robot control for your forum area then
Code:
<META NAME="ROBOTS" CONTENT="NONE">
in html
Code:
<META NAME="ROBOTS" CONTENT="NONE" />
in xhtml
will stop all 'behaved' robots.
The googlebot has it's own command - so you can allow the google bot only on ....
Code:
<META NAME="GOOGLEBOT" CONTENT="INDEX, FOLLOW">
Again, changing > to /> in xhtml.
In your headers for the forum area. As most forum areas use dynamic pages, you should have a module that generates your headers for you in your code somewhere.
As an example, my meta-tags look like below & are called by each page as the FIRST thing it does (It also looks after handling the mime-type, char-set etc. for me... ask if you need the full details)
Code:
***php header***
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title><?php echo $title ?></title>
<meta http-equiv="Content-Type" content="<?php echo $mime ?>;charset=<?php echo $charset ?>" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" media="screen" type="text/css" href="./css/mgj.css" />
<link rel="stylesheet" media="print" type="text/css" href="./css/mgj_print.css" />
<meta name="description" content="hire plant parts spares engineering services"/>
<meta name="keywords" content="keys, locks, engine, parts, hire, plant, services, engineering,
consumables,"/>
<meta name="copyright" content="M.G. Judd Ltd., 2009. All rights Reserved."/>
<meta name="no-email-collection" content="http://www.unspam.com/noemailcollection" />
<meta name="ROBOTS" content="ALL" />
***php footer***
Phill.
Bookmarks