如果你的服务器性能足够好,无所谓,不要去屏蔽垃圾蜘蛛的。但是,如果你的带宽≤3M,建议还是要屏蔽一些垃圾蜘蛛的。因为它们不会给你带来什么流量,反而会消耗你的资源,我实际测试基本上会消耗600K-1M左右的带宽。
因为是流氓、垃圾,所以它们是不会遵循robots协定的,基本上对它们无效。我是用在.htaccess文件里加以限制的,很管用。
我的系统是:CentOS7.5 我装Apached的,用习惯了,我没装Nginx。
.htaccess代码如下:
<IfModule mod_rewrite.c>
Options +FollowSymlinks -Multiviews
RewriteEngine On
#屏蔽垃圾蜘蛛,你自己想屏蔽的加在“|”之间;
SetEnvIfNoCase ^User-Agent$ .*(SemrushBot|SemrushBot-SA|Bytespider|BLEXBot|CompSpyBot|Exabot|ZoominfoBot|ExtLinksBot|AlphaBot|DotBot|MauiBot|MegaIndex.ru|SiteExplorer|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|bingbot|TurnitinBot-Agent|AhrefsBot|YisouSpider|mail.RU|perl|Python|Wget|Xenu|ZmEu) BADBOT
Order Allow,Deny
Allow from all
Deny from env=BADBOT
# 正常规则
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# RewriteRule ^(.*)$ index.php/$1 [QSA,PT,L]
RewriteRule ^(.*)$ index.php [L,E=PATH_INFO:$1]
</IfModule>