Is anyone else seeing a large amount of traffic from Bytespider?
I recently began having this problem. In Awstats, the highest hit count in bots is identified only as "feed", but a deeper dive into the access logs brings up the highest number of requests from a user-agent containing "(compatible; Bytespider; spider-feedback@bytedance.com)"
Some numbers (monthly requests):
- Oct 29; 1,208,007
- Nov 4; 259,563
- Nov 5; 344,452
It doesn't seem that the bot name used by Awstats corresponds to the user-agent string. Trying to block by using "feed" and "feedbot" in robots.txt had no effect.
So far, today, it appears that, contrary to reports I've read, the bot is honoring robots.txt, where I now have Bytespider named specifically. I will keep monitoring.
Based on this thread: https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/ I have prepared the following for my .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bytedance|Bytespider [NC]
RewriteRule .* - [F,L]
This is a bit different from the example in that discussion thread,
RewriteCond %{HTTP_USER_AGENT} ^.(Bytedance|Bytespider).$ [NC]
since the PCRE there is not correct. At least, that's what I think. It's been years since I wrote Perl. AFAIK, I don't need the start/end anchors or grouping, and the dots in the posting from the link match only a single character.
I wanted to put this up here, in case anyone has further suggestion or knowledge regarding this particular bot, and how best to block it. So far, in November, it's used over 9,500 IP addresses, so blocking it that way isn't possible. My hosting provider - Fastcomet - has suggested enabling Cloudflare CDN, to use their filtering capabilities. I'd rather avoid that complexity, for a site that typically experiences low traffic.
Also, I hope this helps anyone else having the same problem. TIA for any additional suggestions.