Block a bad bot by user agent

SYSTEM INFORMATION
OS type and version Ubuntu 20.05
Virtualmin version 7.10

I have been struggling with a relentless bot that keeps making thousands of requests on one of my domains. Here’s a sample from access logs

47.128.33.156 - - [21/Aug/2024:11:44:28 +0530] "GET /product_img/LADIES RING 410/thumb-4663.JPG HTTP/2.0" 206 566 "https://www.domain.com/product/52909/ladies-ring-410" "Mozilla/5.0 (compatible; Bytespider; spider-feedback@bytedance.com) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.0.0 Safari/537.36"

I tried to create a fail2ban rule to block them, it works but apparently they have an infinite supply of IPs and it didn’t slow them down much.

^<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:Bytespider).*?"$

What else can I do? Can I block them by user agent string? If it contains “Bytespider”, block it from the entire server (not just 1 domain)?

found this for ubuntu Block badbot with fail2ban via user agents in access.log - Ask Ubuntu this may help you out

Just edit Your /etc/apache2/sites-enabled/yourdomain.conf and add this at the end before: </VirtualHost>

    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.*(MJ12bot|AhrefsBot|SemrushBot|Barkrowler|Bytespider|PetalBot|AspiegelBot|DotBot|MauiBot).*$ [NC]
    RewriteRule .* – [F,L]
    </IfModule>

reload apache2:
apachectl graceful

@jimr1 I already did that as mentioned in my post. Problem is that fail2ban will block IP after the request has been processed and added to the logs. They have infinite IPs and I want them to be blocked from processing the first request too.

@toli Would it be possible to add this to a common file such that it works on all servers/domains? Like /etc/apach2/conf-enabled/security.conf?

In my example, you can choose which bots you want to block. You don’t have to apply this globally; only on selected domains where you’re experiencing issues. It’s generally not a good idea to block all bots everywhere. I don’t use a global configuration myself, so I can’t comment on how that would work.

Learn on how to setup fail2ban works for me fine

i use pfSense to block by IP and DNS. I use a large amount compiled lists.

Blocking by user-agent is a futile endeavour as bad bots will just use good bot agent strings and so on, there is no validation. i.e. no honour among thieves :smile:

I appreciate this solution will not be an option for everyone.

Enlighten me sir. To my best knowledge, fail2ban reads logs so the request has been processed by apache and logged in the access logs before fail2ban can read it and take action.

If you set fail2ban with aggressive settings in the jail you can have 1 maybe 2 attempts (depends how you set it up) and just hit this condition with a long ban time or you could have some more relaxed settings and then send the repeat offender to the recidive jail which could have a long ban time
you could also set (jail.conf) and set this section of fail2ban on

I find that those abusers that send from multiple IPs or stagger times usually come from an IP range or several ranges that are recognizable in your logs. When I see that it irritates me (I say irritate because they are already being denied access by other methods) so first I do a whois lookup. If they are from a country that I have no need of communication with my server I take as large a range of their IPs as I feel like or If I choose I only take a range that covers their current activity on my server.

You can add permanent “drop” or “reject” rules with IP ranges. I use “drop”

SSH to your server >

sudo firewall-cmd --permanent --zone=public --add-rich-rule="rule family='ipv4' source address='213.219.247.0/24' drop"

follow with

sudo firewall-cmd --reload

It will add a permanent rule to your config file in etc/firewallid/zones/public.xml.
You can remove any rule with:

sudo firewall-cmd --permanent --zone=public --remove-rich-rule="rule family='ipv4' source address='213.219.247.0/24' drop"

then

sudo firewall-cmd --reload

Or go to > select and delete them.

I just restart the server after adding a rule or two so FirewallID and Fail2Ban load in the order they want without issue and Fail2Ban reestablishes its bans.

I don’t get nearly as much noise in my logs as I used to after having added quite a few of these rules.

@jimr1 I’m already doing that. badbots filter blocks for 48hr after a single hit. There is also an exponential increasing ban time for repeat offends.

@popmay I would probably do the same and have done this in the past but these IPs are from AWS. I hesitated to block them lest I disrupt a genuine service from communicating with my servers. But it seems I don’t have a choice, atleast temporarily I’ll have to block the entire subnet.

so what is the problem ? as fail2ban passes the IP to the tool you use (iptables, firewalld, hosts.deny etc) That IP address will be rejected before it gets to the content it was trying to reach (the IP is then rejected before any service sees it) OK you may end up with loads of rules in which ever tool you use to reject the IP.
If you see IP’s returning before the ban has finished (maybe fail2ban is reporting the “already banned” message) there is a problem with how you have set things up

That’s irritating. MXToolbox has a subnet calculator that gives CIDR in smaller ranges if you want to go to the trouble.

@jimr1 Seems like badbots filter should help given some time without much trouble.

@toli has a pretty good approach.

I’m trying to keep everything within the webmin/virtualmin gui rather than off roading (adding stuff to config files that webmin/virtualmin maybe unaware of)

1 Like