Is fail2ban working?

Vipul.K · December 4, 2024, 7:50am

SYSTEM INFORMATION
OS type and version	Ubuntu 20.04
Virtualmin version	7.10

My website is getting hammered by ClaudeBot and I’ve been trying to defend myself with fail2ban but its rather failing2ban right now. This is the configuration I have for bad bots

Now when I grab an IP address from the access logs and search the fail2ban logs, I see something like this:

2024-12-04 13:11:01,449 fail2ban.filter         [821]: INFO    [apache-badbots] Found 3.137.178.122 - 2024-12-04 13:11:01
2024-12-04 13:11:01,449 fail2ban.filter         [821]: INFO    [apache-badbots] Found 3.137.178.122 - 2024-12-04 13:11:01
2024-12-04 13:11:01,664 fail2ban.actions        [821]: NOTICE  [apache-badbots] Ban 3.137.178.122
2024-12-04 13:11:02,160 fail2ban.filter         [821]: INFO    [apache-badbots] Found 3.137.178.122 - 2024-12-04 13:11:02
2024-12-04 13:11:02,177 fail2ban.filter         [821]: INFO    [apache-badbots] Found 3.137.178.122 - 2024-12-04 13:11:02

I notice 2 issues here. First, the IP address was found twice before getting banned. I believe I configured badbots to trigger on the 1st match. Secondly, there are more “Found” logs after the ban. Should this happen? Does that mean the IP is still able to access the server or am I reading the logs wrong?

I can confirm in the Jails list of banned IPs this one is present in the badbots list.

PS - Its not an IP specific issue, I’m just using this as an example. The same pattern is observed for all IPs. Please help.

Joe · December 4, 2024, 7:55am

fail2ban is the wrong tool for Claude. Just add it to robots.txt and tell it to bug off.

Here’s the first article I found, that seems to cover it well: Block AI Bots from Crawling Websites Using Robots.txt

Joe · December 4, 2024, 7:57am

But, yeah, it does seem to be failing to work. There have been several discussions about fail2ban not actually updating the firewall in a way that works. I don’t remember if that was about Ubuntu 20.04 or some other distro…but, there was a problem in the package, maybe, where it wasn’t doing the right thing for ipset-based firewall rules, or something. Vague recollections, here, search the forum, I know it’s been discussed and resolved several times.

Stegan · December 4, 2024, 9:13am

I do wonder just how effective robots.txt actually is these days

calport · December 4, 2024, 9:19am

In this, Fail2ban works on my install. It is mail rate limits that does not work. Just FYI. Don’t want to hijack the thread.

Ilia · December 4, 2024, 12:18pm

I wonder why you’d want to ‘defend yourself’ from AI crawlers. Blocking AI bots can be a double-edged sword for businesses.

Vipul.K · December 4, 2024, 12:36pm

Not all bots are good. ClaudeBot for instance is pinging so much its almost like a Ddos attack at this point. It has to be blocked.

@Joe I hardly think this bot will respect robots.txt but still, I’ve added the directive and will wait 24hrs to see if it helps.

Ilia · December 4, 2024, 12:43pm

I think Anthropic is a serious company, and would respect that!

ID10T · December 4, 2024, 1:29pm

Boot order is an issue with fail2ban and firewalld. Restart firewalld and then fail2ban to update it. I had to change a setting in firewalld to keep the fail2ban if it restarted for some reason.

And it seems to have been reset. I’ll have to figure out where to change it so that doesn’t happen again. Sigh…

# FlushAllOnReload
# Flush all runtime rules on a reload. In previous releases some runtime
# configuration was retained during a reload, namely; interface to zone
# assignment, and direct rules. This was confusing to users. To get the old
# behavior set this to "no".
# Default: yes
FlushAllOnReload=yes

Joe · December 4, 2024, 4:20pm

All the major AI bots respect robots.txt, but there certainly are smaller companies that are shady and will steal content without regard for robots.txt. (They’re all stealing content, but at least some of them will stop stealing if you ask nicely. Though just about every website has been ingested into most of the copyright laundering machines by now.)

Joe · December 4, 2024, 4:21pm

I can think of a lot of reasons someone wouldn’t want their content pulled into the automatic copyright infringement machines.

verne · December 4, 2024, 10:25pm

I have not checked since I wrote a custom fail2ban rule to block it, but at some time in the past Amazonbot ignored robots.txt

Randomz · December 5, 2024, 12:46am

Amazonbots have gone crazy here, I block them with F2B on one visit. There are so many, I am getting tempted to just block them as /16 class IPs.

Vipul.K · December 5, 2024, 3:18am

I’m wondering how to do that myself in Virtualmin. F2B does support it but not sure how to do that in Virtualmin since the command has a placeholder for the ip <ip>. Do share if you know how to do it.

Vipul.K · December 5, 2024, 3:26am

Thanks Joe. Robots.txt file worked thankfully.

I would still like to have a way to block bad bots and figure out why fail2ban is not working properly.

PS - Here’s an interesting article on Claude: Anthropic’s crawler is ignoring websites’ anti-AI scraping policies - The Verge

Tactikast · December 9, 2024, 7:55pm

Nothing good to expect from such Company. Even more if there is the Shadow of google and amazon behind (apparently they invested in it).

They only want 1 thing: Make money to ruin others.

system · December 17, 2024, 7:56pm

This topic was automatically closed 8 days after the last reply. New replies are no longer allowed.