Too many session files based on the traffic

SYSTEM INFORMATION
OS type and version Ubunutu 20.04
Virtualmin version 7.3-1

One of my domains was taking up too much disk space so it caught my eye. Turns out there are millions of session files taking around 10Gb of space. There isn’t much traffic on this site based on Google analytics but a considerable amount of bandwidth used based on Virtualmin’s reporting. Here I have some stats for this website and my most active site for comparison.

Most active website on server
115Gb bandwidth
15K visits in last month (GA)
103K session files

Site in question
77Gb bandwidth
400 visits in last month (GA)
2.2M session files

Both cleanup the session files in 7 days. What could be going on here? Is there anything I can do?

Look at the access_log and error_log for the problem domain.

I know the domain. I’ve posted stats of that domain above (heading: Site in question)

We know that you know the domain. We can also see the stats of the domain that you have posted in your earlier message. Thank you very much for that info. From this it is clear that your most active website has 103K session files and the site in question has 2.2M session files. Both cleanup the session files in 7 days.

At the non-Jedi level of expertise that Joe possesses, the access_log and error_log for the problem domain would need to be looked at to formulate a prognosis for this strange issue with your server.

If you are reluctant to peruse logs, I suggest:

Yes, look at the access_log and error_log for that domain. That’s the first step for troubleshooting any website problem.

Don’t go all berserk on me. I may have misread the post. I thought Joe wants me to find out the domain which is receiving the traffic.

I have enabled AWstats on this server. We’ll know more in a few days. By just looking at the access log files, I have uncomfortably large number of requests from semrushbot.

@Vipul.K,

Based on your latest post regarding the “bot” you may have answered your own question. Bots typically hit up a website often either collect data (like googlebot – which is legit) or attack as is the case of a malicious bot.

I’d research the bot in question by conducting a Google search, and then verify if it seems legit the IP address or range it is coming from.

If malicious, or you simply don’t want to receive traffic from create a firewall rule blocking traffic from the IP address or range to prevent it from reaching your web server.

There are ways to monitor and automatically block known bots and malicious players, though this is beyond the scope of this thread.

*** Professional, Affordable, Trusted Technical Assistance – tpnAssist.com ***

@Vipul.K,

semrushbot likely refers to the following bot, which while doesn’t look malicious, may be sending undesired amount of time scanning your site.

You may wish to learn about setting up a robots.txt file for your site which can be used to tell legit bots not to index or spider your site.

Here’s a resource on the topic:

Hi Peter. How are you?

Yes the bot is legit. I’ll check through AWstats whether this is the majority reason behind the traffic. Its hard to tell just by looking at log entries. If so, its pretty easy to stop it with robots.txt file like you said.

AWstats is unlikely to provide any new insight. You already have overviews of your traffic.

The goal is to get specific: What requests are slow and keeping sessions open for a long time, and then figure out why they are slow and keeping sessions open for a long time. If that is a problem. Figuring that out starts in the logs.

So Semrush was pretty high on the list but Ahrefsbot has twice the hits and much more bandwidth usage. I’ve blocked both of them. I don’t know about the Firefox and Dalvik bot, couldn’t find anything on it.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.