Sheduled backup timeout

SYSTEM INFORMATION
OS type and version DEBIAN 11
Webmin version 2.501
Virtualmin version 7.40.0

Hello,

I am experiencing a fatal error when trying to run a backup. The process fails after 5 minutes, reporting that it cannot acquire a lock on a domain configuration file.

The error message is as follows:

Fatal Error! Backup failed : Failed to lock file /etc/webmin/virtual-server/domains/16028345518927 after 5 minutes. Last error was : Locked by PID 2013324

The process list confirms that the locking process (PID 2013324) is /usr/share/webmin/virtual-server/collectinfo.pl. This script appears to be stuck, as it has been running for a long time and is consuming an abnormally high amount of CPU (97.6%).

Here is the process list for reference:

ID CPU Started Command 2013324 97.6 % 09/17/2025 07:56 AM /usr/share/webmin/virtual-server/collectinfo.pl 1 2.4 % 09/15/2025 /sbin/init 2015096 0.9 % 09/17/2025 07:58 AM /usr/share/webmin/virtual-server/backup.cgi

While I can resolve the immediate issue by killing the stuck process, I am concerned about the root cause.

Could you please advise on why the collectinfo.pl script would hang like this and what I can do to prevent it from happening again?

Thank you for your assistance.

Hello,

It’s expected that concurrent backups won’t run. Since the initial backup is already in progress, the next one can only be started once the first one completes.

It’s not a good idea to force-stop your regular backup processes. Instead, try bumping up the time between scheduled backups so they have enough time to finish up naturally.

Also, check out the “System Settings ⇾ Virtualmin Configuration: Backup and restore” page for some extra backup-related options!

Hello,

Thank you for your reply.

I would like to clarify that the issue was not caused by concurrent backups. At the time of the failure, there were no other backup processes running. The conflict was between the single backup process (backup.cgi) and the diagnostic script (collectinfo.pl).

To confirm, I only have two backups scheduled in total, and they are set for completely different times, so they do not overlap. I had also rebooted the server before this occurred.

My main concern is to understand why the collectinfo.pl script would hang and cause such high CPU load in the first place, as this seems to be the root cause of the problem.

Any insights on that specific script’s behavior would be greatly appreciated.

Thank you.

If you go to the “WP Workbench Manager ⇾ Configuration” page and set the “Update instances cache in background” option to “No,” does it affect how your backup finishes?

Additionally, could you provide the strace -p <PID> output when it hangs? It would give more insight about why it’s hanging.

So, you had collectinfo.pl running in the background and then you also ran a backup manually using the backup.cgi page?

What is the average time for collectinfo.pl to run on your server?

Hello Ilia,

I have checked my system, and I do not have the “WP Workbench Manager” module installed. Therefore, this feature is not active on my server, and I cannot change the suggested setting.

I have managed to capture a strace output while the collectinfo.pl script was running.

A specific pattern of failing system calls repeats constantly and seems to be the cause of the performance issue. The script is repeatedly calling readlink on standard directories and failing with an EINVAL error, because they are not symbolic links.

Here is a snippet from the strace log:

readlink("/home", 0x7ffd698666d0, 4095) = -1 EINVAL (Invalid argument) readlink("/home/client", 0x7ffd698666d0, 4095) = -1 EINVAL (Invalid argument) readlink("/home/client/public_html", 0x7ffd698666d0, 4095) = -1 EINVAL (Invalid argument) readlink("/home", 0x7ffd698666d0, 4095) = -1 EINVAL (Invalid argument) readlink("/home/client", 0x7ffd698666d0, 4095) = -1 EINVAL (Invalid argument)

My hypothesis is that the script is not correctly handling cases where user home directories are standard directories instead of symbolic links. This appears to cause an inefficient loop, leading to the high CPU usage and the process hanging long enough to block the backup process.

Is this a known issue?

Thank you for your help.

Hello,

Finally, Each action I perform is blocked by the collectinfo.pl script while it is running in the background.
As soon as the collection process starts, any other operation (such as launching a manual backup with backup.cgi) is delayed or frozen until collectinfo.pl has finished.

This is very unusual. Do you run Virtualmin on a low-spec machine?

Also, what is the output of:

webmin -v

This is running on a VM with 8 vCPU and 8 GB RAM, so not a low-spec machine.

The output of webmin -v is:

# webmin -v
2.510