I am experiencing a fatal error when trying to run a backup. The process fails after 5 minutes, reporting that it cannot acquire a lock on a domain configuration file.
The error message is as follows:
Fatal Error! Backup failed : Failed to lock file /etc/webmin/virtual-server/domains/16028345518927 after 5 minutes. Last error was : Locked by PID 2013324
The process list confirms that the locking process (PID 2013324) is /usr/share/webmin/virtual-server/collectinfo.pl. This script appears to be stuck, as it has been running for a long time and is consuming an abnormally high amount of CPU (97.6%).
Here is the process list for reference:
ID CPU Started Command 2013324 97.6 % 09/17/2025 07:56 AM /usr/share/webmin/virtual-server/collectinfo.pl 1 2.4 % 09/15/2025 /sbin/init 2015096 0.9 % 09/17/2025 07:58 AM /usr/share/webmin/virtual-server/backup.cgi
While I can resolve the immediate issue by killing the stuck process, I am concerned about the root cause.
Could you please advise on why the collectinfo.pl script would hang like this and what I can do to prevent it from happening again?
It’s expected that concurrent backups won’t run. Since the initial backup is already in progress, the next one can only be started once the first one completes.
It’s not a good idea to force-stop your regular backup processes. Instead, try bumping up the time between scheduled backups so they have enough time to finish up naturally.
Also, check out the “System Settings ⇾ Virtualmin Configuration: Backup and restore” page for some extra backup-related options!
I would like to clarify that the issue was not caused by concurrent backups. At the time of the failure, there were no other backup processes running. The conflict was between the single backup process (backup.cgi) and the diagnostic script (collectinfo.pl).
To confirm, I only have two backups scheduled in total, and they are set for completely different times, so they do not overlap. I had also rebooted the server before this occurred.
My main concern is to understand why the collectinfo.pl script would hang and cause such high CPU load in the first place, as this seems to be the root cause of the problem.
Any insights on that specific script’s behavior would be greatly appreciated.
If you go to the “WP Workbench Manager ⇾ Configuration” page and set the “Update instances cache in background” option to “No,” does it affect how your backup finishes?
I have checked my system, and I do not have the “WP Workbench Manager” module installed. Therefore, this feature is not active on my server, and I cannot change the suggested setting.
I have managed to capture a strace output while the collectinfo.pl script was running.
A specific pattern of failing system calls repeats constantly and seems to be the cause of the performance issue. The script is repeatedly calling readlink on standard directories and failing with an EINVAL error, because they are not symbolic links.
My hypothesis is that the script is not correctly handling cases where user home directories are standard directories instead of symbolic links. This appears to cause an inefficient loop, leading to the high CPU usage and the process hanging long enough to block the backup process.
Finally, Each action I perform is blocked by the collectinfo.pl script while it is running in the background.
As soon as the collection process starts, any other operation (such as launching a manual backup with backup.cgi) is delayed or frozen until collectinfo.pl has finished.