Cannot access VPS using SSH and no http, Webmin panel, mail, etc after months of regular use

cat /var/log/messages | grep soft today report

Jan 24 06:13:50 host kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 41s! [ksoftirqd/2:31]
Jan 24 06:13:50 host kernel: CPU: 2 PID: 31 Comm: ksoftirqd/2 Kdump: loaded Tainted: G             L    -------  ---  5.14.0-362.13.1.el9_3.x86_64 #1
Jan 24 06:13:50 host kernel: ? __do_softirq+0x16a/0x2ac
Jan 24 06:13:50 host kernel: __do_softirq+0xca/0x2ac
Jan 24 06:13:50 host kernel: run_ksoftirqd+0x1e/0x30
Jan 24 06:14:19 host kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 46s! [/usr/libexec/we:505475]
Jan 24 07:09:23 host kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 46s! [httpd:406365]
Jan 24 07:09:23 host kernel: ? __do_softirq+0x16a/0x2ac
Jan 24 07:16:47 host kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [systemd:1]
Jan 24 07:16:47 host kernel: RIP: 0010:__do_softirq+0x78/0x2ac
Jan 24 07:16:47 host kernel: ? __do_softirq+0x78/0x2ac
Jan 24 07:16:47 host kernel: ? __do_softirq+0x60/0x2ac
Jan 24 08:27:42 host kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 53s! [khugepaged:53]
Jan 24 08:27:42 host kernel: RIP: 0010:__do_softirq+0x78/0x2ac
Jan 24 08:27:42 host kernel: ? __do_softirq+0x78/0x2ac
Jan 24 08:27:42 host kernel: ? __do_softirq+0x60/0x2ac
Jan 24 11:40:09 host kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [php-fpm:532054]
Jan 24 11:40:09 host kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 45s! [spamassassin:559650]
Jan 24 11:40:09 host kernel: __do_softirq+0xca/0x2ac
Jan 24 11:53:23 host kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 49s! [monitor.pl:560107]
Jan 24 11:53:23 host kernel: __do_softirq+0xca/0x2ac
Jan 24 13:19:15 host kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 50s! [sh:572058]
Jan 24 13:20:05 host kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 59s! [migration/2:30]
Jan 24 13:20:05 host kernel: __do_softirq+0xca/0x2ac

These happen when Load is high.
https://www.suse.com/support/kb/doc/?id=000018705

Maybe fire up another instance with more memory, even if it is virtual, and migrate? I’m not sure how painful Contabo makes that.

I opened a Contabo ticket for help. I’m with them for more than 5 years.
We’ll see.
I have to leave my city until Feb 4 so until then I wouldn’t share more data.
Thank you anyway for your help.

1 Like

Well, finally it seems it’s s fixed. From Jan 24 I didn’t have more incidents.
Contabo changed my VNC server.
Thank you

2 Likes

I had a machine with lots of websites go high usage once. One of our more important machines. After reboot it seemed OK but I kept checking the Nagios reports. Load level was markedly higher than it had been prior but not high enough to trip the alerts. Given the age and importance of the machine, we just replaced it.

Would it be nice to know the root cause? Sure. Is it nicer to just have it working properly? Yep. :wink:

1 Like

This topic was automatically closed 8 days after the last reply. New replies are no longer allowed.