Slow login becomes time out when using totp mfa

Hi!

I have tried to enroll a user for 2fa, the enrollment worked but logging in afterwards did not.
The login was originally very slow, but succeeded after about 12 seconds.
With 2fa enabled, it loads forever and then returns to the login page.

The log I get from webmin is successfull login and I can’t find any errors really either for logging in.

Tracking the network traffic on a successfull login without 2fa sets two requests apart from the others:
Loading / has 1.25s of wait (1250ms)
Loading /xhr.cgi?xhr-info=1 has 11.93s of wait

Wether this is related or not, no idea.

Any help would be highly appreciated :slight_smile:

SYSTEM INFORMATION
OS type and version Ubuntu 24.04
Webmin version 2.641
Virtualmin version 8.1.0 Professional
Nginx version 1.24.0

Off topic, quick tip, you can easy copy paste your System Info from here.

What error message do you get when trying to login with 2fa? Also, does anything useful get logged to /var/log/secure or /var/log/auth.log (or their equivalent systemd logs) ?

I get no error message in the browser, it only returns to the lgoin page.

I tried now in a different browser, the earlier attempt was made from firefox, and this time in chrome. The load times was significantly shorter in chrome, but the result was the same.

Log from /var/log/auth.log:

2026-05-19T08:27:01.046255+02:00 web01 perl[3926278]: pam_unix(webmin:session): session opened for user username(uid=1017) by (uid=0)
2026-05-19T08:27:01.063927+02:00 web01 systemd-logind[750]: New session c461 of user username.
2026-05-19T08:27:01.081798+02:00 web01 (systemd): pam_unix(systemd-user:session): session opened for user username(uid=1017) by username(uid=0)
2026-05-19T08:27:01.300577+02:00 web01 systemd-logind[750]: Session c461 logged out. Waiting for processes to exit.
2026-05-19T08:27:01.303457+02:00 web01 systemd-logind[750]: Removed session c461.
2026-05-19T08:27:01.858778+02:00 web01 saslauthd[755]: DEBUG: auth_pam: pam_authenticate failed: Authentication failure
2026-05-19T08:27:01.858914+02:00 web01 saslauthd[755]:                 : auth failure: [user=PDMREMI] [service=smtp] [realm=] [mech=pam] [reason=PAM auth error]
2026-05-19T08:27:03.164621+02:00 web01 perl[3926299]: pam_unix(webmin:session): session opened for user username(uid=1017) by (uid=0)
2026-05-19T08:27:03.170281+02:00 web01 systemd-logind[750]: New session c462 of user username.
2026-05-19T08:27:03.404460+02:00 web01 webmin[3926299]: Successful login as username from 10.201.32.4
2026-05-19T08:27:03.430311+02:00 web01 systemd-logind[750]: Session c462 logged out. Waiting for processes to exit.
2026-05-19T08:27:03.431263+02:00 web01 systemd-logind[750]: Removed session c462.

Going back in the logs returns the same messages from the attempts made in firefox.

Incorrect time could maybe cause that? I’ve had similar before when the server time wasn’t set correctly with NTP and the time was off by a few seconds causing the code to be invalid.

Timesync should not be impacting, the offset is less than 1 second from public ntp servers.
Current offset is less than 1 ms.
Delay about 20-30 ms and jitter <10 ms.

The fact that login already takes ~12 seconds even before enabling TOTP makes me think the MFA itself may not be the root problem, but more like exposing an existing delay somewhere in authentication/session handling.

Since /xhr.cgi?xhr-info=1 is taking unusually long, I’d probably also check:

  • DNS/reverse DNS resolution delays
  • PAM authentication modules
  • entropy/randomness availability for TOTP generation
  • fail2ban or auth hooks
  • external auth integrations if any are enabled

Authentication systems can get surprisingly sensitive to small delays once MFA is added on top, especially when multiple validation steps depend on timing/session state.

So after going about this for an evening…
And ended up with four parts:

  • Fixed reverse dns
  • Missing perl plugin
  • xhr-info taking 11+ seconds (blocking TOTP login)
  • TOTP login redirecting to :10000 after success

Reverse DNS
Was just that, simple enough.

xhr-info taking 11+ seconds (blocking TOTP login)

This was out of my depth but had a helping hand.

Problem: xhr-info taking 11+ seconds (blocking TOTP login)

Root cause chain:

Every xhr-info request (which the browser sends repeatedly during and after login) called theme_list_combined_system_info(). Because post_has('xhr-info') checks %in which includes GET query parameters, nocache was always true for xhr-info requests, bypassing the theme cache entirely. This forced a full system info collection on every single request.

That collection path hit two separate slow operations:

Slow call 1: virtual_server::collect_system_info called get_startstop_links() unconditionally, which ran systemctl show for every service on the system. Each systemctl show call took ~0.9 seconds, and there were multiple invocations, adding up to ~6 seconds total. This happened because authentic-lib.pl called virtual_server::collect_system_info('manual') without checking whether the virtualmin collected cache was still fresh.

Fix: Patched authentic-lib.pl to check the age of the virtualmin collected file before triggering a full virtual_server::collect_system_info. If the file is newer than collect_interval minutes, skip the recollection and use cached data.

Also patched virtual-server/collect-lib.pl to use ||= instead of = when populating startstop, so it won’t overwrite an already-populated value from the cache.

Also disabled refresh_startstop_status() in xhr-lib.pl — this was a redundant live systemd query called before theme_list_combined_system_info on every xhr-info request.

Slow call 2: get_cpu_io_usage calls select(undef, undef, undef, 0.5) for a 0.5-second sleep between two CPU sampling reads. In miniserv’s forked child process, Perl’s select() with no file descriptor sets interacts with miniserv’s master event loop — the master polls via pselect6([5 6 10], timeout=2s) and won’t notice the child has finished until the next 2-second poll cycle. With multiple concurrent requests each triggering this sleep, the master accumulated multiple 2-second poll misses, resulting in ~10 seconds of effective serialization delay even though the actual sleep was only 0.5 seconds.

Workaround: Disabled get_cpu_io_usage collection in system-status-lib.pl (if (0 && defined(...))) — CPU I/O stats are cosmetic and 2fa seems more important to have working. Also disabled collect_pkgs to prevent apt update checks.

Supporting fixes:

  • Disabled the boot-time webmincron cron (scheduled_collect_system_info) which was acquiring the in-memory lock at startup and never releasing it, causing all subsequent collection attempts to hit “Already running”
  • Disabled the 5-minute interval webmincron cron for the same reason
  • Set collect_interval=5 and collect_pkgs=0 in /etc/webmin/system-status/config
  • Installed Net::WebSocket::Server (via CPAN + libprotocol-websocket-perl via apt) so stats.pl can start when needed

TOTP login redirecting to :10000 after success

This could only be captured after a successfull login, webmin sent back a redirect to fqdn and port 10000 which was not intercepted by the nginx config for virtualmin.
Adding this to the config resolved it:

proxy_redirect https://[fqdn]:10000/ https://[fqdn]/;

Logins are now sub second or close enough.

Files modified

File Change
/usr/share/webmin/authentic-theme/authentic-lib.pl Skip virtualmin collect_system_info if collected file is fresh
/usr/share/webmin/authentic-theme/xhr-lib.pl Disabled refresh_startstop_status() call
/usr/share/webmin/virtual-server/collect-lib.pl Use ||= for startstop to preserve cached value
/usr/share/webmin/system-status/system-status-lib.pl Disabled get_cpu_io_usage block
/etc/webmin/system-status/config collect_interval=5, collect_pkgs=0
/etc/webmin/webmincron/crons/1776370355146080.cron Renamed to .off (interval cron disabled)
/etc/webmin/webmincron/crons/1776370355146081.cron Removed boot=1 (boot cron disabled)
/etc/nginx/sites-available/virtualmin-control Added proxy_redirect for [fqdn]:10000

Would very much like a review of these changes from some one more knowledgable of virtualmin than me. Haven’t even been working with this system for a month yet.

Also, would this be considered the solution? I’ll wait a day or two for some response before selecting this as a soltuon if nothing else pops up here.

That seems like an intermediate-level problem, not something caused by Webmin itself.

If you try to log in using IP:10000, does it work normally and quickly?

Using login with curl on the server directly towards port 10000 always gave the ~11 seconds.
When I got this down to less than 2 seconds after restarting webmin and trying with a new session to validate, did I enable 2fa, and this got me to the part where nginx did not capture the redirect sent back so the browser wanted to go to [fqdn]:10000 after a successfull login from a web browser.

I’m sorry, it just doesn’t make sense, as xhr-info never loader in unauthenticated session.

I think you have something misconfigured. By the way, there were changes in Webmin 2.641 in regards it being behind proxy.

Did you check the new “Webmin ⇾ Webmin Configuration: IP Access Control / Trusted proxy addresses” option?

I’m sorry, it just doesn’t make sense, as xhr-info never loader in unauthenticated session.

That seems correct. It only did the 11 seconds after a successfull authentication with curl.

This is the curl requests I tested with:

curl -sk -c /tmp/wm34.txt -o /dev/null https://[private_ip]:10000/session_login.cgi
curl -sk -b /tmp/wm34.txt -c /tmp/wm34.txt -D /tmp/wm34-hdr.txt   -d "user=[username]&pass=[password]"   -H "Referer: https://[private_ip]:10000/session_login.cgi"   https://[private_ip]:10000/session_login.cgi
time curl -sk -b "sid=[sid string from wm34-hdr.txt]" "https://[private_ip]:10000/xhr.cgi?xhr-info=1" -o /dev/null

Return after fix:

real    0m1.636s
user    0m0.008s
sys     0m0.008s

Before fix real was between 9-12s.

Is totp another layer or secondary step?
Might be that some of these proceses ends up skewing the point of time when the totp is validated, and that all together made the totop provided invalid, and that what I have done is not the fix of the root cause?
(Cutting about 10s from login though seemed nice.)

Did you check the new “Webmin ⇾ Webmin Configuration: IP Access Control / Trusted proxy addresses” option?

No, not really. Tried to find something like it.
Would this allow me to configure nginx in a better way and also set webmin to not send a redirect to port 10000 after totp?

For this see “Webmin ⇾ Webmin Configuration: Web Server Options” page and “Internal redirect URL overrides” option.

I’ll need to look into this more. I’m not sure what’s happening yet.