Nginx Reboot problem (heads-up)

SYSTEM INFORMATION
OS type and version Ubuntu Linux 22.04.1 and Ubuntu Linux 20.04.5
Webmin version 2.001
Virtualmin version 7.3-1
Related packages SUGGESTED

Just performed a package update as prompted One of the installed packages requires a reboot to be fully applied. on 2 separate boxes.

In both cases after the reboot completed and the Dashboard redisplayed it came to my notice (because I checked) that the nginx webserver was not restarted. I have no idea if this was due to one of the packages updated or simply Virtualmin failing to restart/check that nginx was restarted.

In both cases restarting on the dashboard was all that was required. Pretty critical as no Nginx = no websites

Again! It has done it again.
FAILED to restart Nginx!
A 3rd box this one the Linode Ubuntu Linux 20.04.5 somewhere in those package updates and reboot process the Nginx webserver is not being restarted.

Virtualmin does not start services on boot. systemd does that.

Is the nginx service enabled?

systemctl status nginx

Yes have just checked all 3 boxes.

For info: They were all rebooted through the prompt by Virtualmin following the prompt to update packages (so Virtualmin must be able to determine that a reboot was required) I then go directly to the Virtualmin Dashboard (following its reboot). On the first box I was alerted by a user telling me the site was down! After that experience I checked the other 2 boxes following reboot. On the Dashboard under Server Status it showed (very helpfully) that nginx was down and all the others were running. I care very little about MySQL/MariaDb but that was up OK. Also clicking on the green arrow restarted nginx. So I assumed that Virtualmin would have attempted the restart of everything.

An assumption my mistake. I will now check after every update. I now wish I had taken a copy of all those updates as I have never experienced an update from root user apt update taking down nginx. Would it not be better if Virtualmin on performing the check, attempted to restart (after all - nginx or Apache (for those who use that) is probably the most important server running on their system?)

I appreciate the work that has gone into informing me withe a pretty graphic that it is not running and providing a quick easy graphic to click to restart but is it not a simple restart as part of that check? (one more for the long list of upgrades perhaps?

I didn’t ask about boxes. I want to see the output of systemctl status nginx

If it was enabled but failed to start on boot, it will fail to start when Virtualmin tries to start it. It would just put it into a restart loop that might mask errors.

You need to figure out why it didn’t start. Which starts with looking at systemctl status nginx. It may be that we’re not interpreting systemd status correctly (maybe it was running and we didn’t know it), or it may be that something disabled nginx, or it may be that some other dependency was slow to start and caused it to fail on start but starting it manually later was able to fix it. I don’t know.

But, process management is the job of systemd, not Virtualmin, and the systemd unit file may contain Restart directives (but doesn’t always do so, because it may not be recommended to do so). Those systemd unit files are provided by the package on your OS. Our job is to accurately reflect the state of the system, and to appropriately configure systemd to enable services on boot that we need; not to reimplement parts of systemd poorly to try to make things work that can’t work.

I am unaware of anyone else with this particular issue, so it needs to be root-caused. And the root cause needs to be sorted out. It’s probably specific to your system, since it hasn’t been reported by others.

Sorry Joe I misunderstood the question. Output as below. though not quite sure how useful this actually is seeing that there has been a reboot and a restart of nginx.

root@example:~# systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-12-01 09:28:04 UTC; 2 days ago
       Docs: man:nginx(8)
   Main PID: 15322 (nginx)
      Tasks: 2 (limit: 1131)
     Memory: 4.0M
     CGroup: /system.slice/nginx.service
             ├─15322 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             └─15323 nginx: worker process

Dec 01 09:28:04 example systemd[1]: nginx.service: Succeeded.
Dec 01 09:28:04 example systemd[1]: Stopped A high performance web server and a reverse proxy server.
Dec 01 09:28:04 example systemd[1]: Starting A high performance web server and a reverse proxy server...
Dec 01 09:28:04 example systemd[1]: Started A high performance web server and a reverse proxy server.

While I understate the infinite loop problem I also think it may have been better, with hindsight, to check systemctl status nginx before restarting nginx than after. Also important to note that starting nginx from inside Virtualmin using the button in Server Status did work without evident errors so an internal one-off run could have been done. But you are right, finding the reason why it stopped at all is more important than simply fixing it by restarting.

After perusing many questions here I can’t help wondering just how many folk use the LEMP build compared to the default LAMP build of Virtualmin along with users of NodeJS and other systems. I had not really considered that other packages such as MongoDb, NodeJS, etc that have been added to these boxes might have also been updated during reboot and possibly have compromised Virtualmin.

The fact that it has happened on 3 different boxes by two different providers DO and Linode) in 3 different locations (Canada/Toronto, UK/London, US/NY) is of concern.

That shows nginx is, in fact, enabled (i.e. configured to start on boot), which is what I wanted to be certain of.

It is much smaller. Apache has been around longer, does more, and the performance impact of switching is negligible for the vast majority of users.

But, it is not known to be any less reliable (though less testing is inevitable when there are far fewer users).

Virtualmin isn’t the thing that went wrong, though, right? nginx isn’t provided by us, it’s a package provided by your OS, we’re just managing it. Virtualmin didn’t have any trouble starting on boot, right? We’re not trying to troubleshoot a Virtualmin problem here, as far as I know
it’s an nginx issue, maybe something in the nginx systemd unit file (again, Virtualmin does not start nginx on boot
systemd does). Virtualmin does not replace your system components, we manage the ones provided by your OS; it’s a fundamental feature of our products, one of the main reasons Virtualmin exists.

But, it’s quite common for folks to install other application environments on a system with Virtualmin, including NodeJS and Mongo. The way nginx works should not allow those kinds of things to break it, though; one of the features of Apache that nginx is missing, by design, is the ability to run applications in the core of the webserver (e.g. no mod_cgi, no mod_php, no mod_perl, in nginx). Everything gets proxied to. This should mean that problems in NodeJS cannot possibly interfere with nginx.

Look in the journal for the boot where it failed. You can list boots that are in the journal with journalctl --list-boots and then the -b# flag to choose which one to look at. There will presumably be some clues about why nginx didn’t start.

1 Like

I have seen (not on a Virtualmin box though) that nginx won’t start when using bind as local resolver and bind is started after nginx. Might be worth a look, see if you find lines similar to “unable to resolve blah” in nginx error log.
Then it just dies on itself.

Edit: do note that I am not that well versed in systemd, so never figured out how to actually make nginx depend on bind before starting. I just start nginx manually whenever I reboot.

2 Likes

Problem confirmed.

Confirmed problem fixed by editing
/lib/systemd/system/nginx.service
so as
After=network.target nss-lookup.target
instead reads
After=network.target nss-lookup.target network-online.target

Is this a known issue? Is there a bug report upstream to the Ubuntu folks?

:person_shrugging: not from me (is it really an Ubuntu issue?) possibly more nginx issue.

I have to say that when I checked /lib/systemd/system/nginx.service on several boxes (not only the Virtualmin boxes that the line was After=network.target not nss-lookup.target or network-online.target

Although I marked this as a solution perhaps it needs more package updates/time to tell if it really solves

The systemd unit file in a package on Ubuntu is generally maintained by Ubuntu folks, not upstream nginx folks. (But, not always. Sometimes projects include one. Though I suspect they always get tweaked by the distro vendor.)

We won’t change the system nginx package. That’d be a problem for the Ubuntu folks, which is why I asked if it was a known issue upstream. We’re pretty religious about respecting users choices
you chose an OS, so you’re going to get that OSes packages and ways of doing things (i.e. we split vhosts into their own dir in Ubuntu/Debian, and we handle enabling modules in a way that is compatible with the Ubuntu/Debian way of doing it
we do things differently on other distros).

I assume there must be a reason they made the systemd unit file behave this way, perhaps there’s some discussion about it in their issue tracker or something. I only found StackOverflow discussions about it in a quick search, but it’s apparently been a problem for some folks for some time. It might be that there is something else wrong on your system that is unrelated to nginx that breaks this. I dunno.

We don’t modify the systemd unit for nginx as part of our installation. If you have differences, they’d be because of different Ubuntu versions or because of manual modifications.

So, I think we’re still kinda in the dark about why this happens. nss-lookup and network-online are not entirely unreasonable requirements for a web server to start, unless something is broken in either of those target statuses for some reason. But, also, I’m not sure I understand why they are necessary for the web server to start. The web server will generally function locally without network online or resolution
one might even choose to do so for development purposes.

3 Likes

Some additional relevant information.

Nginx is instructed to listen for IP4 and IP6 addressed in my config. In my logs nginx complained about an IP6 address not been available at the time of startup of nginx. So nginx was correct not to start up and the correct solution was to add a condition to ensure all used routable IP addresses are up, which is what I did for my solution.

So the solution is correct for my conditions I and I assume the conditions of others.

It can be legitimately argued that Ubuntu and other distributions are wrong.

I don’t see any easy solution to correcting distributions.

I think Virtualmin should reconsider their decision not to adopt the solution. The solution does no harm and omitting the solution does do harm.

There is relevant information from the authors of systemd at https://systemd.io/NETWORK_ONLINE/.

network.target indicates that the network management stack has been started. Ordering after it it has little meaning during start-up: whether any network interfaces are already configured when it is reached is not defined.

network-online.target is a target that actively waits until the network is “up”, where the definition of “up” is defined by the network management software. Usually it indicates a configured, routable IP address of some kind. Its primary purpose is to actively delay activation of services until the network has been set up.

1 Like

I have IP6 enabled but saw no complaints in the nginx logs. So not so convinced that IP6 is the issue here. Still of the opinion that nginx should have those extra additions in the nginx.service file by default (especially if they do no harm)

It looks like package maintainers, including Ubuntu, think they know better than Nginx on how to start nginx with systemd. It is hard not to laugh.

First, I will show a slight improvement in the edit required with Ubuntu.

After=network-online.target nss-lookup.target

is sufficient to fix the problem. That is just append ‘-online’ to ‘network’. All this does is tell nginx to wait until the network is up, whatever that is defined to mean (systemd words, not mine, in link https://systemd.io/NETWORK_ONLINE/).

Now here is the juicy bit. Guess what is in current nginx source code for Debian based distributions (including Ubuntu)? Yes! It is the ‘-online’ version!

After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

The link is http://hg.nginx.org/pkg-oss/file/stable-1.22/debian/nginx.service

Maybe the less I say about the absurdity of all of this the better!

You chose an OS. We respect that decision.

We’re pretty religious about this
if there’s something that makes it impossible to provide a good shared hosting experience, we may make changes, but changing a systemd unit file that comes with an OS-provided package is not something we’d do lightly (as it’s not something we can do easily or reliably without replacing the entire package, which we definitely aren’t doing).

1 Like

I should be clear that the corollary to “you chose an OS” is that if you don’t like the decisions made by your OS, you should consider other OSes. I don’t trust the Ubuntu folks to make good decisions, so I consistently choose Rocky, when I have a choice. That’s personal preference.

And, to go further, I don’t think this one is all that ridiculous of a decision on their part. All you need to do to avoid this problem is get your network configuration right.

Leaving aside policy issues, the implication here is that with a proper network configuration the Ubuntu systemd configuration for Nginx is correct, and that the suggested Nginx configuration for use of nginx with systemd is overkill designed to overcome problems.

Given what I have quoted from systemd, I don’t agree with this.

I believe the Ubuntu maintainers have misunderstood the difference between network.target and network-online.target and they are wrong to override the suggested configuration for Nginx. From our perspective, the first can be taken to mean ‘we are able to execute network startup commands’ and the second can be taken to mean ‘we have finished network start up commands’. Systemd lets the OS decide what starting and finishing means

So why are there not more complaints that Ubuntu has got it wrong? Ubuntu, as a server, is regarded as a beginner’s choice and a lot of beginners on servers use low cost and low core/thread VPSs with startup processes that hog a CPU during startup, preventing true parallel startup.

With real multicore parallel startup, telling a web server it can startup when core network functionality is not ready is wrong. No amount of ‘correct network configuration’ will fix this.

I wanted binary compatibility and wanted to avoid using virtual OS solutions with my desktop, so I choose Ubuntu as a server, having used Debian before. I now see this as a mistake. Ubuntu is great as a desktop and as far as I am concerned is a far better choice than the choices offered by Debian desktop, where restrictions on ‘non-free’ software use is just one among many annoyances. Not that Ububtu desktop is without annoying issues.

I have too many other problems with Ubuntu as a server Another example with Virtualmin is with fail2ban. I am going back to Debian for server use and will report if Debian does a better job with statup.

It’s possible I also misunderstand the implications of the change they’ve made. But, if they’re wrong, you’ll want to report it to them. I’m certainly willing to believe they’re wrong. Again, I don’t trust the Ubuntu devs judgement, in general
they’ve made too many weird and rushed decisions over the years for me to be comfortable with Ubuntu on a server, though I’m in the minority on this
Ubuntu is pretty much the most popular distro for everything these days.

I’ve given up on trying to convince folks to use something else. We just do the best we can with the circumstances we’re in. And, honestly, it’s still fine. Even with the questionable decisions. It has good package management, a lot of packages, and a very big community. That’s more important than being perfect. And, it’s probably still better than Debian, for a variety of reasons. One of the reasons is that Debian has a history of following Ubuntu into bad decisions. If Ubuntu makes a poor choice today, it’ll probably show up in Debian in six months to a year. Debian avoided the biggest hasty decisions (upstart, Unity), but still lots of other annoyances have made it in. And, Debian has a shorter lifecycle than LTS and smaller community
so, less suitable for beginners running servers.

Anyway, if it’s a bug or a suboptimal config, I encourage you to file an issue upstream. Maybe they’ll at least explain why they made this particular choice.

1 Like

I have spun up two additional nginx Virtualmin VPSs, one with Debian 11 and the other with Rocky 9.1.

Unlike with Ubuntu 22.04, nginx starts from boot for both.

For Debian 11 the system ‘After’ config is the same as for Ubuntu 22.04. This config, as pointed out above, is wrong and so will potentially result in nginx start up failures if network startup is incomplete.

For Rocky 9.1 the systemd ‘After/Wants’ config is the same in the source code for nginx, shown above. This config is sound. Nginx won’t startup until network startup is complete.

1 Like