Would you believe the same problem bit me twice?

I wrote about an amazingly complicated technical glitch that drove me crazy for days. Would you believe it burned me again?

Here’s the short recap of what happened last time. My Virtual Private Server provider got new hardware, so they’ve moved my VPS from the old hardware to the new hardware. All good. But that means that there are currently two instances of my VPS – one on the old hardware and one on the new – each with a different IP address. Due to a complicated configuration snafu, I was unknowingly working on the old server, thinking I was working on the new server, leaving me completely baffled as to why my changes weren’t appearing on the website. I ended up having to redo a week’s worth of work that I’d done on the old server, this time on the new server.

So how did I screw up again?

Well, one of the things I’d done on the old server during that week was (finally) set up the root mail account to forward to my email account. Before, I’d log in to the server every few days and check the logs for trouble. Now notices of system errors could be emailed to my attention. For example, cron errors.

I knew there were a couple of cronjobs that were failing, but I didn’t care because those tasks were being handled instead by custom scripts I’d written myself. However, I wanted to either start using the system facilities more – so I didn’t need to worry about maintaining my own custom scripts – or at least disable those redundant cronjobs. So I wanted to start getting notices of which jobs were failing. (Also it would just be good to get alerts if any cronjob failed.)

So after I set up the root mail to forward, I started getting – as expected – emails about cron errors.

But at first I ignored them, because this was during the period that I was puzzling over the bizarre behaviours I was getting due to the other problem – the one I blogged about. They didn’t seem to be giving me any helpful info about that problem, so I figured I’d put them aside and deal with them later.

Eventually I figured out the other problem, and reproduced the changes I’d made over that week on the old server over to the new server… which means that I now set up the root mail to forward on the new server as well. With that vexing problem solved, I patted myself on the back and took a break for a few days. Then I turned my attention to the cronjob failure emails that I was getting.

There were only two failures, and one was due to fstrim failing on the new hardware. No problem, expected actually, and an easy fix.

The other cronjob failure was logrotate complaining about a post-rotate script failing. After rotating the MySQL logs (technically, MariaDB on my server), it was getting a password error when trying to flush the database server logs.

mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'root'@'localhost' (using password: NO)'
error: error running shared postrotate script for '/var/log/mysql/mysql.log /var/log/mysql/mysql-slow.log /var/log/mysql/mariadb-slow.log /var/log/mysql/error.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1

Another easy fix – all I had to do was add the missing credentials to the configuration file (as hinted at in the only answer).

Except it didn’t work. I continued to get emails about the same logrotate cronjob failing.

Maybe I missed something, I thought, so I tried running the database log flush command manually. And wouldn’t you know it? It worked perfectly. How very odd. I tried restarting all the daemons as a sort of a blind swipe at the problem, and waiting for the next cron run.

Again, another email about a failure.

Now I wondered if maybe the problem wasn’t somewhere else. Maybe in the logrotate operation itself. So I tried running the logrotate command manually. Worked perfectly.

This was really bizarre. Why was the exact same command working when I ran it manually, and failing when cron ran it? I puzzled over it for a couple days, on and off, then decided to simply try sticking a couple of echo statements before and after the flush-logs command.

And then a really strange thing happened. I got two cron failure emails for the same job! Except… in one of them, I got the same error message I’d gotten before about a bad password. In the other I got just the two echo messages, with no bad password error.

What in the hell?

How could the same job be running twice – at the same time – and one time failing and the other time succeeding? And where the hell were the echo messages going on the run it was failing on?

And then, purely by chance, a friend made a joking comment about my previous blog post – the one about the two server problem. And it hit me.

I’d enabled email notifications of cron failures on both servers – the old one and the new one. I’d done it on the old one during that week that I mistakenly thought I was working on the new one… then I duplicated the same steps on the new server once I realized the mistake. So both servers were sending me cron failure messages. And both servers were configured identically, not counting the changes I’d made during the last couple of weeks. So both were running the same cron job at the same time.

And I’d fixed the error on the new one… but not the old one.

For now, I had to log back into the old server and make the configuration change to stop the error. Presumably I’d already stopped getting cron failure notices from the new server and was only getting failure notices from the old one – I can’t tell because the failure notices don’t give the IP, so I don’t know which of the twin servers it’s coming from. Now the notices from the old server should stop, too.

Honestly, this two identical server thing is going to drive me fucking nuts until they finally take the old server offline.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.