All vhosts are backed up under the same S3 account but randomly one of them fails

Eric, i imagined you guys already looked into all the updates that occur near the date the issue started occurring?

i dont have a rackspace acct but im willing to help. LMK how.

thank you!

Could a firewall be preventing Amazon connection? We have CSF on all our servers, and we block all non-public ports, e.g. SSH, FTP etc. We only open port 80 for incoming traffic, and email ports. Would Amazon S3 be connecting on specific ports, or kicking off some LFD block like too many connections too quickly? I did search for “amazon” in the LFD log but nothing there.

I have approx. 10 servers all backing up to Amazon S3 at the same time each night, midnight, so also wondered if connecting to the buckets in my one S3 account at the same time caused some issue at Amazon? Although if it did then I am sure S3 is not very robust for larger organisations, so probably not this.

I’ll review the git check-ins to double-check that nothing there changed that might be causing the issue.

Did this start occurring at the same time for everyone, roughly mid-june?

-Eric

13-16 of june at least for two ppl here…

Howdy,

Thanks for the info.

Reviewing the Virtualmin releases, it looks like the most recent Virtualmin version, which is where the S3 code is implemented, was released on May 14th.

The most recent Webmin version was released on May 22nd.

I suspect since so many folks began seeing this issue in mid-June, that something else may have changed around that time (assuming it’s client-side, and not related to S3 itself… which it very well may be).

I tried to answer this next question by reviewing the posts in this thread quickly, but I just wanted to confirm – is everyone who’s having this issue using CentOS? It sounds like there’s a mix of CentOS 5 and CentOS 6 systems, but I didn’t notice any that were running Ubuntu or Debian.

I don’t imagine any of you still have yum logs from roughly the time when the problem began occurring, where you could check to see what packages were installed/updated around that time?

-Eric

yum.log for June 2014:

Jun 03 09:32:53 Updated: tzdata-2014d-1.el6.noarch Jun 03 09:33:00 Updated: tzdata-java-2014d-1.el6.noarch Jun 04 17:12:17 Updated: gnutls-2.8.5-14.el6_5.x86_64 Jun 04 17:12:25 Updated: libtasn1-2.3-6.el6_5.x86_64 Jun 06 00:01:02 Updated: openssl-1.0.1e-16.el6_5.14.x86_64 Jun 06 00:01:05 Updated: openssl-devel-1.0.1e-16.el6_5.14.x86_64 Jun 11 22:54:45 Updated: goaccess-0.8-1.el6.x86_64 Jun 21 09:50:37 Updated: kernel-firmware-2.6.32-431.20.3.el6.noarch Jun 21 09:51:03 Installed: kernel-2.6.32-431.20.3.el6.x86_64 Jun 21 09:52:58 Updated: kernel-headers-2.6.32-431.20.3.el6.x86_64 Jun 21 09:53:12 Updated: libxml2-2.7.6-14.el6_5.2.x86_64 Jun 21 09:53:13 Updated: libxml2-python-2.7.6-14.el6_5.2.x86_64 Jun 21 09:53:21 Updated: tzdata-2014e-1.el6.noarch Jun 21 09:53:28 Updated: tzdata-java-2014e-1.el6.noarch Jun 23 10:54:03 Updated: nodejs-packaging-7-1.el6.noarch Jun 23 23:03:26 Updated: avahi-libs-0.6.25-12.el6_5.1.x86_64 Jun 23 23:03:33 Updated: kpartx-0.4.9-72.el6_5.3.x86_64 Jun 23 23:03:39 Updated: ql2400-firmware-7.03.00-1.el6_5.noarch Jun 23 23:03:44 Updated: ql2500-firmware-7.03.00-1.el6_5.noarch Jun 26 10:44:13 Updated: 1:dovecot-2.0.9-7.el6_5.1.x86_64 Jun 26 10:53:59 Updated: coreutils-libs-8.4-31.el6_5.2.x86_64 Jun 26 10:54:10 Updated: coreutils-8.4-31.el6_5.2.x86_64 Jun 28 08:10:25 Updated: clamav-db-0.98.4-1.el6.x86_64 Jun 28 08:10:35 Updated: clamav-0.98.4-1.el6.x86_64 Jun 28 08:10:37 Updated: clamd-0.98.4-1.el6.x86_64

martin

CentOS 6.

Updates on the week of 10-17 of June:

Updated mod_security-1:2.8.0-20.el6.art.x86_64 @asl-4.0
Updated libxml2-2.7.6-14.el6_5.1.x86_64
Updated libxml2-python-2.7.6-14.el6_5.1.x86_64

I attach my yum.log for June, for one of my servers. Let me know if you want more of my servers (all 10 of them have the issue). They are all more or less similar in setup, and all would have been updated the same time, I try to run yum update once a month on all servers. They are all CentOS 6.

Just two put my +1 to this. Sadly I have lost a clients database now as things weren’t being backed up fully. :frowning:

I had this issue some time ago and I found that the server time was a little off which seemed to cause the failing.

Another note for some, Amazon, not so long ago changed their authentication method (IAM). Not sure if it is related or not, but I have changed the credentials now.

Not sure how either of these suggestions have anything to do with Empty response to HTTP request. Just clutching at straws really.

It still doesn’t get everything across, but I got more servers over than I have done in the past. I’m going to tweak reties now and see if that helps.

Hi,

We use Ubuntu 12.04 LTS and we have the same observations:

I contacted AWS support and they are asking for request/response headers of the failed requests. Is there any way for me to find some debugging info in logs that I could pass to AWS team?

The biggest thing I see in common that’s been updated is libxml2.

I’m reluctant to think that’s the source of issue, but I also don’t want to rule it out. I mean, I suppose if one of you guys wanted to try rolling libxml2 back to the previous version, I’d be curious if that makes a difference.

I’m a bit more curious about this though –

I’ve been reviewing the code used to push the backups to S3, and I’m wondering if anyone would be so kind to try making a change in /usr/libexec/webmin/virtual-server/s3-lib.pl to enable some additional debugging.

On line 208 of that file is the following:

$err = "Empty response to HTTP request";

Could you change that line to read as follows:

$err = "Empty response to HTTP request: [line: $line], [out: $out]";

That will show a bit more info about what’s really being returned by Amazon when this error occurs.

After making that change, restart Webmin (/etc/init.d/webmin restart).

Then, next time that error is thrown, could you paste in the full error output here? It’s possible those variables will be completely empty. But it’s also possible they’ll contain exactly what we need to determine what’s going on :slight_smile:

Thanks!

-Eric

Thanks Eric, I’ve made the change. We run backups every night Europe time. I will respond with an updatede if anything comes up unless someone else is faster than me.

Leszek

@T2thec about the server time… Amazon does indeed check server time. If its 15 minutes or more difference it wont connect. I discovered this some time ago when using a plugin for Expression Engine. You can see my report here http://expressionengine.stackexchange.com/questions/10170/assets-stopped-connecting-to-amazon-s3-access-denied-by-target-host, and Amazons FAQ on the matter http://aws.amazon.com/articles/1109#04. BUT, my server times are now correct following this issue, so that may not be the issue here.

Two servers updated, will report back tomorrow.

martin

One domain failed, the report contained:

Uploading archive to Amazon’s S3 service …
… upload failed! Empty response to HTTP request: [line: ], [out: ]

martin

Hey. Reporting back.

• Bumped retires up to 10
• Changed to new AWS IAM passkey details
• Checked server time wasn’t out

72 vps fully backed up three times without a hitch.

I am a happy man again… For now.

T2thec,

Whats the new AWS IAM passkey details?

and everyone, how would server time be an issue OCCASIONALLY?

thanks

Dear Eric,

Similarly to martinmanyhats I see the same output in my backup logs:

Creating incremental TAR file of home directory ..
.. done

Uploading archive to Amazon's S3 service ..
.. upload failed! Empty response to HTTP request: [line: ], [out: ]

… completed in 1 minutes, 24 seconds

Thanks!

Hi. I had the same problem on a Debian 7 server, starting on May 27th: random backup failures of virtual server.

I tried to set up a new scheduled backup and it failed every time until I increased the retries from 3 to 10 in the Virtualmin settings.

I’ll do the same on the server with the random failures and see what it does next time.

Thank you for sharing the additional output in the backup logs. That does look like it’s truly empty.

It sounds like Amazon is suggesting we review (or pass along to them) the headers being sent to them, and received from them, during the backup process.

I’ll work with Jamie to get a patch that retrieves that information, and we’ll get back with you shortly!

-Eric