Running on AWS LightSail VM.
In last 48 hours 1 in 20 S3 backups failed.
Prior to that 1 in 400 backups failed with the 503 event.
The hourly scheduled backup typically under 50 MB.
If the virtualmin backup code already has retry logic suggested by AWS document, the issue is with Amazon.
I will open an Amazon ticket if the situation continues.
I don’t know for sure. Jamie wrote that code…since it’s not just happening today, it may be something we need to look at. I assumed we were gracefully handling problems and I don’t think we’ve had other similar reports lately, but it’s worth looking into.
My LightSail VM instance talking to AWS region which had(?) S3 storage issue.
Maybe worth reviewing code to take into account AWS notion of degraded service latency.
Document when error thrown to check S3 health
12:31 PM PDT We are investigating increased error rates for Storage Gateway read/write operations in the US-EAST-1 Region.
12:53 PM PDT We are seeing recovery for Storage Gateway read/write operations in the US-EAST-1 Region.
1:10 PM PDT Between 11:40 AM and 12:56 PM PDT we experienced increased latencies for read/write operations in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.