University of Warwick system status

Website unavailability
Incident Report for University of Warwick
Postmortem

Over the Christmas period, the University website at wawick.ac.uk became unavailable overnight. We’re sorry to anyone who was trying to look up information at this time.

What happened

Part of the database backup process failed. When this happens, database changes stack up in a queue on the standby database until the backup process can restart and collect all the changes. Due to the Christmas shutdown period, we don’t have the usual level of human monitoring of alerts and so this problem was missed until the standby database ran out of disk space, which in turn causes the live database to stop (as it streams a replica of the data to the standby and will refuse to let the replica fall out of sync, since it becomes a lot harder to get them back in sync if it has dropped updates).

The issue was resolved by increasing the disk space available to allow the backup process to complete and replication to continue.

How can we avoid this in future?

Although we have monitoring in place to send alerts, and alerts were being sent a couple of days before disk space ran out, that’s only useful if someone is around to see the alerts. We’ll review whether some of the critical alerts can be sent through other channels, such as SMS, so that they can be reviewed.

There may also be a review into the backup system to see whether we can decouple it from the live database so that a failed backup isn’t able to disrupt service - or at least has enough of a disk buffer that it can survive for at least the length of the Christmas shutdown period.

Posted Jan 04, 2023 - 15:12 GMT

Resolved
This incident has been resolved.
Posted Dec 28, 2022 - 10:19 GMT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 28, 2022 - 10:13 GMT
Investigating
We are aware that the website is currently unavailable. The issue should be resolved in the next few hours.
Posted Dec 28, 2022 - 07:49 GMT
This incident affected: Sitebuilder (Web pages and files, Editing, Page statistics, Form submissions, Online Payments, Sitebuilder forums).