University of Warwick system status

Tabula performance issues
Incident Report for University of Warwick
Postmortem

What happened?

At 10am, the sign-up process for some teaching groups opened to a large number of students. These students naturally all descended upon Tabula at the same time, loading pages that can take over 10 seconds to load depending on the user, how many modules they are on etc.

We’ve previously added some mitigation to handle users repeatedly refreshing the page to alleviate some load, but there is definitely more work to be done.

What will we do now?

We have already started a phase of performance improvement across Tabula, using data to identify the slowest pages and prioritise improvements there. Every slow page that becomes fast increases the amount of load that Tabula can handle overall, making a total slowdown less likely. On top of that, faster page loads simply improve the experience for everyone - it’s never ideal to have to wait several second for a page to load. Sometimes this is hard to avoid as Tabula loads a lot of interrelated data about courses, modules, groups, all to display useful information to you. However there are plenty of ways we can be smarter about loading data to keep overall page loads fast.

We’ll also be investigating higher-level methods of keeping Tabula running in the event that some parts become slow:

  • Segregating database connections. Currently, Tabula has a finite shared pool of database connections it can use to read and write data, meaning it’s possible for enough slow requests to use up all these connections and starve other pages that might otherwise be fast to load. By giving potentially slow pages their own dedicated pool, they can’t starve the rest of Tabula.
  • Limits/queueing. For certain operations where we expect spikes of traffic, we can look at ways to implement limits on how many slow operations can be happening at a time, and either returning a “try in a minute” message to other users, or possibly place them in a virtual queue where first-come-first-served behaviour is important.
Posted Jan 11, 2023 - 12:30 GMT

Resolved
Service should now be normal. We continue to investigate performance improvements and other ways to avoid service issues when Tabula is very busy.
Posted Jan 11, 2023 - 11:55 GMT
Update
System seems stable now and pages are loaded quickly. We did restart system a few times.
Posted Jan 11, 2023 - 11:53 GMT
Investigating
There are some known performance issues with Tabula caused by a large influx of students signing up for small groups. We are working on remediating this and are already working on Tabula performance improvements to avoid this sort of situation.

If you are currently trying to sign up for small groups and it is slow, please leave the application for a few minutes as users refreshing pages will only exacerbate the problem.
Posted Jan 11, 2023 - 11:24 GMT
This incident affected: Tabula (Coursework submission, Timetables, meeting records and profiles, Exam timetables, Assessment management, Small group teaching, Exam grids, Monitoring points, Mitigating circumstances, Tabula API).