Good afternoon Ovrture users,
This document details the cause and events occurring immediately after Ovrture’s incident on January 28, 2020 as well as the steps we are taking to mitigate the impact of future outages like this one in the future.
On January 28, 2020, there was a major outage regarding microsites, login pages, and the database. This outage occurred from 6:00 AM. to 9:00 AM.
The Ovrture development team was the first to notice this issue through notification from the Ovrture status page. Ovrture investigated the issue and notified the Ovrture development operations team, who began looking into the outage immediately. It was discovered that the outages were related to a race condition in the reboot process between EFS, Tomcat, and Ubuntu 18's new security separation layers. These caused the reboot sequence to be unpredictable. When regular nightly updates were applied last night, the updates themselves were actually okay, it was the reboot that hurt. Subsequent reboots also had the same issue.
The underlying issues regarding this have been fixed and the race condition on reboot was resolved. That being said, reboots are now safe.
Things we will improve to make sure issues like this do not happen again in the future:
We are very sorry for any inconvenience this incident may have caused on January 28 and we will continue to work hard to make sure something like this doesn’t happen again.
Onward,
Gideon and the Ovrture Team