SpaceIQ Detailed Root Cause Analysis – Severity 2 – June 5, 2023
Intermittent Slowness in SpaceIQ
Description:
On June 5, 2023, at approximately 10:52am MST, internal teams and customers support started to receive reports of issues throughout the SpaceIQ platform with Intermittent slowness. Some users were experiencing a spinning wheel, or a Network Request failed error when trying access or navigate the platform.
Type of Event:
Performance Degradation
Services\Modules Impacted:
Production
Timeline:
10:52am MST – Internal teams and customers have reports of issues throughout the SpaceIQ platform with Intermittent slowness. Some users were experiencing a spinning wheel, or a Network Request failed error when trying access or navigate the platform. After a short investigation and information gathering to reproduce customers experiences. The initial ticket was escalated to a S2 incident at 1:12pm MST and posted to our status page alerting customers. Internal teams acknowledge the issue and begin investigating. The issue was quickly identified, and a fix was implemented. At 1:47pm MST, customers were alerted on the status page that we are now monitoring the fix. Customers were able to confirm the implemented fix during monitoring. There were no additional reports of slowness, and the status page was marked as resolved at 4:21pm MST.
Total Duration of Event:
5hrs 29mins
Root Cause Analysis:
The engineering and dev ops team found that there was an issue with a caching server. A manual restart of the server resolved the issue.
Preventative Action:
Our dev ops team continues to investigate why the caching server did not restart automatically. They have put new processes in place to ensure that the appropriate teams are also alerted if this happens again.