S2 - Latency using SpaceIQ Platform
Incident Report for SiQ
Postmortem

SiQ Detailed Root Cause Analysis – Severity 1 & 2 – April 20, 25 & 28

Intermittent Latency Using SpaceIQ Platform

Description:

On April 20, 2023, at approximately 2:40pm, customer support began to receive reports of issues throughout the SpaceIQ platform. When customers tried to navigate the platform, error messages such as “Timeout”, “Network Failed Request” would occur then customers would experience the latency and at one point, customers were unable to access SpaceIQ. Similar reports happened through the next week on, April 25th and again on the 28th.

 

Type of Event:

Performance Degradation

 

Services\Modules Impacted:

Production

 

Timeline:

April 20, 2023 |S1| SIQDEV1-5021
On Thursday, April 20, 2023, at approximately 2:40pm MST, customer support began receiving reports of issues throughout the SpaceIQ platform. When customers tried to navigate the platform, error messages such as “Timeout”, “Network Failed Request” would occur then customers would experience the latency and at one point, customers were unable to access SpaceIQ. At 3:15pm MST, The SpaceIQ engineering team quickly acknowledged the issue and began an Initial Investigation. At 3:15pm MST SpaceIQ customers were also notified of this issue via SpaceIQ Status Page and that we are actively investigating. 3:21pm our engineering team begins to restart internal services to help mitigate the performance issues. Around 8:47pm our product team begins to verify the stability and at 8:57pm confirmation is received that performance of SpaceIQ has improved. At 9:33pm MST, the Status Page was updated from Investigation to Monitoring. We continued to monitor the platform through Friday morning, April 21, 2023, as customers confirm that performance has also improved to a normal level, we marked the Status Page from Monitoring to Resolved at 6:15am.

Total Duration of Event: 15HRS 26MINS

 

April 25, 2023 |S2|SIQDEV1-5092

On Tuesday April 25, 2023, at approximately 3:06pm MST, customer support received an initial report with intermittent latency while using SpaceIQ. By 4:56pm MST, multiple reports were generated, and the support team was able to replicate the latency and all SpaceIQ customers were made aware of the intermittent latency through the Status Page and was set to an Investigating Phase. By 6:40pm the Status Page was updated from Investigating to an Identified Phase. 7:36pm the engineering team began to restart internal services to help mitigate the intermittent latency, and by 7:48pm the engineering team began seeing better performance and multiple customers also reported better performance of the platform. The Status Page was updated to a Monitoring Phase for the next hour and no additional reports were made and status page was marked as Resolved at 9:02pm MST.

Total Duration of Event: 5HRS 56MINS

 

April 28, 2023 |S2| SIQDEV1-5102

On Friday April 28, 2023, at approximately 3:29am MST, customer support received an initial report with intermittent latency and errors while using SpaceIQ. By 9:56am MST 2 additional customers reported the same behavior and engineering team was made aware of the reports and investigation began. 12:05pm additional customers began reporting the same behavior and all customers were made aware of the intermittent latency and errors through the Status Page to an Investigation Phase. 3:37pm the engineering team began to restart internal services to help mitigate the intermittent latency. The engineering team began seeing better performance and multiple customers also reported better performance of the platform and the Status Page was updated from Investigating to a Monitoring Phase. After monitoring, no additional reports were made, and all customers reported the stability of SpaceIQ to be back to normal. At 6:05pm MST the Status Page was updated from Monitoring to Resolved.

Total Duration of Event: 14HRS 36MINS

 

Root Cause Analysis:

In the 3 days that our customers experienced this performance degradation, our engineering team found that the root cause for these incidents were from the same internal service that was failing through each day.

 

Preventative Action:

The engineering team investigated each incident and continued to find and improve the internal service. Finally, reconfiguration was implemented to correct the slowness.

Posted May 03, 2023 - 19:19 UTC

Resolved
As we have not seen further service disruptions after the fix was implemented, we have moved to the Resolved Phase.
A Detailed RCA will be posted in this incident in 10 business days. Please stay subscribed to the page to receive updates automatically.
Posted Apr 26, 2023 - 03:02 UTC
Monitoring
A fix has been implemented. We are moving into the Monitoring Phase for the next hour.
Posted Apr 26, 2023 - 01:59 UTC
Identified
The issue with Latency using SpaceIQ Platform has been identified and a fix is being implemented. We will post another update at 8:40pm or sooner.
Posted Apr 26, 2023 - 00:40 UTC
Investigating
We are currently investigating an issue with some latency while using the SpaceIQ platform. Our Engineering team is currently investigating to determine the cause of the disruption. The next update will be posted at 7:00pm MST.
Posted Apr 25, 2023 - 23:01 UTC
This incident affected: System Status.