SpaceIQ by Eptura detailed Root Cause Analysis | 02/25/2024
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
On the evening of Sunday, February 25, 2024, at approximately 11:55pm EST. Multiple users reported the inability to access the SpaceIQ platform.
Type of Event:
Outage
Services/Modules impacted:
Production
Timeline:
On the evening of Sunday, February 25, 2024, at approximately 11:55pm EST. Multiple users reported the inability to access the SpaceIQ platform. Internal channels were alerted and at 12:10am EST, 02/26/2024, the CloudOps team acknowledged the issue and began investigation. At approximately 12:54am EST, the issue was resolved, and users were able to access the platform.
Total Duration of Event:
1 Hour
Root Cause:
The DB had reached the max storage threshold.
Remediation:
An additional 1 TB of space was added to the max threshold to resolve the issue.
Preventative Action:
RDS Instance has been added to LogicMonitor for additional Monitoring of the core metrics.
Meeting setup with Dev teams to review the usage pattern and see if the logging requirement can be reviewed\optimized.