22 Jan 03:25 UTC
Resolved
The issue has been fully resolved. All error rates have returned to normal levels. A post-mortem will be published within 48 hours. Impact was limited to EU West (fra1-b node) — fra1-a and fra1-c were unaffected. No data loss occurred.
22 Jan 03:19 UTC
Monitoring
A fix has been deployed to the affected node. Error rates are dropping. We are monitoring to confirm full recovery.
22 Jan 03:14 UTC
Identified
The root cause has been identified: a connection pool exhaustion on fra1-b caused by a misconfigured upstream timeout following the v2.4.0 deployment. A rollback is being applied to the affected node.
22 Jan 03:11 UTC
Investigating
We are investigating elevated HTTP 500 error rates on the EU West Ingest API endpoint. Other regions (US East, APAC) are operating normally. Impact: approximately 3% of ingest requests to fra1 returning 500 errors.