System Status — MetricsHub

Elevated error rates on EU West Ingest API

Incident #INC-20250122 · Started: 22 Jan 2025, 03:14 UTC · Duration: 11 min

Resolved

22 Jan 03:25 UTC

Resolved

The issue has been fully resolved. All error rates have returned to normal levels. A post-mortem will be published within 48 hours. Impact was limited to EU West (fra1-b node) — fra1-a and fra1-c were unaffected. No data loss occurred.

22 Jan 03:19 UTC

Monitoring

A fix has been deployed to the affected node. Error rates are dropping. We are monitoring to confirm full recovery.

22 Jan 03:14 UTC

Identified

The root cause has been identified: a connection pool exhaustion on fra1-b caused by a misconfigured upstream timeout following the v2.4.0 deployment. A rollback is being applied to the affected node.

22 Jan 03:11 UTC

Investigating

We are investigating elevated HTTP 500 error rates on the EU West Ingest API endpoint. Other regions (US East, APAC) are operating normally. Impact: approximately 3% of ingest requests to fra1 returning 500 errors.

Scheduled maintenance — Storage tier compaction

Maintenance #MNT-20241215 · 15 Dec 2024, 02:00 – 03:30 UTC (90 min)

Completed

15 Dec 03:28 UTC

Completed

Scheduled maintenance completed 2 minutes ahead of schedule. Storage compaction was successful. Query performance improvements of 8–12% are expected for time-range queries over 30-day windows. All services are fully operational.

15 Dec 02:00 UTC

In progress

Scheduled maintenance window has begun. The Query API may experience degraded performance (increased latency) for the duration. Ingest API is unaffected. This maintenance was announced 72 hours in advance.

Developer Portal — intermittent login failures

Incident #INC-20241108 · 8 Nov 2024, 16:42 UTC · Duration: 23 min

Resolved

8 Nov 17:05 UTC

Resolved

The session token signing issue has been resolved by restarting the affected auth service replica. All logins are completing normally. API functionality was not impacted — only the developer portal web interface was affected.

8 Nov 16:52 UTC

Identified

Root cause identified: a session token signing key rotation triggered by our 90-day auto-rotation policy was not propagated correctly to all auth service replicas. Fix in progress.

8 Nov 16:42 UTC

Investigating

We are investigating reports of intermittent login failures on the developer portal. Users may see "Invalid session" errors when signing in. Affected: approximately 15% of portal login attempts. API access via API keys is unaffected.

MetricsHub Platform Status

Subscribe to status updates