Historical Data Delayed

Resolved·Full outage

This incident has been resolved.

Mon, May 3, 2021, 05:06 PM

(5 years ago)

Affected components

May 3, 2021, 04:19 PM

05:05 PM

Metrics Ingestion

GraphQL API

Updates

Resolved

This incident has been resolved.

Mon, May 3, 2021, 05:06 PM

Monitoring

Everything looks to be back to 100% operational. Steps have also been put in place to mitigate this from happening again/to making the recovery from such an event much faster.

Mon, May 3, 2021, 05:05 PM

Monitoring

We have managed to scale our impacted services in parallel rather than in sequence. As such, we should be back online 100% in around 30 mins (allowing grace period for K8s to scale etc.).

As always we appreciate your patience, and will update in around 15 mins with updates.

Mon, May 3, 2021, 04:49 PM(16 minutes earlier)

Identified

We have identified a hard upper bound for when all data will be back online of around 6 and a half hours. We are working now to reduce this time, and have identified:

there is 0 data loss
each pod which was accidentally shut down will reclaim all of its data in around 7 minutes
if we can figure out how to sidestep our scaling, we should be able to get back up inside of 10 minutes (this is what we are currently investigating)

Mon, May 3, 2021, 04:35 PM(13 minutes earlier)

Investigating

Due to a bad migration, we have extreme delay in being able to serve our historical data. This has 0 impact on our ingestion, but checks, and many dashboards will be severely impacted.

Please note: No data has been lost.

We will report back soon with information on expected time till we can recover.

Mon, May 3, 2021, 03:43 PM(51 minutes earlier)

Apollo Graph, Inc.