All indicators are pointing in the right direction. All monitors have auto-resolved, and all of our pods appear stable. We have added work to our backlog to prioritize this issue, and are planning a post mortem now.
Thanks as always for your continued patience.
Posted Mar 22, 2021 - 07:47 PDT
Our fix has rolled to all pods and our success rate is back up. The original exception is growing stale in our error tracking. All flapping restarts have stopped.
Moving this to monitoring. Once we have seen 15-30 mins of continued success, we will resolve.
Thanks as always for your patience!
Posted Mar 22, 2021 - 07:31 PDT
We have identified the issue and are working to mitigate.
Posted Mar 22, 2021 - 07:28 PDT
We are currently triaging an issue with our reports/metrics ingress which is causing intermittent unavailability for all clients.