We've been able to identify the issues are not stemming solely from DataDog revoking keys. Our logs were cluttered with keys which had, in face, been revoked and thus masked the underlying issues we had on our services with lag. At this time we are seeing the lag dropping across our infrastructure and will update and close here in 20 minutes if this trend continues.
We apologize for the inconvenience, and the red herring for a fix presented earlier.
Posted Aug 31, 2022 - 03:23 PDT
Identified
We've confirmed that while for many graphs we are seeing elevated error rates talking with DataDog, we are still seeing successful metrics being sent for most graphs. At this time we are investigating whether our own internal monitoring is simply finicky, or whether we are seeing evidence of a number of client API tokens expiring/being revoked by DataDog.
We are continuing to investigate this issue at this time.
Posted Aug 31, 2022 - 03:01 PDT
Investigating
We are currently seeing an elevated level of errors trying to forward Apollo Studio insights metrics to DataDog. At this time we're still investigating why this is occurring and what steps can be taken on our end to mitigate.