Slow jobs for Functions and Engineering Diagrams in USA-E1 cluster
Incident Report for Cognite Service
Cognite engineering has resolved an incident where there was a processing backlog build-up for Functions and Engineering Diagrams. Customers started noticing delays during Sunday 18th of September, and engineers were working on scaling the system up and improving the efficiency of the services for a few days. As a part of the incident resolution, the engineering team implemented rate-limiting for job/related endpoints, resolved a problem with unnecessary large memory allocation, and reduced the amount of data that has to be handled simultaneously.

Further improvements to the services have been added to the backlog of the teams owning the service, and additional improvements will be deployed as soon as that work is completed.
The incident resulted in slow processing of functions and engineering diagrams for approximately 48 hours for customers in the USA-E1 cluster.
Posted Sep 19, 2022 - 01:00 CEST