Database migrations has resolved the capacity problem causing bad search freshness. The engineering team has monitored the performance of the two clusters that had problems, and the conclusion is that we have not seen issues the last 20 hours. This incident was resolved in full, November 1st, at 12:00 (noon) UTC.
No components marked as affected
Resolved
Database migrations has resolved the capacity problem causing bad search freshness. The engineering team has monitored the performance of the two clusters that had problems, and the conclusion is that we have not seen issues the last 20 hours. This incident was resolved in full, November 1st, at 12:00 (noon) UTC.
Monitoring
The backend problems have now been addressed through a migration of data to a new database service that sclaes better and the engineering team is now monitoring the health of the two impacted clusters to validate that the new backend services are able to handle the load as we expect them to.
Identified
Cognite Engineering is currently performing database migrations to resolve the capacity limitation that the service behind the document search API is suffering from. The document search API is operating in a degraded mode where updates and new documents no longer can be indexed. Users will get search results from a search index that has not been changed since this incident started.Cognite Estimates that the database migrations can complete by noon, Tuesday, November 1st, (UTC time zone)
Identified
Cognite engineering is currently implementing fixes for an incident where we see that document search freshness is suffering because of a database sizing issue. The engineering team is starting a set of database migrations that was planned for next week, but now have to be moved up in time because of overload kicking in earlier than estimated.The impact on users is that new documents and updates to existing documents are queued up and not visible in the search results for the document search API. Users will get search results based on the documents that were indexed at the time when the incident started.