-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data.search] Use incrementCounter for search telemetry #91230
Conversation
Pinging @elastic/kibana-app-services (Team:AppServices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a telemetry POV.
As for
"reset" the values for each of the properties stored in the saved object
You'd need a SO migration for that unless we simply throw the 'old' SO out and recreate the new one after the upgrade. I speak under correction here though, I'm not a SO expert! @Bamieh might be able to give more advice.
trackSuccess: async (duration: number) => { | ||
const repository = await getRepository(); | ||
await repository.incrementCounter(SAVED_OBJECT_ID, SAVED_OBJECT_ID, [ | ||
{ fieldName: 'successCount' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we only migrate SO between versions id the data from one version to the next needs to be persisted (i.e. end-user facing SO). Since telemetry-related SO is fetched once every 24 hours and indexed and that hasn't changed, it doesn't really matter if we blow the SO object out the water and reinitialize a new one when the cluster upgrades. If you did want to persist the data, then I guess you'd need to add a migration. @Bamieh WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with tina. if a transformation is not possible of the old data simply using a new saved object id and dropping this one is fine. We can drop the existing data in the same savedobject type via migrations
.
const migrate713 = (doc) => {
return {
...doc,
attributes: {},
};
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this PR with a migration as suggested, @Bamieh could you take one more look? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
}, | ||
]); | ||
} catch (e) { | ||
if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use SavedObjectsErrorHelpers.isConflictError(err)
?
{ fieldName: 'errorCount' }, | ||
]); | ||
} catch (e) { | ||
if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SavedObjectsErrorHelpers.isConflictError(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use SavedObjectsErrorHelpers.isConflictError(err)
to detect conflicts, but otherwise LGTM
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
) * [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
) * [data.search] Use incrementCounter for search telemetry * Update reported type * Retry conflicts * Fix telemetry check * Use saved object migration to drop previous document * Review feedback * Fix import Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* master: (111 commits) [Logs UI] Replace dependencies in the infra bundle (elastic#91503) [Search Source] Do not request unmapped fields if source filters are provided (elastic#91921) [APM] Kql Search Bar suggests values outside the selected time range (elastic#91918) Refactored component edit policy tests into separate folders and using client integration testing setup (elastic#91657) [Fleet] Don't error on missing package_assets value (elastic#91744) [Lens] Pass used histogram interval to chart (elastic#91370) [Indexpattern management] Use indexPatterns Service instead of savedObjects client (elastic#91839) [Security Solutions] Fixes Cypress tests for indicator match by making the selectors more specific (elastic#91947) [CI] backportrc can skip CI (elastic#91886) Revert "[SOM] fix flaky suites (elastic#91809)" [Fleet] Install Elastic Agent integration by default during setup (elastic#91676) [Fleet] Silently swallow 404 errors when deleting ingest pipelines (elastic#91778) [data.search] Use incrementCounter for search telemetry (elastic#91230) [Fleet] Bootstrap functional test suite (elastic#91898) [Alerts][Docs] Added API documentation for alerts plugin (elastic#91067) Use correct environment in anomaly detection setup link (elastic#91877) [FTSR] Convert to tasks and add jest/api integration suites (elastic#91770) [CI] Build and publish storybooks (elastic#87701) docs: add PHP agent info to docs (elastic#91773) [DOCS] Adds and updates Visualization advanced settings (elastic#91904) ...
Summary
Resolves #89744.
When reporting
search
telemetry, we query a saved object (typesearch-telemetry
), which, prior to this PR, hadsuccessCount
,errorCount
, andaverageDuration
, and forward these to the telemetry cluster.When search requests complete, prior to this PR, we would query the saved object, and update each of these properties in the saved object. This often resulted in many
409 conflict
errors, because of race conditions in updating the saved object. In addition, there was a bug in how we calculated theaverageDuration
.This PR updates the collection to use
incrementCounter
, which uses a script to update the counts instead of querying/updating the saved object, which should get rid of the race condition. It also replacesaverageDuration
in the collected saved object withtotalDuration
, and usesincrementCounter
to increment that as well. Then, when reporting to the telemetry cluster, we query the saved object, and calculate theaverageDuration
, which is then reported to the telemetry cluster.Open questions
Since
averageDuration
was being calculated incorrectly, and since the saved object will now havetotalDuration
instead, we need to "reset" the values for each of the properties stored in the saved object. How do we accomplish this?