[ML] bucket_count is inaccurate when there are gaps in the data #30080

elasticmachine · 2017-08-25T09:41:27Z

Original comment by @davidkyle:

Open a job send some data and close the job then reopen the job and send some data timestamped a week later than the previous batch. Autodetect will create empty bucket results for the intervening period but DataCounts::bucket_count will not reflect that.

The testMlBasicMultiNodeIT::testMiniFarequoteReopen does exactly this but the test was asserting that bucket_count == 2 rather than bucket_count = 7 days of buckets. bucket_count should equal to the number of buckets written by autodetect, with the caveat that old results are sometimes pruned.

The text was updated successfully, but these errors were encountered:

This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing elastic#30080

This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080

This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes elastic#30080

This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes #30080

This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080

This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes #30080

elasticmachine added :ml Machine learning >bug labels Apr 25, 2018

elasticmachine assigned dimitris-athanasiou Apr 25, 2018

dimitris-athanasiou mentioned this issue Apr 25, 2018

[ML] Refactor DataStreamDiagnostics to use array #30129

Merged

dimitris-athanasiou mentioned this issue May 1, 2018

[ML] Account for gaps in data counts after job is reopened #30294

Merged

dimitris-athanasiou closed this as completed in #30294 May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] bucket_count is inaccurate when there are gaps in the data #30080

[ML] bucket_count is inaccurate when there are gaps in the data #30080

elasticmachine commented Aug 25, 2017

[ML] bucket_count is inaccurate when there are gaps in the data #30080

[ML] bucket_count is inaccurate when there are gaps in the data #30080

Comments

elasticmachine commented Aug 25, 2017