Skip to content
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.

Fix logic when adding a value which could go into multiple centroids #7

Merged
merged 1 commit into from
Oct 8, 2017

Conversation

spenczar
Copy link
Owner

@spenczar spenczar commented Oct 8, 2017

When adding a value, we pick a centroid for it to be added to. Usually, there is just one option, the centroid with the nearest mean. In some cases (typically when a TDigest has very few values), it's possible for multiple centroids to have means which are equidistant from the input value.

The previous logic for this was broken in that it did not carefully maintain the invariant that the centroid means should always be non-decreasing. Previously, we would first filter the candidate set to only a set of centroid which have room for more values, and then we would filter again to preserve the invariant. This was backwards.

Instead, we need to first make sure the invariant is preserved. If the right centroid is full, we just need to add a new one. Other logic (in addNewCentroid) will make sure it's added in the right spot.

Fixes #6.

When adding a value, we pick a centroid for it to be added to.
Usually, there is just one option, the centroid with the nearest mean.
In some cases (typically when a TDigest has very few values), it's
possible for multiple centroids to have means which are equidistant
from the input value.

The previous logic for this was broken in that it did not carefully
maintain the invariant that the centroid means should always be
non-decreasing. Previously, we would first filter the candidate set to
only a set of centroid which have room for more values, and then we
would filter again to preserve the invariant. This was backwards.

Instead, we need to first make sure the invariant is preserved. If the
right centroid is full, we just need to add a new one. Other logic (in
addNewCentroid) will make sure it's added in the right spot.
@spenczar spenczar merged commit ac9b52f into master Oct 8, 2017
@spenczar spenczar deleted the fix_add_target branch October 8, 2017 15:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant