Skip to content
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.

Fix logic when adding a value which could go into multiple centroids #7

Merged
merged 1 commit into from
Oct 8, 2017

Commits on Oct 8, 2017

  1. Fix logic when adding a value which could go into multiple centroids

    When adding a value, we pick a centroid for it to be added to.
    Usually, there is just one option, the centroid with the nearest mean.
    In some cases (typically when a TDigest has very few values), it's
    possible for multiple centroids to have means which are equidistant
    from the input value.
    
    The previous logic for this was broken in that it did not carefully
    maintain the invariant that the centroid means should always be
    non-decreasing. Previously, we would first filter the candidate set to
    only a set of centroid which have room for more values, and then we
    would filter again to preserve the invariant. This was backwards.
    
    Instead, we need to first make sure the invariant is preserved. If the
    right centroid is full, we just need to add a new one. Other logic (in
    addNewCentroid) will make sure it's added in the right spot.
    spenczar committed Oct 8, 2017
    Configuration menu
    Copy the full SHA
    6abd573 View commit details
    Browse the repository at this point in the history