Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert Ranks for Predictions [Resolves #357] #671

Merged
merged 3 commits into from
May 3, 2019
Merged

Commits on May 3, 2019

  1. Insert Ranks for Predictions [Resolves #357]

    Adds ranking to the predictions tables.
    
    A few flavors of ranking are added.
    
    rank_abs (already existing column) - Absolute rank, starting at 1,
    without ties. Ties are broken based on either a random draw or a user-supplied fallback clause in the predictions table (e.g. label_value)
    rank_pct (already existing column) - Percentile rank, *without ties*. Based on the rank_abs tiebreaking.
    rank_abs_without_ties - Absolute rank, starting at 1, with ties and
    skipping (e.g. if two entities are tied for 3, there will be no 4)
    
    The tiebreaking for rank_abs (that cascades to rank_pct) is either done
    randomly using a random seed that is based on the model's seed, or
    through user input at the new "prediction->rank_tiebreaker_order_by"
    config value.
    
    What is the model's seed, you ask? It's a new construct, that we store
    in the models table under 'random_seed'.  For each model training task,
    we generate a value between -1000000000 and 1000000000. This value is
    set as the Python seed right before training of an individual model, so
    behavior is the same on singlethreaded or multiprocess training
    contexts. How is this generated? The experiment requires that one is
    passed in the config, so this becomes part of the experiment config that
    is saved.
    
    To help make space in the predictions table,
    and to remove unnecessary precision that would make tiebreaking kind of irrelevant,
    the score in the predictions tables are turned into DECIMAL(6, 5).
    
    To keep track of how tiebreaking was done, there is a new
    prediction_metadata table that holds this metadata, whether user
    configuration or the Triage-supplied default.
    
    Implementation-wise, this is done via an update statement after
    predictions are initially inserted with NULL ranks to prevent memory
    from ballooning.
    thcrock committed May 3, 2019
    Configuration menu
    Copy the full SHA
    2ff1008 View commit details
    Browse the repository at this point in the history
  2. Changes from review

    thcrock committed May 3, 2019
    Configuration menu
    Copy the full SHA
    e007c4c View commit details
    Browse the repository at this point in the history
  3. Cast tests to float

    thcrock committed May 3, 2019
    Configuration menu
    Copy the full SHA
    bf70c02 View commit details
    Browse the repository at this point in the history