Insert Ranks for Predictions [Resolves #357] #671

Adds ranking to the predictions tables. A few flavors of ranking are added. rank_abs (already existing column) - Absolute rank, starting at 1, without ties. Ties are broken based on either a random draw or a user-supplied fallback clause in the predictions table (e.g. label_value) rank_pct (already existing column) - Percentile rank, *without ties*. Based on the rank_abs tiebreaking. rank_abs_without_ties - Absolute rank, starting at 1, with ties and skipping (e.g. if two entities are tied for 3, there will be no 4) The tiebreaking for rank_abs (that cascades to rank_pct) is either done randomly using a random seed that is based on the model's seed, or through user input at the new "prediction->rank_tiebreaker_order_by" config value. What is the model's seed, you ask? It's a new construct, that we store in the models table under 'random_seed'. For each model training task, we generate a value between -1000000000 and 1000000000. This value is set as the Python seed right before training of an individual model, so behavior is the same on singlethreaded or multiprocess training contexts. How is this generated? The experiment requires that one is passed in the config, so this becomes part of the experiment config that is saved. To help make space in the predictions table, and to remove unnecessary precision that would make tiebreaking kind of irrelevant, the score in the predictions tables are turned into DECIMAL(6, 5). To keep track of how tiebreaking was done, there is a new prediction_metadata table that holds this metadata, whether user configuration or the Triage-supplied default. Implementation-wise, this is done via an update statement after predictions are initially inserted with NULL ranks to prevent memory from ballooning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert Ranks for Predictions [Resolves #357] #671

Insert Ranks for Predictions [Resolves #357] #671

Commits on May 3, 2019