Insert Ranks for Predictions [Resolves #357] #671

thcrock · 2019-04-10T22:23:34Z

Adds ranking to the predictions tables.

A few flavors of ranking are added.

rank_abs (already existing column) - Absolute rank, starting at 1,
without ties. Ties are broken based on either a random draw or a user-supplied fallback clause in the predictions table (e.g. label_value)
rank_pct (already existing column) - Percentile rank, without ties. Based on the rank_abs tiebreaking.
rank_abs_without_ties - Absolute rank, starting at 1, with ties and
skipping (e.g. if two entities are tied for 3, there will be no 4)

The tiebreaking for rank_abs (that cascades to rank_pct) is either done
with RANDOM() using a random seed that is based on the model's seed (but
converted from Python seed-space into Postgres seed-space), or
through user input at the new "prediction->rank_tiebreaker_order_by"
config value.

What is the model's seed, you ask? It's a new construct, that we store
in the models table under 'random_seed'. For each model training task,
we generate a value between -1000000000 and 1000000000. This value is
set as the Python seed right before training of an individual model, so
behavior is the same on singlethreaded or multiprocess training
contexts. How is this generated? The experiment requires that one is
passed in the config, so this becomes part of the experiment config that
is saved.

To help make space in the predictions table,
and to remove unnecessary precision that would make tiebreaking kind of irrelevant,
the score in the predictions tables are turned into DECIMAL(6, 5).

To keep track of how tiebreaking was done, there is a new
prediction_metadata table that holds this metadata, whether user
configuration or the Triage-supplied default.

Implementation-wise, this is done via an update statement after
predictions are initially inserted with NULL ranks to prevent memory
from ballooning.

codecov-io · 2019-04-11T18:27:10Z

Codecov Report

Merging #671 into master will decrease coverage by 0.18%.
The diff coverage is 72.22%.

@@            Coverage Diff             @@
##           master     #671      +/-   ##
==========================================
- Coverage   82.78%   82.59%   -0.19%     
==========================================
  Files          90       93       +3     
  Lines        6012     6114     +102     
==========================================
+ Hits         4977     5050      +73     
- Misses       1035     1064      +29

Impacted Files	Coverage Δ
src/triage/component/results_schema/__init__.py	`66.66% <ø> (ø)`	⬆️
...embic/versions/609c7cc51794_rankify_predictions.py	`0% <0%> (ø)`
src/triage/component/results_schema/schema.py	`98.36% <100%> (+0.15%)`	⬆️
src/triage/experiments/base.py	`95.39% <100%> (+0.03%)`	⬆️
src/triage/component/catwalk/model_trainers.py	`91.94% <100%> (-1.11%)`	⬇️
src/triage/component/catwalk/storage.py	`92.47% <100%> (+0.05%)`	⬆️
src/triage/util/random.py	`100% <100%> (ø)`
src/triage/util/sql.py	`100% <100%> (ø)`
src/triage/experiments/validate.py	`77.03% <83.33%> (+0.27%)`	⬆️
src/triage/component/catwalk/predictors.py	`95.09% <96.29%> (+0.29%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01f357e...a8c30ce. Read the comment docs.

ecsalomon

I don't really have any comments on the code, but I found some of the documentation and configuration confusing.

I think we need to make it really clear where and how the random seed is used and stored -- i.e., what is its scope, especially as it comes to being stored with model metadata. A model's random seem seems intuitively to me as being similar to the random_state key in sklearn classes and so would also be part of the model group definition. If we don't intend to use the experiment random seed in this way, we need to be really clear in the language around it. On the other hand, perhaps it is desirable to set a single random seed to use for all elements of an experiment, including and model random_states. I haven't really thought through the implications of either.
Relatedly, what the expected behavior of re-running the same config changing only the random seed with and without replace? What, if anything, is updated in the models and model_groups tables? In the predictions, evaluations, and predictions_metadata tables?
I'm unclear on the purpose and behavior of the other configuration options, and the remainder of my comments are on that topic.
I think it would be cool to eventually support something like @avishekrk's tie-breaking prioritization (e.g., prioritize women for housing interventions in the case of ties) and to see the impact of this kind of prioritization on evaluation and bias metrics. But this could still lead to ties, which would then need to be broken in the best/worst/stochastic manner. But this doesn't seem like the intention of the configuration, nor does it seem like an obvious extension of the configuration.
Have we considered how audition and/or postmodelling will handle models that have different configuration options for tie-breaking? Or different random seeds? Are these comparable or should they blow up? I know this PR doesn't have to make those changes, but I am trying to think through what the consequences of the new configuration options will be.
The configuration of rank_tiebreaker_order_by allows users to do both potentially dangerous (label_value_desc) and useless (score asc) things. We already support plenty of ridiculous configurations, so this is isn't a strong objection. But I'm not sure that it supports all that many useful things, either.

ecsalomon · 2019-04-15T20:13:59Z

example/config/experiment.yaml

@@ -11,6 +11,9 @@ config_version: 'v6'
 # model_comment (optional) will end up in the model_comment column of the
 # models table for each model created in this experiment
 model_comment: 'test'
+# random_seed will be set in Python at the beginning of the experiment and 
+# affect the generation of all model seeds


and model groups? or are model groups independent of seed? Edit: I was thinking about this as a replacement for sklearn's seed setting, which now (usefully) creates distinct model groups.

It might be good to document here (and elsewhere) what the scope of the model random seed is/isn't.

ecsalomon · 2019-04-15T20:34:37Z

example/config/experiment.yaml

+# Example: "RANDOM()" (default),
+# Will end up in a SQL clause like: "order by score desc, RANDOM()"
+#
+# Example: "label_value desc"


Hmmm, the configuration here seems maybe undesirable to support as is -- it seems like a bad idea to sort by label_value desc as it will give an unrealistically positive view of any downstream processes that make use of ranking, unless there is a giant, flashing warning letting users know what is happening. If it were possible to join to another table and then sort on a value there (e.g., we want to prioritize the lowest income people in the event of ties), I could see this being useful, though that is obviously out of scope here. So I guess my question is: Is that the intended future of this configuration?

This is a good point. I don't really see how to make a non-confusing configuration option support joins here. Like, we could let people put in the join clause, but disconnected from the query it's probably hard to reliably write. They could put the whole query, with what's in the code file now as an example, but that is

Honestly, given the limited scope of the usage of these columns (see my other reply just now), I'm inclined to just remove the configuration for this entirely. My opinion is that stuff like 'prioritize lowest income people in the event of ties' would belong in after-experimenting evaluation, which could be built around the enhanced sort_predictions_and_labels in the evaluation pull request: https://github.com/dssg/triage/pull/674/files#diff-69452711d71cc1b37559e5011b372db0 (for instance, we add onto that to make it take other arbitrary arrays that may have other data, and add some convenience onto that for getting the predictions and labels and other data)

I think if we're reporting out best, worst, and stochastic cases in the evaluations table, it makes sense to allow those to be supported here, but I wonder if it might make more sense to just have a parameter with those explicit values allowed rather than an arbitrary order by clause.

In the future, maybe that could be extended to allow specifying a join table or something along those lines for more complex cases, but may be a little safer and more explicit (especially with respect to errors confusing ascending and descending) for the current purposes.

@shaycrk how would the stochastic case (30 trials, therefore 30 rankings) map to a ranking we can materialize here?

Yeah, I think I got my mind all twisted thinking about this in terms of what's used for the metrics, but since there is no longer a "canonical" metric value but rather a range (worst, best) and central tendency (stochastic), there is no longer any alignment between the inserted ranks and the metric value. So I assume these will just be used for downstream processes, specifically postmodeling and bias analysis. So we should think about the configuration either in terms of what we want to support for those... or punt on configuration altogether until we start working on changes to those modules.

Just as the random tie-breaking option (just one realization from the stochastic case) you're already implementing here. I don't think we need the flexibility of allowing the user to specify any arbitrary order by clause (which may also be error-prone) as opposed to having a fixed menu of options that mirrors what we're putting in the evaluations table.

ecsalomon · 2019-04-15T20:37:54Z

example/config/experiment.yaml

+# of Postgres' random number generator. Before ranking, the generator is seeded based on
+# the *model*'s random seed.
+#prediction:
+#   rank_tiebreaker_order_by: "RANDOM()"


It's unclear from the documentation here how this interacts with the (future) metrics behavior, which may be addressed in the next PR in this series. But I am wondering: If I set random() here, is that intended to apply before label_value in the best/worst metric comparisons or random() in the stochastic (leading to paradoxically invariant stochastic metrics)? In the example I gave above of prioritizing based on income, I would expect the sorting to be applied and then ties within that sorting to be broken by random() or label_value, but if I gave one of the example options, I would not.

This doesn't affect the metrics behavior. It'd be really hard to integrate this schema with best/worst/30 random runs. Really, I think the mere existence of the stochastic option for evaluation should discourage us from trying to pin evaluation hopes on tightly coupling the metrics with the ranking that goes into this table. We're not reasonably going to store every ranking that was used, and there isn't even a true 'ranking' to recover if we use the stochastic value.

shaycrk · 2019-04-25T22:34:33Z

src/triage/component/catwalk/predictors.py

+                update {table_name(prediction_obj)} as p
+                set
+                    rank_abs = abs.abs_rank,
+                    rank_pct = pct.pct_rank,


I realize it would mean changing column names that already exist, but I would probably lean towards naming these rank_abs_no_ties and rank_pct_no_ties (or without_ties) just to be very explicit as well as to avoid suggesting that one is the preferred/default to use in general.

Also, we might have to calculate it directly, but it seems like we may also want rank_pct_with_ties too?

Adds ranking to the predictions tables. A few flavors of ranking are added. rank_abs (already existing column) - Absolute rank, starting at 1, without ties. Ties are broken based on either a random draw or a user-supplied fallback clause in the predictions table (e.g. label_value) rank_pct (already existing column) - Percentile rank, *without ties*. Based on the rank_abs tiebreaking. rank_abs_without_ties - Absolute rank, starting at 1, with ties and skipping (e.g. if two entities are tied for 3, there will be no 4) The tiebreaking for rank_abs (that cascades to rank_pct) is either done randomly using a random seed that is based on the model's seed, or through user input at the new "prediction->rank_tiebreaker_order_by" config value. What is the model's seed, you ask? It's a new construct, that we store in the models table under 'random_seed'. For each model training task, we generate a value between -1000000000 and 1000000000. This value is set as the Python seed right before training of an individual model, so behavior is the same on singlethreaded or multiprocess training contexts. How is this generated? The experiment requires that one is passed in the config, so this becomes part of the experiment config that is saved. To help make space in the predictions table, and to remove unnecessary precision that would make tiebreaking kind of irrelevant, the score in the predictions tables are turned into DECIMAL(6, 5). To keep track of how tiebreaking was done, there is a new prediction_metadata table that holds this metadata, whether user configuration or the Triage-supplied default. Implementation-wise, this is done via an update statement after predictions are initially inserted with NULL ranks to prevent memory from ballooning.

shaycrk · 2019-05-03T19:16:39Z

docs/sources/dirtyduck/triage/experiments/inspections_dt.yaml

@@ -184,6 +185,9 @@ feature_group_definition:

 feature_group_strategies: ['all']

+prediction:
+   rank_tiebreaker: "best"


maybe add a comment warning that this is illustrative, but you might not want to use best in most cases unless you have good reason to do so...

I actually left this in accidentally from some speed testing I was doing. I'll remove it and let them use the default (unless @nanounanue wants to change it to something else)

shaycrk · 2019-05-03T19:23:39Z

docs/sources/experiments/prediction-ranking.md

+| Column name | Behavior |
+| ----------- | ------- |
+| rank_abs_with_ties | Absolute ranking, with ties. Ranks will skip after a set of ties, so if two entities are tied at rank 3, the next entity after them will have rank 5. |
+| rank_pct_with_ties | Percentile ranking, with ties. Percentiles will skip after a set of ties, so if two entities out of ten are tied at 0.1 (tenth percentile), the next entity after them will have 0.3 (thirtieth percentile) |


might be good to mention if the percentiles are calculated to a certain precision?

shaycrk · 2019-05-03T21:21:25Z

src/triage/component/catwalk/predictors.py

+        ranking_df['rank_pct_no_ties'] = numpy.array([1 - (rank - 1) / len(ranking_df) for rank in ranking_df['rank_abs_no_ties']])
+        ranking_df['rank_pct_with_ties'] = ranking_df['score'].rank(method='min', pct=True)
+
+        # with our rankings computed, upsert these ranks into the predictions table


just an update, not an upsert, right? That is, it assumes the records are in the predictions table already (at least if I'm reading this correctly)

shaycrk

Made a couple of small notes about comments inline, but generally looks good to me!

thcrock assigned shaycrk Apr 10, 2019

thcrock mentioned this pull request Apr 10, 2019

Always insert ranks for predictions [Resolves #357] #358

Closed

thcrock requested review from ecsalomon, saleiro and nanounanue April 15, 2019 19:58

ecsalomon reviewed Apr 25, 2019

View reviewed changes

shaycrk self-requested a review April 25, 2019 19:47

shaycrk reviewed Apr 25, 2019

View reviewed changes

thcrock force-pushed the ranking_try_3 branch 3 times, most recently from b8fc0cf to 02f60c6 Compare May 3, 2019 19:07

thcrock force-pushed the ranking_try_3 branch from 02f60c6 to 2ff1008 Compare May 3, 2019 19:10

shaycrk reviewed May 3, 2019

View reviewed changes

shaycrk approved these changes May 3, 2019

View reviewed changes

shaycrk assigned thcrock and unassigned shaycrk May 3, 2019

thcrock added 2 commits May 3, 2019 17:14

Changes from review

e007c4c

Cast tests to float

bf70c02

thcrock merged commit 1dc8a4a into master May 3, 2019

thcrock deleted the ranking_try_3 branch May 3, 2019 22:29

nanounanue mentioned this pull request Aug 28, 2024

random seed in parameters #948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert Ranks for Predictions [Resolves #357] #671

Insert Ranks for Predictions [Resolves #357] #671

thcrock commented Apr 10, 2019

codecov-io commented Apr 11, 2019 •

edited

Loading

ecsalomon left a comment •

edited

Loading

ecsalomon Apr 15, 2019

ecsalomon Apr 15, 2019

ecsalomon Apr 15, 2019

thcrock Apr 25, 2019

shaycrk Apr 25, 2019

thcrock Apr 25, 2019

ecsalomon Apr 25, 2019

shaycrk Apr 25, 2019

ecsalomon Apr 15, 2019

thcrock Apr 25, 2019

shaycrk Apr 25, 2019

shaycrk May 3, 2019

thcrock May 3, 2019

shaycrk May 3, 2019

shaycrk May 3, 2019

shaycrk left a comment

Insert Ranks for Predictions [Resolves #357] #671

Insert Ranks for Predictions [Resolves #357] #671

Conversation

thcrock commented Apr 10, 2019

codecov-io commented Apr 11, 2019 • edited Loading

Codecov Report

ecsalomon left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaycrk left a comment

Choose a reason for hiding this comment

codecov-io commented Apr 11, 2019 •

edited

Loading

ecsalomon left a comment •

edited

Loading