Heisenbug in fake_labels #138

thcrock · 2017-04-28T22:14:28Z

When this code (in tests/utils.py):

def fake_labels(length):
    return numpy.array([random.choice([True, False]) for i in range(0, length)])

Comes up with all the same values, some of the metric calculators error out. We should just hardcode this to have some variation.

The text was updated successfully, but these errors were encountered:

thcrock · 2017-07-19T22:04:19Z

Moved to dssg/architect#13

thcrock · 2018-12-03T17:23:19Z

Not fixed (or rather, should have been reopened when architect merged back in here)

thcrock · 2018-12-03T17:25:44Z

On the surface this looks like the test can just be fixed, but although the case where this would come up in real runs is rare, it could happen and perhaps it's the code that should be fixed and not the test.

Instead of a random test there should be two test cases for the metric calculators: one with mixed labels and one with all the same labels. I'm not sure what the expected behavior for the metric calculators would be for all-same-labels but it should be present there

ecsalomon · 2018-12-04T16:51:28Z

Per yesterday's conversation:

The best first version of this is probably to catch the errors, print an informative warning, and insert NULL values for the affected metrics.
Having the NULLs will cause audition to blow up if the model groups are included in its query, but, generally, users should not be trying to run audition after experiencing this issue, so handling this in audition is a nice-to-have but not necessary change.

ecsalomon · 2018-12-08T04:41:02Z

Hmmm, as I am addressing the subsets issue, I thought this might also be the correct way to handle empty subsets, so the "evaluations" still get written to the database but with the relevant information that there were no labels to evaluate.

This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). WIP: Preparation for a more subsets-like experience, where a subset table is built initially from the user-input query and then used at evaluation time. The first step in this is renaming the cohort generators to entity_date table generators, as the code will have a more generic function. However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

* Evaluate on subsets [Resolves #535, #138] This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.

thcrock · 2019-02-28T19:20:26Z

Closed in #552

thcrock added the bug label Apr 28, 2017

thcrock closed this as completed Jul 19, 2017

thcrock reopened this Dec 3, 2018

thcrock mentioned this issue Dec 3, 2018

refined test query to avoid unwarranted failure #523

Merged

thcrock added the trivial Should be very easy to implement, in a couple lines or so label Dec 6, 2018

ecsalomon self-assigned this Dec 8, 2018

ecsalomon mentioned this issue Dec 10, 2018

Evaluate on subsets [Resolves #535, #138] #552

Merged

thcrock closed this as completed Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heisenbug in fake_labels #138

Heisenbug in fake_labels #138

thcrock commented Apr 28, 2017 •

edited

Loading

thcrock commented Jul 19, 2017

thcrock commented Dec 3, 2018

thcrock commented Dec 3, 2018

ecsalomon commented Dec 4, 2018

ecsalomon commented Dec 8, 2018

thcrock commented Feb 28, 2019

Heisenbug in fake_labels #138

Heisenbug in fake_labels #138

Comments

thcrock commented Apr 28, 2017 • edited Loading

thcrock commented Jul 19, 2017

thcrock commented Dec 3, 2018

thcrock commented Dec 3, 2018

ecsalomon commented Dec 4, 2018

ecsalomon commented Dec 8, 2018

thcrock commented Feb 28, 2019

thcrock commented Apr 28, 2017 •

edited

Loading