-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heisenbug in fake_labels #138
Comments
Moved to dssg/architect#13 |
Not fixed (or rather, should have been reopened when architect merged back in here) |
On the surface this looks like the test can just be fixed, but although the case where this would come up in real runs is rare, it could happen and perhaps it's the code that should be fixed and not the test. Instead of a random test there should be two test cases for the metric calculators: one with mixed labels and one with all the same labels. I'm not sure what the expected behavior for the metric calculators would be for all-same-labels but it should be present there |
Per yesterday's conversation:
|
Hmmm, as I am addressing the subsets issue, I thought this might also be the correct way to handle empty subsets, so the "evaluations" still get written to the database but with the relevant information that there were no labels to evaluate. |
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). WIP: Preparation for a more subsets-like experience, where a subset table is built initially from the user-input query and then used at evaluation time. The first step in this is renaming the cohort generators to entity_date table generators, as the code will have a more generic function. However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
* Evaluate on subsets [Resolves #535, #138] This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
Closed in #552 |
When this code (in tests/utils.py):
Comes up with all the same values, some of the metric calculators error out. We should just hardcode this to have some variation.
The text was updated successfully, but these errors were encountered: