-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subset evaluations support in evaluations module #535
Comments
First step in resolving #519 |
ecsalomon
added a commit
that referenced
this issue
Dec 10, 2018
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Dec 10, 2018
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Jan 19, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds three tables to the results schemas to track subsets and their evaluations: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created - `train_results.subset_evaluations` and `test_results.subset_evaluations` store evaluations for each subset A new alembic upgrade script creates the subsets tables. Testing factories are included for the subsets and subset_evaluations tables, and a test for the factories ensures that the foreign keys in the subset_evaluations tables are correctly configured. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Feb 19, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Feb 20, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). WIP: Preparation for a more subsets-like experience, where a subset table is built initially from the user-input query and then used at evaluation time. The first step in this is renaming the cohort generators to entity_date table generators, as the code will have a more generic function. However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Feb 22, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Feb 22, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
ecsalomon
added a commit
that referenced
this issue
Feb 27, 2019
This commit adds support for evaluating models against subsets of their predictions, in both training and testing. It adds a table to the results schemas to track subsets: - `model_metadata.subsets` stores subset metadata, including a hash, the subset configuration, and the time the row was created The `evaluations` tables in the `train_results` and `test_results` schemas are updated to include a new column (also added to the primary key), `subset_hash` that is an empty string for full cohort evaluations or contains the subset hash when the evaluation is for a subset of the cohort. A new alembic upgrade script creates the subsets table and updates the evaluation tables. Testing factories are included or modified for the subsets and evaluation tables. Most of the remaining code changes are made to the ModelEvaluator class, which can now process subset queries and write the results to the appropriate table [#535] and will record `NULL` values for undefined metrics (whether due to an empty subset or lack of variation in labels [#138]). However, some changes are made elsewhere in the experiment to allow (optionally) including subsets in the experiment configuration file, including storing subset metadata in the `model_metadata.subsets` table and iterating over subsets in the model tester. In addition, some changes to the documentation and `.gitignore` are included to make modifying the results schema more joyful.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: