Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heisenbug in fake_labels #138

Closed
thcrock opened this issue Apr 28, 2017 · 6 comments
Closed

Heisenbug in fake_labels #138

thcrock opened this issue Apr 28, 2017 · 6 comments
Assignees
Labels
trivial Should be very easy to implement, in a couple lines or so

Comments

@thcrock
Copy link
Contributor

thcrock commented Apr 28, 2017

When this code (in tests/utils.py):

def fake_labels(length):
    return numpy.array([random.choice([True, False]) for i in range(0, length)])

Comes up with all the same values, some of the metric calculators error out. We should just hardcode this to have some variation.

@thcrock thcrock added the bug label Apr 28, 2017
@thcrock
Copy link
Contributor Author

thcrock commented Jul 19, 2017

Moved to dssg/architect#13

@thcrock thcrock closed this as completed Jul 19, 2017
@thcrock
Copy link
Contributor Author

thcrock commented Dec 3, 2018

Not fixed (or rather, should have been reopened when architect merged back in here)

@thcrock thcrock reopened this Dec 3, 2018
@thcrock
Copy link
Contributor Author

thcrock commented Dec 3, 2018

On the surface this looks like the test can just be fixed, but although the case where this would come up in real runs is rare, it could happen and perhaps it's the code that should be fixed and not the test.

Instead of a random test there should be two test cases for the metric calculators: one with mixed labels and one with all the same labels. I'm not sure what the expected behavior for the metric calculators would be for all-same-labels but it should be present there

@ecsalomon
Copy link
Contributor

Per yesterday's conversation:

  • The best first version of this is probably to catch the errors, print an informative warning, and insert NULL values for the affected metrics.

  • Having the NULLs will cause audition to blow up if the model groups are included in its query, but, generally, users should not be trying to run audition after experiencing this issue, so handling this in audition is a nice-to-have but not necessary change.

@thcrock thcrock added the trivial Should be very easy to implement, in a couple lines or so label Dec 6, 2018
@ecsalomon
Copy link
Contributor

Hmmm, as I am addressing the subsets issue, I thought this might also be the correct way to handle empty subsets, so the "evaluations" still get written to the database but with the relevant information that there were no labels to evaluate.

@ecsalomon ecsalomon self-assigned this Dec 8, 2018
ecsalomon added a commit that referenced this issue Dec 10, 2018
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds three tables to the
results schemas to track subsets and their evaluations:

  - `model_metadata.subsets` stores subset metadata, including a hash
    hash, the subset configuration, and the time the row was created
  - `train_results.subset_evaluations` and
    `test_results.subset_evaluations` store evaluations for each subset

A new alembic upgrade script creates the subsets tables.

Testing factories are included for the subsets and subset_evaluations
tables, and a test for the factories ensures that the foreign keys in
the subset_evaluations tables are correctly configured.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Dec 10, 2018
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds three tables to the
results schemas to track subsets and their evaluations:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created
  - `train_results.subset_evaluations` and
    `test_results.subset_evaluations` store evaluations for each subset

A new alembic upgrade script creates the subsets tables.

Testing factories are included for the subsets and subset_evaluations
tables, and a test for the factories ensures that the foreign keys in
the subset_evaluations tables are correctly configured.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Jan 19, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds three tables to the
results schemas to track subsets and their evaluations:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created
  - `train_results.subset_evaluations` and
    `test_results.subset_evaluations` store evaluations for each subset

A new alembic upgrade script creates the subsets tables.

Testing factories are included for the subsets and subset_evaluations
tables, and a test for the factories ensures that the foreign keys in
the subset_evaluations tables are correctly configured.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Feb 19, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Feb 20, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

WIP: Preparation for a more subsets-like experience, where a subset
table is built initially from the user-input query and then used at
evaluation time. The first step in this is renaming the cohort
generators to entity_date table generators, as the code will have a more
generic function.

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Feb 22, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Feb 22, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
ecsalomon added a commit that referenced this issue Feb 27, 2019
This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
thcrock pushed a commit that referenced this issue Feb 28, 2019
* Evaluate on subsets [Resolves #535, #138]

This commit adds support for evaluating models against subsets of their
predictions, in both training and testing. It adds a table to the
results schemas to track subsets:

  - `model_metadata.subsets` stores subset metadata, including a hash,
    the subset configuration, and the time the row was created

The `evaluations` tables in the `train_results` and `test_results`
schemas are updated to include a new column (also added to the primary
key), `subset_hash` that is an empty string for full cohort evaluations
or contains the subset hash when the evaluation is for a subset of the
cohort.

A new alembic upgrade script creates the subsets table and updates the
evaluation tables.

Testing factories are included or modified for the subsets and
evaluation tables.

Most of the remaining code changes are made to the ModelEvaluator class,
which can now process subset queries and write the results to the
appropriate table [#535] and will record `NULL` values for undefined
metrics (whether due to an empty subset or lack of variation in labels
[#138]).

However, some changes are made elsewhere in the experiment to allow
(optionally) including subsets in the experiment configuration file,
including storing subset metadata in the `model_metadata.subsets` table
and iterating over subsets in the model tester.

In addition, some changes to the documentation and `.gitignore` are
included to make modifying the results schema more joyful.
@thcrock
Copy link
Contributor Author

thcrock commented Feb 28, 2019

Closed in #552

@thcrock thcrock closed this as completed Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trivial Should be very easy to implement, in a couple lines or so
Projects
None yet
Development

No branches or pull requests

2 participants