Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant imputation flag columns [Resolves #544] #676

Merged
merged 3 commits into from
Apr 25, 2019

Conversation

thcrock
Copy link
Contributor

@thcrock thcrock commented Apr 19, 2019

The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')

  • Change the Imputer class interface:
    • Add column_imputation_base to constructor
    • Change imputation_flag_sql to imputation_flag_select_and_alias so
      the caller can keep track of the aliases without doing SQL parsing
  • Change the Aggregation/SpacetimeAggregation to:
    • Create reverse column name -> Aggregate lookup (with some
      refactoring so it can build this without duplicating a bunch fo
      existing logic)
    • When creating the imputation SQL, query the lookup to create the
      column_imputation_base
  • Modify experiment algorithm doc to describe imputation flag behavior

The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')

- Change the Imputer class interface:
    - Add column_imputation_base to constructor
    - Change imputation_flag_sql to imputation_flag_select_and_alias so
    the caller can keep track of the aliases without doing SQL parsing
- Change the Aggregation/SpacetimeAggregation to:
    - Create reverse column name -> Aggregate lookup (with some
    refactoring so it can build this without duplicating a bunch fo
    existing logic)
    - When creating the imputation SQL, query the lookup to create the
    column_imputation_base
- Modify experiment algorithm doc to describe imputation flag behavior
@codecov-io
Copy link

codecov-io commented Apr 19, 2019

Codecov Report

Merging #676 into master will decrease coverage by 0.01%.
The diff coverage is 85.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #676      +/-   ##
==========================================
- Coverage   82.78%   82.76%   -0.02%     
==========================================
  Files          90       90              
  Lines        6012     6058      +46     
==========================================
+ Hits         4977     5014      +37     
- Misses       1035     1044       +9
Impacted Files Coverage Δ
src/triage/component/collate/imputations.py 100% <100%> (ø) ⬆️
src/triage/component/collate/collate.py 90.25% <72.41%> (-1.97%) ⬇️
src/triage/component/collate/spacetime.py 98.29% <94.44%> (-0.72%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec78a2a...f677953. Read the comment docs.

@shaycrk
Copy link
Contributor

shaycrk commented Apr 22, 2019

@thcrock -- I still need to take a look at the code here. Definitely a good thing to clean up the redundant flags, but I think there are some cases where the imputation flag could differ in the same time period across functions. For instance, standard deviation will return NULL if you have exactly one non-null value while many other functions (sum, min, max, etc) will not. Seems like we would want to be able to allow for these cases as well.

@thcrock
Copy link
Contributor Author

thcrock commented Apr 24, 2019

@shaycrk I just pushed a change that should hopefully fix your comment and the discussion we had yesterday.

if self.column_base_for_impflag:
return (
template.format(col=self.column),
alias_template.format(base_for_imp_flag=self.column_base_for_impflag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky, but maybe standardize the naming on imp_flag vs impflag?

Copy link
Contributor

@shaycrk shaycrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small inline comment, but generally looks good to me

@shaycrk shaycrk requested a review from ecsalomon April 25, 2019 19:46
Copy link
Contributor

@ecsalomon ecsalomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sooo pleased. ❤️

@ecsalomon ecsalomon merged commit 4cebba4 into master Apr 25, 2019
@ecsalomon ecsalomon deleted the redundant_imp_flags branch April 25, 2019 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants