Remove redundant imputation flag columns [Resolves #544] #676

thcrock · 2019-04-19T22:08:34Z

The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')

Change the Imputer class interface:
- Add column_imputation_base to constructor
- Change imputation_flag_sql to imputation_flag_select_and_alias so
  the caller can keep track of the aliases without doing SQL parsing
Change the Aggregation/SpacetimeAggregation to:
- Create reverse column name -> Aggregate lookup (with some
  refactoring so it can build this without duplicating a bunch fo
  existing logic)
- When creating the imputation SQL, query the lookup to create the
  column_imputation_base
Modify experiment algorithm doc to describe imputation flag behavior

The content of the imputation flag columns across all functions for a given timespan will be the same. This commit removes the redundant columns, and names the imputation flag column without any function name (e.g. 'events_entity_id_1y_outcome_imp' instead of 'events_entity_id_1y_outcome_avg_imp') - Change the Imputer class interface: - Add column_imputation_base to constructor - Change imputation_flag_sql to imputation_flag_select_and_alias so the caller can keep track of the aliases without doing SQL parsing - Change the Aggregation/SpacetimeAggregation to: - Create reverse column name -> Aggregate lookup (with some refactoring so it can build this without duplicating a bunch fo existing logic) - When creating the imputation SQL, query the lookup to create the column_imputation_base - Modify experiment algorithm doc to describe imputation flag behavior

codecov-io · 2019-04-19T22:26:19Z

Codecov Report

Merging #676 into master will decrease coverage by 0.01%.
The diff coverage is 85.48%.

@@            Coverage Diff             @@
##           master     #676      +/-   ##
==========================================
- Coverage   82.78%   82.76%   -0.02%     
==========================================
  Files          90       90              
  Lines        6012     6058      +46     
==========================================
+ Hits         4977     5014      +37     
- Misses       1035     1044       +9

Impacted Files	Coverage Δ
src/triage/component/collate/imputations.py	`100% <100%> (ø)`	⬆️
src/triage/component/collate/collate.py	`90.25% <72.41%> (-1.97%)`	⬇️
src/triage/component/collate/spacetime.py	`98.29% <94.44%> (-0.72%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec78a2a...f677953. Read the comment docs.

shaycrk · 2019-04-22T14:50:44Z

@thcrock -- I still need to take a look at the code here. Definitely a good thing to clean up the redundant flags, but I think there are some cases where the imputation flag could differ in the same time period across functions. For instance, standard deviation will return NULL if you have exactly one non-null value while many other functions (sum, min, max, etc) will not. Seems like we would want to be able to allow for these cases as well.

thcrock · 2019-04-24T18:57:30Z

@shaycrk I just pushed a change that should hopefully fix your comment and the discussion we had yesterday.

shaycrk · 2019-04-25T19:42:02Z

src/triage/component/collate/imputations.py

+            if self.column_base_for_impflag:
+                return (
+                    template.format(col=self.column),
+                    alias_template.format(base_for_imp_flag=self.column_base_for_impflag)


nitpicky, but maybe standardize the naming on imp_flag vs impflag?

shaycrk

one small inline comment, but generally looks good to me

ecsalomon

Sooo pleased. ❤️

thcrock assigned ecsalomon Apr 19, 2019

thcrock requested a review from shaycrk April 19, 2019 22:09

Change in behavior from review

42aa9f1

shaycrk reviewed Apr 25, 2019

View reviewed changes

shaycrk approved these changes Apr 25, 2019

View reviewed changes

shaycrk requested a review from ecsalomon April 25, 2019 19:46

ecsalomon approved these changes Apr 25, 2019

View reviewed changes

Straight up impflag vs imp_flag

f677953

ecsalomon merged commit 4cebba4 into master Apr 25, 2019

ecsalomon deleted the redundant_imp_flags branch April 25, 2019 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant imputation flag columns [Resolves #544] #676

Remove redundant imputation flag columns [Resolves #544] #676

thcrock commented Apr 19, 2019

codecov-io commented Apr 19, 2019 •

edited

Loading

shaycrk commented Apr 22, 2019

thcrock commented Apr 24, 2019

shaycrk Apr 25, 2019

shaycrk left a comment

ecsalomon left a comment

Remove redundant imputation flag columns [Resolves #544] #676

Remove redundant imputation flag columns [Resolves #544] #676

Conversation

thcrock commented Apr 19, 2019

codecov-io commented Apr 19, 2019 • edited Loading

Codecov Report

shaycrk commented Apr 22, 2019

thcrock commented Apr 24, 2019

shaycrk Apr 25, 2019

Choose a reason for hiding this comment

shaycrk left a comment

Choose a reason for hiding this comment

ecsalomon left a comment

Choose a reason for hiding this comment

codecov-io commented Apr 19, 2019 •

edited

Loading