-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant imputation flag columns [Resolves #544] #676
Conversation
The content of the imputation flag columns across all functions for a given timespan will be the same. This commit removes the redundant columns, and names the imputation flag column without any function name (e.g. 'events_entity_id_1y_outcome_imp' instead of 'events_entity_id_1y_outcome_avg_imp') - Change the Imputer class interface: - Add column_imputation_base to constructor - Change imputation_flag_sql to imputation_flag_select_and_alias so the caller can keep track of the aliases without doing SQL parsing - Change the Aggregation/SpacetimeAggregation to: - Create reverse column name -> Aggregate lookup (with some refactoring so it can build this without duplicating a bunch fo existing logic) - When creating the imputation SQL, query the lookup to create the column_imputation_base - Modify experiment algorithm doc to describe imputation flag behavior
Codecov Report
@@ Coverage Diff @@
## master #676 +/- ##
==========================================
- Coverage 82.78% 82.76% -0.02%
==========================================
Files 90 90
Lines 6012 6058 +46
==========================================
+ Hits 4977 5014 +37
- Misses 1035 1044 +9
Continue to review full report at Codecov.
|
@thcrock -- I still need to take a look at the code here. Definitely a good thing to clean up the redundant flags, but I think there are some cases where the imputation flag could differ in the same time period across functions. For instance, standard deviation will return |
@shaycrk I just pushed a change that should hopefully fix your comment and the discussion we had yesterday. |
if self.column_base_for_impflag: | ||
return ( | ||
template.format(col=self.column), | ||
alias_template.format(base_for_imp_flag=self.column_base_for_impflag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky, but maybe standardize the naming on imp_flag
vs impflag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small inline comment, but generally looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sooo pleased. ❤️
The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')
the caller can keep track of the aliases without doing SQL parsing
refactoring so it can build this without duplicating a bunch fo
existing logic)
column_imputation_base