Collate produces redundant imputation flags #544

ecsalomon · 2018-12-05T18:52:11Z

The imputations for a categorical or quantity will be the same for the same aggregation period, regardless of aggregation function. This produces a lot of redundant columns. For example, the following features will have exactly the same imputation flag columns:

Average test score in the last three years
Max test score in the last three years
Minimum test score in the last three years
Standard deviation of test score in the last three years
Modal test score in the last three years

Collate should add only one imputation column per quantity/categorical per aggregation period.

thcrock · 2019-04-02T22:00:57Z

I'm guessing we should name the _imp column similarly but without the aggregate function?
e.g.

zip_code_features_zip_code_1year_num_events_min_imp
zip_code_features_zip_code_1year_num_events_max_imp

become
zip_code_features_zip_code_1year_num_events_imp
?

The content of the imputation flag columns across all functions for a given timespan will be the same. This commit removes the redundant columns, and names the imputation flag column without any function name (e.g. 'events_entity_id_1y_outcome_imp' instead of 'events_entity_id_1y_outcome_avg_imp') - Change the Imputer class interface: - Add column_imputation_base to constructor - Change imputation_flag_sql to imputation_flag_select_and_alias so the caller can keep track of the aliases without doing SQL parsing - Change the Aggregation/SpacetimeAggregation to: - Create reverse column name -> Aggregate lookup - When creating the imputation SQL, query the lookup to create the column_imputation_base - Modify experiment algorithm doc to describe imputation flag behavior

Remove redundant imputation flag columns [Resolves #544]

rayidghani added the urgent label Jan 22, 2019

thcrock self-assigned this Apr 19, 2019

ecsalomon closed this as completed in c9c5182 Apr 25, 2019

ecsalomon added a commit that referenced this issue Apr 25, 2019

Merge pull request #676 from dssg/redundant_imp_flags

4cebba4

Remove redundant imputation flag columns [Resolves #544]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collate produces redundant imputation flags #544

Collate produces redundant imputation flags #544

ecsalomon commented Dec 5, 2018

thcrock commented Apr 2, 2019

Collate produces redundant imputation flags #544

Collate produces redundant imputation flags #544

Comments

ecsalomon commented Dec 5, 2018

thcrock commented Apr 2, 2019