Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collate produces redundant imputation flags #544

Closed
ecsalomon opened this issue Dec 5, 2018 · 1 comment
Closed

Collate produces redundant imputation flags #544

ecsalomon opened this issue Dec 5, 2018 · 1 comment
Assignees
Labels

Comments

@ecsalomon
Copy link
Contributor

The imputations for a categorical or quantity will be the same for the same aggregation period, regardless of aggregation function. This produces a lot of redundant columns. For example, the following features will have exactly the same imputation flag columns:

  • Average test score in the last three years
  • Max test score in the last three years
  • Minimum test score in the last three years
  • Standard deviation of test score in the last three years
  • Modal test score in the last three years

Collate should add only one imputation column per quantity/categorical per aggregation period.

@thcrock
Copy link
Contributor

thcrock commented Apr 2, 2019

I'm guessing we should name the _imp column similarly but without the aggregate function?
e.g.

zip_code_features_zip_code_1year_num_events_min_imp
zip_code_features_zip_code_1year_num_events_max_imp

become
zip_code_features_zip_code_1year_num_events_imp
?

@thcrock thcrock self-assigned this Apr 19, 2019
thcrock added a commit that referenced this issue Apr 19, 2019
The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')

- Change the Imputer class interface:
    - Add column_imputation_base to constructor
    - Change imputation_flag_sql to imputation_flag_select_and_alias so
    the caller can keep track of the aliases without doing SQL parsing
- Change the Aggregation/SpacetimeAggregation to:
    - Create reverse column name -> Aggregate lookup
    - When creating the imputation SQL, query the lookup to create the
    column_imputation_base
- Modify experiment algorithm doc to describe imputation flag behavior
thcrock added a commit that referenced this issue Apr 19, 2019
The content of the imputation flag columns across all functions for a
given timespan will be the same. This commit removes the redundant
columns, and names the imputation flag column without any function name
(e.g. 'events_entity_id_1y_outcome_imp' instead of
'events_entity_id_1y_outcome_avg_imp')

- Change the Imputer class interface:
    - Add column_imputation_base to constructor
    - Change imputation_flag_sql to imputation_flag_select_and_alias so
    the caller can keep track of the aliases without doing SQL parsing
- Change the Aggregation/SpacetimeAggregation to:
    - Create reverse column name -> Aggregate lookup
    - When creating the imputation SQL, query the lookup to create the
    column_imputation_base
- Modify experiment algorithm doc to describe imputation flag behavior
ecsalomon added a commit that referenced this issue Apr 25, 2019
Remove redundant imputation flag columns [Resolves #544]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants