Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make using groups other than entity_id in collate less error-prone #874

Closed
shaycrk opened this issue Nov 15, 2021 · 1 comment
Closed

Make using groups other than entity_id in collate less error-prone #874

shaycrk opened this issue Nov 15, 2021 · 1 comment

Comments

@shaycrk
Copy link
Contributor

shaycrk commented Nov 15, 2021

Mostly for discussion, but it seems like trying to use anything aside from entity_id when specifying groups in a feature aggregation (e.g., zipcode, etc) is currently pretty error-prone. For instance, if there isn't a 1-to-1 relationship between the entity_id and these other columns, you'll end up with multiple records in the matrix with the same (entity_id, as_of_date) key, which causes many problems downstream.

Thoughts on how to improve the functionality here? Some options:

  • Remove groups from the experiment config and always assume only entity_id?
  • Allow the user to specify non-entity_id groups, but make them somehow override a validation check to make it clear this is "advanced" functionality?
  • Check that matrices have no duplicate entity_id/as_of_date pairs and raise an early error if they do?
  • Other ideas?
@shaycrk
Copy link
Contributor Author

shaycrk commented Mar 28, 2022

Note that we decided to remove support for collate groups other than entity_id for the time being via #887, especially pending further discussion on what direction we want to go with feature engineering generally in the future. Will go ahead and close this issue for now, though it's possible someone might want to revisit this question in the future.

@shaycrk shaycrk closed this as completed Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant