Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on cohort or label duplicates #889

Merged
merged 6 commits into from
Apr 18, 2022
Merged

Error on cohort or label duplicates #889

merged 6 commits into from
Apr 18, 2022

Conversation

shaycrk
Copy link
Contributor

@shaycrk shaycrk commented Mar 26, 2022

Closes #872

Adds a check for duplicates in the labels or cohort table in order to error earlier and with a more helpful message when this occurs (currently, this tends to cause a duplicate primary key error far downstream when saving predictions, which can be difficult to debug).

Note that I actually haven't added a unit test for catching duplicates in the cohort because two routes (via query or labels) for generating these already do a distinct or group by so these shouldn't actually occur. We could consider removing this logic and putting the burden on user (which might enforce better understanding of what triage is doing/expecting), but I'm not sure that's too worthwhile.

@thcrock -- is the fact that there are two (essentially identical) database_reflection.py files (one in src/triage/ and the other in src/triage/component/architect/) an artifact of the merge to a single repo? I'd like to consolidate -- thoughts/preferences on which is a better place for this it to live?

@thcrock
Copy link
Contributor

thcrock commented Mar 27, 2022

@shaycrk When it comes to the database reflection util I prefer to get rid of the arachitect version. In the now-outdated cleanup PR https://github.com/dssg/triage/pull/692/files I moved shared things to the top triage level.

@shaycrk
Copy link
Contributor Author

shaycrk commented Mar 29, 2022

Thanks @thcrock! I went ahead and consolidated into the top-level file here and didn't appear to break anything in the process 😄 so I'll go ahead and switch this PR out of draft mode since I think it's ready for you to take a look whenever you have a chance.

@shaycrk shaycrk marked this pull request as ready for review March 29, 2022 00:02
@shaycrk shaycrk changed the title Error on cohort or label duplicates (WIP) Error on cohort or label duplicates Mar 29, 2022
Copy link
Contributor

@thcrock thcrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, but this looks good

@shaycrk
Copy link
Contributor Author

shaycrk commented Apr 18, 2022

Thanks! I'll go ahead and merge in that case

@shaycrk shaycrk merged commit 335f264 into master Apr 18, 2022
@shaycrk shaycrk deleted the kit_check_dupes branch April 18, 2022 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error earlier on duplicates in label/cohort
2 participants