-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropped records from one republication event to another #5301
Comments
I agree with @dshorthouse that a pause on resources that experience a significant reduction in records during a publication event would be helpful, but I would add that it would be most helpful if the pause was accompanied with a notification describing the reason for the pause. GBIF already does this for resources when a significant number of occurrenceIDs have changed, and as an Admin on the VertNet IPT, I receive an email from @jhnwllr detailing the rationale. I can then either communicate that this was intentional and request the pause be lifted or I am alerted to return to the IPT and the resource in question to look for errors. In this particular issue with Arctos, notifications would be extra helpful, because all of these resources are set to publish automatically and I cannot monitor these events in person. I expect other Admins who have resources set to publish on a regular schedule would appreciate the extra layer of safety, too. Cc'ing others on the GBIF Helpdesk here (other than JW). @ahahn-gbif @ManonGros @CecSve |
Update: the backend Arctos issues have been corrected and all affected resources have been updated and corrected via the VertNet IPT, but the issue presented here is still relevant. |
@dshorthouse @dbloom I think our occurrenceId checker currently only catches datasets when they have a big increase, but no decrease in records. It might be possible to have it catch both. @muttcg |
It'd be good for the GBIF ingestion to have a threshold check, but the IPT 3.1.0 should also catch this at source when this issue is addressed - please can you comment on that if you have wishes? |
Closing this ticket as addressed, assuming other tickets elsewhere have been created to target solutions that help monitor mishaps such as significant decreased in number of records per dataset. |
Most (all?) of the Arctos-based IPTs recently experienced a significant drop in the number of records they serve and it's unclear to me if anyone noticed:
https://ipt.vertnet.org/resource?r=mvz_herp from 290k to 10k
https://ipt.vertnet.org/resource?r=uam_herb_vascular from 202k to 21k
https://ipt.vertnet.org/resource?r=mvz_bird 195k to 3k
https://ipt.vertnet.org/resource?r=ucm_herps 68k to 40
https://ipt.vertnet.org/resource?r=mvz_mammal 245k to 19k
I contacted the data publishers for the above & their technical support. No need to do so again.
However...
Besides checking for shifting occurrenceIDs, I suggest the processing pipeline at GBIF likewise put a stop with GitHub issues created here when a republication event experiences a significant drop in the number of records served such as what happened to the above. A 25% drop in number might be a reasonable threshold.
The text was updated successfully, but these errors were encountered: