Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow auto-publication to abort when there are (far) fewer records #2133

Open
peterdesmet opened this issue Oct 24, 2023 · 9 comments
Open
Assignees
Milestone

Comments

@peterdesmet
Copy link
Member

Source files via URL + auto-publication is very useful for automatically publishing an active dataset. We use it for e.g. the following citizen science dataset: https://ipt.inbo.be/resource?r=dieren-planten-natuurpunt-occurrences

It would be useful however, if the IPT offered some options for aborting the auto-publication. The dataset above for example, has an issue in the pipeline, which resulted in far fewer records in the source file. This resulted in the (unintentional) deletion of many records at GBIF.org. It would have been nice if the IPT can detect this and abort the auto-publication.

@mike-podolskiy90 mike-podolskiy90 self-assigned this Oct 25, 2023
@mike-podolskiy90
Copy link
Contributor

Thanks Peter for the suggestion.
It sounds like a very sensible and useful feature.

@mike-podolskiy90
Copy link
Contributor

But what would be threshold for records drop? Should it be percentage or number?

@peterdesmet
Copy link
Member Author

I suggest a threshold of 90% (hardcoded), but make it an optional setting when setting up auto-publication. That also leaves room for other options, without making it too complicated. Some of these options should probably not be optional (e.g. source data are missing), but always result in an error.

Enable auto-publication

  • Abort when the number of records has dropped by 10%
  • Abort when mapped fields are missing in source data

@dshorthouse
Copy link
Contributor

+1 for support. It would help prevent downstream snafus. The only issue I see here is the secondary need for notification of the abort(s) from the IPT, otherwise an affected dataset may sleep indefinitely in purgatory.

@mike-podolskiy90
Copy link
Contributor

Thanks @dshorthouse
Email notification might be a very good idea here.

@MattBlissett
Copy link
Member

Email notification would require additional configuration by the administrator — currently the IPT doesn't send any emails.

Having this within the IPT would avoid bad data being published, but having it detected by GBIF would allow easier email notifications and the helpdesk could be involved.

@mike-podolskiy90
Copy link
Contributor

@MattBlissett Actually, IPT does send emails, but not directly and via Registry. There is an option "Click here to contact organisation" and there is a link to send an organization token/password reminder. So we can probably implement that similarly.

@dbloom
Copy link

dbloom commented Apr 22, 2024

Having just been through this with @dshorthouse, I agree that an email notification would be very helpful - especially for those publications that initiated on an automated schedule (I may not see an issue for days otherwise). With nearly 180 resources publishing on a schedule knowing that an event was aborted or that the # of records was reduced (significantly), or both, would be very helpful. I also think it is important to be able to configure who receives these messages from within the IPT. The VertNet IPT, for example, has several admins, but not all need to, or should, received notices like this.

@MattBlissett
Copy link
Member

@MattBlissett Actually, IPT does send emails, but not directly and via Registry. There is an option "Click here to contact organisation" and there is a link to send an organization token/password reminder. So we can probably implement that similarly.

I'd be reluctant for us to send emails triggered by systems (external IPTs) which we do not control. We could have IPTs with resources that are broken for months emailing users who don't want those emails (e.g. no longer work on the resource), and that risks GBIF's systems being considered spammy by Google, Microsoft etc.

TBC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants