You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For debugging reasons (e.g. to check the effect of data mapping fixes), it is sometimes necessary to re-trigger a crawl within a shorter time interval from the last run. Due to the fixed lag period of seven days between crawl attempts, this is presently not easily possible: any earlier crawl request will be refused (resulting in log messages like "DEBUG message: Not eligible to crawl [e45c7d91-81c6-4455-86e3-2965a5739b1f] - crawled 5 days ago, which is within threshold of 7 days.)"
At least for a registry administrator, it would be very helpful to be able to request a re-crawl at any time, also before the seven day waiting period is up (and starting from a re-harvest of the local data source, e.g. DwC-A).
Also: to an outside user (e.g. a registered publisher requesting the crawl through the UI), the enforced lag period and ignoring of the crawl request is not immediately transparent, they will just find that nothing happens in response to them requesting a re-crawl. Some feedback would be helpful here.
The text was updated successfully, but these errors were encountered:
Due to the fixed lag period of seven days between crawl attempts, this is presently not easily possible: any earlier crawl request will be refused
That's not actually the case -- any dataset can be recrawled as soon as any current crawl of that dataset has completed. That debug message is from the regular 7-day scheduler, the message from the click-to-crawl process is a little older "Requested crawl for dataset [e45c7d91-81c6-4455-86e3-2965a5739b1f] but crawl already scheduled or running, ignoring"
It's something we need to handle, but I'm not sure it should be automated, as there's a high chance of making more of a mess when there isn't a problem -- i.e. clicking the button when a crawl is genuinely still in progress.
We have already fixed the case of an invalid archive preventing metadata updates, so I think there are now three cases this button would be needed:
After changing a default value machine tag to reprocess the data
After changing interpretation code
I think appropriate re-runs of just interpretation can be done for pipelines already (there is a button "Rerun specific steps in a pipeline" which can be used, but it needs some explanation on how to use it).
The other case is bugs in the old crawling, which are pretty rare anyway, and will become irrelevant once we switch that off early next year.
For debugging reasons (e.g. to check the effect of data mapping fixes), it is sometimes necessary to re-trigger a crawl within a shorter time interval from the last run. Due to the fixed lag period of seven days between crawl attempts, this is presently not easily possible: any earlier crawl request will be refused (resulting in log messages like "DEBUG message: Not eligible to crawl [e45c7d91-81c6-4455-86e3-2965a5739b1f] - crawled 5 days ago, which is within threshold of 7 days.)"
At least for a registry administrator, it would be very helpful to be able to request a re-crawl at any time, also before the seven day waiting period is up (and starting from a re-harvest of the local data source, e.g. DwC-A).
Also: to an outside user (e.g. a registered publisher requesting the crawl through the UI), the enforced lag period and ignoring of the crawl request is not immediately transparent, they will just find that nothing happens in response to them requesting a re-crawl. Some feedback would be helpful here.
The text was updated successfully, but these errors were encountered: