Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting #813

Closed
pdurbin opened this issue Aug 8, 2014 · 26 comments
Closed

Harvesting #813

pdurbin opened this issue Aug 8, 2014 · 26 comments
Assignees
Labels

Comments

@pdurbin
Copy link
Member

pdurbin commented Aug 8, 2014

In DVN 3.x we had harvesting: https://github.com/IQSS/dvn/blob/develop/doc/sphinx/source/dataverse-user-main.rst#harvesting-section

There should be an option to show federated dataverses in the featured dataverses UI developed in #750.

@scolapasta
Copy link
Contributor

Assigning to @eaquigley to decide what exactly to do with this ticket. For 4.0 we have decided that we will support the infrastructure for harvested datasets (and have migrated harvested data), but not yet support new harvests.

@bencomp
Copy link
Contributor

bencomp commented Apr 10, 2015

For issue discoverability: this issue concerns OAI-PMH, the Open Archives Initiatives Protocol for Metadata Harvesting.

@bencomp
Copy link
Contributor

bencomp commented Apr 10, 2015

For consideration: ResourceSync is a set of protocols based on Sitemaps for publishing information about, links to or dumps of changed resources - both data and metadata. As it was developed by the people from the Open Archives Initiative, some consider this the new OAI-PMH.

@bencomp
Copy link
Contributor

bencomp commented Apr 14, 2015

And ResourceSync was already mentioned in #900.

@posixeleni
Copy link
Contributor

@scolapasta @eaquigley please also keep in mind that we need to add a new Dataverse to our harvest per this request from the UBC Abacus Dataverse team. Should I create a separate ticket for this?

​Eleni:

We are very excited to see Dataverse 4.0 alive. Is there a way for us to harvest our UBC Abacus dataverse we run for the province of BC and include the results in the main dataverse index?

Thanks!

Eugene​

Ticket tracked in RT already: https://help.hmdc.harvard.edu/Ticket/Display.html?id=196381

@eaquigley
Copy link
Contributor

@scolapasta @mcrosas @posixeleni do we have a list of all installations and others that we will be harvesting from? wanted to check due to @posixeleni comment above. also, @scolapasta should i have this ticket still or should it be going to someone else? we've already decided the behavior for displaying harvested materials (is searchable, has a search card with the harvested icon and text, will open in a new tab at the location where it is from)

@mercecrosas
Copy link
Member

Here is the current list (harvested dataverses):

https://docs.google.com/spreadsheets/d/1xE55jFE--sEDvvS-vtXavBxc8VvKvKpGJXcjfkelcAw/edit#gid=0

Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University
http://scholar.harvard.edu/mercecrosas

On Wed, Apr 29, 2015 at 12:13 PM, Elizabeth Quigley <
notifications@github.com> wrote:

@scolapasta
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scolapasta&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=6cj8mHQl-n9Xory4iQf3o0KYSOFqkcGh8B7Dfew3m0Y&e=
@mcrosas
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mcrosas&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=ji731hnr4FvdT2Tlxs7IXrEGWxrTUFimrRbLkERnnaI&e=
@posixeleni
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_posixeleni&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=cf0Y2d4nfEZ0VxV9wuwtujPVfwWvpxnqCUjIjdbK6zg&e=
do we have a list of all installations and others that we will be
harvesting from? wanted to check due to @posixeleni
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_posixeleni&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=cf0Y2d4nfEZ0VxV9wuwtujPVfwWvpxnqCUjIjdbK6zg&e=
comment above. also, @scolapasta
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scolapasta&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=6cj8mHQl-n9Xory4iQf3o0KYSOFqkcGh8B7Dfew3m0Y&e=
should i have this ticket still or should it be going to someone else?
we've already decided the behavior for displaying harvested materials (is
searchable, has a search card with the harvested icon and text, will open
in a new tab at the location where it is from)

Reply to this email directly or view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_813-23issuecomment-2D97484498&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=jqp0YD1VZ1tvhm6JxGrqx4kUcH79YjuHofHRjoNhB4g&s=ZXDG2phiSOmDVLHkRW-2oX_8IgjCf-XC6MxsOy0tnpo&e=
.

@bencomp
Copy link
Contributor

bencomp commented May 19, 2015

I just realised this issue is not about providing an OAI-PMH endpoint for dataset metadata. Or is it? Should another issue be opened for it?

@mercecrosas
Copy link
Member

@bencomp yes, supporting harvesting includes supporting OAI-PMH for dataset metadata.

@bencomp
Copy link
Contributor

bencomp commented Jul 14, 2015

One of our partners needs the OAI-PMH endpoint to be available to meet project responsibilities. Is there any new information about the milestone this will be part of?

@mercecrosas
Copy link
Member

The plan for now is to do this for 4.3. We'll be reviewing 4.2 and 4.3 issues in this week and next (in our weekly "Issues review" meetings) and make a decision then.

@djbrooke
Copy link
Contributor

djbrooke commented Aug 4, 2016

@landreev - I met with @eaquigley to review the Usability Testing Feedback and these items should be fixed for this release:

  • Move Dashboard link to be an item in the navigation and not within the username dropdown
  • Make all boxes on Dashboard landing page same size
  • If a user tries to harvest content that has already been harvested, there is nothing in the UI that tells them this has happened. It continues to show “INPROGRESS” in the table - let's provide better messaging about what is happening
  • Add text that says the scheduling is for the local time of the installation
  • Change column heading from "Statistics" to "Number of datasets" (unless this will display anything aside from Number of datasets)
  • Use Tool tips instead of onscreen text under the specific boxes
  • Make the first input field the focus when opening a new modal

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

There was one serious bug found during the usability session that is NOT on this list: After creating a new, or editing and saving an existing client, clicking on the "edit" button resulted in an empty popup; the page would stay in that state until reloaded.
This was indeed a really bug, unlike almost everything else on this list, and it has been fixed.

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

@djbrooke
I would like to point out that 4 of these bullet points: 1, 2, 6 and 7 are cosmetic fixes of the page itself; that would be for Mike to work on, and not for me. They are probably very easy to fix, but seeing how it's kind of late, and nobody has yet alerted Mike about it, I propose we table them - ? (and seeing how they don't appear to be anywhere near crucially important for the actual functionality of the pages)

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

Any chance we could drop number 4 from the list:

  • Add text that says the scheduling is for the local time of the installation

and just go on pretending it has never been brought up in the first place? Unless somebody could give me a single reason anyone would ever assume that this is NOT their local time, but, say, London or Central Australian time?

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

Item 5:

  • Change column heading from "Statistics" to "Number of datasets" (unless this will display anything aside from Number of datasets)

it's more complex than just a "number of datasets" - it could be various numbers of datasets related to this set: "N number of datasets found; M already exported; K marked as deleted". But, I agree, "Statistics" is not the best term. I have changed it to just "Datasets" - would that be ok?
Because that's what the column is for - to show the information about the Datasets that belong to the set in question.

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

Item number one:

  • Move Dashboard link to be an item in the navigation and not within the username dropdow

I can do this is in about 5 seconds, most likely... but where is "navigation"?

(edit: OK, just spoke to @mheppler about this; "... in the navigation" means make it one of the buttons in the upper right corner, next to "About", "Guides", "Support" and "Account")

@mheppler
Copy link
Contributor

mheppler commented Aug 9, 2016

@landreev @djbrooke -- My $0.02... Why have usability testing if we aren't going to implement suggested UI improvements? Can we meet quickly this afternoon to review these and I can give a LOE to determine if indeed I can get them into this release. Seems foolish not to.

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

Number 3:
This is a super important one (was the single most embarrassing point of the usability session)
Because it failed in the most un-gracious, user-unfriendly way (by hanging in the "inprogress" state)

This is the way I have resolved this:
Now, if any dataset being harvested is found to already exist in this Dataverse installation, but AS PART OF ANOTHER dataverse, the following will happen:

  • the dataset will be skipped
  • the harvest will continue, and will complete;
  • it will be listed under the "failed" count in the last harvest results.

However, it will still not be immediately obvious to the admin using the page, that the dataset failed to get harvested for this specific reason - because it's already present in another dataset. To find that, they would need to go and read the harvest log.
My solution is to explain this in the admin guide:

... You can consult the detailed harvesting log (found in ...) to learn more about the harvesting failures. For example, please note that an attempt to harvest datasets that are already present in another dataverse, will result in these datasets being skipped and listed as failed. (Deleting harvested datasets from dataverse A will make it possible to harvest them into dataverse B).

(Alternatively, a new statistics field could be added - in addition to "harvested", "deleted" and "failed", something like "exist in another dataverse, skipped" - to be shown in the results immediately. If you feel it is necessary for this release, I could put it in place today)

@landreev
Copy link
Contributor

landreev commented Aug 9, 2016

@mheppler @djbrooke
Mike, that is the plan - Danny and I are meeting at 2:30 to go through the list.
I was about to ask you if you wanted to join us.
I just wanted to comment on all this before the meeting.

I wasn't talking about not implementing these (with the exception of one, the time zone)
I was talking about 4.5 vs 4.5.1.

If you feel that any of this is a low-hanging fruit that you could easily fix for 4.5, by all means do so. But I am strongly against actually holding the release on account of cosmetic changes, at this point. I've said this about a million times already, but one more time: this functionality is for admins only; these pages will be used by admins only, and very rarely - compared to the workhorse pages used by all the users every day.

@djbrooke
Copy link
Contributor

djbrooke commented Aug 9, 2016

Thanks @landreev and @mheppler for meeting about this! Good to put together a plan to tackle these.

mheppler added a commit that referenced this issue Aug 9, 2016
@mheppler
Copy link
Contributor

mheppler commented Aug 9, 2016

Status of my UI updates from the list above:

  • Move Dashboard link to be an item in the navigation and not within the username dropdown
  • Make all boxes on Dashboard landing page same size
  • Add text that says the scheduling is for the local time of the installation
  • Use Tool tips instead of onscreen text under the specific boxes
  • Make the first input field the focus when opening a new modal
  • NEW! Fix column widths of dataTables on manage pgs

mheppler added a commit that referenced this issue Aug 10, 2016
mheppler added a commit that referenced this issue Aug 11, 2016
@djbrooke djbrooke assigned kcondon and unassigned landreev Aug 15, 2016
pdurbin added a commit that referenced this issue Aug 16, 2016
@kcondon
Copy link
Contributor

kcondon commented Aug 19, 2016

OK, looks good. Have completed baseline testing of Harvesting, functionality is delivered and issues found are opened in new individual tickets. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants