Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant publish imported dataset with existing DOI #5104

Closed
tcoupin opened this issue Sep 27, 2018 · 30 comments
Closed

Cant publish imported dataset with existing DOI #5104

tcoupin opened this issue Sep 27, 2018 · 30 comments
Assignees

Comments

@tcoupin
Copy link
Member

tcoupin commented Sep 27, 2018

I test the new feature https://github.com/IQSS/dataverse/blob/ddc41f58c1cc21449eb5d1f9a69a13be7915ca8d/doc/sphinx-guides/source/api/native-api.rst#import-a-dataset-into-a-dataverse

I can import a dataset with an existing doi and auto-publishing it (release=yes).
I can import a dataset with an existing doi and not auto-publishing it (release=no) but the dataset can not be publish using UI or API.

@pameyer
Copy link
Contributor

pameyer commented Sep 27, 2018

@tcoupin Could you provide more information about the use case you're using release=no for?

@tcoupin
Copy link
Member Author

tcoupin commented Sep 27, 2018

It's only a functionnal test, not a real use case. You can look at http://irclog.iq.harvard.edu/dataverse/2018-09-27#i_73799 for context

@djbrooke
Copy link
Contributor

Hi @tcoupin - I read through the IRC logs and I'm not sure I understand what you're trying to implement here. Are you submitting a PR with a UI change in order to fix a bug? I think we'd want to implement a fix for the underlying bug here instead, but some more context would be helpful. Thanks!

@tcoupin
Copy link
Member Author

tcoupin commented Oct 1, 2018

Are you submitting a PR with a UI change in order to fix a bug?

No. During the my development of #5105, I could create a new dataset with a existing DOI using UI but I have the same exception of #5104. To finish my PR, I fix it and this fixes also this issue.

So I'm not submitting a PR with a UI change in order to fix #5104, but the PR with the new UI feature fixes this issue.

I'm not sure to be clear, especially since my English is buggy...

@djbrooke
Copy link
Contributor

djbrooke commented Oct 4, 2018

Thanks @tcoupin. @pameyer helped me get this branch running and I’ve attached a screenshot of the proposed change here.

I believe I understand the need that you have, but we’d need to do some user research about the best approach from a UI/UX perspective and to accommodate the largest number of community workflows. From a development and QA time standpoint, PIDs are pretty central to the application and this represents a very large chunk of development work and verification. If you have the time to work on this a bit more, I’d suggest working on the bug in the Publish API which is designed to move datasets from unpublished to published (instead of the UI change). Let me know what you think.

Screen Shot 2018-10-02 at 2.54.29 PM.png

@tcoupin
Copy link
Member Author

tcoupin commented Oct 4, 2018

I can split #5105:

  • fix publish (CreateNewDatasetCommand.java, PublishDatasetCommand.java, FinalizeDatasetPublicationCommand.java)
  • UI change (xhtml+DatasetPage.java+Bundles)
    ?

For UI, do you want a field similar to pid parameter of import route (https://github.com/IQSS/dataverse/blob/ddc41f58c1cc21449eb5d1f9a69a13be7915ca8d/doc/sphinx-guides/source/api/native-api.rst#import-a-dataset-into-a-dataverse) ? To allow DOI and Handle identifieras provided identifier
My change will be based on https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java#L257

@tcoupin
Copy link
Member Author

tcoupin commented Oct 12, 2018

So... split or not split ??

@pdurbin
Copy link
Member

pdurbin commented Oct 12, 2018

@tcoupin hi, if you could make a pull request that doesn't change the UI, that would probably be best. Does that help? When we change the UI, adding buttons or whatever, we pull in our design team whenever possible.

tcoupin added a commit to tcoupin/dataverse that referenced this issue Oct 16, 2018
tcoupin added a commit to tcoupin/dataverse that referenced this issue Oct 16, 2018
@tcoupin
Copy link
Member Author

tcoupin commented Oct 16, 2018

  • Part 1 (no UI change) submitted in Do not manage external DOI. Fix #5104 #5199
  • Part 2 (UI change) : it would be great if the UI can have the same feature as api import dataset endpoint, provide pid. In API, pid can be a doi or a handle. Can the design team work on a similar field on dataset creation page ?

tcoupin added a commit to tcoupin/dataverse that referenced this issue Oct 16, 2018
@djbrooke
Copy link
Contributor

djbrooke commented Oct 16, 2018

Thanks @tcoupin ! We'll review PR #5199.

In terms of the UI change, it's unlikely that we would be able to work on this in the near term. Most of our design efforts are focused in other areas right now and we don't have the time to do the research, user testing, and design that would be needed for this change.

@djbrooke djbrooke self-assigned this Oct 23, 2018
@tcoupin
Copy link
Member Author

tcoupin commented Oct 29, 2018

So I close the 2 PR and this issue ?

@RightInTwo
Copy link
Contributor

make use of the “Alternative Identifier” field for the old DOI and mint a new DOI for what’s imported into Dataverse.

Hey @djbrooke! I'm pretty sure that would go against the idea of DOIs...

About release=no, I can imagine the case where I add a dataset with the existing DOI but want to enrich the metadata before "publishing" (in this case just referencing it in my dataverse repository - as I understand this, the DOI would exist on another repository and I just add it's metadata to my catalogue, please correct me if I'm wrong!) or where I want to prepare a collection for a certain purpose and put it live in one go.

@djbrooke
Copy link
Contributor

Hey @RightInTwo, good to hear from you! We still don't have a great understanding about the use case here and whether or not these changes were for testing or not. Point taken about the alternative ID field. I discussed this with @scolapasta and @jggautier and they may include some thoughts here.

@scolapasta
Copy link
Contributor

So I'm just getting caught up to speed on this, I could use some help in understanding the original use case.

What it looks to me, is that you have a dataset with a DOI that points to somewhere else and you are importing it into Dataverse, but still want the original DOI to point to the original (non Dataverse) location? Please correct me if that's incorrect, but the rest of my comment will be based on that assumption.

If that is the case, then import is really not the right function, as it was built in order to migrate datasets from somewhere else into Dataverse, i.e. the DOI would end up pointing to the Dataverse's Dataset page. If the source if the dataset is supposed to be elsewhere, then really what you would want to do is harvest the dataset. What this does is bring over the metadata on order to be discoverable, while still leaving the original location as the source of truth. If you click on the card in Dataverse, you will actually be redirected to the original location. The major benefits of this are that the datset representation is always up to date, and that the permissions management / allowing access to file downloads is still controlled by the original source.

@RightInTwo
Copy link
Contributor

RightInTwo commented Dec 13, 2018

@djbrooke And from you as well :)
@scolapasta Man, you see right through me. Yes, that's what I want to do... Though I'm facing the issue that the sources I want to harvest from are heterogeneous and don't necessarily present the datasets I need in one set. I created the issue #5402 for that.

@pdurbin
Copy link
Member

pdurbin commented Jan 4, 2019

@tcoupin Happy New Year! What's the status of this issue from your perspective?

@tcoupin
Copy link
Member Author

tcoupin commented Jan 7, 2019

My need is to allow user to provide existing DOI, externally managed, to avoid multiples DOI pointing on the same resource.
I dont know if this is in the dataverse phylosophy.
Anyway this feature is in my fork.

@pdurbin
Copy link
Member

pdurbin commented Jan 7, 2019

@tcoupin you're saying that the DOI is externally managed. That sounds like something Johns Hopkins was (or is?) doing. @dheles might know the status of this. What about files? Are they externally managed? Or are the files stored in Dataverse? Can you please link to an example dataset? The reason I ask is that last week's design meeting focused on a concept called Trusted Remote Storage Agent (TRSA) that we'd like to incorporate into Dataverse some day (#5213).

@tcoupin
Copy link
Member Author

tcoupin commented Jan 7, 2019

DOI are created with an other account (by a partner institute for example). I don't really know about file management because our dataverse is not in production for now. But I think that files will be store in dataverse in some case.

@pdurbin
Copy link
Member

pdurbin commented Jan 7, 2019

@tcoupin ok, did you see the note by @scolapasta above about harvesting via OAI-PMH? Would that be a solution for you? We try to avoid forks whenever possible.

@tcoupin
Copy link
Member Author

tcoupin commented Jan 7, 2019

Yes but I will not have an oaipmh server for all datasets. For now, this feature will be deployed in our production. We will see in some months if it's still necessary.

@RightInTwo
Copy link
Contributor

Yes but I will not have an oaipmh server for all datasets.

@tcoupin Would you like to contribute to #5402 regarding your requirements?

@tcoupin
Copy link
Member Author

tcoupin commented Feb 13, 2019

Actually, there is no harvest process for mentionned datasets.
I'm not sure to understand your goal: do you want an datacite harvester ?

@pdurbin
Copy link
Member

pdurbin commented Mar 20, 2019

Related #5667

@pdurbin
Copy link
Member

pdurbin commented Mar 28, 2019

@tcoupin please read today's update at #5667 (comment) by @tcpan and let us know what you think.

I'm starting to lose track of your pull requests:

Are you still hoping to get one or both of these merged? I'm sorry that they have not gotten much attention lately.

@RightInTwo
Copy link
Contributor

RightInTwo commented Mar 28, 2019

Just a general +1 to keep this rolling, as the use case I presented in Harvesting from non-OAI-PMH sources #5402 (adding existing PIDs which will not be managed in this instance of dataverse) would be possible then (if my understanding is correct).

Actually, there is no harvest process for mentionned datasets.
I'm not sure to understand your goal: do you want an datacite harvester ?

Yes, but that would not solve all issues described in #5402.

@RightInTwo
Copy link
Contributor

RightInTwo commented Jan 24, 2020

@tcoupin Your use case for this issue would also be covered by doi2pmh, wouldn't it?

@tcoupin
Copy link
Member Author

tcoupin commented Jan 25, 2020

yes

@tcoupin tcoupin closed this as completed Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants