Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let file metadata (i.e. description) be specified during zip upload #723

Closed
raprasad opened this issue Jul 11, 2014 · 10 comments
Closed

Let file metadata (i.e. description) be specified during zip upload #723

raprasad opened this issue Jul 11, 2014 · 10 comments

Comments

@raprasad
Copy link
Contributor

raprasad commented Jul 11, 2014


Author Name: Philip Durbin (@pdurbin)
Original Redmine Issue: 3232, https://redmine.hmdc.harvard.edu/issues/3232
Original Date: 2013-08-19


Currently, our zip and tar upload feature does not allow the description field to be populated on a per file basis. After upload the user much change the description field for each uploaded file, if desired.

In order to set file metadata fields such as "description" we could support some sort of "manifest" file within the zip or tar itself that contains a list of all the files in the archive and the metadata (description, category, possibly md5sum) for each file.

We could invent our own format or support an existing format such as BagIt ( http://en.wikipedia.org/wiki/BagIt ) or the DSpace Simple Archive Format: https://wiki.duraspace.org/display/DSDOC3x/Importing+and+Exporting+Items+via+Simple+Archive+Format#ImportingandExportingItemsviaSimpleArchiveFormat-ItemImporterandExporter

In addition to zip or tar upload via DVN's web interface, this functionality could also be used in the Data Deposit API (SWORDv2), which supports file upload. Some discussion of file metadata took place with Open Journal Systems (OJS) at http://irclog.iq.harvard.edu/dvn/2013-07-29#i_2752

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2013-08-27T19:42:23Z


Philip Durbin wrote:

Some discussion of file metadata took place with Open Journal Systems (OJS) at http://irclog.iq.harvard.edu/dvn/2013-07-29#i_2752

Jen from OJS and I discussed this again today at http://irclog.iq.harvard.edu/dvn/2013-08-27#i_3226 and she seem ok with limiting the visible fields on the OJS side to those that we can accept via zip upload:

12:49 pdurbin jwhitney: in the past we've talked about our files on the DVN side have metadata such as filename, category, and description. I don't have a way to set descriptions for files. Is this a problem for the OJS use case? See also this ticket about this: https://redmine.hmdc.harvard.edu/issues/3232
13:15 pdurbin jwhitney: does that makes sense? I'm trying to ask if it's ok if we don't populate "description" for each file on the DVN side
13:17 jwhitney pdurbin: for now, yes, I think so. OJS gives authors the option to add metadata to supplementary files (title, creators, keywords, etc.) that may differ from article-level metadata for these fields.
13:19 jwhitney pdurbin: so potentially, ojs is collecting more metadata than can currently be sent over to dataverse, unless some of the file-level metadata is propagated upward to fill in absent study-level fields
13:19 jwhitney pdurbin: although that's probably not a great idea
13:20 pdurbin jwhitney: right, OJS is collecting more metadata about files and it wouldn't all appear on the DVN side
13:21 jwhitney pdurbin: yep
13:21 pdurbin jwhitney: I'm looking at your screenshot at author_describe_datafile.png - Google Drive - https://docs.google.com/file/d/0B8Zfl4GMgyejMVV2VUV6QkptN3M/edit
13:22 pdurbin looks like for a file in OJS, you can have Title, Author(s), Keywords, Brief description, Category, and Date
13:22 jwhitney pdurbin: yes, this is based on the supplementary file upload
13:23 pdurbin jwhitney: to support all this, you and I would need to agree on some sort of manifest file, I guess... or some other way to store all this information within the zip file that is sent across via SWORD
13:24 jwhitney pdurbin: I'm wondering if it's better to capture file-level metadata that's only available OJS side, or use a simpler interface to only capture what Dataverse will currently store
13:25 pdurbin jwhitney: oh, are you saying you could expose only a few fields on the OJS side? Only the fields we can receive on the DVN side? (filename and category)
13:27 jwhitney pdurbin: that's what I'm wondering: if that approach would be too limiting for submitters
13:27 pdurbin it might feel limiting, yes
13:27 posixeleni hi pdurbin and jwhitney!
13:28 pdurbin but it would probably be frustrating for submitters if they filled in a bunch of fields that don't get propogated to the DVN side
13:28 jwhitney hello!
13:28 jwhitney pdurbin: agreed
13:28 pdurbin posixeleni: are you following this?
13:28 pdurbin posixeleni: and hello! :)
13:29 posixeleni i saw you chatting about the invidiviual file level metadata and just wanted to ask a quick question about how OJS would capture send over to us about the overall metadata for the Dataverse study
13:29 pdurbin posixeleni: oh, well, that's different... study-level metatadat
13:29 posixeleni we got that covered right?
13:30 pdurbin well, let's finish the file-level metadata (i.e. description) discussion
13:30 pdurbin for now anyway :)
13:30 posixeleni cool sorry to interrupt!
13:30 pdurbin I'm in favor of limiting the visible fields on the OJS side to what we can receive on the DVN side (filename and category)
13:31 pdurbin I realize this is limiting, but I'm more worried about the frustration submitters would feel when they realize the description, etc. doesn't get propogated to the DVN side
13:31 jwhitney I agree -- otherwise, I think it's misleading to collect metadata that doesn't get deposited

I also left a note about this on the OJS mockups doc: https://docs.google.com/document/d/1T-i2a4synXIhe3DClYyALI8VYgh2hLdJJMmd6KVVXhc/edit?pli=1

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2013-11-06T14:29:23Z


At https://help.hmdc.harvard.edu/Ticket/Display.html?id=169905#txn-3486070 Eleni pointed out that this blog post mentions Bagit: Introducing next year’s model, the data-crate; applied standards for data-set packaging | ptsefton - http://ptsefton.com/2013/11/01/1944.htm

"Crate = Bagit + Zip + x"

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-05-23T15:15:53Z


Now that we're developing a "native" API, perhaps we could re-visit this ticket. I just added a Trello card for this: https://trello.com/c/LiD9Xx5u/12-let-file-metadata-i-e-description-be-specified-during-upload

@raprasad raprasad added this to the Dataverse 4.0: In Review milestone Jul 11, 2014
@scolapasta scolapasta modified the milestones: Dataverse 4.0: Final, In Review - Dataverse 4.0 Jul 15, 2014
@scolapasta scolapasta modified the milestones: In Review - Dataverse 4.0, Dataverse 4.0: Final Dec 8, 2014
@scolapasta scolapasta modified the milestones: 4.01, In Review - Dataverse 4.0 Jan 6, 2015
@pdurbin pdurbin removed their assignment Jan 20, 2016
@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@kcondon
Copy link
Contributor

kcondon commented May 9, 2016

The feature is to allow automatic metadata completion by uploading a companion codebook-style file containing file metadata. This would be useful for batch file uploads but since we currently do not recommend having lots of files per dataset this is not a high priority.

@pdurbin
Copy link
Member

pdurbin commented May 9, 2016

@kcondon yes, that's a good way of putting it. SWORD could make of this feature as well. See also #1612 which is sort of becoming the issue where we're tracking the need to develop a non-SWORD way of uploading files, hopefully with the ability to specific metadata for files such as the description.

@pdurbin
Copy link
Member

pdurbin commented Oct 15, 2016

While working on a way to add files to the native API yesterday (#1612, basically) @raprasad added the ability to send JSON along with the file and once this includes description we might want to close this issue and just say to use that new endpoint instead if you need to add a description to a file via API.

@pdurbin
Copy link
Member

pdurbin commented Jan 17, 2017

I'm parking this issue in Development at https://waffle.io/IQSS/dataverse along side #1612.

@kcondon
Copy link
Contributor

kcondon commented Jan 26, 2017

It sounds like this was not part of the implementation. Putting into code review for consideration to be moved into backlog

@djbrooke
Copy link
Contributor

Moving to backlog

@pdurbin
Copy link
Member

pdurbin commented Jun 25, 2017

Is anyone who is following this issue still interested in this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants