Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Package JSON schema #4747

Closed
pkiraly opened this issue Jun 12, 2018 · 5 comments
Closed

Data Package JSON schema #4747

pkiraly opened this issue Jun 12, 2018 · 5 comments
Labels

Comments

@pkiraly
Copy link
Member

pkiraly commented Jun 12, 2018

I just come across an interesting paper:

Fowler, Dan, Jo Barratt, and Paul Walsh. “Frictionless Data: Making Research Data Quality Visible.” International Journal of Digital Curation 12, no. 2 (May 13, 2018): 274–85.
https://doi.org/10.2218/ijdc.v12i2.577
http://www.ijdc.net/article/view/577

This is a summary of activities of Open Knowledge International regarding to data quality. One of their suggestion is a JSON schema called Data Package (the full description is available at https://frictionlessdata.io/specs/data-package/) which describes the structure of the underlying data, e.g. the column types of a CSV file, dictionaries, signals for NA etc. They have created Python and JavaScript libraries to read these metadata along with the CSV files, and tell the programs how to interpret the input file. External partners created R and Ruby packages).

I think the Dataverse support of it would be very useful. If you are interested, read first the paper, then the description of the Data Package.

@pdurbin
Copy link
Member

pdurbin commented Jul 13, 2018

@pkiraly thanks for following up on the discussion at https://groups.google.com/d/msg/dataverse-community/ao-zFwN_M0M/LDlfR3hfBQAJ by creating this issue.

@lwinfree
Copy link

lwinfree commented Jun 7, 2019

Hi @pkiraly & @pdurbin! I'm the product owner for the Frictionless Data reproducible research project (https://frictionlessdata.io/reproducible-research/). I came across this issue while checking out Dataverse and wanted to say hi! I would be happy to chat with y'all more about if there is a potential collaboration with us and Dataverse, or answer any questions you might have about datapackage.json or any of our other software or specs. 😄

@pdurbin
Copy link
Member

pdurbin commented Jun 7, 2019

@lwinfree hi! I bet @pkiraly is enjoying his weekend by now but I'm in http://chat.dataverse.org for another hour and a half today. Or there's always next week or whenever. 😄

@pdurbin
Copy link
Member

pdurbin commented Oct 14, 2022

Fast forward a few years and @qqmyers has been adding excellent support for BagIt as a packaging standard in Dataverse (export first and now import!). A good entry point is the docs: https://guides.dataverse.org/en/5.12/installation/config.html#bagit-file-handler

Just today the team talked about this issue...

... that mentions BagIt but another packaging standard we've talked about is RO-Crate:

Finally, we've supported SWORD for a long time but that's just for import and we never did get around to having some sort of manifest inside the zip to populate file descriptions:

(This is just as well, because we have BagIt import now.)

First, @pkiraly are you still interested in this Data Package standard from Frictionless Data (by the way, thank you, @lwinfree for attending a community call a while back!)? Second, is your vision for export or import or both? Is it a zip file? Thanks.

Oh, finally, other packaging stuff I hear about are AIPs, DIPs, and SIPs. I believe these are more conceptual that specific standards. I guess they come from OAIS.

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

5 participants