Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MIT license file and guidance for more license files #10426

Merged
merged 11 commits into from
May 8, 2024

Conversation

jp-tosca
Copy link
Contributor

What this PR does / why we need it:

The request to include on HDV the MIT License was created on IQSS/dataverse.harvard.edu#248, This PR adds a JSON file so the license can be added.

Which issue(s) this PR closes:

Closes #10425

Suggestions on how to test this:
You can add the license and check that the link is working

@jp-tosca jp-tosca added Size: 3 A percentage of a sprint. 2.1 hours. GREI 6 Connect Digital Objects Type: Feature a feature request labels Mar 25, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick feedback

{
"name": "MIT License",
"uri": "https://spdx.org/licenses/MIT.html",
"shortDescription": "MIT License (MIT).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering where this short description comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to suggestions 🤣

Massachusetts Institute of Technology License (MIT).?

scripts/api/data/licenses/licenseMIT.json Show resolved Hide resolved
@pdurbin
Copy link
Member

pdurbin commented Mar 26, 2024

I asked the community for feedback on this pull request: https://groups.google.com/g/dataverse-community/c/_UUKZT4RrmM/m/HCtul3VlAQAJ

@pdurbin
Copy link
Member

pdurbin commented Mar 27, 2024

@pdurbin pdurbin self-assigned this Mar 27, 2024
@pdurbin
Copy link
Member

pdurbin commented Apr 11, 2024

@jp-tosca and I just pushed 5e0c73f to update the MIT license and add guidance on how to add additional licenses:

Screenshot 2024-04-11 at 12 44 09 PM

Here's a preview of the docs: https://dataverse-guide--10426.org.readthedocs.build/en/10426/installation/config.html#contributing-to-the-collection-of-standard-licenses-above

At list point we should solicit more input from the community, especially on the guidance above. The MIT license we're adding is slightly different (different URI, at least) than the one @DieuwertjeBloemen mentioned adding in https://groups.google.com/g/dataverse-community/c/_UUKZT4RrmM/m/IxQaA7ycAQAJ

Also, I'm interested in what @philippconzett thinks since he's been leading the charge on license standardization in these issues and PRs:

I'll try to dig up the right threads on the google group to have more people look (here and here). And Zulip: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/first.20software.20license.20in.20guides.3A.20MIT/near/429826659

For now, I guess I'll leave myself as a reviewer.

p.s. The license facet is here in 6.2! We already updated https://demo.dataverse.org and here's how it looks:

Screenshot 2024-04-11 at 12 55 58 PM

It would be wonderful to keep these values unique!

@philippconzett
Copy link
Contributor

Thanks all for driving standardized license information forward! I have a couple of questions:

  1. Does this PR cover Feature Request/Idea: Standardize standard license configuration #8512?
  2. What is the rationale for using the actual URL that the SPDX license link (in some/most cases) redirects to as value in the uri field? I wonder whether using the SPDX license link would spare us from monitoring link rot?
  3. Does this PR include adapting the database to include table fields where all the SPDX values (name, description, uri, ...) are stored?
  4. Should the PR include scripts for how to clean up / align licence information in legacy datasets, so that the new approach is applied to the entire Dataverse installation?

@DieuwertjeBloemen
Copy link
Contributor

DieuwertjeBloemen commented Apr 15, 2024

@pdurbin Looks great! I think my MIT uri is the faulty one, as the url in the SPDX list has the lower-case variant.

@philippconzett

  1. I think it covers a lot if not all of Feature Request/Idea: Standardize standard license configuration #8512 though it doesn't provide a set of standard licenses straight away, but rather guidelines on how to contribute new JSONs/Licenses to ensure they are standardized. This will probably make these JSONs grow over time. (we could already do an initial push of some JSONs we have at KU Leuven that are pretty much in line with this, I'll just check them once the guidelines are approved).
  2. We decided not to use the SPDX landing page because we looked for the uri's that harvesters expected (e.g. for the creative commons, that's where you find the one OpenAire expects and DataCite (page 40) expect base URL-wise). I think the only real way to prevent link-rot would be to have a local version of each license text, but I don't think anyone wants to maintain or do that ;) We're also not guaranteed of the SPDX landing page URLs infinite availability.

@philippconzett
Copy link
Contributor

philippconzett commented Apr 15, 2024

Thanks, @DieuwertjeBloemen. I had to revisit my issue/PR once more and now realize that the main difference between #8512 and #10426 is that #8512 is based on the DataCite recommendations, whereas #10426 is still based on the setup Dataverse uses currently, with some modifications. I think you clearly see the difference when you compare what the JSON file for CC BY 4.0 looks like in the two approaches:

JSON according to #8512:
{
"rightsName": "CC BY 4.0",
"rightsURI": "https://creativecommons.org/licenses/by/4.0/",
"rightsIdentifier": "CC-BY-4.0",
"rightsIdentifierScheme": "SPDX",
"schemeURI": "https://spdx.org/licenses/",
"rightsShortDescription": "Creative Commons Attribution 4.0 International.",
"rightsIconUrl": "https://licensebuttons.net/l/by/4.0/88x31.png",
"rightsActive": true
}

JSON according to #10426:
{
"name": "CC-BY-4.0",
"uri": "http://creativecommons.org/licenses/by/4.0",
"description": "CC BY 4.0",
"iconUrl": "https://licensebuttons.net/l/by/4.0/88x31.png",
"active": true,
"sortOrder": 2
}

I guess you want to add the MIT license as soon as possible, for which #10426 seems to be a feasible way. At DataverseNO, we still would like to be able to deliver license metadata to DataCite in line with their recommendations, which will mean implementing #8512, which will take some more ressources, I guess, because fields need to be added and renamed in the database, among other things.

@pdurbin
Copy link
Member

pdurbin commented Apr 16, 2024

Hi @philippconzett and @DieuwertjeBloemen thanks for your comments!

Yes, this PR (#10426) is quite small, only adding the MIT license using existing database columns/tables and adding some new documentation/guidance on adding new licenses moving forward (still using the existing columns/tables).

As for letting the SPDX link resolve or not, I'm happy to reverse the stance we've taken and declare that we should use the SPDX link as-is without redirection. Mostly I just wanted to capture the (frustrating) fact that redirection is going on and to pick one way or the other (as-is or redirected). More on this below.

Yes, I think we should leave #8512 open to think about adding additional database columns and further improving how we store licenses in the database and standardize them.

As for SQL migration scripts to handle existing licenses that are not in compliance with the guidance we've written up (CC0 and friends) @jp-tosca and talked about out but decided this work is out of scope for this issue. You may have noticed that we added this note to the guidance: Note that prior to Dataverse 6.2, various license above have been added that do not adhere perfectly with this procedure. For example, the name for the CC0 license is CC0 1.0 (no dash) rather than CC0-1.0 (with a dash). We are keeping the existing names for backward compatibility. For more on standarizing license configuration, see https://github.com/IQSS/dataverse/issues/8512. Basically, we talked briefly about how it would be a fair amount of work to write these scripts so we'd rather defer this until #8512.

As for providing additional licenses, yes, sure, we're open to more. After this PR (#10426) gets finalized and merged, @DieuwertjeBloemen you're welcome to add more. Thanks!

So! In the interest of keeping things moving, it sounds like we're all more or less in agreement of the scope of this PR (#10426) as well as its content, with the possible exception of this line...

- For the ``uri`` field, go to the SPDX landing page for the license and click on the link under "other web pages for this license". Let any redirection happen and then copy the URL (e.g. ``https://opensource.org/license/mit``) into the ``uri`` field.

If we change the language to use the exact URL as shown on the SPDX landing page (rather than letting redirection happen), we would change...

"uri": "https://opensource.org/license/mit",

to

"uri": "https://opensource.org/license/mit/",

Again, I don't feel strongly about this. Does anyone?

@qqmyers
Copy link
Member

qqmyers commented Apr 16, 2024

CC-BY-4.0 is an interesting one: https://spdx.org/licenses/CC-BY-4.0.html points you to https://creativecommons.org/licenses/by/4.0/legalcode where the CC folks tell you the canonical URL is https://creativecommons.org/licenses/by/4.0/. There are no redirects.

In general, I'd think we'd want the canonical URL as defined by the license provider (versus wherever spdx points if that's different) but I agree the question of a trailing slash is painfully trivial (especially when CC and MIT appear to choose opposite conventions!). I wouldn't be surprised if people trying to parse these can handle that much difference, but who knows.

@pdurbin pdurbin changed the title Adds MIT license file Add MIT license file and guidance for more license files Apr 16, 2024
@DieuwertjeBloemen
Copy link
Contributor

Yesterday I was also wondering if maybe an example (one of the CC options perhaps) should be chosen in addition to the MIT license for the documentation that examplifies if the "name" field in the JSON is with or without the dash, because MIT is not an example that makes this explicit. But that's just a minor detail that I thought could perhaps be improved in the above-mentioned documentation.

@pdurbin
Copy link
Member

pdurbin commented Apr 17, 2024

@DieuwertjeBloemen good idea.

@jp-tosca in the docs, can you please switch from MIT to another license as the example? Please feel free to add an additional license in the process, one that exercises the rules a little more thoroughly.

@DieuwertjeBloemen as we wrote in the guidance above, we are considering the existing CC0 licenses grandfathered in: Note that prior to Dataverse 6.2, various license above have been added that do not adhere perfectly with this procedure. For example, the name for the CC0 license is CC0 1.0 (no dash) rather than CC0-1.0 (with a dash). We are keeping the existing names for backward compatibility. What do you think?

@DieuwertjeBloemen
Copy link
Contributor

@pdurbin It think it makes sense to 'grandfather' it for now. If someone at some points wants to make it compatible with the rest and figure out how to update it on existing datasets, then that can always be done at a later stage.

@pdurbin
Copy link
Member

pdurbin commented Apr 17, 2024

@DieuwertjeBloemen yeah, updating the CC license to comply with the new thinking will require an SQL migration script (in Flyway). I'm glad you're ok with this being out of scope for this PR.

@jp-tosca jp-tosca requested a review from qqmyers May 7, 2024 15:03
@jp-tosca jp-tosca removed their assignment May 7, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@sekmiller sekmiller self-assigned this May 8, 2024
@sekmiller sekmiller merged commit d923f1c into develop May 8, 2024
3 checks passed
@sekmiller sekmiller deleted the add-mit-license branch May 8, 2024 18:16
@pdurbin pdurbin added this to the 6.3 milestone May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 6 Connect Digital Objects Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add MIT License request so can be added to HDV
6 participants