Proposal to use NVS P07 as the canonical source of CF standard names #366

japamment · 2024-09-20T10:10:02Z

japamment
Sep 20, 2024
Maintainer

Topic for discussion

Currently the "official" source of the CF standard names is the XML formatted file, e.g., https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.xml. A new version of the XML file is produced every time the standard name table is updated. At the same time as publishing the XML file, identical content is submitted to the NERC Vocabulary Server (NVS) where standard names are collection P07: http://vocab.nerc.ac.uk/collection/P07/current/. The proposal is to move to recognising P07 as the canonical source of standard names.

The points below were raised in a hackathon at the CF 2024 workshop.

Advantages of adopting P07 as canonical source of standard names

Unique identifiers
Access to FAIR semantics and mappings
Machine actionability
Better functionality can be built on the semantics of the vocabulary (SPARQL endpoint
NVS provides more formats for each vocab
Potential for streamlining publication process for new vocabulary versions
Reduces/removes need to store all the standard name table versions in the github repos and on the CF website (helps with space considerations)

Possible consequences of moving to P07

Reference to Standard names xml in the conventions document (appendix B?) will need to change to p07 NVS
We will need to ensure that access to the HTML view of standard names is retained on the CF website
Need to consider the KWIK index and how this will be generated in the workflow
We are currently reliant on UDUNITS strings to recongnize legal units - these are not currently implemented in NVS

Practical steps we will need to take

Make a proposal to modify the conventions document to recognise P07, modifications to appendix B?
We need to consult the community on consequences of removing the XML and HTML versions of the standard name table - how many people's workflow depends on them, how often do people access previous versions of the standard name table
Tools, python modules and workflows which rely on these files (cfchecker, cfpython?, cfplot?, cdo?, iris?) (mitigation - Xml could be created automatically from NVS)
Users might want to propose mappings to other vocabularies - how can they do this?
Viewing previous versions of the collection on NVS is currently not working, have had it working previously so will need to bring this back
Investigate UDUNIT-QUDT functionality (https://qudt.org/schema/qudt/udunitsCode) as a mean to linking P06 to UDUNIT CF requirement

ChrisBarker-NOAA · 2024-09-20T13:00:26Z

ChrisBarker-NOAA
Sep 20, 2024
Collaborator

Investigate UDUNIT-QUDT functionality (https://qudt.org/schema/qudt/udunitsCode) as a mean to linking P06 to UDUNIT CF requirement

I am particularly interested in the units question -- CF has referred to UDUNITS for units since the beginning (yes?) -- but over that time, UDINITS saw a major upgrade that re-worked the unit database, and, at least as far as I could find there is no nice human readable form, nor a clear definition of what variations are acceptable (e.g. both "second" and "seconds" are accepted). So the Conical source could be the XML in the UDUNITS source -- but we really should have a way to extractg from that something human-readable.

I'm willing to take a look at that, unless someone points us to something I missed :-).

0 replies

taylor13 · 2024-09-20T15:36:31Z

taylor13
Sep 20, 2024
Collaborator

I get a 404 error when I try to reach the xml file: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.xml

Where does it show the names of aliases on the P07 file ( http://vocab.nerc.ac.uk/collection/P07/current/ )?

Why is the JSON-LD version of the file so complicated. Couldn't we simply host the standard name information in a simpler structure with the keys being the standard names and for each we'd have a few "attributes": canonical units, description, aliases, and a flag indicating whether or not the name has been deprecated?

I'm sure I'm hopelessly naive, but it seems like we should host the standard name information in a form that other tools could build on without learning how to interact with the NERC vocabulary server. Is it wise to tie CF to the support of a single institution (which has been a critically important and reliable partner for so long now), which might at some time be unable to continue with CF. In that event, how hard would it be for someone to take over responsibility for the NERC-hosted vocabulary?

0 replies

JonathanGregory · 2024-09-20T16:38:16Z

JonathanGregory
Sep 20, 2024
Maintainer

Dear standard names hackathon group

While I recognise the advantages of the NERC vocabulary server, I think we should change things more gradually than implied by this summary. When we discussed this possibility in the CF committee, my understanding was that the proposal was initially only to "designate" the NERC vocab server as the primary repository. The primary one is the one you'd trust in the case of an inconsistency, but mostly there would be no inconsistency. Designating NVS as primary would not change our actual practice significantly, I believe, since the standard names team prepare the updates in the CF editor and publish them to the NERC vocab server and the CF website at the same time.

In the earlier discussions, I believe we understood that we would keep the HTML and XML versions on the CF website, just as they are now. I don't see why we shouldn't do that, even if we don't regard them as primary. If we change the process of preparing updates to use NVS facilities instead, could we not continue to publish to our XML and HTML as well? There isn't a need to change Appendix B if we keep our XML files. Our HTML file is very useful for searching, especially since Abel @bzah made large and widely applauded improvements in its capabilities within the last year. In #296, Andrew @DocOtak and Antonio @cofinoa have between them devised ways in which we could keep all versions of HTML and XML accessible online by generating them on the fly from the GitHub repo. That would solve the the problem with space on GitHub Pages.

The first advantage listed is the unique permanent identifier for each standard name. I agree that is valuable. When Alison and I discussed this with Gwen, some months ago, we agreed that CF would like unique URIs containing the standard name. My memory is that this had formerly been offered, although it isn't now. For instance, as an alternative to https://vocab.nerc.ac.uk/collection/P07/current/CFSN0023, we should be able to use https://vocab.nerc.ac.uk/collection/P07/current/air_temperature. For some purposes that would be more convenient, and it's definitely more CF-like.

If the standard names are stored in the NVS as well as our existing files, we can explore the advantages of NVS without losing any of our current functionality. I'm sure there are advantages, such as you outline. In time we may find that NVS can take things over from us in a satisfactory way; then we can phase out our own. Despite my reservations, I appreciate your exploration of this subject! Thanks.

Best wishes

Jonathan

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CF Conventions

Proposal to use NVS P07 as the canonical source of CF standard names #366

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

CF Conventions

Proposal to use NVS P07 as the canonical source of CF standard names #366

japamment Sep 20, 2024 Maintainer

Topic for discussion

Replies: 3 comments

ChrisBarker-NOAA Sep 20, 2024 Collaborator

taylor13 Sep 20, 2024 Collaborator

JonathanGregory Sep 20, 2024 Maintainer

japamment
Sep 20, 2024
Maintainer

ChrisBarker-NOAA
Sep 20, 2024
Collaborator

taylor13
Sep 20, 2024
Collaborator

JonathanGregory
Sep 20, 2024
Maintainer