Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align teidata.version with Semantic Versioning Specification, closes #1993 #1996

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

peterstadler
Copy link
Member

This PR updates the regex used for defining the value of teidata.version to match the Semantic Versioning Specification. In fact, it's a slight modification of the provided regex at https://semver.org to work around the special flavor of XML Schema Regular Expressions (see https://www.regular-expressions.info/xml.html).

Additionally, the english prose was updated and some 'wrong' @version attributes removed from the Guidelines source.

to comply with the new semver format
since those are not supported by XML Schema Regular Expressions, see https://www.regular-expressions.info/xml.html
the current version (e.g. '4.1.0') is inserted in the `<editionStmt>` dynamically, while this hard coded value was referring to the 'P' number of the Guidelines.
Copy link
Member

@sydb sydb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See individual comments.

@@ -9,21 +9,20 @@ See the file COPYING.txt for details
ident="teidata.version">
<desc versionDate="2013-11-20"
xml:lang="en">defines the range of attribute values which may be used to
specify a TEI or Unicode version number.</desc>
specify a version number following the Semantic Versioning Specification.</desc>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don’t want to say we follow this spec. It is far too restrictive for our use, I think.

  • Our semantics for MAJOR, MINOR, and PATCH are different than theirs
  • We want build metadata considered when determining precedence
  • SemVer is about the API; we are about schemas and documentation

Thus I would much prefer we either not mention the Semantic Version Specification, or (better, IMHO) mention it as the inspiration for the system we use, and cite it properly in the BIB. Something lie “using the syntax of the Semantic Versioning system” or “based on the Semantic Versioning system”, with a link to the BIB (which gives a pointer to the 2.0.0 spec).

<desc xml:lang="fr"
versionDate="2007-06-12">définit la gamme des valeurs d'attribut
exprimant un numéro de version TEI.</desc>
<content>
<dataRef name="token" restriction="[\d]+(\.[\d]+){0,2}"/>
<dataRef name="token" restriction="(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-((0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*)(\.(0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*))*))?(\+([\-0-9a-zA-Z]+(\.[\-0-9a-zA-Z]+)*))?"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not hand-parsed the regular expression and seen that it makes sense. I did test it against all 71 test cases, and it correctly flags the first 31 as valid and the remaining 40 as invalid. So this looks like the correct W3C regular expression for SemVer.
So hurrah! to @peterstadler for a job well done. I think he should submit this to the SemVer folks, but hope he’ll coordinate with me, as I may have an entirely different regexp for them, as well.
That said, I think we should be much more restrictive for @version of <TEI> and <teiCorpus>. Something more akin to 4\.1(\.(0|[1-9]\d*))?(-(alpha|beta))?(\+([\-0-9a-zA-Z]+(\.[\-0-9a-zA-Z]+)*))?.

number, for minor and sub-minor version numbers, may also be
supplied.
the Semantic Versioning Specification (<ptr target="https://semver.org"/>). A version number
contains digits only and consists at least of three parts separated by a dot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. “consists at least of three parts” should probably be “consists of at least three parts”.
  2. But in truth (as alluded to above), I am not sure why a user should be required to specify the patch number in the @version of <TEI> or <teiCorpus>.

@duncdrum
Copy link
Contributor

I have to say the question if TEI (guidelines and stylesheets) is an api according sem-ver definition is not straight forward to me. Actually i m leaning more towards saying yes.

IIRC anything that is build metadata should be sorted alphabetically when determining version precedence, no?

@sydb
Copy link
Member

sydb commented May 11, 2020

Maybe it’s me … I am certainly an old dog in a new trick game. But I cannot see TEI as an API. (In fact, I just yesterday got a virtual earful of how it is problematic that there is no Web API that allows one to get snippets of the source of the Guidelines.) An API “defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc.” [Wikipedia]. What call or request can a software system make to TEI?

As for build metadata, I wish that were so, @duncdrum, but what the SemVer spec says is “Build metadata does not figure into precedence”.

But I think we want build metadata included in our little universe, although we may want to go ahead with this ticket first, and then add it later.

@sydb sydb added this to the Guidelines 4.2.0 milestone Jan 19, 2021
@raffazizzi
Copy link
Contributor

VF2F subgroup thinks that it is appropriate to say that a TEI version number follows the Semantic Versioning Specification.
Before we merge, however, could @peterstadler address @sydb's other comments?

@sydb
Copy link
Member

sydb commented May 7, 2021

Council SVF2F NA group thinks:

  • Still pending @peterstadler to make mods requested by me (per @raffazizzi)
  • Still need to decide whether or not the value of @version should be required to match the (beginning of) the version of the schema
  • Still need prose to explain what it means to be conforming to SemVer

@sydb
Copy link
Member

sydb commented Apr 1, 2022

Council SVF2F NA group still thinks the 3 items listed on 07 May should be addressed. For 2nd bullet point we suggest that there be a soft requirement. I.e., prose that says “they should match”, not an actual schema test that will fail if they don’t.
@peterstadler should consider himself poked about the other two bullet points.

@ebeshero
Copy link
Member

ebeshero commented Apr 8, 2022

Council meeting 2022-04-08: Add language to indicate that for the TEI Guidelines we’re inspired by the semantic versioning conventions but we’re applying our own TEI conventions to it (not following the custom for identifying “major” / “minor”.)

@sydb
Copy link
Member

sydb commented Apr 8, 2022

Note: commit 7131846 did nothing except merge in the 664 commits that had been made to 'dev' branch in the interim.

@ebeshero ebeshero modified the milestones: Guidelines 4.6.0, Guidelines 4.7.0 Apr 3, 2023
@raffazizzi raffazizzi modified the milestones: Guidelines 4.7.0, Guidelines 4.8.0 Nov 9, 2023
@sabineseifert
Copy link
Contributor

Council at VF2F 16 March 2024:

Discussion

  • Main problem: versioning system (major-dot-minor-dot-patch followed by optional ‘a’ or ‘b’, or supposedly ‘α’, ‘Β’, “alpha”, or “beta”) is not permitted by the datatype used for TEI/@version and teiCorpus/@version
  • datatypes for other version numbers need rationalization as well
  • teidata.version should allow for multiple variations
  • Should the TEI provide a datatype that is not used by any actual element or attribute defined by the Guidelines themselves?

Council decision: Development of a comprehensive system for datatypes of version numbers

  • An uber-datatype (e.g. teidata.version or teidata.versionIndicator or teidata.versionNumber) should be defined as an alternation of various specific datatypes which match particular kinds of version numbering systems. E.g.:
    • teidata.semVer would match the major-dot-minor-dot-patch-optional-dash-details kinda thing (TEI/@version, teiCorpus/@version)
    • teidata.calVer would match date-like version numbers (e.g.: 20240315)
    • teidata.unicodeVer would match theirs (att.gaijiProp)
    • teidata.version.versionNumber (e.g. 4.7.0)
  • teidata.version would be defined as something like
<dataSpec ident="teidata.version">
<content>
<alternate minOccurs="1" maxOccurs="1">
        	 <dataRef key="teidata.version.semVer"/>
        	 <!-- follows semantic versioning syntax, see https://semver.org/ -->
        	 <dataRef key="teidata.version.calVer"/>
        	 <!-- follows calendar versioning syntax, see https://calver.org/ -->
        	 <dataRef key="teidata.version.unicodeVer"/>
        	 <!-- an enumerated list of Unicode versions (currently in att.gaijiProp)-->
        	 <dataRef key="teidata.version.versionNumber"/>
        	 <!-- for backwards comopatability -->
    	 	</alternate>
	 </content>
</dataSpec>
  • original issue #1993: value of @version on TEI/teiCorpus will be changed to teidata.version.semVer
  • SB: a TEI schema should permit only 1 possible value on TEI/@version or teiCorpus/@version, namely that schema’s version number. That schema number might be derived by appending the schemaSpec/@ident to the version number of the schema pointed to by @source. [This would be a new ticket.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants