-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discrepancy of P5 version number vs definition of version attribute of TEI #1993
Comments
A fourth and perhaps less disruptive solution would be to leave the version unsuffixed, and use @status to indicate whether this is an alpha/beta/whatever version. This would just entail making TEI a member of the att.docStatus class. |
wouldn't it be nice if P.S.: using |
@lb42: @duncdrum:
|
If As for Unicode versions, I m pretty sure those will always be valid sem-ver as well. |
I believe I am correctly representing the entire Council’s point of view to say that
below |
Had to rewrite it a little more to avoid validation errors (in oXygen) for (0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-((0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*)(\.(0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*))*))?(\+([\-0-9a-zA-Z]+(\.[\-0-9a-zA-Z]+)*))? |
I did work on this in branch |
@peterstadler from the looks of it you haven t updated the Also the French guidelines have 5.0 whereas the en has 5.0.0 |
I had not planned to do this until late tonight or tomorrow; will look at regexes then, but would not be surprised at failures until. |
Oh dear. But (as @peterstadler points out), oXygen reports that regular expression is invalid. It is wrong, IMHO. This is just jing complaining of a character range (“charRange”, production 17, here used as part of a positive character group (“posCharGrp”, production 14)) which is only one character, a U+002D. I have always considered this a bug in jing (but have also been worried that I might be wrong, because rnv reports similar). If you read the spec, the productions imply that U+002D is allowed as a character used in a character range (in this case, a character range of only 1 character represented by itself). There is a constraint expressed in the prose, though (3rd bullet point):
But in all cases in the regex I wrote, the U+002D is at the end of a positive character group. So a large part of me doesn’t care. After all, other W3C regex processors do not have a problem with this. While this attitude may get me off the hook morally, it does not solve our problem, because we use jing to validate our documents dozens of times a day. So it seems to me we do have to fix this. That said, in the meantime @duncdrum thinks there is an error in the regular expression I posted, and posted an alternative; and @peterstadler (who I belatedly realize is actually trying to implement this ticket) has come up with a version that is valid per jing. So I went about testing them. I used the set of versions used on the regex101 page that is pointed to by the sem-ver page (and I think @duncdrum used, too) as a test suite. I tried validating each of the 71 tests against each of the 3 regular expressions. I used Schematron because I could not use jing or rnv to validate RelaxNG. All 3 of our regular expressions correctly say test cases 1–31 are valid. Then it gets weird.
So … so far, of the 3 regular expressions we have, @peterstadler’s is the clear winner. But it still says 19 of the supposedly invalid test cases are valid. I have to admit, when I created my version above, I took the regex101 expression at face value, and did not test it. (I kinda doubt the other gentlemen did, either.) But it looks like either the test suite or the expression has some problems. I am starting to think I should re-write this regex myself from scratch by reading the sem-ver spec, and ignoring other examples of regular expressions. But that will take several hours. And someone (maybe me) should look very carefully at the test suite, and see if any of those it claims are invalid are actually valid per the spec. Here are my test files: |
@sydb @peterstadler @duncdrum In a vain(?) effort to try to simplify the hideousness of these regular expressions, I tried my hand at it with Syd's handy test file. This isn't perfect, but it's a little shorter anyway:
Should we be trying to simplify this versioning system anyway? |
Despite the regex issue, I feel a little bit uneasy (now that I looked at it more closely) about changing So, I think what I'd like to see is that we keep backwards compatibility for |
I m less hesitant about backwards compatibility than @peterstadler. Things will break only if ppl specify a new I m more hesitant about creating out own regex though. If we continue down that road, we should add the regex tests to the tei testsuite, to make sure that we stick to the sem-ver specs. Good or bad. |
You have a long way to go to convince me on this, @peterstadler. Besides @duncdrum’s valid (pardon the pun) argument, the fact that I think there is a very strong argument to be made that the value of That is, of course, what I’m trying to do with tei_customization. And it’s not as simple as that, because I (at least at the moment) think we should allow “-alpha”, “+MoEML-6.2”, and perhaps most importantly “+Stylesheets-7.53.1” appendices. But that’s not quite right, because SemVer gives “4.3.1+Stylesheets-7.53.1” the same precedence as “4.3.1+Stylesheets-7.53.6”, “4.3.1+Stylesheets-7.58.0”, or “4.3.1+platypus”, and I think we would want the later Stylesheets version to have higher precedence. |
Just created the PR #1996 because after some more testing (and reverted changes) I think it's good to go. Thanks @sydb for the test files and indeed the results are strange. Yet, I believe this to be some Schematron issue because the following RelaxNG schema does validate your test file as intended (and works the same way with @sydb's provided rnc): <grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<element name="tests">
<oneOrMore>
<element name="test">
<attribute name="n">
<data type="integer"/>
</attribute>
<element name="sydb">
<text/>
</element>
<element name="duncdrum">
<text/>
</element>
<element name="peterstadler">
<data type="token">
<param name="pattern">(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-((0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*)(\.(0|[1-9]\d*|\d*[\-a-zA-Z][\-0-9a-zA-Z]*))*))?(\+([\-0-9a-zA-Z]+(\.[\-0-9a-zA-Z]+)*))?</param>
</data>
</element>
</element>
</oneOrMore>
</element>
</start>
</grammar> (NB: I couldn't paste in your regexes because they were flagged as invalid so your tests will always return ok.) So, since we're not using Schematron for validating the |
It seems I've been too quick with the PR #1996 To me it was clear that we'd follow the SemVer specification and that we'd want to update
Of course, we'd need to flesh out what MAJOR, MINOR, and PATCH changes are in the TEI Guidelines world but IMHO that's a better exercise than inventing one's own version number scheme. |
I asked on TEI-L at 2020-07-16 and on 2020-10-12 if removing the version attribute on |
Literally remove |
Yes, literally remove Keep in mind there are already two other mechanisms (xml-model processing instruction and the |
@duncdrum Just out of interest, what do you use |
To specify the version of the Guidelines, that a document is using. Usually before and after a contributions is particularly important. |
VF2F meeting has discussed this ticket and the related PR #1996 and the decision is to change the datatype specification of A preliminary proposal for
|
The version number of the TEI Guidelines is (at least nowadays) stored in ./p5.xml at /TEI/teiHeader/fileDesc/editionStmt/edition/ref[2]. The value therein is copied from ./VERSION.
The current contents of that file (in the dev branch) is “4.1.0a”. Makes sense. The current release is 4.0.0, and we are anticipating the release of 4.1.0 in August or thereabouts by referring to the development branch copy as an alpha. IIRC, we plan to switch that to beta at the time when we go into “refrigeration” mode, and thus will use “4.1.0b” for a couple of weeks; and then the new release will be “4.1.0”. All well and good.
BUT the definition of the
@version
attribute of<TEI>
is teidata.version, which in turn is defined as a token that matches "[\d]+(.[\d]+){0,2}", and gets that definition from the Unicode standard. Thus the ‘a’ or ‘b’ at the end of our version number (which I submit could just as well be an ‘α’ or ‘β’) for P5 cannot be used on the@version
attribute of<TEI>
(or<teiCorpus>
).This strikes me as bad on general principles. But in specific, it causes a problem for building tei_customization which wants to use the version number of the p5.xml that was used as its source as both the
@version
of its own<TEI>
and the@version
of a TEI document that conforms to it. (And, I daresay, all Exemplars should do that. But that’s a different ticket.)Possible solutions:
@version
of tei_customizatoin. I don’t like this because that means tei_customization is lying about its own pedigree.@version
of<TEI>
, or@version
of att.versioned to allow for an optional ultimate ‘a’, ‘b’, ‘α’, or ‘β’. I like this, but it does involve some work. First, we would have to decide on whether or not to keep@version
of<TEI>
and<unicodeName>
the same (allowing a trailing letter where none should be allowed in the case of<unicodeName>
), or have<unicodeName>
inherit then modify the attribute so it is further restricted. Second, we would no longer be able to rely soley on the Unicode definition of a version number.The text was updated successfully, but these errors were encountered: