Render integer values in simply #590

ietf-svn-bot · 2021-02-02T23:50:30Z

owner:jennifer@painless-security.com resolution_fixed type_enhancement | by martin.thomson@gmail.com

We have a number of places in QUIC that we are using 2^15 and similar. Using 215 makes the HTML rendering much nicer, but the text then renders as 2^(15).

A small tweak might improve rendering with no real loss of fidelity. Patch inbound.

Issue migrated from trac:590 at 2022-02-08 07:12:21 +0000

The text was updated successfully, but these errors were encountered:

ietf-svn-bot · 2021-02-02T23:57:35Z

@rjsparks@nostrum.com commented

Thanks for the patch.
I think we should expand on the regex to match any single token, not just an integer.
See also #574.

ietf-svn-bot · 2021-02-03T00:07:06Z

@martin.thomson@gmail.com uploaded file 0001-Render-integer-superscripts-simply.patch (1.4 KiB)

Render integer subscripts simply

ietf-svn-bot · 2021-02-03T00:09:26Z

@martin.thomson@gmail.com commented

Ahh, I didn't see that one.

I can easily change the pattern matching here, but it's not clear what the rules would be for deciding. Are you thinking ^(?:-?\d+|\w+)$? That would capture integers or single "words", according to regex definitions.

ietf-svn-bot · 2021-02-03T10:51:47Z

@lars@eggert.org commented

Could we make the same change to ? It would improve rendering of the subscripts of the math variables in https://ntap.github.io/rfc8312bis/draft-eggert-tcpm-rfc8312bis.txt

ietf-svn-bot · 2021-02-08T08:39:34Z

@martin.thomson@gmail.com uploaded file 0001-Render-integer-super-sub-scripts-simply.patch (2.2 KiB)

Simple rendering for super- and sub-scripts

ietf-svn-bot · 2021-02-09T19:33:36Z

@jennifer@painless-security.com changed status from new to assigned

ietf-svn-bot · 2021-02-09T19:33:36Z

@jennifer@painless-security.com changed owner from `` to jennifer@painless-security.com

ietf-svn-bot · 2021-02-09T19:38:25Z

@jennifer@painless-security.com commented

One exceptional case that jumps out at me is the case where someone explicitly wants parentheses in the HTML output. E.g., (x + y). This will render as 2^((x+y)). This would also affect any other brackets.

As this is a problem with the current text renderer and solving this as a general problem is tricky, perhaps that's best left as a separate issue. (Or put aside entirely.)

ietf-svn-bot · 2021-02-09T20:28:55Z

@jennifer@painless-security.com commented

How would you feel if I simplify the pattern to ^\w+$. This matches integers and words, but keep the parentheses for signed numbers or decimals. I think punctuation in the super/subscripted expression can be confusing. Between, e.g., 2^(3.0) and 2^3.0, I find the former to be clearer. Also, 2^(-1) vs 2^-1.

ietf-svn-bot · 2021-02-09T20:28:55Z

@jennifer@painless-security.com changed _comment0 which not transferred by tractive

ietf-svn-bot · 2021-02-09T21:02:31Z

@jennifer@painless-security.com commented

Very sorry for the spam, but I just ran the tests with patched code and I'm not enamored of the results:

-   This is regular text.  This is s_(ubscript).  This is s^(uperscript).
+   This is regular text.  This is s_ubscript.  This is s^uperscript.

and

-    | The _quick_ ^(brown) _(fox) *jumps* over    | Paragraph 1       |
+    | The _quick_ ^brown _fox *jumps* over the    | Paragraph 1       |

The examples are contrived, so in actual use things might turn out clearer. Looking at the subscript examples lars@eggert.org pointed to,

_W_(max)_ will become _W_max_
W_(cubic)(_t_ + _RTT_) will become W_cubic (_t_ + _RTT_)

I wonder if it might be preferable to keep the parentheses except for integers. I'm happy to do it either way, just wanted to point this out to be sure the effect is what's desired.

ietf-svn-bot · 2021-02-09T21:22:57Z

@martin.thomson@gmail.com commented

I find that the W_cubic example is better, but it isn't clear why the values for t and RTT are underlined in that way. Mixing subscripts and other underscores in that way ends up looking odd, but that might be something Lars can work through. Changing the W_max example is probably something Lars can do though.

This works very nicely for the numbers in QUIC. Much better than with the parentheses.

I do think that maybe we could remove '_' from the set of characters that was otherwise in \w to avoid creating confusion in rendering, but otherwise, I think that this is good. I think that authors will simply need to be aware of how this renders in text and adjust. Just like they probably shouldn't mix a literal '^' and '^'.

ietf-svn-bot · 2021-02-10T07:35:14Z

@lars@eggert.org commented

It's _W_(max)_ because the markdown source is *Wmax*, i.e., the markdown formats the variable name in the body of the text in italics, to match the styling of SVG math renderer. Ditto for _t_ and _RTT_, the math renderer uses italics for variables, and I am trying to reproduce that.

I'd prefer that italics in text form didn't get rendered with underscores and instead simply became plain text, but that needs a separate issue filed.

ietf-svn-bot · 2021-02-10T07:35:14Z

@lars@eggert.org changed _comment0 which not transferred by tractive

ietf-svn-bot · 2021-02-10T07:35:14Z

@lars@eggert.org changed _comment1 which not transferred by tractive

ietf-svn-bot · 2021-02-10T14:09:10Z

@jennifer@painless-security.com commented

This sounds good - makes sense that people will need to be careful, since there's only so much that can be done to typeset things unambiguously. I agree that keeping parentheses if the expression includes an underscore is a good idea.

I think that the pattern ^[+-]?\d*\.?[a-zA-Z0-9]*$ captures what we've discussed.

ietf-svn-bot · 2021-02-10T16:10:10Z

@jennifer@painless-security.com changed status from assigned to closed

ietf-svn-bot · 2021-02-10T16:10:10Z

@jennifer@painless-security.com changed resolution from `` to fixed

ietf-svn-bot · 2021-02-10T16:10:10Z

@jennifer@painless-security.com commented

Fixed in 65f2676:

Simplify text rendering of super/subscripts. Based on patch submitted by martin.thomson@gmail.com. Fixes #590. Commit ready for merge.

ietf-svn-bot · 2021-02-10T22:44:19Z

@martin.thomson@gmail.com commented

Hi Jennifer,

You have:

        return re.match(r'^[+-]?\d*\.?[a-zA-Z0-9]*$', expr) is not None

I don't think that is good as it allows for some weird patterns. Like '^+.word', '^+', '^23.stuff', or the empty string: '^'. I would have thought that it would be better to keep numbers and words distinct and require at least one character:

        return re.match(r'^(?:[+-]?\d+(?:\.\d+)?|[a-zA-Z0-9]+)$', expr) is not None

This doesn't allow for an empty digit string in any position for a number, nor does it allow for the string overall to be empty as your pattern did.

Not using \w means that this loses the ability to have a unicode character in super-/sub-script, which is probably worth noting.

ietf-svn-bot · 2021-02-11T04:56:41Z

@jennifer@painless-security.com commented

Yes, the empty string should be rejected.

The other examples are wonky, but seem contrived. If someone is using notation like that, adding parentheses to the mix is as likely to confuse the meaning as to clarify it. The reason I accept those is because it also accepts things like 2^-2n without parentheses. That seems to me less in need of parentheses than, e.g., 2^-3.14159.

So I think we should perhaps take a step back, decide what we would like to accept as a token first, then implement to that.

A few cases that have come up - I'd appreciate your thoughts on these or any I've overlooked.

Ones we seem to agree clearly do not need parentheses:

integers
non-integer decimals (at least one digit on either side of the decimal point)
ASCII letter/digit strings
positive and negative signs on numeric values

Things that may or may not need parentheses (but we don't clearly agree):

non-integer decimals (one side or empty, e.g. .5 or 1.)
non-integer decimals with letter/digit strings (0.5x)
positive and negative signs for letter/digit strings (-num or ^+x`)
unicode \w strings

Things we seem to agree clearly do need parentheses:

anything with non-\w characters
anything with an underscore

Regarding unicode, I'm inclined to keep the parentheses - I'm not sure that there's a good way to know that a character is going to be confusing without them, so it seems prudent to assume the worst. It might be nice to handle common cases, such as Greek characters, but that seems like a big project to handle well.

For decimal points without digits on one side, my inclination is to keep them. They're poor style, but I don't know that they are any less readable without the parentheses. I don't feel terribly strongly about this, though.

I do think accepting signs for things like -2n is desirable.

Sorry for the long message - I don't mean to draw this out, but it's a tricky feature and I think being deliberate will avoid revisiting it more than necessary.

ietf-svn-bot · 2021-02-11T07:15:15Z

@martin.thomson@gmail.com commented

Thanks Jennifer, that makes sense. On your questionable ones:

non-integer decimals (one side or empty, e.g. ^.5 or ^1.)

Prefer parens, I think, but only weakly.

non-integer decimals with letter/digit strings (^0.5x)

Prefer no parens, yeah.

positive and negative signs for letter/digit strings (^-num or ^+x`)

Prefer no parens on -, don't care about + (it's weird, so I'm OK either way).

unicode \w strings

Prefer no parens; we could just filter out underscore. The reason is to deal with the math stuff Lars is doing, where 2^α seems pretty reasonable.

Does that help?

ietf-svn-bot · 2021-02-11T07:15:15Z

@martin.thomson@gmail.com changed _comment0 which not transferred by tractive

ietf-svn-bot · 2021-02-11T08:48:51Z

@lars@eggert.org commented

Replying to ietf-svn-conversion/xml2rfc#590 (comment:13):

Regarding unicode, I'm inclined to keep the parentheses - I'm not sure that there's a good way to know that a character is going to be confusing without them, so it seems prudent to assume the worst. It might be nice to handle common cases, such as Greek characters, but that seems like a big project to handle well.

Given that sub/sup are almost always going to be used for math, I think allowing "mathy" Unicode characters such as Greek letters would be very useful.

ietf-svn-bot · 2021-02-11T15:49:52Z

@jennifer@painless-security.com commented

Thanks for your thoughts. I'm sold on parenthesizing the bare decimal points and on accepting unicode words. I think accepting plus signs is worthwhile - it's not common, but comes up sometimes and basing the rule on its being a sign character seems to me less likely to be surprising.

Rather than trying to write all this in a RE pattern, I've expanded the is_simple_expression() method - I think this makes it more understandable. I've made it unicode-aware, but have not found a way to enter unicode characters that are rendered by the  (they turn into "&#" code points when I try the straightforward way).

I have added a check that avoids doubling up if the expression is already delimited by parentheses (so that (x+y) won't become ^((x+y)))

    def is_simple_expression(expr):
        """Can this expression be rendered without adding parentheses?"""
        def already_parenthesized(s):
            """Is the string enclosed in parentheses?

            Only considers parentheses, not other brackets. Good enough to avoid
            pointlessly doubling the parentheses, not to decide that the expression
            makes mathematical sense.
            """
            if not (len(s) >= 2 and s[0] ## '(' and s[-1] ')'):
                return False
            count = 0
            for c in s[1:-1]:
                count += 1 if c ## '(' else -1 if c ')' else 0
                if count < 0:
                    return False
            return count == 0

        expr = expr.strip()

        # Avoid (( )) if the entire expression is already in balanced parentheses
        if already_parenthesized(expr):
            return True

        # Underscore is a `\w` character, so explicitly reject it
        if '_' in expr:
            return False

        # Leading sign is allowed, so ignore it for further tests. Accept unicode
        # sign chars '\u2212' (negative sign), '\u00b1' (plus/minus), '\u2213' (minus/plus),
        # '\ufe63' (small minus),'\uff0b' (full-width plus), '\uff0d' (full-width minus)
        if expr and expr[0] in '+-\u2212\u00b1\u2213\ufe63\uff0b\uff0d':
            expr = expr[1:]

        # Empty or all-whitespace after removing sign must have parentheses for clarity
        if len(expr) == 0:
            return False

        # Regex accepts possibly decimal number followed by mixed word characters.
        # Assumes already removed sign and checked for empty string.
        return re.match(r'^(?:\d+(?:\.\d+)?)?\w*$', expr) is not None

To give you an idea of what this does, for the following input

          <t>2<sup>15</sup> 2<sup>-15</sup><sup>+15</sup></t>
          <t>2<sup>3.0</sup> 2<sup>-3.0</sup> 2<sup>+3.0</sup></t>
          <t>2<sup>(x+y)</sup> 2<sup>-(x+y)</sup></t>
          <t>2<sup>2n</sup> 2<sup>-2n</sup></t>
          <t>this is s<sup>uperscript</sup></t>
          <t>this is s<sup>-trange</sup></t>
          <t>this is <sup>multiple words</sup></t>
          <t>W<sub>max</sub></t> <t>W<sub>max_0</sub></t>
          <t><sup>+.word</sup> <sup>23.stuff</sup> <sup></sup> <sup>   </sup> <sup>-</sup></t>

it renders to

   2^15 2^-15^+15

   2^3.0 2^-3.0 2^+3.0

   2^(x+y) 2^(-(x+y))

   2^2n 2^-2n

   this is s^uperscript

   this is s^-trange

   this is ^(multiple words)

   W_max

   W_(max_0)

   ^(+.word) ^(23.stuff) ^() ^() ^(-)

What do you think?

ietf-svn-bot · 2021-02-11T21:29:46Z

@martin.thomson@gmail.com commented

Love it. Thanks for doing this.

Given the leading +/- check, why not this ordering?

        # Leading sign is allowed, so ignore it for further tests. Accept unicode
        # sign chars '\u2212' (negative sign), '\u00b1' (plus/minus), '\u2213' (minus/plus),
        # '\ufe63' (small minus),'\uff0b' (full-width plus), '\uff0d' (full-width minus)
        if expr and expr[0] in '+-\u2212\u00b1\u2213\ufe63\uff0b\uff0d':
            expr = expr[1:]

        # Avoid (( )) if the entire expression is already in balanced parentheses
        if already_parenthesized(expr):
            return True

        # Underscore is a `\w` character, so explicitly reject it
        if '_' in expr:
            return False

        # Empty or all-whitespace after removing sign must have parentheses for clarity
        if len(expr) == 0:
            return False

That would change 2-(x+y) to 2^-(x+y).

ietf-svn-bot · 2021-02-12T13:44:24Z

@jennifer@painless-security.com commented

I went back and forth on that. I'm happy to do it the other way.

However, one thing I've realized while thinking about that is that we need to think about spaces. The issue:

x0ny0m

becomes

x_0^ny_0^m

which, in addition to looking like strange ascii art, is pretty ambiguous. I'm not sure how to handle this. The simple thing would be to change the render_sup method to use '^%s ' (note the spaces after the s), but that will cause artifacts like:

My sentence is x_0 ^n y_0 ^m .

(spaces between sub/sup and before the sentence period)

I suppose this is another case where we could leave it to the author to know that spaces are needed - certainly that'd be understood by LaTeX users.

ietf-svn-bot · 2021-02-12T21:10:18Z

@jennifer@painless-security.com commented

Ok - I had a look at the output of the HTML writer and found that its results without a space between factors also look a bit odd. With a space, they are much more readable. Based on that, I'm not going to worry about the lack of a trailing space in the text writer and leave it to the author to insert one.

ietf-svn-bot · 2021-02-12T21:10:18Z

@jennifer@painless-security.com changed _comment0 which not transferred by tractive

ietf-svn-bot · 2021-02-17T15:32:44Z

@jennifer@painless-security.com commented

FYI, the additional work has now been committed in 28d2f44

ietf-svn-bot · 2021-02-18T00:26:30Z

@martin.thomson@gmail.com commented

Thanks Jennifer, this is a nice improvement.

ietf-svn-bot · 2021-03-16T16:11:54Z

@rjsparks@nostrum.com commented

Fixed in 0979a66:

Merged in 65f2676 and 28d2f44 from jennifer@painless-security.com:\n Simplify text rendering of super/subscripts. Based on patch submitted by martin.thomson@gmail.com and refinement from subsequent list discussion. Fixes #590.

ietf-svn-bot · 2022-03-11T09:40:14Z

The attachments for these issues were lost in trac before the transition to github, and cannot be recovered. If the issue is still relevant, and the attachments can be reconstructed, please add them as new comments.

ietf-svn-bot closed this as completed Feb 10, 2021

ietf-svn-bot mentioned this issue Mar 11, 2022

Stop rendering with underscores #596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render integer values in <sup> simply #590

Render integer values in <sup> simply #590

ietf-svn-bot commented Feb 2, 2021

ietf-svn-bot commented Feb 2, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 8, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021 •

edited

Loading

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 17, 2021

ietf-svn-bot commented Feb 18, 2021

ietf-svn-bot commented Mar 16, 2021

ietf-svn-bot commented Mar 11, 2022

Render integer values in <sup> simply #590

Render integer values in <sup> simply #590

Comments

ietf-svn-bot commented Feb 2, 2021

ietf-svn-bot commented Feb 2, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 3, 2021

ietf-svn-bot commented Feb 8, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 9, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 10, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021 • edited Loading

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 11, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 12, 2021

ietf-svn-bot commented Feb 17, 2021

ietf-svn-bot commented Feb 18, 2021

ietf-svn-bot commented Mar 16, 2021

ietf-svn-bot commented Mar 11, 2022

ietf-svn-bot commented Feb 11, 2021 •

edited

Loading