CQL2 (Text): Alpha and Symbols #787

eseglem · 2023-02-13T16:58:15Z

The current definition of alpha has the following:

alpha = "\x0009" | "\x000A" | "\x000D" | "\x0020".."\x0029" |
        "\x0040".."\xD7FF" | "\xE000".."\xFFFD" | "\x10000".."\x10FFFF"

#707 indicates the usage of https://www.w3.org/TR/REC-xml/#charsets but it is missing some characters. I understand having digit split out but it also skips "\x002A".."\x002F" and "\x003A".."\x003F".

Meaning *+,-./:;<=>? are all excluded from being a character / characterLiteral. Most of those are also defined as their own things, but semicolon and question mark are not. And I would understand if all the additional defines symbols were excluded but brackets, parens, caret, percent, and underscore are all included in the ranges in alpha.

This is causing example8.txt to fail in my parser since it has 'HH+VV+HV+VH' in it. As what I assume is meant to be a characterLiteral but it contains + though so it cannot parse as one. (I also mentioned this over in #783)

The other issue I have ran into is that single quote "\x0027" is included in alpha so there seems to be ambiguity in the grammar. Anything with multiple characterLiteral in it can be parsed weirdly. example16.txt has:

swimming_pool=true AND (floors>5 OR
                        material LIKE 'brick%' OR
                        material LIKE '%brick')

Which is parseable as a single characterLiteral even though that is clearly not the actual intent.

'brick%' OR
                        material LIKE '%brick'

I would think alpha may need to be defined slightly differently. Perhaps just do "\x0020".."\x0026" and "\x0028".."\xD7FF" and only skip the single quote:

alpha = "\x0009" | "\x000A" | "\x000D" | "\x0020".."\x0026" |
        "\x0028".."\xD7FF" | "\xE000".."\xFFFD" | "\x10000".."\x10FFFF"

It could also be reasonable to add whiteSpace and symbol definitions in to capture them separately and then have character = alpha | digit | escapedquote | whiteSpace | symbol;. I don't know if it makes sense to do so though since they aren't going to get used elsewhere. And avoiding overlapping ranges doesn't seem necessary since identifier stuff already overlaps stuff.

The text was updated successfully, but these errors were encountered:

cportele · 2023-02-15T15:27:45Z

Meeting 2023-02-15: @pvretano will review, thanks for raising this.

pvretano · 2023-02-19T08:03:22Z

@cportele, @eseglem please review #789.
I fixed the alpha production to add the missing chars and remove the single quote.
Rather than adding more productions for whitespace and symbols I decided to remove a lot of the unnecessary single-character productions like colon, leftParent, etc. and simply use the literals in the gramar. I think (hope) it makes the grammar a litter easier to read.

cportele · 2023-02-21T15:46:10Z

Closed by #789

cportele added the CQL2 label Feb 15, 2023

cportele assigned pvretano Feb 15, 2023

pvretano mentioned this issue Feb 19, 2023

CQL2 Escaping #717

Closed

cportele closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQL2 (Text): Alpha and Symbols #787

CQL2 (Text): Alpha and Symbols #787

eseglem commented Feb 13, 2023

cportele commented Feb 15, 2023

pvretano commented Feb 19, 2023

cportele commented Feb 21, 2023

CQL2 (Text): Alpha and Symbols #787

CQL2 (Text): Alpha and Symbols #787

Comments

eseglem commented Feb 13, 2023

cportele commented Feb 15, 2023

pvretano commented Feb 19, 2023

cportele commented Feb 21, 2023