You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#707 indicates the usage of https://www.w3.org/TR/REC-xml/#charsets but it is missing some characters. I understand having digit split out but it also skips "\x002A".."\x002F" and "\x003A".."\x003F".
Meaning *+,-./:;<=>? are all excluded from being a character / characterLiteral. Most of those are also defined as their own things, but semicolon and question mark are not. And I would understand if all the additional defines symbols were excluded but brackets, parens, caret, percent, and underscore are all included in the ranges in alpha.
This is causing example8.txt to fail in my parser since it has 'HH+VV+HV+VH' in it. As what I assume is meant to be a characterLiteral but it contains + though so it cannot parse as one. (I also mentioned this over in #783)
The other issue I have ran into is that single quote "\x0027" is included in alpha so there seems to be ambiguity in the grammar. Anything with multiple characterLiteral in it can be parsed weirdly. example16.txt has:
swimming_pool=true AND (floors>5 OR
material LIKE 'brick%' OR
material LIKE '%brick')
Which is parseable as a single characterLiteral even though that is clearly not the actual intent.
'brick%' OR
material LIKE '%brick'
I would think alpha may need to be defined slightly differently. Perhaps just do "\x0020".."\x0026" and "\x0028".."\xD7FF" and only skip the single quote:
It could also be reasonable to add whiteSpace and symbol definitions in to capture them separately and then have character = alpha | digit | escapedquote | whiteSpace | symbol;. I don't know if it makes sense to do so though since they aren't going to get used elsewhere. And avoiding overlapping ranges doesn't seem necessary since identifier stuff already overlaps stuff.
The text was updated successfully, but these errors were encountered:
@cportele, @eseglem please review #789.
I fixed the alpha production to add the missing chars and remove the single quote.
Rather than adding more productions for whitespace and symbols I decided to remove a lot of the unnecessary single-character productions like colon, leftParent, etc. and simply use the literals in the gramar. I think (hope) it makes the grammar a litter easier to read.
The current definition of alpha has the following:
#707 indicates the usage of https://www.w3.org/TR/REC-xml/#charsets but it is missing some characters. I understand having
digit
split out but it also skips"\x002A".."\x002F"
and"\x003A".."\x003F"
.Meaning
*+,-./:;<=>?
are all excluded from being acharacter
/characterLiteral
. Most of those are also defined as their own things, but semicolon and question mark are not. And I would understand if all the additional defines symbols were excluded but brackets, parens, caret, percent, and underscore are all included in the ranges inalpha
.This is causing example8.txt to fail in my parser since it has
'HH+VV+HV+VH'
in it. As what I assume is meant to be acharacterLiteral
but it contains+
though so it cannot parse as one. (I also mentioned this over in #783)The other issue I have ran into is that single quote
"\x0027"
is included inalpha
so there seems to be ambiguity in the grammar. Anything with multiplecharacterLiteral
in it can be parsed weirdly. example16.txt has:Which is parseable as a single
characterLiteral
even though that is clearly not the actual intent.I would think
alpha
may need to be defined slightly differently. Perhaps just do"\x0020".."\x0026"
and"\x0028".."\xD7FF"
and only skip the single quote:It could also be reasonable to add
whiteSpace
andsymbol
definitions in to capture them separately and then havecharacter = alpha | digit | escapedquote | whiteSpace | symbol;
. I don't know if it makes sense to do so though since they aren't going to get used elsewhere. And avoiding overlapping ranges doesn't seem necessary since identifier stuff already overlaps stuff.The text was updated successfully, but these errors were encountered: