-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make yychar
long
#605
Make yychar
long
#605
Conversation
The input is generated by a custom Reader.
…er that repeats the given content, to reduce iterations when we read a large amount of data.
We actually need a lot of RAM to make this test pass.
> By default a container is given 2 CPUs and 4 GB of memory but it can be configured in .cirrus.yml https://cirrus-ci.org/guide/linux/
Don't prepare a large buffer if we read only a few characters.
And remove old `Yyxlex.java` which was submitted (probably by mistake) in commit 4db21cb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this, esp generating the large input is nice.
We should announce this change fairly prominently, because it is API-breaking: users will have to make the same change we had to make in the example.
I wonder if we should make the type configurable (in a separate PR). The ability to handle input files >2GB is not necessarily what everyone needs, some might prefer not having to change their code.
Thanks for the positive feedback, Gerwin. My opinion is that:
That being said, I think 1.8 has accumulated enough changes to be released. |
Author: Régis Décamps <regisd@google.com> Date: Wed Nov 27 00:02:51 2019 +0100 Make `yychar` long (#605) * Replace `yychar` by long in skeletons (default & nested) * Update example for long yyChar * Also, add some positivity assertion in `example/simple/Yytoken` to give some defensive programming as an example. * Update manual and READMe * Update test added in #603 . * Improve doc of RepeatContentReader * Improve RepeatContentReader Don't prepare a large buffer if we read only a few characters. * Also update testcase/example/simple for long `yychar`. * Delete testcase/simple which is just redundant with the example. Updated from target/jflex-parent-1.8.0-SNAPSHOT-sources.jar
Yes, I meant making it prominent in the release announcement (just because API breaking changes are rare for JFlex), I agree that we don't want to announce it before. |
Forgot to say: I agree that 1.8 is getting there. Still labouring under the illusion that I'll manage to pull in the char class macros (#216) into 1.8. Recently made some progress on that, but there is still some way to go. |
Introduce a fakeRead() that changes `yychar` so that we can fakely jump to yychar just before Integer.MAX_VALUE and make the test run in a snap. Follow-up of jflex-de#603 and jflex-de#605
Follow-up comment for those who hit this breaking change. There are two options to fix it:
|
Starting from JFlex 1.8.0, `yychar` is now a long. However antlr `CommonToken` only takes an int, hence explicit casting is needed. I propose to use `java.lang.Math.toIntExact()` which throws ArithmeticException in case of overflow. This is acceptable, because the application will have an illegal state today (negative character position caused by the int overflow). See jflex-de/jflex#605 for more details.
Since JFlex 1.8.0, the `yychar` is a long. However, it is used to build an antlr `CommonToken` which takes an int. I propose to cast the `yychar` with `java.lang.Math.toIntExact()` which throw an ArithmeticException in case of overflow. This is acceptable because the previous code would have returned an overflowed (negative) position. See jflex-de/jflex#605 for details.
Starting from JFlex 1.8.0, `yychar` is a long. However, it is used to build an antlr `CommonToken` which takes an int. Hence, explicit casting is needed. I propose to cast the `yychar` with `java.lang.Math.toIntExact()` which throw an ArithmeticException in case of overflow. This is acceptable because the previous code would have returned an overflowed (negative) position. See jflex-de/jflex#605 for details.
Starting from JFlex 1.8.0, `yychar` is a long. However, it is used to build an antlr `CommonToken` which takes an int. Hence, explicit casting is needed. I propose to cast the `yychar` with `java.lang.Math.toIntExact()` which throw an ArithmeticException in case of overflow. This is acceptable because the previous code would have returned an overflowed (negative) position. See jflex-de/jflex#605 for details.
Fix #536.
When the input is larger than 2GB, the scanner internal state for the number of read characters (aka
zzchar
) is negative. This obviously doesn't make sense.The test added in #603 reproduces the problem and throws an exception, when the
zzchar
is negative.The problem is caused by
yychar
being anint
, hence subject to overflow.Fix #558 by making
yychar
long. This is similar to #558 by @sarowe with a lighter approach for testing.Note that some problems remain:
yyline
andyycolumn
are still integers.zzRefill()
will throw aNegativeArraySizeException
like this