Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework C++ Runtime Interval, IntervalSet, ATN and ATNState #1930

Merged
merged 11 commits into from
Oct 29, 2017

Conversation

jm-mikkelsen
Copy link
Contributor

  • Remove the readonly status from IntervalSet.
  • Remove virtual functions from IntervalSet and Interval. These are
    passed by value throughout the C++ runtime; meaningful inheritance is
    not possible anyway.
  • Moving the atomic flag into ATNState as a "now cached" flag.
  • Return a const reference from ATN::nextStates(ATNState*) so the readonly
    status is enforced by the compiler not at runtime in the code.
  • Use value semantics using std::move to reduce the number of copies performed,
    constent with how these classes are used in the C++ runtime source.
  • Remove type-unsafe varargs constructor in IntervalSet, replace with
    type-safe varadic templates implementation.

- Remove the readonly status from IntervalSet.
- Remove virtual functions from IntervalSet and Interval. These are
  passed by value throughout the C++ runtime; meaningful inheritance is
  not possible anyway.
- Moving the atomic flag into ATNState as a "now cached" flag.
- Return a const reference from ATN::nextStates(ATNState*) so the readonly
  status is enforced by the compiler not at runtime in the code.
- Use value semantics using std::move to reduce the number of copies performed,
  constent with how these classes are used in the C++ runtime source.
- Remove type-unsafe varargs constructor in IntervalSet, replace with
  type-safe varadic templates implementation.
@jm-mikkelsen
Copy link
Contributor Author

The Xcode build failure seems to be a timeout ...

@ksedgwic
Copy link

I ran my test program at https://github.com/ksedgwic/antlr-cpp-threading-example using this branch (21a53c2). I'm still seeing crashes when the program is multi-threaded

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f80af7b4327 in std::equal_to<unsigned long>::operator() (this=0x7f800c002848, __x=@0x7f8096c8d218: 36, 
    __y=@0x8: <error reading variable>) at /usr/include/c++/5.3.1/bits/stl_function.h:357
357	      { return __x == __y; }
[Current thread is 1 (Thread 0x7f8096c8e700 (LWP 17586))]
(gdb) bt
#0  0x00007f80af7b4327 in std::equal_to<unsigned long>::operator() (this=0x7f800c002848, __x=@0x7f8096c8d218: 36, 
    __y=@0x8: <error reading variable>) at /usr/include/c++/5.3.1/bits/stl_function.h:357
#1  0x00007f80af7f576e in std::__detail::_Equal_helper<unsigned long, std::pair<unsigned long const, antlr4::dfa::DFAState*>, std::__detail::_Select1st, std::equal_to<unsigned long>, unsigned long, false>::_S_equals (__eq=..., __extract=..., __k=@0x7f8096c8d218: 36, __n=0x0)
    at /usr/include/c++/5.3.1/bits/hashtable_policy.h:1333
#2  0x00007f80af7f3922 in std::__detail::_Hashtable_base<unsigned long, std::pair<unsigned long const, antlr4::dfa::DFAState*>, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Hashtable_traits<false, false, true> >::_M_equals (this=0x7f800c002848, __k=@0x7f8096c8d218: 36, __c=36, __n=0x0)
    at /usr/include/c++/5.3.1/bits/hashtable_policy.h:1704
#3  0x00007f80af7f18d7 in std::_Hashtable<unsigned long, std::pair<unsigned long const, antlr4::dfa::DFAState*>, std::allocator<std::pair<unsigned long const, antlr4::dfa::DFAState*> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node (this=0x7f800c002848, __n=0, __k=@0x7f8096c8d218: 36, __code=36)
    at /usr/include/c++/5.3.1/bits/hashtable.h:1433
#4  0x00007f80af7ef78a in std::_Hashtable<unsigned long, std::pair<unsigned long const, antlr4::dfa::DFAState*>, std::allocator<std::pair<unsigned long const, antlr4::dfa::DFAState*> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_node (this=0x7f800c002848, __bkt=0, __key=@0x7f8096c8d218: 36, __c=36) at /usr/include/c++/5.3.1/bits/hashtable.h:632
#5  0x00007f80af7ee02c in std::_Hashtable<unsigned long, std::pair<unsigned long const, antlr4::dfa::DFAState*>, std::allocator<std::pair<unsigned long const, antlr4::dfa::DFAState*> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::find (this=0x7f800c002848, __k=@0x7f8096c8d218: 36) at /usr/include/c++/5.3.1/bits/hashtable.h:1307
#6  0x00007f80af7ec857 in std::unordered_map<unsigned long, antlr4::dfa::DFAState*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, antlr4::dfa::DFAState*> > >::find (this=0x7f800c002848, __x=@0x7f8096c8d218: 36)
    at /usr/include/c++/5.3.1/bits/unordered_map.h:615
#7  0x00007f80af7e65cc in antlr4::atn::ParserATNSimulator::getExistingTargetState (this=0x7f806c00a4d0, previousD=0x7f800c002830, t=36)
    at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:261
#8  0x00007f80af7e5d32 in antlr4::atn::ParserATNSimulator::execATN (this=0x7f806c00a4d0, dfa=..., s0=0x7f800c002830, input=0x7f8096c8db30, 
    startIndex=0, outerContext=0x7f806c017d60) at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:162
#9  0x00007f80af7e5beb in antlr4::atn::ParserATNSimulator::adaptivePredict (this=0x7f806c00a4d0, input=0x7f8096c8db30, decision=3, 
    outerContext=0x7f806c017d60) at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:140
#10 0x00000000004212fc in PredicateExpressionParser::binary_op_expr (this=0x7f8096c8da30) at GEN/PredicateExpressionParser.cpp:526
#11 0x00000000004205c5 in PredicateExpressionParser::expr (this=0x7f8096c8da30, precedence=0) at GEN/PredicateExpressionParser.cpp:361
#12 0x000000000041f274 in PredicateExpressionParser::start (this=0x7f8096c8da30) at GEN/PredicateExpressionParser.cpp:84
#13 0x000000000043ba72 in create_parse_tree () at example.cpp:71
#14 0x000000000043bbdf in worker (argp=0x0) at example.cpp:85
#15 0x00007f80af32661a in start_thread (arg=0x7f8096c8e700) at pthread_create.c:334
#16 0x00007f80ae7c55fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f9965468bec in __gnu_cxx::__normal_iterator<std::shared_ptr<antlr4::atn::ATNConfig>*, std::vector<std::shared_ptr<antlr4::atn::ATNConfig>, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > > >::__normal_iterator (this=0x7f9953137f80, 
    __i=@0x8: <error reading variable>) at /usr/include/c++/5.3.1/bits/stl_iterator.h:741
741	      : _M_current(__i) { }
[Current thread is 1 (Thread 0x7f9953139700 (LWP 17676))]
(gdb) bt
#0  0x00007f9965468bec in __gnu_cxx::__normal_iterator<std::shared_ptr<antlr4::atn::ATNConfig>*, std::vector<std::shared_ptr<antlr4::atn::ATNConfig>, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > > >::__normal_iterator (this=0x7f9953137f80, 
    __i=@0x8: <error reading variable>) at /usr/include/c++/5.3.1/bits/stl_iterator.h:741
#1  0x00007f9965468a69 in std::vector<std::shared_ptr<antlr4::atn::ATNConfig>, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >::begin (this=0x8) at /usr/include/c++/5.3.1/bits/stl_vector.h:548
#2  0x00007f9965485203 in antlr4::atn::ParserATNSimulator::computeReachSet (this=0x7f99240150f0, closure_=0x0, t=36, fullCtx=false)
    at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:455
#3  0x00007f9965484680 in antlr4::atn::ParserATNSimulator::computeTargetState (this=0x7f99240150f0, dfa=..., previousD=0x7f987c002830, t=36)
    at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:271
#4  0x00007f9965483d66 in antlr4::atn::ParserATNSimulator::execATN (this=0x7f99240150f0, dfa=..., s0=0x7f987c002830, input=0x7f9953138b30, 
    startIndex=0, outerContext=0x7f9924015af0) at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:164
#5  0x00007f9965483beb in antlr4::atn::ParserATNSimulator::adaptivePredict (this=0x7f99240150f0, input=0x7f9953138b30, decision=3, 
    outerContext=0x7f9924015af0) at /usr/local/src/antlr4/runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp:140
#6  0x00000000004212fc in PredicateExpressionParser::binary_op_expr (this=0x7f9953138a30) at GEN/PredicateExpressionParser.cpp:526
#7  0x00000000004205c5 in PredicateExpressionParser::expr (this=0x7f9953138a30, precedence=0) at GEN/PredicateExpressionParser.cpp:361
#8  0x000000000041f274 in PredicateExpressionParser::start (this=0x7f9953138a30) at GEN/PredicateExpressionParser.cpp:84
#9  0x000000000043ba72 in create_parse_tree () at example.cpp:71
#10 0x000000000043bbdf in worker (argp=0x0) at example.cpp:85
#11 0x00007f9964fc461a in start_thread (arg=0x7f9953139700) at pthread_create.c:334
#12 0x00007f99644635fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@jm-mikkelsen
Copy link
Contributor Author

@ksedgwic Thanks for testing and for the stack trace. This bug is this sequence of events: locking to acquire an iterator, dropping the lock, and then dereferencing the iterator. In the mean time the iterator has been invalidated. I'll add a patch to this pull request. (On looking, there is another case in the code where the same pattern occurs; fixing that one too.)

Iterators on an unordered_map were being dereferenced after dropping a
read lock, leading to races where iterators could be invalidated before
they were used.
@jm-mikkelsen
Copy link
Contributor Author

@ksedgwic I believe the latest commit in this pull request should solve the crash you are seeing.

@@ -6,6 +6,7 @@
#pragma once

#include "misc/IntervalSet.h"
#include <atomic>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System headers should go in antlr4-common.h. I love to have them all together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change.

private:
/// Used to cache lookahead during parsing, not used during construction.

misc::IntervalSet nextTokenWithinRule;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: private vars use a leading underscore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change.

@mike-lischke
Copy link
Member

For some reasons I left my review pending and had to finish it now.

std::cout << std::string("reuse state ") << s->stateNumber << std::string(" edge to ") << iterator->second->stateNumber << std::endl;
}
if (iterator != s->edges.end()) {
std::cout << std::string("reuse state ") << s->stateNumber << std::string(" edge to ") << iterator->second->stateNumber << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that's still code from the automatic conversion. We don't need to create std::string here explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again -- The string conversion is from the existing code. I added the iterator check to avoid dereferencing a bad iterator in the debug code. That's a bugfix (if only in an error message).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment wasn't specifically for your patch. More loudly thinking... Would be nice if things like that get fixed while someone comes across them.

Copy link
Member

@mike-lischke mike-lischke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the read-only flag might not work as you expect and in fact I have a bad feeling about this change. Interval sets can be made read only dynamically. That cannot be enforced by the compiler. Let's not deviate so much from the Java code, even if that seems like a nice beautification. Intervals are stored in many places and it needs really careful checks to ensure changes don't break anything.

@@ -26,59 +26,56 @@ namespace misc {
ssize_t b;

Interval();
explicit Interval(size_t a_, size_t b_); // For unsigned -> signed mappings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that one in my first review. Why did you remove the explicit keyword? I certainly don't want automatic conversion between size_t and ssize_t! Compilers behave differently here. Please put that back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to put it back. However: I don't understand what you mean about size_t and ssize_t conversions. My understanding of the explicit keyword is that it will prevent:

IntervalSize s = { 1, 2 };

from compiling, and require:

IntervalSize s(1, 2);

instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntervalSet of course ...

Copy link
Member

@mike-lischke mike-lischke Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit keyword marks a c-tor so that no automatic type conversion is applied. So when the compiler looks up a c-tor with e.g. int values that explicit c-tor is not considered a valid match. This way we can ensure that unsigned values only go with the explicit c-tor, while signed ones use the other c-tor (which is important because of the internal handling). I wish we could avoid using negative values. From the pure semantic of that class there cannot be a negative interval, but such values denote invalid intervals and are used when sorting intervals in an interval set (which is why we cannot use something like (size_t)-1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit keyword's type conversion refers to conversion to the target class through a converting constructor, and whether that conversion can be implicit or must be explicit. I'm not sure how you are using this in the code. For example:

explicit IntervalSet(size_t a, size_t b);

will prevent both:

IntervalSet s = { 1, 2 };
IntervalSet s = { -1, -2 }

and will allow both:

IntervalSet s(1, 2);
IntervalSet s(-1, -2);

What is the form you want to disallow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising. From my tests I found that the explicit constructor will be used for IntervalSet s(1, 2).

Anyway, the idea is simple: use the c-tor for size_t only with unsigned values and the one with ssize_t for signed ones. That means usually that the one for unsigned values is used (since most values are unsigned) and only in specific cases take the signed version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we haven't finished this one. Are you sure we don't need the explicit keyword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't do what you think, re: signed vs. unsigned comparisons, however, adding it back in because it doesn't hurt.

}

IntervalSet::IntervalSet(const std::vector<Interval> &intervals) : IntervalSet() {
_intervals = intervals;
IntervalSet::IntervalSet(const IntervalSet &set) : _intervals(set._intervals) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the changes to the c-tors. I took care to use delegating c-tors everywhere and want to stay with them, instead of repeating init of internal values over and over again. Please revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist. However: This a class with a single member that's being constructed empty and then assigned to in a constructor body. I don't see a repeated init here. What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this one is so simple, but I'd like to keep that pattern throughout the runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls change this back also? As I said already, I like to stay with delegating c-tors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

for (int i = 0; i < n; i++) {
add(va_arg(vlist, int));
}
IntervalSet::IntervalSet(std::vector<Interval>&& intervals) : _intervals(std::move(intervals)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned a replacement of the variadic variant with template params, but that's not it. So where is the new code? Also using move here is not a good idea. What if you wanna pass in a constant? Better use const& for the parameter (if we stay with this variant).

Actually, the new code changes the semantic significantly. The variadic variant uses single value intervals (in fact it actually uses single elements, which are later converted to single element intervals). Your new code allows for any interval list (a case we already handled in a different c-tor). This change is not improving anything IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The varadic code starts on line 42 of IntervalSet.h, but I think you found it from your later comment.

The constructor that takes a vector rvalue reference is a private constructor used internally. If we want to use a constant here it will be local to this class and can be an explicit decision.

Removing type-unsafe va_args code certainly seems like an improvement to me: The old code is likely to fail outright, especially on machines where sizeof(int) != sizeof(ssize_t). The new code is type-safe, provides compile-time diagnostics, and allows single and multi-value intervals. For the same arguments I don't see a semantic change (excluding failures because of va_args usage problems).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I didn't ask for the variadic code, but the templated one (which I don't see in your patch). However, as you say, the new code is a private c-tor, which is not that obvious from the patch. I wonder, though, why you could remove the variadic code. Is it not used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the varadic code with a templated version that is type safe and provides the same interface. There are two changes that are being mixed here because they're close by in the diff:

  • The va_arg code was replaced with a templated version in the header file. The old va_arg code was removed. The interface is the same and any code using the old va_args based interface will now use the templated version (including type safety).
  • At the same place in the file an internal constructor was placed to move in a vector of intervals.

They are distinct changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault. I looked at the wrong places. The templated code is just fine (even though a bit less readable IMO). But type safety is a good thing.

@jm-mikkelsen
Copy link
Contributor Author

On the read-only flag: IntervalSets seem to copied by value throughout the code base, with const references from the cached values. The actual read only flag is only set in the ATN::nextState method, which now returns a const reference. If anything else uses the readOnly flag it will now fail to compile, making the problem obvious.

Am I missing something?

@mike-lischke
Copy link
Member

Ok, I checked again about the read-only flag and it seems you are right. I confused that with the ATNConfig read-only state. For IntervalSet there is only this single place where it's set to read-only (nextTokens()).

@jm-mikkelsen
Copy link
Contributor Author

I have a pending patch that also removes read only from ATNConfig. I spent a while making sure it was OK, but that was a while ago now.

@jm-mikkelsen
Copy link
Contributor Author

@mike-lischke Changes made as requested; sorry about the delays, had to do other work!

@jm-mikkelsen
Copy link
Contributor Author

@mike-lischke I meant to commit the whitespace and mode changes to my pending branch, did it here by accident. They remove the byte order mark from checked in files (eg. they're useless on UTF8) and removes the execute bit from source file mode flags. A side-effect of using Perforce for our local copy and moving changes around.

@mike-lischke
Copy link
Member

@jm-mikkelsen no, please remove [b106f8e]. That's a big useless patch. Next time these files are saved the BOMs are re-added. Don't let your SCMS manage styles (BOM, line endings and the like).

@jm-mikkelsen
Copy link
Contributor Author

@mike-lischke But isn't that the point of "charset = utf-8" in .editorconfig? (Although I agree it hasn't been set up for .h and .cpp, but I suspect that's just an oversight.) From the editorconfig.org front page:

charset: set to latin1, utf-8, utf-8-bom, utf-16be or utf-16le to control the character set. Use of utf-8-bom is discouraged.

@mike-lischke
Copy link
Member

mike-lischke commented Aug 24, 2017

BOMs are really not an issue. Why fixing something that is not broken? We cannot forsee what editor settings are used and I really don't want to be forced to check this everytime I use a new editor (or when I get a patch). But regardless of that, the BOM removal doesn't belong to this patch and must be reverted.

@parrt
Copy link
Member

parrt commented Oct 27, 2017

@mike-lischke Where do we stand on this PR?

@mike-lischke
Copy link
Member

I asked @jm-mikkelsen to revert patch b106f8e. This one changes almost 300 files only because of BOM clean up (+ some file mod changes). I recommend not to merge until this is removed.

@jm-mikkelsen
Copy link
Contributor Author

@parrt @mike-lischke Sorry about the absence, had project deadlines and then was travelling for 6 weeks. BOM changes reverted, merged changes from head. Back to normal now, so can respond to changes quickly again.

Copy link
Member

@mike-lischke mike-lischke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few small things left. I think we are close now.

}

IntervalSet::IntervalSet(const std::vector<Interval> &intervals) : IntervalSet() {
_intervals = intervals;
IntervalSet::IntervalSet(const IntervalSet &set) : _intervals(set._intervals) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls change this back also? As I said already, I like to stay with delegating c-tors.

@@ -26,59 +26,56 @@ namespace misc {
ssize_t b;

Interval();
explicit Interval(size_t a_, size_t b_); // For unsigned -> signed mappings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we haven't finished this one. Are you sure we don't need the explicit keyword?


private:
void InitializeInstanceFields();
void addItems() { /* No-op */ }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems strange. Can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is necessary for the variadic template constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, to end the iteration. OK.

@parrt
Copy link
Member

parrt commented Oct 29, 2017

Hmm...all of the copyright headers are showing diff so looks like huge PR. can you undo that part?

@mike-lischke
Copy link
Member

@parrt The headers have been taken back already. I see only 10 files changed in the patch.

@parrt
Copy link
Member

parrt commented Oct 29, 2017

@mike-lischke ok, ping me when this is ready to merge.

- Add "explicit" to Interval(size_t, size_t) constructor.
- Change an IntervalSet constructor to delegate part of the construction

- Add "explicit" to Interval(size_t, size_t) constructor.
- Change an IntervalSet constructor to delegate part of the construction
@jm-mikkelsen
Copy link
Contributor Author

@mike-lischke OK, changes made!


private:
void InitializeInstanceFields();
void addItems() { /* No-op */ }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, to end the iteration. OK.

@mike-lischke
Copy link
Member

@parrt I think we are good here.

@parrt parrt added this to the 4.7.1 milestone Oct 29, 2017
@parrt parrt merged commit d8996c7 into antlr:master Oct 29, 2017
@sharwell
Copy link
Member

sharwell commented Oct 30, 2017

@jm-mikkelsen The byte order marks are not unnecessary, and serve a very valuable purpose. In a world where the default encoding for everyone is UTF-8, it is unnecessary, but we are nowhere close to that. If the byte order mark is omitted, eventually someone will open up a file and make an edit in an encoding other than UTF-8, leading to eventual misinterpretation of the text content and loss of information. The place I see this most often is in the names of people from European countries, but I've seen it happen in other cases as well.

Despite the wording provided by EditorConfig for the charset property, it should in fact read that use of utf-8 is discouraged, as it will inevitably result in corrupted source files. The set of editors which incorrectly handle utf-8-bom is much smaller than the set of editors which incorrectly handle utf-8.

@jm-mikkelsen
Copy link
Contributor Author

@sharwell I agree that byte order marks serve a purpose, but that purpose is limited to detecting the difference between UTF-16LE and UTF-16BE. UTF-8 byte order marks just get in the way; if you want byte order marks, add them during the conversion to UTF-16. Perforce, for example, will do this.

I don't see how you get the "UTF-8 is discouraged" reading from the EditorConfig text, and I don't see how UTF-8 leads to corruption. The editor count seems wrong too: If an editor doesn't handle UTF-8, how does it handle a UTF-8 BOM?

A quick check of my antlr4 source tree shows 243 out of 1918 files have byte order marks. 205 of these are C++ source code (.cpp/.h). None of these are Java source code. My reading is that byte order marks have been incorrectly added for languages which are not listed in .editorconfig, with the majority of these in the C++ runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants