Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: DocumentInputStream does not handle surrogate pairs correctly #1069

Merged
merged 2 commits into from
Aug 20, 2024

Conversation

sebthom
Copy link
Member

@sebthom sebthom commented Aug 19, 2024

This PR replaces the current DocumentInputStream implementation which does not handle high/low surrogate character pairs.

@sebthom sebthom force-pushed the docinputstream branch 3 times, most recently from 70598c7 to 2a657d9 Compare August 19, 2024 20:57
this(chars::charAt, chars::length, charset, bufferSize);
}

public CharsInputStream(final CharsSupplier chars, final IntSupplier charsLength) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused, should we remove it?

@sebthom sebthom force-pushed the docinputstream branch 2 times, most recently from 795c624 to de41da2 Compare August 20, 2024 10:37
}

try {
charBuffer.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not do the charBuffer.clear rather at the end of the method in a finally block so that we do not hold data that will never be used until the next call to refill?

Copy link
Member Author

@sebthom sebthom Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the charBuffer contains a fixed size char[] array. I don't see any advantage clearing the buffer at the end. I prefer clearing the buffer just before I start refilling and using it.

@rubenporras
Copy link
Contributor

@sebthom , I do not have more comments.

@sebthom
Copy link
Member Author

sebthom commented Aug 20, 2024

@sebthom , I do not have more comments.

@rubenporras oh man, you are thorough!

@sebthom sebthom merged commit be075c5 into eclipse:main Aug 20, 2024
6 checks passed
@sebthom sebthom deleted the docinputstream branch August 20, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants