Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some short UTF Strings encoded using non-canonical form #159

Closed
Sajjon opened this issue Apr 22, 2019 · 7 comments
Closed

Some short UTF Strings encoded using non-canonical form #159

Sajjon opened this issue Apr 22, 2019 · 7 comments
Labels
Milestone

Comments

@Sajjon
Copy link

Sajjon commented Apr 22, 2019

Hey!

I recently discovered that Jackson produces UTF strings on the non-shortest-possible format (non-canonical form ) sometimes.

I first reported this issue CBOR Github but realized that most other CBOR project is using canonical form where Jackson is not.

It would be great if you could support canonical encoding of strings :)

@cabo
Copy link

cabo commented Apr 22, 2019

Note that this bug report is not about the missing feature of generating "canonical" encoding (which would be a duplicate of #138), but about a set of off-by-one errors in the basic encoding logic which seem to lead to valid, but unnecessarily long encoding for ASCII strings of length 23 and 255. Please see the referenced issue for details. I'm not claiming to have read the entirety of CBORGenerator.java, but it seems that Line 1251 is not the only place that has this off-by-one error (maybe search for lines with ...MAX... that have < instead of <=).

@cowtowncoder
Copy link
Member

@Sajjon thank you for reporting this.
@cabo thank you for clarification: that does make more sense. I'll update the title.

@cowtowncoder
Copy link
Member

Added this on

https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress

and hopefully include in 2.10 (depending on fix, could consider 2.9 backport but bit worried that change not safe for patch as 2.9 is quite far along patch series -- while correction, change to encoded data length can probably trigger bogus test fails in downstream deps).

@Sajjon
Copy link
Author

Sajjon commented Apr 23, 2019

@cowtowncoder @cabo Great job guys! Thanks a lot for your prompt reply. Looking forward to the fix!

@cowtowncoder cowtowncoder changed the title UTF Strings on non canonical form Some short UTF Strings encoded using non-canonical form May 7, 2019
@cowtowncoder cowtowncoder added this to the 2.9.9 milestone May 8, 2019
@cowtowncoder
Copy link
Member

Hoping to get 2.9.9 out Real Soon Now -- just received a potential CVE, which I'll need to address, but after that.

@Sajjon
Copy link
Author

Sajjon commented May 12, 2019

Great job! 💪

@cowtowncoder
Copy link
Member

@Sajjon thank you once again for reporting this -- we take correctness and interoperability seriously, so it's good to weed out these deviations at least eventually over time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants