Some short UTF Strings encoded using non-canonical form #159

Sajjon · 2019-04-22T08:31:59Z

Hey!

I recently discovered that Jackson produces UTF strings on the non-shortest-possible format (non-canonical form ) sometimes.

I first reported this issue CBOR Github but realized that most other CBOR project is using canonical form where Jackson is not.

It would be great if you could support canonical encoding of strings :)

cabo · 2019-04-22T11:34:09Z

Note that this bug report is not about the missing feature of generating "canonical" encoding (which would be a duplicate of #138), but about a set of off-by-one errors in the basic encoding logic which seem to lead to valid, but unnecessarily long encoding for ASCII strings of length 23 and 255. Please see the referenced issue for details. I'm not claiming to have read the entirety of CBORGenerator.java, but it seems that Line 1251 is not the only place that has this off-by-one error (maybe search for lines with ...MAX... that have < instead of <=).

cowtowncoder · 2019-04-22T17:24:02Z

@Sajjon thank you for reporting this.
@cabo thank you for clarification: that does make more sense. I'll update the title.

cowtowncoder · 2019-04-22T17:27:52Z

Added this on

https://github.com/FasterXML/jackson-future-ideas/wiki/Jackson-Work-in-Progress

and hopefully include in 2.10 (depending on fix, could consider 2.9 backport but bit worried that change not safe for patch as 2.9 is quite far along patch series -- while correction, change to encoded data length can probably trigger bogus test fails in downstream deps).

Sajjon · 2019-04-23T07:33:56Z

@cowtowncoder @cabo Great job guys! Thanks a lot for your prompt reply. Looking forward to the fix!

cowtowncoder · 2019-05-11T17:24:46Z

Hoping to get 2.9.9 out Real Soon Now -- just received a potential CVE, which I'll need to address, but after that.

Sajjon · 2019-05-12T19:03:06Z

Great job! 💪

cowtowncoder · 2019-05-13T01:04:25Z

@Sajjon thank you once again for reporting this -- we take correctness and interoperability seriously, so it's good to weed out these deviations at least eventually over time :)

cowtowncoder added 2.9 active cbor labels May 7, 2019

cowtowncoder changed the title ~~UTF Strings on non canonical form~~ Some short UTF Strings encoded using non-canonical form May 7, 2019

cowtowncoder closed this as completed in 4ac24ab May 7, 2019

cowtowncoder removed the active label May 8, 2019

cowtowncoder added this to the 2.9.9 milestone May 8, 2019

cowtowncoder mentioned this issue May 8, 2019

Different Byte strings decodes into same UTF string, is that correct? cbor/cbor.github.io#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some short UTF Strings encoded using non-canonical form #159

Some short UTF Strings encoded using non-canonical form #159

Sajjon commented Apr 22, 2019

cabo commented Apr 22, 2019

cowtowncoder commented Apr 22, 2019

cowtowncoder commented Apr 22, 2019

Sajjon commented Apr 23, 2019

cowtowncoder commented May 11, 2019

Sajjon commented May 12, 2019

cowtowncoder commented May 13, 2019

Some short UTF Strings encoded using non-canonical form #159

Some short UTF Strings encoded using non-canonical form #159

Comments

Sajjon commented Apr 22, 2019

cabo commented Apr 22, 2019

cowtowncoder commented Apr 22, 2019

cowtowncoder commented Apr 22, 2019

Sajjon commented Apr 23, 2019

cowtowncoder commented May 11, 2019

Sajjon commented May 12, 2019

cowtowncoder commented May 13, 2019