Use charset from Content-Type header #22769

jaymode · 2017-01-24T15:31:16Z

In #22691 (comment), I added a comment which points out that our code currently ignores the charset parameter of the Content-Type header and that this is something we should look into. Looking at the javadocs of JsonFactory to see how different charsets are handled:

Encoding is auto-detected from contents according to JSON
specification recommended mechanism. Json specification
supports only UTF-8, UTF-16 and UTF-32 as valid encodings,
so auto-detection implemented only for this charsets.
For other charsets use {@link #createParser(java.io.Reader)}.

Unfortunately not all clients adhere to the unicode only encodings as I have seen some send data as ISO-8859-1. I think we should consider parsing the charset from the content-type when available and handling appropriately (failing if we cannot support, convert, create parser differently etc.).

The text was updated successfully, but these errors were encountered:

jaymode · 2017-01-27T13:54:47Z

Discussed in Fix it Friday, the plan forward is to:

Add deprecation logging for 5.x with non unicode encoding with a strict mode defaulted to off
Enable a strict mode in 6.0 by default and deprecate non-strict mode
Strict only for 7.0

Ultimately this plan allows us to provide notice to users, hopefully learn the encodings that are being used other than unicode, and allow users a way to continue working while we decide if there is anything we can do about the other encodings.

Pyppe · 2017-10-20T08:42:20Z

Not only it ignores the charset, but if we e.g. use header of Content-Type: application/x-ndjson; charset=UTF-8 for the bulk-API we also get a misleading deprecation warning:

[WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header.

It took me a while to realize the warning was caused by the additional ; charset=UTF-8 suffix...

nik9000 · 2017-10-20T14:56:36Z

@Pyppe can you open an issue for the deprecation warning with the ; charset UTF-8 thing? That seems like a thing we should discuss and I think it deserves its own issue.

pgomulka · 2021-10-13T12:15:31Z

we might want to combine this with #72969
So the idea would be that before the request routing in RestController we would validate the parameters

We already have a way to declare allowed parameters for given media type, but nothing is validated.

elasticsearch/libs/x-content/src/main/java/org/elasticsearch/xcontent/XContentType.java

Line 154 in 20c9f75

Map.of(COMPATIBLE_WITH_PARAMETER_NAME, VERSION_PATTERN)),

jaymode added :Core/Infra/REST API REST infrastructure and utilities discuss labels Jan 24, 2017

jaymode mentioned this issue Jan 24, 2017

Optionally require a valid content type for all rest requests with content #22691

Merged

jaymode self-assigned this Jan 27, 2017

jaymode removed the discuss label Jan 27, 2017

tlrx mentioned this issue Aug 31, 2017

Forbid direct usage of ContentType.create() methods #26457

Merged

Pyppe mentioned this issue Oct 20, 2017

Deprecation warning when having charset in the request Content-Type #27065

Closed

DaveCTurner mentioned this issue Nov 7, 2017

Add ability to parse Content-Type from content type contains charset #27301

Closed

jaymode mentioned this issue Jan 9, 2018

406 response when POSTing to _bulk with a charset specified #28123

Closed

jaymode removed their assignment Feb 8, 2018

colings86 added the >enhancement label Apr 24, 2018

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

jaymode mentioned this issue Sep 2, 2020

Allow parsing Content-Type and Accept headers with version #61427

Merged

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

jaymode added help wanted adoptme and removed needs:triage Requires assignment of a team area label labels Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use charset from Content-Type header #22769

Use charset from Content-Type header #22769

jaymode commented Jan 24, 2017

jaymode commented Jan 27, 2017

Pyppe commented Oct 20, 2017

nik9000 commented Oct 20, 2017

pgomulka commented Oct 13, 2021

Use charset from Content-Type header #22769

Use charset from Content-Type header #22769

Comments

jaymode commented Jan 24, 2017

jaymode commented Jan 27, 2017

Pyppe commented Oct 20, 2017

nik9000 commented Oct 20, 2017

pgomulka commented Oct 13, 2021