Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use charset from Content-Type header #22769

Open
jaymode opened this issue Jan 24, 2017 · 4 comments
Open

Use charset from Content-Type header #22769

jaymode opened this issue Jan 24, 2017 · 4 comments
Labels
:Core/Infra/REST API REST infrastructure and utilities >enhancement help wanted adoptme Team:Core/Infra Meta label for core/infra team

Comments

@jaymode
Copy link
Member

jaymode commented Jan 24, 2017

In #22691 (comment), I added a comment which points out that our code currently ignores the charset parameter of the Content-Type header and that this is something we should look into. Looking at the javadocs of JsonFactory to see how different charsets are handled:

Encoding is auto-detected from contents according to JSON
specification recommended mechanism. Json specification
supports only UTF-8, UTF-16 and UTF-32 as valid encodings,
so auto-detection implemented only for this charsets.
For other charsets use {@link #createParser(java.io.Reader)}.

Unfortunately not all clients adhere to the unicode only encodings as I have seen some send data as ISO-8859-1. I think we should consider parsing the charset from the content-type when available and handling appropriately (failing if we cannot support, convert, create parser differently etc.).

@jaymode
Copy link
Member Author

jaymode commented Jan 27, 2017

Discussed in Fix it Friday, the plan forward is to:

  1. Add deprecation logging for 5.x with non unicode encoding with a strict mode defaulted to off
  2. Enable a strict mode in 6.0 by default and deprecate non-strict mode
  3. Strict only for 7.0

Ultimately this plan allows us to provide notice to users, hopefully learn the encodings that are being used other than unicode, and allow users a way to continue working while we decide if there is anything we can do about the other encodings.

@Pyppe
Copy link

Pyppe commented Oct 20, 2017

Not only it ignores the charset, but if we e.g. use header of Content-Type: application/x-ndjson; charset=UTF-8 for the bulk-API we also get a misleading deprecation warning:

[WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header.

It took me a while to realize the warning was caused by the additional ; charset=UTF-8 suffix...

@nik9000
Copy link
Member

nik9000 commented Oct 20, 2017

@Pyppe can you open an issue for the deprecation warning with the ; charset UTF-8 thing? That seems like a thing we should discuss and I think it deserves its own issue.

@pgomulka
Copy link
Contributor

we might want to combine this with #72969
So the idea would be that before the request routing in RestController we would validate the parameters

We already have a way to declare allowed parameters for given media type, but nothing is validated.

Map.of(COMPATIBLE_WITH_PARAMETER_NAME, VERSION_PATTERN)),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/REST API REST infrastructure and utilities >enhancement help wanted adoptme Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

6 participants