Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marked UTF-8 as the that SHOULD be used #1645

Merged
merged 4 commits into from
Apr 24, 2021
Merged

Marked UTF-8 as the that SHOULD be used #1645

merged 4 commits into from
Apr 24, 2021

Conversation

iherman
Copy link
Member

@iherman iherman commented Apr 22, 2021

This is a PR implemented the proposed solution outlined in #1628 (comment) for issue #1628.

Merging this PR must be sanctioned by a WG resolution.

Fix #1628.


Preview | Diff

@iherman iherman requested review from mattgarrish and xfq April 22, 2021 11:33
@iherman
Copy link
Member Author

iherman commented Apr 23, 2021

The issue was discussed in a meeting on 2021-04-23

List of resolutions:

View the transcript

1.1. CSS files encoding

See github issue #1628.

See github pull request #1645.

Dave Cramer: first, encoding of CSS files
… we say CSS files must be encoded in UTF8 or UTF16
… web is leaning towards UTF8 for everything
… i18n WG suggests we use UTF8 for everything
… but some existing epubs might already use UTF16
… proposal would deprecate UTF16 CSS files
… issue is the epubcheck would be obligated to display warning if it found UTF16 CSS
… not sure how big of a thing this is
… CSS WG has recommended UTF16 in past
… i'm okay with deprecating UTF16
… the proposed phrasing is in the issue

Ivan Herman: not sure how much books currently use UTF16, but epubcheck having to warn about this means that some changes will have to be made to epubcheck
… but this isn't too big an obstacle
… there is already a PR
… ready to go
… it keeps the current sentence but adds "but UTF16 is deprecated"
… also, the exact same thing is happening in the spec for XML char encoding
… so the PR makes the conforming change to XML too

Matt Garrish: i'm not sure how much epubcheck even looks at CSS
… we've talked about deprecating UTF16 before, but we opted to keep it at the time for backwards compatibility reasons

Brady Duga: are there enough positive benefits to go ahead with this?
… also, deprecation feels like it should be a statement of intent (i.e. that we will remove in future), but the truth is that we intend to keep this compatibility forever

Dave Cramer: we're trying to work with i18n WG

Brady Duga: is it deprecated in CSS?

Dave Cramer: they say you SHOULD use UTF8

Brady Duga: i would say "you should use UTF8", but i'm not sure about deprecating UTF16

Ivan Herman: i suspect there are tools and scripts which simply do not work with UTF16
… so we minimize the chance of people trying to use UTF16 CSS epubs with these tools

Brady Duga: i'd be surprised if there are a lot of tools out there that don't support both encodings

Matt Garrish: we are trying to align with the web, and what the other specs are saying
… i.e. guiding people towards only using UTF8

Dave Cramer: i understand the concerns with deprecation
… we could say CSS must be either UTF8 or UTF16, but it should be UTF8

Matt Garrish: that puts us more or less back in the same place, there's going to be an epubcheck warning

Dave Cramer: and then people also have to go look up the definition of "deprecated", so maybe not saying that is more straightforward

Ivan Herman: for sake of argument, Rust for example, which is gaining popularity, only uses UTF8

Proposed resolution: Change the PR from 'deprecated' to 'should' and then merge (Ivan Herman)

Charles LaPierre: Brackets is also only UTF8, complains it can't read UTF16 files.

Ivan Herman: but yes, let's change the PR

Matt Garrish: +1

Wendy Reid: +1

Ivan Herman: +1

Charles LaPierre: +1

Matthew Chan: +1

Dave Cramer: +1

Deborah Kaplan: +1

Ben Schroeter: +1

Masakazu Kitahara: 0

Brady Duga: +1

Toshiaki Koike: +1

Gregorio Pellegrino: 0

Resolution #1: Change the PR from 'deprecated' to 'should' and then merge

@iherman iherman mentioned this pull request Apr 23, 2021
@iherman iherman changed the title Marked UTF-16 as deprecated Marked UTF-8 as the that SHOULD be used Apr 23, 2021
@iherman
Copy link
Member Author

iherman commented Apr 23, 2021

@mattgarrish can you check whether the change is fine with the WG resolution?

@xfq following the WG resolution the text has now an extra "UTF-8 SHOULD be used". The reason that it is a SHOULD and not a MUST is due to our backward compatibility obligation. Are you o.k. merging this and, thereby, closing the original issue?

@mattgarrish
Copy link
Member

Although I wonder if it would make sense to simplify the statements to:

It SHOULD be encoded in UTF-8 but MAY be encoded in UTF-16.

We lose the "must" but maybe this is a more readable way of phrasing what we want?

@iherman
Copy link
Member Author

iherman commented Apr 23, 2021

Although I wonder if it would make sense to simplify the statements to:

It SHOULD be encoded in UTF-8 but MAY be encoded in UTF-16.

We lose the "must" but maybe this is a more readable way of phrasing what we want?

Yes, but that phrasing would allow the author to use UTF-1234... Ie, it is more readable, but relies on the knowledge that there are only these two encodings of UTF...

@mattgarrish
Copy link
Member

but that phrasing would allow the author to use UTF-1234...

I wouldn't think so. It's effectively you should use this one but may use the other. Binary choice.

But if that's not clear enough we could make that specific with "No other encodings are allowed." on the end.

It's the "should" that makes the current formulation seem awkward, as it clashes with the must. An alternative might be: "It MUST be encoded in either UTF-8 or UTF-16, with UTF-8 the RECOMMENDED encoding.

@iherman
Copy link
Member Author

iherman commented Apr 24, 2021

An alternative might be: "It MUST be encoded in either UTF-8 or UTF-16, with UTF-8 the RECOMMENDED encoding.

Yes! That seems to be perfect. I have updated the document.

Copy link
Member

@xfq xfq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@mattgarrish mattgarrish merged commit 70ae53c into main Apr 24, 2021
@mattgarrish mattgarrish deleted the remove-utf16 branch April 24, 2021 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoding of CSS files
3 participants