Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding of CSS files #1628

Closed
xfq opened this issue Apr 12, 2021 · 8 comments · Fixed by #1645
Closed

Encoding of CSS files #1628

xfq opened this issue Apr 12, 2021 · 8 comments · Fixed by #1645
Labels
Cat-i18n Grouping label for all internationalization related issues EPUB33 Issues addressed in the EPUB 3.3 revision i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Status-Proposed Solution A proposed solution has been included in the issue for working group review Topic-ContentDocs The issue affects EPUB content documents

Comments

@xfq
Copy link
Member

xfq commented Apr 12, 2021

CSS files in an EPUB package MUST be encoded in UTF-8 or UTF-16. The i18n WG supports the idea of using UTF-8 only. Why is UTF-16 mentioned here?

@xfq xfq added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Apr 12, 2021
@mattgarrish
Copy link
Member

There was a discussion about dropping UTF-16 in #50

It was also reconfirmed for CSS in a telecon discussion (the referenced thread in google groups appears to be lost to time now, though)

The only concern I'd have with making a change is whether we risk breaking existing content. Maybe we should just advise against the use rather than disallow it?

@iherman
Copy link
Member

iherman commented Apr 12, 2021

The only concern I'd have with making a change is whether we risk breaking existing content. Maybe we should just advise against the use rather than disallow it?

I had the same concern...

@mattgarrish mattgarrish added the Topic-ContentDocs The issue affects EPUB content documents label Apr 13, 2021
@iherman iherman added the Cat-i18n Grouping label for all internationalization related issues label Apr 17, 2021
@iherman
Copy link
Member

iherman commented Apr 17, 2021

@xfq @mattgarrish would it be possible to change the text as follows:

from

It MUST be encoded in UTF-8 or UTF-16 [Unicode].

to

It MUST be encoded in UTF-8. Encoding in UTF-16 is deprecated.

@xfq please follow the link to deprecated to see what it means in the context of EPUB 3.3

However, I see UTF-16 mentioned in two more places in the document:

I believe the media type registration must stay, because it is "legal" to use UTF-16 (because it is just deprecated). I would think, however, that the same change should be made on the XML case as for CSS.

WDYT?

cc @dauwhe @wareid @shiestyle

@iherman iherman added the Status-Proposed Solution A proposed solution has been included in the issue for working group review label Apr 17, 2021
@dauwhe dauwhe added the Agenda+ Issues that should be discussed during the next working group call. label Apr 21, 2021
@mattgarrish
Copy link
Member

It MUST be encoded in UTF-8. Encoding in UTF-16 is deprecated.

Isn't this a bit of a contradiction? If it must be encoded in utf-8, then that negates using utf-16 at all.

This is kind of an oddball case since we're only deprecating half the requirement.

If we go this route, maybe it might make more sense to phrase it along the lines of: "It MUST be encoded in UTF-8 or UTF-16, but the use of UTF-16 is now deprecated."

That would be consistent with not invalidating any utf-16 content that might be out there while recommending it no longer be used.

@iherman
Copy link
Member

iherman commented Apr 22, 2021

If we go this route, maybe it might make more sense to phrase it along the lines of: "It MUST be encoded in UTF-8 or UTF-16, but the use of UTF-16 is now deprecated."

You are right. Let us go this way if we get the blessing of the WG (on the call tomorrow)

@mattgarrish
Copy link
Member

It appears the general web direction is to use utf-8 exclusively, so a broader recommendation against using utf-8 for xml, too, seem in keeping with that trend.

The Encoding spec says this:

Authors must use the UTF-8 encoding and must use the ASCII case-insensitive "utf-8" label to identify it.

New protocols and formats, as well as existing formats deployed in new contexts, must use the UTF-8 encoding exclusively.

UTF-16 is now considered a legacy encoding format.

It also appears that only a fraction of a percent of web content uses it (although that's not necessarily a good barometer of publishing use).

@dauwhe
Copy link
Contributor

dauwhe commented Apr 23, 2021

Just for reference... CSS says stylesheets should be UTF-8

Though UTF-8 is the default encoding for the web, and many newer web-based file formats assume or require UTF-8 encoding, CSS was created before it was clear which encoding would win, and thus can’t automatically assume the stylesheet is UTF-8.

Stylesheet authors should author their stylesheets in UTF-8, and ensure that either an HTTP header (or equivalent method) declares the encoding of the stylesheet to be UTF-8, or that the referring document declares its encoding to be UTF-8. (In HTML, this is done by adding a element to the head of the document.)

@iherman
Copy link
Member

iherman commented Apr 23, 2021

The issue was discussed in a meeting on 2021-04-23

List of resolutions:

View the transcript

1.1. CSS files encoding

See github issue #1628.

See github pull request #1645.

Dave Cramer: first, encoding of CSS files
… we say CSS files must be encoded in UTF8 or UTF16
… web is leaning towards UTF8 for everything
… i18n WG suggests we use UTF8 for everything
… but some existing epubs might already use UTF16
… proposal would deprecate UTF16 CSS files
… issue is the epubcheck would be obligated to display warning if it found UTF16 CSS
… not sure how big of a thing this is
… CSS WG has recommended UTF16 in past
… i'm okay with deprecating UTF16
… the proposed phrasing is in the issue

Ivan Herman: not sure how much books currently use UTF16, but epubcheck having to warn about this means that some changes will have to be made to epubcheck
… but this isn't too big an obstacle
… there is already a PR
… ready to go
… it keeps the current sentence but adds "but UTF16 is deprecated"
… also, the exact same thing is happening in the spec for XML char encoding
… so the PR makes the conforming change to XML too

Matt Garrish: i'm not sure how much epubcheck even looks at CSS
… we've talked about deprecating UTF16 before, but we opted to keep it at the time for backwards compatibility reasons

Brady Duga: are there enough positive benefits to go ahead with this?
… also, deprecation feels like it should be a statement of intent (i.e. that we will remove in future), but the truth is that we intend to keep this compatibility forever

Dave Cramer: we're trying to work with i18n WG

Brady Duga: is it deprecated in CSS?

Dave Cramer: they say you SHOULD use UTF8

Brady Duga: i would say "you should use UTF8", but i'm not sure about deprecating UTF16

Ivan Herman: i suspect there are tools and scripts which simply do not work with UTF16
… so we minimize the chance of people trying to use UTF16 CSS epubs with these tools

Brady Duga: i'd be surprised if there are a lot of tools out there that don't support both encodings

Matt Garrish: we are trying to align with the web, and what the other specs are saying
… i.e. guiding people towards only using UTF8

Dave Cramer: i understand the concerns with deprecation
… we could say CSS must be either UTF8 or UTF16, but it should be UTF8

Matt Garrish: that puts us more or less back in the same place, there's going to be an epubcheck warning

Dave Cramer: and then people also have to go look up the definition of "deprecated", so maybe not saying that is more straightforward

Ivan Herman: for sake of argument, Rust for example, which is gaining popularity, only uses UTF8

Proposed resolution: Change the PR from 'deprecated' to 'should' and then merge (Ivan Herman)

Charles LaPierre: Brackets is also only UTF8, complains it can't read UTF16 files.

Ivan Herman: but yes, let's change the PR

Matt Garrish: +1

Wendy Reid: +1

Ivan Herman: +1

Charles LaPierre: +1

Matthew Chan: +1

Dave Cramer: +1

Deborah Kaplan: +1

Ben Schroeter: +1

Masakazu Kitahara: 0

Brady Duga: +1

Toshiaki Koike: +1

Gregorio Pellegrino: 0

Resolution #1: Change the PR from 'deprecated' to 'should' and then merge

@dauwhe dauwhe removed the Agenda+ Issues that should be discussed during the next working group call. label Apr 28, 2021
@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label May 2, 2021
@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cat-i18n Grouping label for all internationalization related issues EPUB33 Issues addressed in the EPUB 3.3 revision i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Status-Proposed Solution A proposed solution has been included in the issue for working group review Topic-ContentDocs The issue affects EPUB content documents
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants