Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rosetta as default charset conversion library #2701

Closed
wants to merge 1 commit into from
Closed

Conversation

toots
Copy link
Member

@toots toots commented Oct 21, 2022

This PRs expand the native charset conversion support to include conversions from the formats supported by rosetta and by OCaml version >= 4.14.

It also adds tests based on the camomile tests dataset.

The introduction of conf-liquidsoap-charset is because of: ocaml/dune#6284

@toots toots force-pushed the rosetta-support branch 3 times, most recently from 5224df7 to ffd2107 Compare October 21, 2022 18:26
@toots toots requested a review from smimram October 21, 2022 18:26
@toots toots force-pushed the rosetta-support branch 9 times, most recently from 81bd061 to c61c7dd Compare October 21, 2022 22:50
Copy link
Member

@smimram smimram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work, this is much more complete than my first implementation.

Aparat from the questions in the review, I have a general one: why not simply drop camomile support in favor of rosetta instead of keeping both, which requires introducing a new opam package... ?

Comment on lines +73 to +74
Printf.printf "%s\n%!"
[%string "Testing conversion from %{enc_str} to UTF8"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the point of adding a dependency on ppx_string which is not standard here instead of simply using printf:

Suggested change
Printf.printf "%s\n%!"
[%string "Testing conversion from %{enc_str} to UTF8"];
Printf.printf "Testing conversion from %s to UTF8\n%!" enc_str

It makes the code longer and less readable...

(of course this comment also applies on other prints in the file...)

Comment on lines +27 to +28
(camomile -> charset_impl.camomile.ml)
(rosetta -> charset_impl.rosetta.ml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the "native" implementation? I agree that we should ask for rosetta or camomil in the opam file, but we could still keep the native implementation around to still be able to compile with dune if we have a problem with rosetta, or want to have a small binary etc?

Suggested change
(camomile -> charset_impl.camomile.ml)
(rosetta -> charset_impl.rosetta.ml)
(camomile -> charset_impl.camomile.ml)
(rosetta -> charset_impl.rosetta.ml)
(-> charset_impl.native.ml)

@toots
Copy link
Member Author

toots commented Oct 23, 2022

Thanks for the review. I'm not sure that I want to go this route now.. 😅

Moving forward, we want to make sure that everything is encoded with the de-facto UTF-8 standard. However, and particularly for a project like us, the ability to decode and encode in outdated character encoding is particularly important.

In the past, we've had reports about issues with legacy software. See for instance: #411.

output.shoutcast still to that day sends metadata in latin1 encoding for legacy icy protocol. The original reason was that icy metadata in mp3 streams have historically been in this encoding and many encoders will expect it. Reading more (see: https://stackoverflow.com/questions/31044773/liquidsoap-tag-encoding) it looks like we should default back to UTF-8 now.

However, in this case we do want to give the user a chance to encoding in latin1 if it's still needed.

For ocaml dependencies with external dependencies, it makes sense to make them optional but a package with pure ocaml code should be made mandatory if it's important enough and I believe that our capacity to encode and decode to more than ISO and unicode encodings is.

What we should really consider here is a friendly, community supported fork of camomile with support for ocaml 5 merged into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants