Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String functions UTF-8 issue #4109

Closed
DigiBC opened this issue Aug 25, 2024 · 1 comment · Fixed by #4111
Closed

String functions UTF-8 issue #4109

DigiBC opened this issue Aug 25, 2024 · 1 comment · Fixed by #4111
Labels

Comments

@DigiBC
Copy link

DigiBC commented Aug 25, 2024

Description

The string functions only seem to work well with ASCII characters but not with Unicode characters (UTF-8).
Unicode characters are counted twice or three times, which leads to incorrect calculations.

Steps to reproduce

The issue is reproducible with the official package 2.2.5 (and earlier) and with the Rolling Release 2.3.x (tested with build 4980bc0).

Here are some examples using Liquidsoap's interactive interpreter:

ASCII (working correctly)
string.length("e");; - : int = 1
string.length("o");; - : int = 1
string.length("~");; - : int = 1

UTF-8
string.length("é");; - : int = 2
string.length("ö");; - : int = 2
string.length("€");; - : int = 3

Here is a more practical example of extracting the name of a song as substring.
If the band name is in ASCII characters, the result is correct:
string.sub("Queensryche - Silent lucidity (live)", start=14, length=15);; - : string = "Silent lucidity"

In the original notation of the band name, which contains an UTF-8 special character, the result is shifted by one character:
string.sub("Queensrÿche - Silent lucidity (live)", start=14, length=15);; - : string = " Silent lucidit"

Expected behavior

Each UTF-8 character should only be counted as one character so that calculations with string functions generate correct results.

Liquidsoap version

Liquidsoap 2.3.0+git@4980bc075
Copyright (c) 2003-2024 Savonet team
Liquidsoap is open-source software, released under GNU General Public License.
See <http://liquidsoap.info> for more information.

Liquidsoap build config

* Liquidsoap version  : 2.3.0+git@4980bc075

 * Compilation options
   - Release build       : false
   - Git SHA             : 4980bc075
   - OCaml version       : 4.14.2
   - OS type             : Unix
   - Libs versions       : alsa=0.3.0 angstrom=0.16.0 ao=0.2.4 asetmap=0.8.1 asn1-combinators=0.2.6 astring=0.8.5 base=v0.16.3 base.base_internalhash_types=v0.16.3 base.caml=v0.16.3 base.shadow_stdlib=v0.16.3 base64=3.5.1 bigarray=[distributed with Ocaml] bigarray-compat=1.1.0 bigstringaf=0.9.1 bjack=0.1.6 bos=0.2.1 bytes=[distributed with OCaml 4.02 or above] ca-certs=v0.2.3 camlp-streams camomile.lib=2.0 cohttp=5.3.1 cohttp-lwt=5.3.0 cohttp-lwt-unix=5.3.0 conduit=6.2.3 conduit-lwt=6.2.3 conduit-lwt-unix=6.2.3 cry=1.0.3 cstruct=6.2.0 ctypes=0.22.0 ctypes-foreign=0.22.0 ctypes.stubs=0.22.0 curl=0.9.2 domain-name=0.4.0 domain_shims dssi=0.1.5 dtools=0.4.5 dune-build-info=3.16.0 dune-private-libs.dune-section=3.16.0 dune-site=3.16.0 dune-site.private=3.16.0 duppy=0.9.4 eqaf=0.9 eqaf.bigstring=0.9 eqaf.cstruct=0.9 faad=0.5.2 fdkaac=0.3.3 ffmpeg-av=1.2.0 ffmpeg-avcodec=1.2.0 ffmpeg-avdevice=1.2.0 ffmpeg-avfilter=1.2.0 ffmpeg-avutil=1.2.0 ffmpeg-swresample=1.2.0 ffmpeg-swscale=1.2.0 fileutils=0.6.4 flac=0.5.1 flac.decoder=0.5.1 flac.ogg=0.5.1 fmt=0.9.0 fpath=0.7.3 frei0r=0.1.2 gd=1.1 gen=1.1 gmap=0.3.0 hkdf=1.0.4 inotify=2.0-62-g5e58536 integers ipaddr=5.6.0 ipaddr-sexp=5.6.0 ipaddr.unix=5.6.0 irc-client irc-client-unix ladspa=0.2.2 lame=0.3.7 lastfm=0.3.4 lilv=0.2.0 liquidsoap-lang=2.3.0 liquidsoap-lang.console=2.3.0 liquidsoap_alsa=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ao=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_bjack=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_builtins=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_core=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_dssi=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_faad=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_fdkaac=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ffmpeg=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_flac=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_frei0r=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_gd=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_irc=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ladspa=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_lame=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_lastfm=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_lilv=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_lo=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_mad=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ogg=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ogg_flac=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_optionals=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_opus=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_osc=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_oss=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_portaudio=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_posix_time=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_prometheus=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_pulseaudio=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_runtime=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_samplerate=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_sdl=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_sdl_log_level=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_shine=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_soundtouch=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_speex=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_sqlite=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_srt=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_ssl=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_stdlib=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_stereotool=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_theora=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_tls=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_vorbis=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_xmlplaylist=rolling-release-v2.3.x-2-g4980bc0 liquidsoap_yaml=rolling-release-v2.3.x-2-g4980bc0 lo=0.2.0 logs=0.7.0 logs.fmt=0.7.0 logs.lwt=0.7.0 lwt=5.7.0 lwt.unix=5.7.0 macaddr=5.6.0 mad=0.5.3 magic-mime=1.3.1 mem_usage=0.1.1 memtrace=0.2.3 menhirLib=20231231 metadata=0.3.0 mirage-crypto=0.11.3 mirage-crypto-ec=0.11.3 mirage-crypto-pk=0.11.3 mirage-crypto-rng=0.11.3 mirage-crypto-rng.unix=0.11.3 mm=0.8.5 mm.audio=0.8.5 mm.base=0.8.5 mm.image=0.8.5 mm.midi=0.8.5 mm.video=0.8.5 ocplib-endian ocplib-endian.bigstring ogg=0.7.4 ogg.decoder=0.7.4 opus=0.2.3 opus.decoder=0.2.3 osc osc-unix parsexp=v0.16.0 pbkdf portaudio=0.2.3 posix-base=5a7f328 posix-socket=5a7f328 posix-socket.constants=5a7f328 posix-socket.stubs=5a7f328 posix-socket.types=5a7f328 posix-time2=5a7f328 posix-time2.constants=5a7f328 posix-time2.stubs=5a7f328 posix-time2.types=5a7f328 posix-types=5a7f328 posix-types.constants=5a7f328 ppx_compare.runtime-lib=v0.16.0 ppx_hash.runtime-lib=v0.16.0 ppx_sexp_conv.runtime-lib=v0.16.0 prometheus=1.2 prometheus-app=1.2 ptime=1.1.0 ptime.clock.os=1.1.0 pulseaudio=0.1.6 re=1.11.0 result=1.5 rresult=0.7.0 samplerate=0.1.7 saturn_lockfree=0.4.1 sedlex=3.2 seq=[distributed with OCaml 4.07 or above] sexplib=v0.16.0 sexplib0=v0.16.0 shine=0.2.3 soundtouch=0.1.9 speex=0.4.2 speex.decoder=0.4.2 sqlite3=5.1.0 srt=0.3.1 srt.constants=0.3.1 srt.stubs=0.3.1 srt.stubs.locked=0.3.1 srt.types=0.3.1 ssl=0.7.0 stdlib-shims=0.3.0 stereotool=rolling-release-v2.3.x-2-g4980bc0 str=[distributed with Ocaml] stringext=1.6.0 theora=0.4.1 theora.decoder=0.4.1 threads=[distributed with Ocaml] threads.posix=[internal] tls=0.17.4 tsdl=v1.0.0 tsdl-image=0.5 tsdl-ttf=0.6 unix=[distributed with Ocaml] unix-errno=52c6ecb unix-errno.errno_bindings=52c6ecb unix-errno.errno_types=52c6ecb unix-errno.errno_types_detected=52c6ecb unix-errno.unix=52c6ecb uri=4.4.0 uri-sexp=4.4.0 uri.services=4.4.0 vorbis=0.8.1 vorbis.decoder=0.8.1 x509=0.16.5 xmlm=1.4.0 xmlplaylist=0.1.5 yaml=3.2.0 yaml.bindings=3.2.0 yaml.bindings.types=3.2.0 yaml.c=3.2.0 yaml.ffi=3.2.0 yaml.types=3.2.0 zarith=1.13
   - architecture        : amd64
   - host                : x86_64-pc-linux-gnu
   - target              : x86_64-pc-linux-gnu
   - system              : linux
   - ocamlopt_cflags     : -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC
   - native_c_compiler   : gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -D_FILE_OFFSET_BITS=64
   - native_c_libraries  : -lm

 * Configured paths
   - mode              : posix
   - standard library  : /usr/share/liquidsoap/libs
   - scripted binaries : /usr/share/liquidsoap/bin
   - rundir            : /var/run/liquidsoap
   - logdir            : /var/log/liquidsoap
   - user cache        : $HOME/.cache/liquidsoap (override with $LIQ_CACHE_USER_DIR)
   - system cache      : /var/cache/liquidsoap (override with $LIQ_CACHE_SYSTEM_DIR)
   - camomile files    : /usr/share/liquidsoap/camomile

 * Supported input formats
   - MP3               : yes
   - AAC               : yes
   - Ffmpeg            : yes
   - Flac (native)     : yes
   - Flac (ogg)        : yes
   - Opus              : yes
   - Speex             : yes
   - Theora            : yes
   - Vorbis            : yes
   - WAV/AIFF          : yes (native)

 * Supported output formats
   - FDK-AAC           : yes
   - FFmpeg            : yes
   - MP3               : yes
   - MP3 (fixed-point) : yes
   - Flac (native)     : yes
   - Flac (ogg)        : yes
   - Opus              : yes
   - Speex             : yes
   - Theora            : yes
   - Vorbis            : yes
   - WAV/AIFF          : yes (native)

 * Tags
   - AAC               : yes
   - FFmpeg            : yes
   - FLAC (native)     : yes
   - Flac (ogg)        : yes
   - Native decoder    : yes
   - Vorbis            : yes

 * Input / output
   - ALSA              : yes
   - AO                : yes
   - FFmpeg            : yes
   - JACK              : yes
   - OSS               : yes
   - Portaudio         : yes
   - Pulseaudio        : yes
   - SRT               : yes

 * Audio manipulation
   - FFmpeg            : yes
   - LADSPA            : yes
   - Lilv              : yes
   - Samplerate        : yes
   - SoundTouch        : yes
   - StereoTool        : yes

 * Video manipulation
   - camlimages        : no (requires camlimages)
   - FFmpeg            : yes
   - frei0r            : yes
   - ImageLib          : no (requires imagelib)
   - SDL               : yes

 * MIDI manipulation
   - DSSI              : yes

 * Visualization
   - GD                : yes
   - Graphics          : no (requires graphics)
   - SDL               : yes

 * Additional libraries
   - FFmpeg filters    : yes
   - FFmpeg devices    : yes
   - inotify           : yes
   - irc               : yes
   - jemalloc          : no (requires jemalloc)
   - lastfm            : yes
   - lo                : yes
   - memtrace          : no (requires memtrace)
   - osc               : yes
   - ssl               : yes
   - sqlite3           : yes
   - tls               : yes
   - posix-time2       : yes
   - windows service   : no (requires winsvc)
   - YAML support      : yes
   - XML playlists     : yes

 * Monitoring
   - Prometheus        : yes

Installation method

From official packages in the release artifacts

Additional Info

Tested with Debian 12 and Ubuntu 24.04 (AMD64).

@DigiBC DigiBC added the bug label Aug 25, 2024
toots added a commit that referenced this issue Aug 25, 2024
  `"utf8"`
* Add `string.chars` with encoding.
* Fix default string escaping to properly fallback to `"ascii"` when
  utf8 escaping failed.

Fixes: #4109
@toots
Copy link
Member

toots commented Aug 25, 2024

Thanks, that was a great suggestion! #4111 should take care of it.

toots added a commit that referenced this issue Aug 25, 2024
  `"utf8"`
* Add `string.chars` with encoding.
* Fix default string escaping to properly fallback to `"ascii"` when
  utf8 escaping failed.

Fixes: #4109
toots added a commit that referenced this issue Aug 25, 2024
  `"utf8"`
* Add `string.chars` with encoding.
* Fix default string escaping to properly fallback to `"ascii"` when
  utf8 escaping failed.

Fixes: #4109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants