Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions with non-ASCII characters in path names #17817

Open
juj opened this issue Sep 8, 2022 · 10 comments
Open

Regressions with non-ASCII characters in path names #17817

juj opened this issue Sep 8, 2022 · 10 comments

Comments

@juj
Copy link
Collaborator

juj commented Sep 8, 2022

It looks like Emscripten has regressed to no longer work if there are difficult characters in the path names - now most recently we have gotten reports over the character ç on Windows, (Unicode U+00E7, UTF-8 encoding C3 A7).

This character is present as 0xE7 encoding in ISO-8859-1 (Latin 1), ISO-8859-2, *-3, *-14, *-15 and *-16 charsets.

That made me wonder what the current status of testing this kind of behavior is? Back long time ago when Mozilla was maintaining the test suites, the test CIs ran inside a path that contained all kinds of difficult characters (spaces, single quotes, dollar, hash signs, upper 8-bit Latin-1 chars, BMP Unicode chars and non-BMP Unicode chars).

Reading the circleci script, that no longer seems to be the case?

I wonder if this kind of hardening should be redeveloped to the circleci scripts? It would be good to have Emsdk, Emscripten compiler, the .emscripten config and build temp directories each reside inside a difficult path name, so they exercise all the different sources of path name difficulties that may arise.

@juj
Copy link
Collaborator Author

juj commented Sep 8, 2022

The most recent regression trace we have received reads as follows.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\threading.py", line 926, in _bootstrap_inner
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\threading.py", line 870, in run
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\subprocess.py", line 1238, in _readerthread
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 62: character maps to <undefined>

Exception in thread Thread-2:
Traceback (most recent call last):
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\threading.py", line 926, in _bootstrap_inner
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\threading.py", line 870, in run
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\subprocess.py", line 1238, in _readerthread
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 62: character maps to <undefined>

Traceback (most recent call last):
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\\emcc.py", line 3982, in <module>
    sys.exit(main(sys.argv))
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\contextlib.py", line 74, in inner
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\\emcc.py", line 3975, in main
    ret = run(args)
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\\emcc.py", line 1168, in run
    phase_calculate_system_libraries(state, linker_arguments, linker_inputs, newargs)
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\contextlib.py", line 74, in inner
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\\emcc.py", line 2697, in phase_calculate_system_libraries
    extra_files_to_link += system_libs.calculate(all_linker_inputs, newargs, forced=state.forced_stdlibs)
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\tools\system_libs.py", line 1874, in calculate
    handle_reverse_deps(input_files)
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\tools\system_libs.py", line 1718, in handle_reverse_deps
    symbolses = building.llvm_nm_multiple([os.path.abspath(t) for t in input_files])
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\contextlib.py", line 74, in inner
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\tools\building.py", line 177, in llvm_nm_multiple
    results = run_process(cmd, stdout=PIPE, stderr=PIPE, check=False)
  File "C:\Program Files\Unity\2022.2.0b7_5377bbb3b245\Data\PlaybackEngines\WebGLSupport\BuildTools\Emscripten\emscripten\tools\shared.py", line 106, in run_process
    ret = subprocess.run(cmd, check=check, input=input, *args, **kw)
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\subprocess.py", line 474, in run
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\subprocess.py", line 939, in communicate
  File "D:\obj\windows-release\37amd64_Release\msi_python\zip_amd64\subprocess.py", line 1288, in _communicate
IndexError: list index out of range

The root issue in this callstack may well be of my own making, since the function llvm_nm_multiple is involved (which I have most recently refactored), and I am trying to reproduce this specific issue locally.

Though when attempting to get to repro by cloning emsdk under a path C:\emsdç, I find a number of other breakages, hence thinking it would be good to add hardening into the CI directly.

@kripken
Copy link
Member

kripken commented Sep 12, 2022

Good find, yeah, looks like we've just not been testing this.

Once this is fixed, sgtm to add testing for this, perhaps by changing the CircleCI script for windows to create and use such a path.

@AndrewAlexArt
Copy link

I've got the same error with Russian major letter И in the COMMENT SECTION. After "//"
Example:
// window.alert("И");
It is thrown an error when I was building Unity web GL build.
damn...

@sbc100
Copy link
Collaborator

sbc100 commented Mar 31, 2023

@AndrewAlexArt can you share more about the error, like maybe the backtrace? I guess this is a JS or C/C++ source file? Do you know what encoding the file is in?

@AndrewAlexArt
Copy link

AndrewAlexArt commented Apr 1, 2023 via email

@sbc100
Copy link
Collaborator

sbc100 commented Apr 2, 2023

Do you know what version of emscripten you are using? The line kw.setdefault('encoding', 'utf-8') was added to run_process in tools/shared.py back in #16736. That change was released in 3.1.14. Can you check if you have that line in your run_process function, or if you are using 3.1.14 or above?

The fact that the error is coming from cp1251.py makes me think that encoding=utf-8 is not being passed to subprocess.run.

@AndrewAlexArt
Copy link

AndrewAlexArt commented Apr 2, 2023 via email

@AndrewAlexArt
Copy link

AndrewAlexArt commented Apr 2, 2023 via email

@sbc100
Copy link
Collaborator

sbc100 commented Apr 2, 2023

If you are using 3.1.8 then I think this issue may already have been fixed in more recent versions of emscripten (3.1.14 and above).

@AndrewAlexArt
Copy link

AndrewAlexArt commented Apr 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants