Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--embed-file fails if source path contains non-ascii characters or if destination filename contains non-ascii characters #19229

Open
juj opened this issue Apr 21, 2023 · 1 comment

Comments

@juj
Copy link
Collaborator

juj commented Apr 21, 2023

STR:

In Emscripten root directory,

echo foo > test_äö.txt
emcc test/hello_world.c --embed-file test_äö.txt

results in failure

E:\code\emsdk\emscripten\emäscript>emcc test/hello_world.c --embed-file test_äö.txt
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:3:39: error: Expected ,, instead got: Σ
.section .rodata.__em_file_data__test_Σ÷_txt,"",@
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:6:22: error: Unexpected token in operand: Σ
__em_file_data__test_Σ÷_txt_name:
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:7:8: error: data directive must occur in a data segment: "/test_\0o30o300o303\0o20o240o244\0o30o300o303\0o20o260o266.txt"
.asciz "/test_\0o30o300o303\0o20o240o244\0o30o300o303\0o20o260o266.txt"
       ^
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:8:28: error: Expected ,, instead got: Σ
.size __em_file_data__test_Σ÷_txt_name, 63
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:11:22: error: Unexpected token in operand: Σ
__em_file_data__test_Σ÷_txt:
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:12:9: error: Could not find incbin file 'test_Σ÷.txt'
.incbin "test_Σ÷.txt"
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:13:28: error: Expected ,, instead got: Σ
.size __em_file_data__test_Σ÷_txt, 6
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:38:28: error: unexpected token
.dc.a __em_file_data__test_Σ÷_txt_name
C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s:42:28: error: unexpected token
.dc.a __em_file_data__test_Σ÷_txt
emcc: error: 'E:/code/emsdk/llvm/git/build_main_vs2019_64/Release/bin\clang++.exe -target wasm32-unknown-emscripten -c --target=wasm32-unknown-emscripten C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s -o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.o' failed (returned 1)
file_packager: error: 'E:\code\emsdk\emscripten\emäscript\emcc.bat -c --target=wasm32-unknown-emscripten -o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.s' failed (returned 1)
emcc: error: 'E:\code\emsdk\emscripten\emäscript\tools\file_packager.bat a.out.data --from-emcc --embed test_äö.txt --obj-output=C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_50sg8plp\embedded_files.o' failed (returned 1)

The above issue cannot be worked around by restricting the destination name to consist of only ascii characters E.g.

STR:

In e.g. Emscripten root directory,

echo foo > test_äö.txt
emcc test/hello_world.c --embed-file test_äö.txt@/test.txt

fails with a different error message:

C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.s:12:9: error: Could not find incbin file 'test_Σ÷.txt'
.incbin "test_Σ÷.txt"
emcc: error: 'E:/code/emsdk/llvm/git/build_main_vs2019_64/Release/bin\clang++.exe -target wasm32-unknown-emscripten -c --target=wasm32-unknown-emscripten C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.s -o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.o' failed (returned 1)
file_packager: error: 'E:\code\emsdk\emscripten\emäscript\emcc.bat -c --target=wasm32-unknown-emscripten -o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.o C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.s' failed (returned 1)
emcc: error: 'E:\code\emsdk\emscripten\emäscript\tools\file_packager.bat a.out.data --from-emcc --embed test_äö.txt@/test.txt --obj-output=C:\Users\jukkaj\AppData\Local\Temp\emscripten_temp_5mpj6c9s\embedded_files.o' failed (returned 1)
@juj
Copy link
Collaborator Author

juj commented Apr 21, 2023

There is an issue that embedded_files.s gets generated with encoding 'Western (Windows 1252)'. Changing

diff --git a/tools/file_packager.py b/tools/file_packager.py
index 9ee8e4d2e..a828c8b88 100755
--- a/tools/file_packager.py
+++ b/tools/file_packager.py
@@ -265,7 +265,7 @@ def generate_object_file(data_files):
   for f in embed_files:
     f.c_symbol_name = '__em_file_data_%s' % to_c_symbol(f.dstpath, used)

-  with open(asm_file, 'w') as out:
+  with open(asm_file, 'w', encoding='UTF-8') as out:
     out.write('# Emscripten embedded file data, generated by tools/file_packager.py\n')

     for f in embed_files:

does fix the encoding:

image

however it does not actually affect the execution of LLVM itself, and the error message remains the same. It looks like LLVM might be having an issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant