-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode command line to UTF8 on Windows #5671
Conversation
This is essentially the orthogonal piece to #5627 (i.e. it doesn't depend on std::filesystem, which may end up being problematic) and I think it actually should accomplish the same goal as #5514 . It's not exactly done but as I'm OOO next week I thought I'd upload it. |
This is slightly cleaned up. For some reason there still seems to be an issue with the source map paths. i.e. I tried testing by running tests in a directory with an emoji (so all paths would have it) and the input and output paths seem fine, but source maps are failing to open for some reason I don't understand yet. But it does show one possibility that doesn't require std::filesystem or use of std::wstring anywhere other than right at the point of opening files. One downside is that it does require explicitly doing the conversion everywhere we open a file, so it's easy to make a mistake. I think the only way to avoid that is use an abstraction like std::filesystem (or reinventing the path part of std::filesystem). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on the windows stuff but overall this looks reasonable and non-intrusive.
@brson maybe you can help here, I'm a bit stumped. Currently this patch looks like it should work, and it does when opening input files, but opening output files (either for e.g. wasm conversion output like in the lit tests here, or for source map output) fails. And not with a "file not found" error, but with an "invalid argument" error (you can see this in the test output on this PR). Does it have the same behavior on your machine? Do you have any ideas about what the issue might be? |
I'll get back to this next week and give this patch a test then. |
@dschuff I have tested this patch and made a fix: This seems to resolve the issue of the |
5b8b9f4
to
7cd1a54
Compare
Codecov Report
@@ Coverage Diff @@
## main #5671 +/- ##
==========================================
+ Coverage 42.63% 42.65% +0.01%
==========================================
Files 484 484
Lines 74872 74850 -22
Branches 11938 11925 -13
==========================================
+ Hits 31919 31924 +5
+ Misses 39748 39725 -23
+ Partials 3205 3201 -4
|
Here's a patch for the current windows build error: The error is:
This is because the signature of
is asking for a Changing the signature to
returning a string/wstring, works for all callers and fixes the build error on windows, still builds on linux. I tested that wasm-opt on windows still works with unicode paths, but have not run the test suite. |
e5c08bb
to
9fdb5df
Compare
That is basically what I had in 9fe2f9b (now rebased to 9fdb5df so you can see the current CI). The problem is that it doesn't work on mingw (which has GNU libstdc++ on top of windows)... libstdc++ doesn't have wchar versions of functions like |
(CI note: the linter really wants "shellapi.h" to be included before "windows.h" but my understanding is that windows.h should be included first, so I'm leaving it). |
Oh, and a note on testing: The current test case doesn't cover all the cases in the code. One easy way to test this comprehensively would be to create the build/install directory on windows such that the directory path (and therefore all file paths in use) contains an emoji. Do you think that would be worth it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert on the windows side, but those details look good in general, and overall this seems nonintrusive and helpful.
Testing here seems sufficient for now to me.
@brson can you confirm this fixes things for you?
This PR changes how file paths and the command line are handled. On startup on Windows, we process the wstring version of the command line (including the file paths) and re-encode it to UTF8 before handing it off to the rest of the command line handling logic. This means that all paths are stored in UTF8-encoded std::strings as they go through the program, right up until they are used to open files. At that time, they are converted to the appropriate native format with the new to_path function before passing to the stdlib open functions. This has the advantage that all of the non-file-opening code can use a single type to hold paths (which is good since std::filesystem::path has proved problematic in some cases), but has the disadvantage that someone could add new code that forgets to convert to_path before opening. That's somewhat mitigated by the fact that most of the code uses the ModuleIOBase classes for opening files. Fixes WebAssembly#4995
This PR changes how file paths and the command line are handled. On startup on Windows,
we process the wstring version of the command line (including the file paths) and re-encode
it to UTF8 before handing it off to the rest of the command line handling logic. This means
that all paths are stored in UTF8-encoded std::strings as they go through the program, right
up until they are used to open files. At that time, they are converted to the appropriate native
format with the new
to_path
function before passing to the stdlib open functions.This has the advantage that all of the non-file-opening code can use a single type to hold paths
(which is good since std::filesystem::path has proved problematic in some cases), but has the
disadvantage that someone could add new code that forgets to convert
to_path
beforeopening. That's somewhat mitigated by the fact that most of the code uses the
ModuleIOBase
classes for opening files.
Fixes #4995