You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This seems to play ok-ish with regular expressions: Question about Unicode ranges on StackOverflow.
We have to be careful here, though!
Regular expressions should be encoded as UTF-32 on Linux (or any other platform which stores wchar with 4 bytes), and decomposed in multiple character points in UTF-16 on Windows (or any platform with 2 bytes for wchar).
We think about using nlohmann/json to parse JSON as of today (2023-06-28).
It decodes strings to UTF-8 by default so we have to be careful with std::wstring's:
C++, being an old language, has many ways to deal with Unicode character points. How should we deal with Unicode in this library?
It seems that the easiest solution is to use
std::wstring
to represent all the strings: Question on StackOverflow regarding wstring and string.Each character takes either 2 or 4 bytes (depending on the platform; 2 on Windows, 4 on Linux).
Regular Expressions
This seems to play ok-ish with regular expressions: Question about Unicode ranges on StackOverflow.
We have to be careful here, though!
Regular expressions should be encoded as UTF-32 on Linux (or any other platform which stores
wchar
with 4 bytes), and decomposed in multiple character points in UTF-16 on Windows (or any platform with 2 bytes forwchar
).Using UTF-8 might be more trouble with
std::regex
: Question about UTF-8 and std::regex on StackOverflowJSON
We think about using
nlohmann/json
to parse JSON as of today (2023-06-28).It decodes strings to UTF-8 by default so we have to be careful with
std::wstring
's:to_json
nlohmann/json#1592Alternatively, if we ever want to use [RapidJSON], we also have to be equally careful and stick to UTF-8: Question about RapidJSON and Unicode on StackOverflow
XML
Given its light weight, we are thinking about using PugiXML to parse XML.
PugiXML seems to support Unicode well: Section in the PugiXML docs about Unicode.
The text was updated successfully, but these errors were encountered: