Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if json file contain Internationalization chars , get exception #1122

Closed
tan00 opened this issue Jun 6, 2018 · 5 comments
Closed

if json file contain Internationalization chars , get exception #1122

tan00 opened this issue Jun 6, 2018 · 5 comments
Labels
kind: question solution: proposed fix a fix for the issue has been proposed and waits for confirmation

Comments

@tan00
Copy link

tan00 commented Jun 6, 2018

	json j;
	j["name"] = "中文";

	std::cout << j.dump() << std::endl;
	std::ofstream ofs;
	ofs.open("test.json");
	ofs << j;
	ofs.close();

	std::ifstream ifs("test.json");
	if (!ifs.is_open())
	{
		std::cout << "open error" << std::endl;
		return -1;
	}
	json j2;
	j2 << ifs;
	std::cout << j2.dump() << std::endl;

exception at j2 << ifs;

@nlohmann
Copy link
Owner

nlohmann commented Jun 6, 2018

Please make sure that your code is compiled with UTF-8, assuming you use MSVC.

@OvermindDL1
Copy link

OvermindDL1 commented Jun 6, 2018

j["name"] = "中文";

For note, this is not entirely valid in source code. The C++ spec only ensures that the ascii range is viable source code, anything beyond that is up to the compiler (and though most compilers will support it, you have to add the proper argument to most compilers to tell it to).

In general, keep to the ascii range in C++ source, keep UTF-8 and so forth out in other files that load up.

@tan00
Copy link
Author

tan00 commented Jun 7, 2018

Tested under Linux, everything is fine with utf-8.

Under win10 MSVS, my system uses GBK encoding.
Whether the source code is utf8 or not, the resulting json file is the system encoding.

Converting the json file to utf8 format by hand , reading does not throw an exception, but the out data is encoded with utf8. It cannot be displayed correctly in the terminal.
need a conversion

std::string Utf8ToAnsi(const char *pSrc, int nLen)
{
	if (nLen == 0)
	{
		return "";
	}

	//计算需要的Unicode字符长度
	int nCount = MultiByteToWideChar(CP_UTF8, 0, pSrc, nLen, NULL, 0);
	std::vector<WCHAR> vecTmp;
	vecTmp.resize(nCount);

	//将UTF-8转为Unicode
	MultiByteToWideChar(CP_UTF8, 0, pSrc, nLen, &vecTmp[0], nCount);

	//计算需要的Ansi字符长度
	nCount = WideCharToMultiByte(CP_ACP, 0, &vecTmp[0], vecTmp.size(), NULL, 0, 0, 0);;
	std::vector<char> vecAnsi;
	vecAnsi.resize(nCount);

	//将Unicode转为Ansi
	WideCharToMultiByte(CP_ACP, 0, &vecTmp[0], vecTmp.size(), &vecAnsi[0], nCount, 0, 0);

	//反馈结果
	std::string strRet;
	strRet.append(&vecAnsi[0], nCount);
	char *pChar = &vecAnsi[0];
	return strRet;
}

@tan00 tan00 closed this as completed Jun 7, 2018
@OvermindDL1
Copy link

Tested under Linux, everything is fine with utf-8.

Nothing to do with the OS, everything to do with the compiler and the options for it.
Even then the only reliable source code is source code kept within the standard ASCII range as that is the only one guaranteed by the Standard.

As for the terminal, sounds like a new terminal is needed, windows does not have great methods for detecting terminal capabilities unlike linux, but even then that is all user-code stuff, not json. ^.^

@nlohmann nlohmann added kind: question solution: proposed fix a fix for the issue has been proposed and waits for confirmation labels Jun 7, 2018
@weili0677
Copy link

Tested under Linux, everything is fine with utf-8.

Under win10 MSVS, my system uses GBK encoding. Whether the source code is utf8 or not, the resulting json file is the system encoding.

Converting the json file to utf8 format by hand , reading does not throw an exception, but the out data is encoded with utf8. It cannot be displayed correctly in the terminal. need a conversion

std::string Utf8ToAnsi(const char *pSrc, int nLen)
{
	if (nLen == 0)
	{
		return "";
	}

	//计算需要的Unicode字符长度
	int nCount = MultiByteToWideChar(CP_UTF8, 0, pSrc, nLen, NULL, 0);
	std::vector<WCHAR> vecTmp;
	vecTmp.resize(nCount);

	//将UTF-8转为Unicode
	MultiByteToWideChar(CP_UTF8, 0, pSrc, nLen, &vecTmp[0], nCount);

	//计算需要的Ansi字符长度
	nCount = WideCharToMultiByte(CP_ACP, 0, &vecTmp[0], vecTmp.size(), NULL, 0, 0, 0);;
	std::vector<char> vecAnsi;
	vecAnsi.resize(nCount);

	//将Unicode转为Ansi
	WideCharToMultiByte(CP_ACP, 0, &vecTmp[0], vecTmp.size(), &vecAnsi[0], nCount, 0, 0);

	//反馈结果
	std::string strRet;
	strRet.append(&vecAnsi[0], nCount);
	char *pChar = &vecAnsi[0];
	return strRet;
}

这个临时解决了我的问题^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: question solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

No branches or pull requests

4 participants