Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_string() return token_type::parse_error; when parse ansi file #812

Closed
sdhongjun opened this issue Nov 1, 2017 · 21 comments
Closed
Labels
platform: visual studio related to MSVC state: needs more info the author of the issue needs to provide more details

Comments

@sdhongjun
Copy link

With VS2015 IDE when I run below demo code to read json file, scan_string function return token_type::parse_error at line 2186.
{
ofstream out_json("C:\test.json");

    json jsDefault = json();
    jsDefault["name"] = "默认";
    jsDefault["param"] = json();
    json jsArray = json::array({ jsDefault });

    json jsObj = json();
    jsObj["select"] = "默认";
    jsObj["items"] = jsArray;

    out_json << std::setw(4) << jsObj;

    out_json.close();

    ifstream in_json("C:\\test.json");
    json jsNewObj = json();
    in_json >> jsNewObj;
}
@nlohmann nlohmann added the platform: visual studio related to MSVC label Nov 1, 2017
@nlohmann
Copy link
Owner

nlohmann commented Nov 1, 2017

Could you please try the develop version and paste the full exception message? On MSVC, we had several issues regarding the encoding of the strings: the library only supports UTF-8 encoded strings.

@sdhongjun
Copy link
Author

sdhongjun commented Nov 2, 2017

@nlohmann I'm already using the develop branch, the scan_string function parse error with return token_type::parse_error.

{
    if (JSON_UNLIKELY(not next_byte_in_range({0x80, 0xBF})))
    {
        return token_type::parse_error;//line 2186
    }
    break;
}

When I comment line from 2153 to 2266 and modify default branch to below it works fine. By the way are you have utf-8 convert function.

default:
{
    add(current);
    break;
    /* error_message = "invalid string: ill-formed UTF-8 byte";
    return token_type::parse_error;*/
}

@gregmarr
Copy link
Contributor

gregmarr commented Nov 2, 2017

Looks like you need to convert your non-ascii strings jsDefault["name"] and jsObj["select"] to utf-8 before you put them in the json object.

@sdhongjun
Copy link
Author

@gregmarr Yes! But this project should support this.

@nlohmann
Copy link
Owner

nlohmann commented Nov 2, 2017

The project supports UTF-8. Could you try adding a u8 prefix to your string literals, e.g. jsDefault["name"] = u8"默认";?

@sdhongjun
Copy link
Author

@nlohmann add u8 prefix not work correct.

#include "json.hpp"

#include <fstream>

using namespace std;
using json = nlohmann::json;

int main()
{
    ofstream out_json("C:\\test.json");

    json jsDefault = json();
    jsDefault["name"] = u8"默认";
    jsDefault["param"] = json();
    json jsArray = json::array({ jsDefault });

    json jsObj = json();
    jsObj["select"] = u8"默认";
    jsObj["items"] = jsArray;

    out_json << std::setw(4) << jsObj;

    out_json.close();

    ifstream in_json("C:\\test.json");
    json jsNewObj = json();
    in_json >> jsNewObj;
    string strJson = jsNewObj.dump(4);

    return 0;
}

Output file c:\test.json text.

{
    "items": [
        {
            "name": "默认",
            "param": null
        }
    ],
    "select": "默认"
}

And string strJson = jsNewObj.dump(4); text is:

{
    "items": [
        {
            "name": "榛樿",
            "param": null
        }
    ],
    "select": "榛樿"
}

@nlohmann
Copy link
Owner

Can you please attach your code and the JSON file (best as a ZIP archive) so I can check this myself?

@nlohmann nlohmann added the state: needs more info the author of the issue needs to provide more details label Nov 11, 2017
@sdhongjun
Copy link
Author

Here is VS2015 test project you can try it.
TestJson.zip

@nlohmann
Copy link
Owner

I don't have MSVC. Can you please execute the code on your side and attach the generated JSON file please?

@sdhongjun
Copy link
Author

Check the attach file.

test.zip

@nlohmann
Copy link
Owner

I can parse the file without problems:

#include <iostream>
#include <fstream>
#include "json.hpp"

using json = nlohmann::json;

int main(int argc, char* argv[]) {
    std::ifstream f("test.json");
    json j;
    f >> j;
    std::cout << j << std::endl;
}

@sdhongjun
Copy link
Author

Not error,decode string is not same as src string.

@nlohmann
Copy link
Owner

In #812 (comment) you mentioned a parse error. I cannot reproduce this with the file. I think your input is not UTF-8 encoded. The library only supports UTF-8.

@sdhongjun
Copy link
Author

sdhongjun commented Nov 14, 2017

No u8 prefix will report parse error.After add u8 prefix no error but decoded string not same as source string.

No u8 prefix saved file:
test_no_u8Prefix.zip

Use u8 prefix saved file:
test_use_u8Prefix.zip

@nlohmann
Copy link
Owner

Can you check in the debugger if the string you store in the library is UTF-8 encoded?

@sdhongjun
Copy link
Author

sdhongjun commented Nov 22, 2017

I didn't know how to check it, but i think use outfile<<jsonobj and infile>>jsonobj method with same file could be same string property value.

@nlohmann
Copy link
Owner

The thing is that the library does not check whether stored strings are UTF-8 encoded. That's why, serialization may produce non-compliant JSON text. When such a text is parsed, a parse error occurs reporting the non-valid UTF-8.

@sdhongjun
Copy link
Author

Can you add this feature to reduce discuss about self save file but parse error with invalid UTF-8 char? This will take a lot of your time taocpp json has this feature.

@nlohmann
Copy link
Owner

I won't do re-encoding, but maybe throwing an exception when non-UTF-8 encoded text is dumped is discussed in #838.

@sdhongjun
Copy link
Author

sdhongjun commented Nov 23, 2017

It would be save your time if you implement convert to UTF-8 encode when dump to file. I guess when more user use this library and more same question will be discussed.
I think your UTF-8 parser should be something wrong when I save converted UTF-8 code to file and and UTF-8 BOM but use infile>>json parse not correct property value.

test_utf-8.zip

This is test code on windows.

// TestJson.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include "json.hpp"
#include <fstream>
#include <iostream>
#include <Windows.h>

using namespace std;
using json = nlohmann::json;

// 多字节编码转为UTF8编码  
bool MBToUTF8(vector<char>& pu8, const char* pmb, int mLen)
{
    // convert an MBCS string to widechar   
    int nLen = MultiByteToWideChar(CP_ACP, 0, pmb, mLen, NULL, 0);

    WCHAR* lpszW = NULL;
    try
    {
        lpszW = new WCHAR[nLen];
    }
    catch (bad_alloc &memExp)
    {
        return false;
    }

    int nRtn = MultiByteToWideChar(CP_ACP, 0, pmb, mLen, lpszW, nLen);

    if (nRtn != nLen)
    {
        delete[] lpszW;
        return false;
    }
    // convert an widechar string to utf8  
    int utf8Len = WideCharToMultiByte(CP_UTF8, 0, lpszW, nLen, NULL, 0, NULL, NULL);
    if (utf8Len <= 0)
    {
        return false;
    }
    pu8.resize(utf8Len+1);
    nRtn = WideCharToMultiByte(CP_UTF8, 0, lpszW, nLen, &*pu8.begin(), utf8Len, NULL, NULL);
    pu8[utf8Len] = '\0';
    delete[] lpszW;

    if (nRtn != utf8Len)
    {
        pu8.clear();
        return false;
    }
    return true;
}

int main()
{
    ofstream out_json("C:\\test.json");

    json jsDefault = json();
    jsDefault["name"] = "默认";
    jsDefault["param"] = json();
    json jsArray = json::array({ jsDefault });

    json jsObj = json();
    jsObj["select"] = "默认";
    jsObj["items"] = jsArray;

    std::string strDump = jsObj.dump(4);
    vector<char> utf8Char;
    MBToUTF8(utf8Char, strDump.c_str(), strDump.size());
    out_json << (byte)0xEF << (byte)0xBB << (byte)0xBF;
    out_json << utf8Char.data();
    out_json.close();

    ifstream in_json("C:\\test.json");
    json jsNewObj = json();
    in_json >> jsNewObj;
    string strJson = jsNewObj.dump(4);

    return 0;
}

@nlohmann
Copy link
Owner

This seems to be MSVC-specific code.

@nlohmann nlohmann closed this as completed Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform: visual studio related to MSVC state: needs more info the author of the issue needs to provide more details
Projects
None yet
Development

No branches or pull requests

3 participants