Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intl: new Date().toLocaleString('de') puts unicode (BiDi) markers around punctuation #599

Closed
nojvek opened this issue Mar 17, 2016 · 18 comments

Comments

@nojvek
Copy link

nojvek commented Mar 17, 2016

d = new Date().toLocaleString('de') 
"‎17‎.‎03‎.‎2016‎ ‎14‎:‎21‎:‎50"

// I only want to match the time part
// In chrome (v8)
d.match(/(\d+\:\d+\:\d+)/)[1]
"14:27:28" // returns the correct answer

// in edge (chakra)
d.match(/(\d+\:\d+\:\d+)/)
null

// edge matches \: correctly
d.match(/\:/)
[":",":"]
 //edge matches \d+ correctly as well
d.match(/d+/)
["17", "17"]
//matches \d+\: incorrectly
d.match(/\d+\:/)
null // :(
@nojvek
Copy link
Author

nojvek commented Mar 17, 2016

If you don't mind, I'd like to make it a personal exercise to fix this bug with a PR. So far I've got chakra to build. Its amazing how easy it was. I see the baseline tests and I'll spend some time this weekend to get this fixed and make sure I don't break anything else.

@goyakin goyakin added the Bug label Mar 17, 2016
@goyakin
Copy link

goyakin commented Mar 17, 2016

This isn't a RegExp issue. It looks like we put LEFT-TO-RIGHT MARK (U+200E) around punctuation marks, but Chrome and Firefox don't.

On Edge:

var d = new Date("2016-03-17T21:43:10.645Z").toLocaleString('de');
d === "‎17‎.‎03‎.‎2016‎ ‎14‎:‎43‎:‎10";
d[1] === "1";
d[2] === "7";
d.charCodeAt(3) === 8206;
d[4] === ".";
d.charCodeAt(5) === 8206;
d[6] === "0";

On Chrome:

var d = new Date("2016-03-17T21:43:10.645Z").toLocaleString('de');
d === "‎17‎.‎3‎.‎2016‎ ‎14‎:‎43‎:‎10";
d[0] === "1";
d[1] === "7";
d[2] === ".";
d[3] === "3";

I don't know why we have the "LEFT-TO-RIGHT MARK (U+200E)" around punctuation marks though.

@nojvek
Copy link
Author

nojvek commented Mar 17, 2016

Yeah that's what I just noticed.

d = new Date().toLocaleString();
WScript.Echo(d);
$ ch dateLocale.js
?3?/?17?/?2016? ?2?:?56?:?45? ?PM

I'll change the title of the bug.

@nojvek nojvek changed the title Regex Matching has a bug matching time strings d.match(/(\d+\:\d+\:\d+)/) new Date().toLocaleString('de') puts unicode markers around punctuation Mar 17, 2016
@nojvek
Copy link
Author

nojvek commented Mar 17, 2016

Could this be an issue in the windows GetDateFormatW ?

How do I start ch.exe on a file and break it so I can attach a debugger?

I don't see a debugging page.

// For Windows Vista and above GetDateFormatEx is preferred
WINBASEAPI
int
WINAPI
GetDateFormatW(
    _In_ LCID Locale,
    _In_ DWORD dwFlags,
    _In_opt_ CONST SYSTEMTIME * lpDate,
    _In_opt_ LPCWSTR lpFormat,
    _Out_writes_opt_(cchDate) LPWSTR lpDateStr,
    _In_ int cchDate
    );

@digitalinfinity
Copy link
Contributor

If you're using Visual Studio, you can just set the command line parameters for ch by right clicking on the ch project, and setting the appropriate field under the Debugging Configuration Property. Then, set your breakpoints in VS, hit F5 and have fun.
If you want to use another debugger, you can install the Windows Debugging Tools by following the instructions at https://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx. You can then run windbg.exe ch.exe <test.js> and launch it under that.
Currently, I don't think there is a way to force breaking into the debugger from a script file in ChakraCore- @Yongqu or @curtisman would know for sure. If there isn't a way right now, feel free to file an issue to have this exposed in debug builds.

@nojvek
Copy link
Author

nojvek commented Mar 19, 2016

From what I gather, it seems, some of the logic is in Intl.js. This is probably byte code generated and I don't seem to figure out where the implementation of formatDateTime is.

I got debugging to work. I added a 10 sec sleep so I could attach VS.

WScript.Echo(d.toLocaleString());

The breakpoint doesn't seem to hit GetDateLocaleString as I previously hypothesized.

    DateImplementation::GetDateLocaleString(Js::YMD *pymd, TZD *ptzd, DateTimeFlag noDateTime,ScriptContext* scriptContext)

I'm just beginning to realize how complicated chakra can be.

@Yongqu
Copy link
Contributor

Yongqu commented Mar 21, 2016

you might want to break at JavascriptDate::EntryToLocaleDateString and continue from there.

@abchatra abchatra added this to the 1.2 milestone Mar 24, 2016
@goyakin
Copy link

goyakin commented Mar 29, 2016

According to the ECMAScript spec, the string returned by Date.prototype.toLocaleString needs to be in a human-readable form, but the details are left to the implementation. We insert the BiDi control characters (e.g., LEFT-TO-RIGHT MARK (U+200E)) in order to preserve the order of characters when the string is shown in a UI element.

There is an ongoing discussion on how to align implementations and standardize the use of BiDi control characters in the spec. We've decided to postpone the work on this issue until an agreement is reached as part of the standardization discussion.

@nojvek, if I understand correctly, you're trying to get the localized time string. Would Date.prototype.toLocaleTimeString help you achieve that?

@dilijev
Copy link
Contributor

dilijev commented Nov 16, 2016

@bterlson Any word on the TC39 Consensus aspect of this issue?

@dilijev dilijev changed the title new Date().toLocaleString('de') puts unicode markers around punctuation new Date().toLocaleString('de') puts unicode (BiDi) markers around punctuation Nov 16, 2016
@bterlson
Copy link
Contributor

@dilijev No word, no progress, and it seems unlikely to make progress from my conversations with people.

@dilijev dilijev added this to the Backlog milestone Nov 17, 2016
@dilijev dilijev self-assigned this Mar 29, 2017
@dilijev dilijev changed the title new Date().toLocaleString('de') puts unicode (BiDi) markers around punctuation Intl: new Date().toLocaleString('de') puts unicode (BiDi) markers around punctuation Apr 5, 2017
@dilijev
Copy link
Contributor

dilijev commented May 17, 2017

@BobFrankston
Copy link

I've created a test example to address possibly related issues:

  1. toLocaleString puts a comma after the date in the Nodejs.org version but the Chakra version does not.
  2. if I use dtOptions with timeZone nodejs accepts it and ignore want is doesn't like but Chakra gets an error. For my purpose {hour12:false} is sufficient so I didn't try to fully debug it.
const dtOptions = {
    timeZone: "America/New_York",
    timeZoneName: "short",
    hour12: false,
};

function dx() {
    try {
        console.log(`Version ${JSON.stringify(process.versions)}`);
        const dt = new Date();
        //const dsa = dt.toLocaleString('en-US', dtOptions);
        const dsa = dt.toLocaleString('en-US', {hour12:false});
        console.log(dsa.split('').map(d => d.charCodeAt(0).toString(16)).join(','));
        const ds = dsa.replace(/\u200e/g,"");       // work-around
        console.log(ds.split('').map(d => d.charCodeAt(0).toString(16)).join(','));
        console.log(`Date ${ds}`);
        const dm = ds.match(/^(\d+)\/(\d+)\/(\d+),? ([\d:]+)/);
        const dd = {y: dm[3], m: dm[1], d: dm[2], hms: dm[4]}
        return dd;
    }
    catch(e) {
        console.log(`Error ${e}`);
        debugger;
    }
}

console.log(dx());

@dilijev
Copy link
Contributor

dilijev commented Jun 3, 2017

toLocaleString puts a comma after the date in the Nodejs.org version but the Chakra version does not.

I see that there's a comma between the date and the time. Also allowed per spec, but the difference is noted.

> eshost -is test.js
## Source
const dtOptionsChakraBug = {
    timeZone: "America/New_York",
    timeZoneName: "short",
    hour12: false,
};
const dtOptionsSimple = {
    hour12: false,
};

function dx(dtOptions) {
    try {
        // print(`Version ${JSON.stringify(process.versions)}`);
        const dt = new Date();
        const dsa = dt.toLocaleString('en-US', dtOptions);
        print(dsa.split('').map(d => d.charCodeAt(0).toString(16)).join(','));
        const ds = dsa.replace(/\u200e/g,"");       // work-around
        print(ds.split('').map(d => d.charCodeAt(0).toString(16)).join(','));
        print(`Date ${ds}`);
        const dm = ds.match(/^(\d+)\/(\d+)\/(\d+),? ([\d:]+)/);
        const dd = {y: dm[3], m: dm[1], d: dm[2], hms: dm[4]}
        return dd;
    }
    catch(e) {
        print(`Error ${e}`);
        return null;
    }
}

print(JSON.stringify(dx(dtOptionsSimple)));
// print('\ndtOptionsChakraBug');
// print(JSON.stringify(dx(dtOptionsChakraBug)));



#### d8, sm
36,2f,32,2f,32,30,31,37,2c,20,31,37,3a,32,38,3a,32,37
36,2f,32,2f,32,30,31,37,2c,20,31,37,3a,32,38,3a,32,37
Date 6/2/2017, 17:28:27
{"y":"2017","m":"6","d":"2","hms":"17:28:27"}

#### ch-1.2.3, ch-1.3.2, ch-1.5.0, ch-1.4.4, ch-dev-rel, ch-master, ch-dev-dbg
200e,36,200e,2f,200e,32,200e,2f,200e,32,30,31,37,200e,20,200e,31,37,200e,3a,200e,32,38,200e,3a,200e,32,37
36,2f,32,2f,32,30,31,37,20,31,37,3a,32,38,3a,32,37
Date 6/2/2017 17:28:27
{"y":"2017","m":"6","d":"2","hms":"17:28:27"}

#### node
36,2f,32,2f,32,30,31,37,2c,20,31,37,3a,32,38,3a,32,37
36,2f,32,2f,32,30,31,37,2c,20,31,37,3a,32,38,3a,32,37
Date 6/2/2017, 17:28:27
{"y":"2017","m":"6","d":"2","hms":"17:28:27"}

#### node-ch
200e,36,200e,2f,200e,32,200e,2f,200e,32,30,31,37,200e,20,200e,31,37,200e,3a,200e,32,38,200e,3a,200e,32,37
36,2f,32,2f,32,30,31,37,20,31,37,3a,32,38,3a,32,37
Date 6/2/2017 17:28:27
{"y":"2017","m":"6","d":"2","hms":"17:28:27"}

if I use dtOptions with timeZone nodejs accepts it and ignore want is doesn't like but Chakra gets an error. For my purpose {hour12:false} is sufficient so I didn't try to fully debug it.

Already tracked in #3096

@dilijev dilijev removed their assignment Sep 5, 2017
@dilijev
Copy link
Contributor

dilijev commented Sep 5, 2017

Tracking as part of #3644

@zbraniecki
Copy link

I'm not sure if this should be fixed FWTW. Intl operations are by design supposed to be opaque and no expectations should be made as to the output.

Bidi marks are perfectly valid parts of the result string and any operations that are attempting to retrieve data from the formatted string (like the regexp in comment 0) are working against the logic of Intl operations.

My recommendation would be to close this issue as invalid.

@nojvek
Copy link
Author

nojvek commented Dec 20, 2017 via email

@zbraniecki
Copy link

They're only weird if you are unfamiliar with the issues of bidirectionality. Bidi chars are needed and we will be adding them. "Developers" also tend to do other bad practices like sync io. I don't believe that ignorance in the subject someone is working on is a good reason to sacrifice the quality of the API.

@BobFrankston
Copy link

At this point I'm with keeping the bidi because parsing the output has enough other problems and focusing on TC39 date improvements to get pat today's kludges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants