Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base URL Windows drive sometimes favoured over input string drive #574

Open
karwa opened this issue Jan 23, 2021 · 14 comments
Open

Base URL Windows drive sometimes favoured over input string drive #574

karwa opened this issue Jan 23, 2021 · 14 comments
Labels
topic: file Aren't file: URLs the best? topic: parser

Comments

@karwa
Copy link
Contributor

karwa commented Jan 23, 2021

Typically, the URL parser detects Windows drive letters in the parsed path's "effective first component" (term I just made up).
The path parser processes each component in turn, and pushing and popping each component as they are encountered, with normalisation checking that the existing path is empty and the shortening function checking each time if the component at path[0] is a Windows drive letter so as not to pop it. This leads to Windows drive letters getting detected even if there is leading rubbish in front of them, for example (Live viewer):

// (input, base) => result
("file:///abc/def/../../C|/../hello/", "about:blank") => "file:///C:/hello/"

The C| component gets normalised in to C: and is not popped from the resulting path, because it is detected as a Windows drive letter.

However, there is a very subtle quirk: if the input string does not have an authority (/abc or file:/abc), and the base URL also contains a drive letter, the process of detecting drive letters in the input string gets stricter, so the drive must be the literal first component, without any leading rubbish. Otherwise the drive from the base URL gets preferred.

Consider the following (input, base) pairs:

// D drive is literally the first component.
("file:/D|/../foo", "file:///C:/base1/base2/") => "file:///D:/foo"

// Add a single-dot component before the input drive.
// Even though this doesn't affect the final path, it makes us favour the base's drive.
("file:/./D|/../foo", "file:///C:/base1/base2/") => "file:///C:/foo"

// This doesn't happen if input string has an authority.
("file:///./D|/../foo", "file:///C:/base1/base2/") => "file:///D:/foo"

// Going back to no-authority, the drive in the input string
// is still recognised if base _doesn't_ have its own drive.
("file:/./D|/../foo", "file:///Cx/base1/base2/") => "file:///D:/foo"

As you can see, in most cases we take the drive "D" from the input string, even when it isn't literally the first component, except for this one case.

This doesn't seem to be exercised by any existing tests, and given that it is such an edge-case I'd like to check whether it is intentional or not. I'm using a parsing algorithm derived from the one in the standard, and I was able to pass the all of the constructor and setter tests while handling this incorrectly. So if this is intentional behaviour, I'd be happy to add some tests for it.

@annevk
Copy link
Member

annevk commented Jan 25, 2021

That does seem like an error of sorts. It might be worthwhile comparing this with browsers on Windows, especially Chrome. And either way a test would definitely be appreciated and help move this forward.

@alwinb
Copy link
Contributor

alwinb commented Feb 1, 2021

Yes, this is unpleasant. I remember two testcases in WPT that are related. (see below).

These are 'congruence bugs', in the mathematical sense. They are not exactly reparsing bugs according to the spec, but they are very similar.

You'd want to prevent components that 'look like drive letters' from 'becoming drive letters' when normalizing.
This is similar to what was done in #505 for hosts (there, path components would 'turn into a host'). You can resolve this by percent encoding (part of) the first component upon serialization, if it would otherwise turn into a drive letter. Or otherwise, prefix ./ as in #505.

For implementations and the spec, It's useful to store the drive as a property on the (internal) URL record to avoid branching on the content of the first path component.

Related tests:

{
  "input": "file:///./Y:",
  "base": "about:blank",
  "href": "file:///Y:",
  "protocol": "file:",
  "username": "",
  "password": "",
  "host": "",
  "hostname": "",
  "port": "",
  "pathname": "/Y:",
  "search": "",
  "hash": ""
}
{
  "input": "file:///./y:",
  "base": "about:blank",
  "href": "file:///y:",
  "protocol": "file:",
  "username": "",
  "password": "",
  "host": "",
  "hostname": "",
  "port": "",
  "pathname": "/y:",
  "search": "",
  "hash": ""
}

@alwinb
Copy link
Contributor

alwinb commented Apr 12, 2021

I would like to solve this issue. It seems to be somewhat problematic, but I can give it a try. I think it should be among the last file URL/ drive letter issues.

My approach would be to add some cases to wpt, as a draft/ proposal at first, then collect the results and investigate how to make changes to jsdom/whatwg-url and the standard based on that.

I propose to change the desired behaviour for cases 1. and 4. of the original issue (see below).

Case 1

file:///abc/def/../../C|/../hello/

Spec:       file:///C:/hello/
Safari:     file:///C|/hello/
Firefox:    file:///hello/
Chrome/Mac: file:///hello/
Proposal:   file:///hello/

Case 4

file:///./D|/../foo against file:///C:/base1/base2/

Safari:     file:///D|/foo
Firefox:    file:///foo
Chrome/Mac: file:///foo
Spec:       file:///D:/foo
Proposal:   file:///foo

I don't have access to a Windows machine unfortunately.

The spec would remain unchanged for the second and third cases:

  • file:/D|/../foo against file:///C:/base1/base2/ resolves to file:///D:/foo
  • file:/./D|/../foo against file:///C:/base1/base2/ resolves to file:///C:/foo

The fifth case and the one I added can maybe be kept separate, but I'm not sure yet.

Let me know if this is OK, then I'll prepare another PR (and somehow mark it as a draft?) to the web platform tests.

@domenic
Copy link
Member

domenic commented Apr 12, 2021

Chrome on Windows appears to agree with Chrome on Mac in cases 1 and 4. So it seems your changes there move us closer to most browser behavior and so from my perspective they sound great.

@achristensen07
Copy link
Collaborator

I'm not opposed to change here, but I think the tests should include "file:///C|/a", which currently serializes to "file:///C%7C/a" in Chrome on Mac, "file:///C:/a" in Firefox, and "file:///C|/a" in Safari, which currently passes all the URL constructor web platform tests.

@domenic
Copy link
Member

domenic commented Apr 12, 2021

FWIW file:///C|/a serializes to file:///C|/afile:///C:/a on Chrome on Windows, matching the current spec.

@domenic
Copy link
Member

domenic commented Apr 12, 2021

which currently passes all the URL constructor web platform tests.

Wow, it really does!! Congratulations!!!

@karwa
Copy link
Contributor Author

karwa commented Apr 12, 2021

FWIW file:///C|/a serializes to file:///C|/a on Chrome on Windows, matching the current spec.

Hm? I still see file:///C:/a on the jsdom live viewer, and that's still what I would expect from the current standard.

@alwinb
Copy link
Contributor

alwinb commented Apr 12, 2021

Yes, I just started investigating. It seems there is no test for that, surprisingly. And now I'm curious how Safari handles this internally :)

It is not related to the issue here though, which results from step 4.1. in the path state together with in-place normalisation.

@karwa
Copy link
Contributor Author

karwa commented Apr 12, 2021

FYI @alwinb, I have a lot of additional tests in my implementation of a new URL type for Swift: https://github.com/karwa/swift-url/blob/main/Tests/WebURLTests/additional_constructor_tests.json

I'm using a very different path parsing algorithm to the one in the standard, and I noticed quite a lot of situations where I could pass the entire WPT test suite and still forget to handle some edge-cases. They're in the WPT JSON format because I test them against JSDOM to ensure they really don't go out of sync with the current standard, and because I planned to upstream them in the near future (still doing final tidying-up of the implementation). The tests may include some duplicates of existing tests, because at some point I got a bit frustrated at the number of holes I was finding, so I started listing combinations.

If you're familiar with the current set of the tests, you may find it useful, and it may be easier for you to spot gaps in the WPT's coverage. They are broken up in to categories with extensive comments, so it's not just a totally unordered collection of tests :)

@domenic
Copy link
Member

domenic commented Apr 12, 2021

Hm? I still see file:///C:/a on the jsdom live viewer, and that's still what I would expect from the current standard.

Sorry, you are right!

FYI @alwinb, I have a lot of additional tests in my implementation of a new URL type for Swift: https://github.com/karwa/swift-url/blob/main/Tests/WebURLTests/additional_constructor_tests.json

Really looking forward to those getting upstreamed! The more coverage the better!

@alwinb
Copy link
Contributor

alwinb commented Apr 13, 2021

Yeah thats nice! (@karwa)

I found seven of your tests that are related to this very issue, and whose behaviour would change. Can I include them in my 'proposal' PR to the tests? I can also wait till your tests are upstreamed and adapt them after.

Quote:

As you can see, in most cases we take the drive "D" from the input string, even when it isn't literally the first component, except for this one case.

In summary, and to gather some feedback, my idea is to prevent the issue by not converting segments to a drive unless it is literally the first component (of the host/path in the input string). I think this is what people would expect, and it seems to be not far off from browser behaviour. If possible, @achristensen07, (and others) do you have an opinion about that approach? Since it will incur a few changes to Safari.

@achristensen07
Copy link
Collaborator

I don't have strong opinions about windows drive letters in general other than that I think they exist to be compatible with browsers past and present on Windows. We implement them in order to have a cross-platform spec that ideally everyone would be moving towards. If a change is agreed upon and it's reasonable, I'll make the change.

@alwinb
Copy link
Contributor

alwinb commented Jun 5, 2021

This is still on my list, sorry for the delay, I hope to get around to it someday soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: file Aren't file: URLs the best? topic: parser
Development

No branches or pull requests

5 participants