-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base URL Windows drive sometimes favoured over input string drive #574
Comments
That does seem like an error of sorts. It might be worthwhile comparing this with browsers on Windows, especially Chrome. And either way a test would definitely be appreciated and help move this forward. |
Yes, this is unpleasant. I remember two testcases in WPT that are related. (see below). These are 'congruence bugs', in the mathematical sense. They are not exactly reparsing bugs according to the spec, but they are very similar. You'd want to prevent components that 'look like drive letters' from 'becoming drive letters' when normalizing. For implementations and the spec, It's useful to store the drive as a property on the (internal) URL record to avoid branching on the content of the first path component. Related tests:
|
I would like to solve this issue. It seems to be somewhat problematic, but I can give it a try. I think it should be among the last file URL/ drive letter issues. My approach would be to add some cases to wpt, as a draft/ proposal at first, then collect the results and investigate how to make changes to jsdom/whatwg-url and the standard based on that. I propose to change the desired behaviour for cases 1. and 4. of the original issue (see below). Case 1
Case 4
I don't have access to a Windows machine unfortunately. The spec would remain unchanged for the second and third cases:
The fifth case and the one I added can maybe be kept separate, but I'm not sure yet. Let me know if this is OK, then I'll prepare another PR (and somehow mark it as a draft?) to the web platform tests. |
Chrome on Windows appears to agree with Chrome on Mac in cases 1 and 4. So it seems your changes there move us closer to most browser behavior and so from my perspective they sound great. |
I'm not opposed to change here, but I think the tests should include "file:///C|/a", which currently serializes to "file:///C%7C/a" in Chrome on Mac, "file:///C:/a" in Firefox, and "file:///C|/a" in Safari, which currently passes all the URL constructor web platform tests. |
FWIW |
Wow, it really does!! Congratulations!!! |
Hm? I still see |
Yes, I just started investigating. It seems there is no test for that, surprisingly. And now I'm curious how Safari handles this internally :) It is not related to the issue here though, which results from step 4.1. in the path state together with in-place normalisation. |
FYI @alwinb, I have a lot of additional tests in my implementation of a new URL type for Swift: https://github.com/karwa/swift-url/blob/main/Tests/WebURLTests/additional_constructor_tests.json I'm using a very different path parsing algorithm to the one in the standard, and I noticed quite a lot of situations where I could pass the entire WPT test suite and still forget to handle some edge-cases. They're in the WPT JSON format because I test them against JSDOM to ensure they really don't go out of sync with the current standard, and because I planned to upstream them in the near future (still doing final tidying-up of the implementation). The tests may include some duplicates of existing tests, because at some point I got a bit frustrated at the number of holes I was finding, so I started listing combinations. If you're familiar with the current set of the tests, you may find it useful, and it may be easier for you to spot gaps in the WPT's coverage. They are broken up in to categories with extensive comments, so it's not just a totally unordered collection of tests :) |
Sorry, you are right!
Really looking forward to those getting upstreamed! The more coverage the better! |
Yeah thats nice! (@karwa) I found seven of your tests that are related to this very issue, and whose behaviour would change. Can I include them in my 'proposal' PR to the tests? I can also wait till your tests are upstreamed and adapt them after. Quote:
In summary, and to gather some feedback, my idea is to prevent the issue by not converting segments to a drive unless it is literally the first component (of the host/path in the input string). I think this is what people would expect, and it seems to be not far off from browser behaviour. If possible, @achristensen07, (and others) do you have an opinion about that approach? Since it will incur a few changes to Safari. |
I don't have strong opinions about windows drive letters in general other than that I think they exist to be compatible with browsers past and present on Windows. We implement them in order to have a cross-platform spec that ideally everyone would be moving towards. If a change is agreed upon and it's reasonable, I'll make the change. |
This is still on my list, sorry for the delay, I hope to get around to it someday soon. |
Typically, the URL parser detects Windows drive letters in the parsed path's "effective first component" (term I just made up).
The path parser processes each component in turn, and pushing and popping each component as they are encountered, with normalisation checking that the existing path is empty and the shortening function checking each time if the component at
path[0]
is a Windows drive letter so as not to pop it. This leads to Windows drive letters getting detected even if there is leading rubbish in front of them, for example (Live viewer):The
C|
component gets normalised in toC:
and is not popped from the resulting path, because it is detected as a Windows drive letter.However, there is a very subtle quirk: if the input string does not have an authority (
/abc
orfile:/abc
), and the base URL also contains a drive letter, the process of detecting drive letters in the input string gets stricter, so the drive must be the literal first component, without any leading rubbish. Otherwise the drive from the base URL gets preferred.Consider the following
(input, base)
pairs:As you can see, in most cases we take the drive "D" from the input string, even when it isn't literally the first component, except for this one case.
This doesn't seem to be exercised by any existing tests, and given that it is such an edge-case I'd like to check whether it is intentional or not. I'm using a parsing algorithm derived from the one in the standard, and I was able to pass the all of the constructor and setter tests while handling this incorrectly. So if this is intentional behaviour, I'd be happy to add some tests for it.
The text was updated successfully, but these errors were encountered: