Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to parse URL that doesn't start with scheme but contains a query that contains a URL with a scheme #13

Open
mlhdeveloper opened this issue Sep 11, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@mlhdeveloper
Copy link

Here's a simple example URL that fails to parse:

somedomain.com/?g=http://somedomain.com/

If you add a scheme to the front, it then parses properly:

http://somedomain.com/?g=http://somedomain.com/
@mlhdeveloper
Copy link
Author

mlhdeveloper commented Sep 11, 2024

I think the solution lies in changing this part so that it's only checking for :// near the beginning of the url and not anywhere in the entire url:

preg_rfc1808 = re.compile("://")

if not preg_rfc1808.search(url):
url = "//%s" % url

@mlhdeveloper
Copy link
Author

mlhdeveloper commented Sep 11, 2024

I think this fixes it so that it's only checking for :// at the very beginning of the url or only after a scheme, i.e. only after any number of alphabetical or + characters (based on schemes handled by urllib.parse):

preg_rfc1808 = re.compile("^[a-z+]*://")

@ggokdemir ggokdemir self-assigned this Sep 16, 2024
@ggokdemir ggokdemir added the bug Something isn't working label Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants