URL not detected correctly, when it contains a '#' #556

klartext · 2021-01-22T10:27:38Z

Describe the bug
URL not detected correctly, when it contains a '#'
http://www.mywebsite.org/#/foobar
Url-detection stops at the '#'

To Reproduce

Use an URL with a hashtag, like the one mentioned above, and insert it in edit mode
and view the result in the preview mode.

Expected behavior
The URL is detected correctly completely and detection does not stop at the '#' symbol.

Versions:
Rednotebook 2.20, Arch-Linux, package 2.20-2 (2020-11-12).

laraconda · 2023-05-30T23:38:19Z

I believe this could be fixed by changing HASHTAG_PATTERN from rednotebook/data.py.

jendrikseipp · 2023-05-31T10:03:04Z

Probably, yes. Feel free to raise a PR :)

laraconda · 2023-05-31T21:23:57Z

Alright. I discovered that the patterns in data.py match hashtags and those are used for the Tags section. The matching done for the coloring of the text, recognition of urls and such is done in rednotebook/files/t2t.lang.

A single source of regexes is necessary to avoid confusion like this one. The second file uses xml format to define how the matches are treated. Is the regex format in this file compliant with the python format? If so, regexes can be defined in a single place and imported in both data.py and in t2t.lang.

laraconda · 2023-05-31T21:33:40Z

Another thing that I noticed was that data.py hashtags are excluded from being hex colors (as in #AAA000). Also are excluded from being what I presume are cpp directives such as "#include". I have to observations:

Given that the pattern HASHTAG is compiled with the flag re.IGNORECASE, tags like #face10, #facade and such will be treated as hexcolors (which they are it seems). I don't know if this is the intended effect. Given how the original regex for hex numbers is written (r"[0-9A-F]{6}"), it feels like the original intent was to only match hex colors written in uppercase.
If I'm correct that "#include" is intended to exclude cpp directives, then it is necessary to add more like #define, #endif and such. (I already coded this so I hope I'm right).

klartext · 2023-06-01T00:33:07Z

I wonder if not using a library for the URL-check would make sense.
I looked for URL-parsing and url-validation.

For parsing URLs, urllib (Python Standard-Lib) can be used. But it does not check url-validity.

I then looked for URL-validation and found the validators lib.
I have not used it so far, but it was recommended in some aticles I found, and the lib seems to be used a lot and was updated last week. Some Issues there, but nothing that looks bad.
This one: https://github.com/python-validators/validators

Maybe that lib might be considered here.

laraconda · 2023-06-01T18:11:44Z

I think fixing the patterns in t2t.lang would be better. The code is already coupled with it, using a new library would be a lot more work. Plus, it seems like GTK Python uses an xml file to identify patterns and then do something with them like displaying them in bold, underline, coloring urls, etc. So i don't think evaluating each piece of text with a python function would work here.
Thanks for the suggestion anyways.

laraconda · 2023-06-01T18:13:27Z

In other topics: Turns out the use of '#' alone doesn't cause problems in url recognition, the use of '/' after '#' does:

These examples are correctly identified as links:
http://blog.example.net/post123#comments
http://www.example.com/page.html#section1
https://www.shop.com/product#reviews
http://www.example.org/#contact

These are not:
http://www.mywebsite.org/#/foobar
http://www.mywebsite.org/#foo/bar

jendrikseipp · 2023-06-01T18:17:11Z

I agree. Adding an external library is always a big pain.

I don't see an easy way of avoiding duplicating the hashtag regex in data.py and t2t.lang, since we'd have to parse the XML with Python, which seems excessive. It's probably best to just add a comment in both places that changing one line implies changing the other.

Regarding hex values and C++ preprocessor directives, I agree with you, good catch!

…ssue jendrikseipp#556

laraconda · 2023-06-01T23:20:04Z

I submitted the PR: #703

* Fixing url not recognized when hashtag symbol is followed by slash. Issue #556 * Adding more cpp directives to hashtag pattern in t2t. Adding comment regarding what each hashtag regex is used for in both files. --------- Co-authored-by: Jendrik Seipp <jendrikseipp@gmail.com>

jendrikseipp · 2024-02-17T21:43:15Z

Fixed in #703.

…#703) * Fixing url not recognized when hashtag symbol is followed by slash. Issue jendrikseipp#556 * Adding more cpp directives to hashtag pattern in t2t. Adding comment regarding what each hashtag regex is used for in both files. --------- Co-authored-by: Jendrik Seipp <jendrikseipp@gmail.com>

klartext added the bug label Jan 22, 2021

jendrikseipp mentioned this issue Dec 29, 2022

Hash Symbol breaks out of URL mode #625

Closed

laraconda added a commit to laraconda/rednotebook that referenced this issue Jun 1, 2023

Fixing url not recognized when hashtag symbol is followed by slash. I…

27470b4

…ssue jendrikseipp#556

jendrikseipp closed this as completed Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL not detected correctly, when it contains a '#' #556

URL not detected correctly, when it contains a '#' #556

klartext commented Jan 22, 2021

laraconda commented May 30, 2023

jendrikseipp commented May 31, 2023

laraconda commented May 31, 2023

laraconda commented May 31, 2023 •

edited

Loading

klartext commented Jun 1, 2023

laraconda commented Jun 1, 2023

laraconda commented Jun 1, 2023 •

edited

Loading

jendrikseipp commented Jun 1, 2023

laraconda commented Jun 1, 2023

jendrikseipp commented Feb 17, 2024

URL not detected correctly, when it contains a '#' #556

URL not detected correctly, when it contains a '#' #556

Comments

klartext commented Jan 22, 2021

laraconda commented May 30, 2023

jendrikseipp commented May 31, 2023

laraconda commented May 31, 2023

laraconda commented May 31, 2023 • edited Loading

klartext commented Jun 1, 2023

laraconda commented Jun 1, 2023

laraconda commented Jun 1, 2023 • edited Loading

jendrikseipp commented Jun 1, 2023

laraconda commented Jun 1, 2023

jendrikseipp commented Feb 17, 2024

laraconda commented May 31, 2023 •

edited

Loading

laraconda commented Jun 1, 2023 •

edited

Loading