Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider preventing autolinks in links #719

Open
wooorm opened this issue Sep 2, 2022 · 1 comment
Open

Consider preventing autolinks in links #719

wooorm opened this issue Sep 2, 2022 · 1 comment

Comments

@wooorm
Copy link
Contributor

wooorm commented Sep 2, 2022

Background

The CommonMark spec prevents “actual” links (those with a reference (e.g., [x]), or those with a resource (e.g., (x "y")), from occurring inside each other:

[a [b](c) d](e)
<p>[a <a href="c">b</a> d](e)</p>

The reason for that, is that the HTML spec does not allow it, and specifically the parsing algorithm of the HTML spec makes it impossible:

document.body.innerHTML = '<p><a href="e">a <a href="c">b</a> d</a></p>';
console.log(document.body.innerHTML)

Yields:

<p><a href="e">a </a><a href="c">b</a> d</p>

(the first link is closed when a new link is opened, its link end is later ignored).

CommonMark does this by, when a link is matched (the “deep” one is matched first), marking earlier link starts as “inactive”.
It does not do that for links in images, or images in links, because it doesn‘t need to: images in links are fine, and links “in” images do not generate HTML (somewhat related: #716)

Problem

However, there is another way to create links in links with markdown: autolinks (<https://example.com>) inside links:

[a <https://example.com> b](c)

Yields:

<p><a href="c">a <a href="https://example.com">https://example.com</a> b</a></p>

Solution

There are two solutions.
Which to choose, depends on which of the two links is more likely the one an author intended.

Solution A: mark link starts as inactive

The code for this would be very similar to the one already used for “normal” links: when a valid autolink is parsed, mark earlier link starts as inactive. The output then would be:

<p>[a <a href="https://example.com">https://example.com</a> b](c)</p>
Solution B: outputting the source of an autolink when compiling, when in links

The code for this would be very similar to what is likely needed for images, when compiling.
When in an image alt, no tags are generated.
Something similar can be done when inside a link: output < (instead of <a href="https://example.com">), output the text like normal, and > (instead of </a>).
The output then would be:

<p><a href="c">a &lt;https://example.com> b</a></p>

Consideration

The current state is clearly broken: it’s not something someone wants or is expecting: HTML doesn’t like it and only halve of an authors outer link is “clickable”.

With solution A, both URLs are displayed verbatim, the inner one “clickable”, but both usable.
This is like how “normal” links work in CommonMark, so I think it is the best solution.

I think solution B is cleaner output, and chances are it’s more likely what an author intended, both URLs being the same, showing the URL that someone goes to:

[For more information, see <https://example.com/some-page>!](https://example.com/some-page).

Though, solution B gets confusing if the URLs are different, a reader sees one URL, but is taken to another:

[For more information, see <https://example.com/some-page>!](https://example.com/a-completely-different-page).

I think both are acceptable and an improvement, and I’d prefer solution A a little bit more than solution B.

@jgm
Copy link
Member

jgm commented Sep 2, 2022

Thanks for raising this. It would be good to get more feedback on the proposed solutions.

wooorm added a commit to syntax-tree/mdast that referenced this issue Feb 4, 2023
Previously, there was a difference between static and dynamic phrasing
content.
There was also transparent content, to “inherit” the (static or dynamic)
content model from a parent.
This was used to prohibit links in links.

However, while it is unwise to put links in links, it is possible with
markdown to do so, by embedding autolinks in links with resources or
references.
Take for example:

```markdown
[alpha <https://bravo> charlie](delta)
```

Yields, per CommonMark:

```html
<p><a href="delta">alpha <a href="https://bravo">https://bravo</a> charlie</a></p>
```

See also: commonmark/commonmark-spec#719.

There is also the case where while it is unwise to author such markdown,
tools transforming mdast might inject links.
It’s probably good to handle such cases in `mdast-util-to-markdown` and
such?

Not allowing links in links is also something imposed by HTML (particularly
the parsing side[^1]): perhaps mdast could be used in places where this
restriction doesn’t need to be imposed.

Finally, it is hard to type such inherited content models.
As in, transparent content is not used in `@types/mdast` (I don’t think it
can be typed?):

https://github.com/DefinitelyTyped/DefinitelyTyped/blob/3b052a3e5b59b0efaa22669d9bd8f268bd689835/types/mdast/index.d.ts#L286

So, this PR is a proposal to remove the difference and simplify the AST.

[^1]: You can produce nested `a`s with the DOM:

    ```js
    let a1 = document.createElement('a'); a1.textContent = 'alpha';
    let a2 = document.createElement('a'); a2.textContent = 'bravo';
    a1.appendChild(a2)
    document.body.appendChild(a1)
    ```
wooorm added a commit to syntax-tree/mdast that referenced this issue Jun 15, 2023
Previously, there was a difference between static and dynamic phrasing
content.
There was also transparent content, to “inherit” the (static or dynamic)
content model from a parent.
This was used to prohibit links in links.

However, while it is unwise to put links in links, it is possible with
markdown to do so, by embedding autolinks in links with resources or
references.
Take for example:

```markdown
[alpha <https://bravo> charlie](delta)
```

Yields, per CommonMark:

```html
<p><a href="delta">alpha <a href="https://bravo">https://bravo</a> charlie</a></p>
```

See also: commonmark/commonmark-spec#719.

There is also the case where while it is unwise to author such markdown,
tools transforming mdast might inject links.
It’s probably good to handle such cases in `mdast-util-to-markdown` and
such?

Not allowing links in links is also something imposed by HTML (particularly
the parsing side[^1]): perhaps mdast could be used in places where this
restriction doesn’t need to be imposed.

Finally, it is hard to type such inherited content models.
As in, transparent content is not used in `@types/mdast` (I don’t think it
can be typed?):

https://github.com/DefinitelyTyped/DefinitelyTyped/blob/3b052a3e5b59b0efaa22669d9bd8f268bd689835/types/mdast/index.d.ts#L286

So, this PR is a proposal to remove the difference and simplify the AST.

Closes GH-42.

[^1]: You can produce nested `a`s with the DOM:

    ```js
    let a1 = document.createElement('a'); a1.textContent = 'alpha';
    let a2 = document.createElement('a'); a2.textContent = 'bravo';
    a1.appendChild(a2)
    document.body.appendChild(a1)
    ```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants