Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not adding empty alt tags to rendered images #718

Open
mjbvz opened this issue Aug 12, 2022 · 12 comments
Open

Consider not adding empty alt tags to rendered images #718

mjbvz opened this issue Aug 12, 2022 · 12 comments

Comments

@mjbvz
Copy link

mjbvz commented Aug 12, 2022

Originally reported in markdown-it/markdown-it#885

Setup

  1. Create a reference to an image that does not exist: ![](nosuch)

Problem

Common Mark currently renders this image with an empty alt attribute. ![](nosuch) is rendered as <img src="nosuch" alt="">.

Bad ![](nosuch)

Good ![alt](nosuch)

Good <img src="nosuch">

Bad <img alt src="nosuch">

Renders as:

Screen Shot 2022-08-12 at 2 20 36 PM

This is expected since according to mdn:

Setting this attribute to an empty string (alt="") indicates that this image is not a key part of the content (it's decoration or a tracking pixel), and that non-visual browsers may omit it from rendering.


I know the spec isn't really about how markdown is rendered, so my suggestion is to change the tests for this: https://spec.commonmark.org/0.30/#example-580

It seems more likely to me that the user did mean to include the image and simply forgot the alt than that they wanted to hide the image from non-visual browsers

@jgm
Copy link
Member

jgm commented Aug 13, 2022

OK, I did not know this. I agree that the reference implementations and test should be changed.

@jgm jgm closed this as completed in 48ba840 Aug 13, 2022
jgm added a commit to commonmark/cmark that referenced this issue Aug 13, 2022
This is better than `alt=""`, which signals to browsers that
the image is not part of the main content.
See commonmark/commonmark-spec#718.
jgm added a commit to commonmark/commonmark.js that referenced this issue Aug 13, 2022
@jgm
Copy link
Member

jgm commented Aug 13, 2022

I've now updated cmark and commonmark.js accordingly.

@wooorm
Copy link
Contributor

wooorm commented Aug 14, 2022

I believe that this change is terrible for accessibility of millions of existing, accessible, blog posts and other markdown pages currently in the wild.
It is very breaking and I believe it should not have been made lightly without due diligence or a comment period.

It seems more likely to me that the user did mean to include the image and simply forgot the alt than that they wanted to hide the image from non-visual browsers

There is a chance that people did this, yes. And maybe that case is more likely, sure.
This proposed change means that conformance checkers like the w3c validator but also AXE will warn for these images. That, on its own, would indeed be an improvement.

But it is very valid to use empty alt attribute in HTML. This change breaks anyone that depended on that, and marks their sites as inaccessible.

For more information, please see the HTML spec on the img element, particularly:

If the src attribute is set and the alt attribute is set to the empty string

The image is either decorative or supplemental to the rest of the content, redundant with some other information in the document.

[...]

Otherwise, the element represents nothing, and may be omitted completely from the rendering. User agents may provide the user with a notification that an image is present but has been omitted from the rendering.

https://html.spec.whatwg.org/multipage/embedded-content.html#the-img-element:attr-img-src-5


While it is nice to cause some more linting warnings with a change like this, there are very valid use cases of empty alt attribute values, and this change makes it impossible to write valid empty alt attributes, such as when images are explained in surrounding prose already.

In the past, some of markdown parsers in the JavaScript ecosystem made this mistake, not setting an empty string. People raised bugs, they were right, and it was fixed.

This change also does not work well with consistency of CommonMark: [a]() generates an <a href="">a</a>. In HTML, empty href and no href are also treated differently, why handle alt and href/src different?

@jgm jgm reopened this Aug 15, 2022
@jgm
Copy link
Member

jgm commented Aug 15, 2022

I'll reopen for further comment. The change can always be reverted, nothing has been released yet.

@jgm
Copy link
Member

jgm commented Aug 15, 2022

I'll note that the original Markdown.pl did not emit an alt with empty content. Pandoc followed it in this behavior. And I haven't had complaints about this over the 15+ years I've been maintaining pandoc. But according to babelmark pandoc is an outlier in this among modern implementations.

@mjbvz
Copy link
Author

mjbvz commented Aug 16, 2022

I believe omitting alt is the most pragmatic approach. A few points:

  • In an ideal world, images would always have an alt that is deliberately added by the author. alt is even mandated by the html spec
  • alt="" has a very specific meaning as it marks as non-visual and signals that it can be ignored by screen readers

In my view, defaulting to alt="" is worse for accessibility than omitting the attributed entirely. Omitting it is a clear error (and browsers render it as such), while implicitly adding empty alt says, "I know what I am doing and intentionally don't want this image to be announced". I don't think that is true nearly as often (especially for markdown documents)

Instead it's more likely that the author was either still working on the document and hasn't filled in the alt yet, or has simply forgotten the alt. In these cases, we want the missing alt to be obvious to the author so that they can add a proper alt. That's not the case if we implicitly add alt="" as the default

@wooorm
Copy link
Contributor

wooorm commented Aug 16, 2022

images would always have an alt that is deliberately added by the author.
alt="" has a very specific meaning as it marks as non-visual and signals that it can be ignored by screen readers

I don’t understand this. Why? I linked to the HTML spec above, which refutes your statements. The spec specifically recommends using an empty alt if the image is explained in prose.

I am of the opinion that (almost) all images should be a non-essential thing, that can be omitted entirely. For people on slow internet connections. For people that are color blind, can’t look at a screen currently, for broken files, for copy/pasting, and numerous other reasons: explain what they add in surrounding prose, so that they don’t add anything essential.

This stems in part from humans being terrible at writing alt text. Most people don’t get it right.

I don't think that is true nearly as often

The problem with “chances” is that there are either false positives or false negatives. Because HTML is more expressive than markdown. Similar to empty src or href attributes.

I don‘t want valid markdown, written in an accessible way, to be generates as inaccessible HTML just because there is a chance that other people write inaccessible HTML.

If you want to check your markdown, there are markdown linters that you can enable. Or use an HTML linter that checks for empty alts if you believe that they are never okay.

@jgm
Copy link
Member

jgm commented Aug 16, 2022

@coding-horror do you have any thoughts on this issue?

@mjbvz
Copy link
Author

mjbvz commented Aug 16, 2022

I'm not arguing there aren't valid use cases for alt="", I'm saying omitting it seems like a more reasonable default for how people write markdown out in the real world. From the the same section of the spec (emphasis added):

[If] the src attribute is set and the alt attribute is not

The image might be a key part of the content

Right now we can't know what the author intended if they write ![](...). Ideally Markdown syntax would make this clear but that's something for a larger discussion. Until then, I think the rendered markdown should also reflect this uncertainty

@wooorm
Copy link
Contributor

wooorm commented Aug 17, 2022

how people write markdown out in the real world

Do you have examples of this? Concrete reasons to make this assertion? I wonder if the statement is correct.
Of course, some people will write wrong alt text (I assume a lot of it is).
And some people will wrongly omit it.
But where is the idea coming from that everyone is omitting alt text wrongly, and nobody is omitting it correctly?
At what percentage of wrong vs. correct would it warrant such a change?


The HTML spec goes into detail on why empty alt attributes are very valid in many cases.
I asked Twitter, and many people do write empty alt attributes: https://twitter.com/wooorm/status/1559593654486024192.
There are also many popular websites, such as Wikipedia, that use empty alt attributes in their content:

Screenshot from the Wikipedia article on Otto von Bismarck, showing a figure with the caption “Bismarck in 1836, at age 21”, and an image that well, shows that.

That figure + image w/ empty alt + caption style is also used in the HTML spec (giant page, slow to load, use $$('img[alt=""]') in console to find them) itself, putting some weight behind the fact that it is valid.


What will happen if this goes through, in the scenario of people writing a website in markdown, is that suddenly their CI will start breaking because their HTML is no longer valid (not everyone will have that set up of course, but that is the intent behind this feature: to mark sites as inaccessible).
Assuming these people made mistakes before, they’ll add some alt text to images: good thing, sure.
But in the cases where they wrote an accessible site, they are now forced to either duplicate content:

![Bismarck in 1836, at age 21](#)

Bismarck in 1836, at age 21

or inject bogus alt text:

![&nbsp;](#)

Bismarck in 1836, at age 21

…which are both very bad for accessibility.


Why should CommonMark decide Wikipedia’s site is inaccessible?
Why can’t this be solved on your end, if you want it, with a lint rule in your markdown or on your sites?
Or, an option in the markdown parser, e.g., markdown-it?
With remark, you can write a small plugin:

/**
 * @typedef {import('mdast').Root} Root
 */

import {visit} from 'unist-util-visit'

/** @type {import('unified').Plugin<[], Root>} */
function myRemarkPluginCheckingEmptyAlts() {
    return function (tree, file) {
        visit(tree, (node) => {
            if ((node.type === 'image' || node.type === 'imageReference') && !node.alt) {
                file.warn('Expected `alt` text to be non-empty', node)
            }
        })
    }
}

@sukiletxe
Copy link

Also, screen readers will read the filename of the image when there is no alt, which, except in interactive controls, is rather distracting. When the alt is empty, screen readers will skip it, or just read "image" (depending on how you navigate).

@jgm
Copy link
Member

jgm commented Aug 26, 2022

I've reverted the change for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants