Understanding WCAG 2.1 Success Criterion 4.1.1

One of the most frequently misunderstood success criteria in WCAG 2.0 and WCAG 2.1 is success criterion 4.1.1 parsing. The criterion was written when HTML 4.01 (based on SGML) was still in use and XML-based successors had been in development for several years, especially XHTML 1.0 (a reformulation of HTML 4 in XML 1.0) and XHTML 2.0. Development of XHTML 2.0 was abandoned after HTML 5 turned out to be much more viable. Instead of building on SGML or XML, HTML 5 describes how an HTML parser should behave and what parse errors are.

What distinguishes HTML 5 from its predecessors is that it is based neither on SGML nor on XML. However, when success criterion 4.1.1 was defined for WCAG 2.0, it was not yet clear that HTML5 would radically break with XML and its clear separation between syntax (well-formedness constraints) and document type (validity constraints). The wording of the success criterion is based on XML's concept of well-formedness but avoids that term in order to remain format agnostic and thereby more future-proof.

When we read success criterion 4.1.1 with XML in mind, we arrive at the following meanings:

The success criterion prohibits neither custom elements nor custom attributes. When WCAG 2.0 was in its final stages, development if WAI-ARIA had already started. The first public working draft of WAI-ARIA was published in February 2008, a few months after the WCAG working group had published its second Last Call Working Draft of WCAG 2.0. At the time, it was not possible to use WAI-ARIA attributes in HTML 4.01 or XHTML documents without causing validation errors; this type of attributes were still essentially custom attributes. The first public working draft of HTML 5 was published in January 2008 and did not address WAI-ARIA or custom attributes. The WCAG did not want to hamper the use of WAI-ARIA or other attributes that would improve accessibility. For this reason, success criterion 4.1.1 does not say what types of attributes are allowed or disallowed.

Some have objected to explaining the phrase elements are nested according to their specifications as a syntax constraint and insisted that it is a validation constraint because the normative text does not support [this] interpretation. However, the “validation interpretation” of the criterion introduces an inconsistency into the success criterion, since it requires only correct syntax at the level of attributes and validation at the level of element nesting, in other words two different types of constraints for two features of the same markup languages.
The “syntax interpretation”, by contrast, makes the success criterion internally consistent (same type of constraint for both attributes and element nesting), consistent with the criterion's name (“Parsing”) and with the (non-normative) Understanding Success Criterion 4.1.1: Parsing (for WCAG 2.0). If inconsistency in the requirements for attributes and element nesting had been intended, the Understanding document would have pointed this out (but it does no such thing).

Historical note: XHTML 2.0 would have supported the use of WAI-ARIA roles through its Role Attribute Module. See also XHTML Role Attribute Module (Working Group Note, December 2010 and Role Attribute 1.0 (W3C Recommendation, March 2013). However, the 16 December 2010 version of XHTML 2.0 was the last draft of that specification. XHTML 2.0 was retired in 2010 because browser developers preferred to support HTML 5.

The section Notes on the History of Success Criterion 4.1.1 attempts to reconstruct the history of the success criterion based on publicly available sources.

Test Pages

Below is a list of test pages, some of which contain errors according to the W3C's Validation Service. However, not all issues flagged by the validator are failures of success criterion 4.1.1.

Other test pages:

Validator Errors and Warnings

Even American choppers argue about success criterion 4.1.1. (Expand this summary to read the long description.)

American chopper argument meme: five stills from a video showing two choppers in an office who are having an argument.

The first chopper says, You should stop reporting HTML validation errors as SC 4.1.1 violations! The other chopper angrily insists, WCAG says elements must be nested according to their specifications!! The first chopper angrily shouts back, That's about syntactic nesting, not content models!! The other chopper hurls a chair through the office shouting, How dare you tell me I've been misreading SC 4.1.1 for 14 years! The first chopper, ready to exit the office, shouts back, You should have read the Understanding doc!

Accessibility audits often rely on tools such as the W3C's Validation Service to identify violations of success criterion 4.1.1. Examples of validator errors or warnings that are failures of the success criterion include the following:

The following errors and warnings may be failures of success criterion 4.1.1.

The following errors and warnings are not failures of success criterion 4.1.1. Some of them are violations (or potential violations) of other success criteria, whereas others are not relevant to any success criterion.

How to Check Conformance to SC 4.1.1

Steve Faulkner's articles WCAG 2.0 Parsing Criterion is a PITA (TPGi, 20.11.2015) and WCAG 2.1 parsing error bookmarklet – updated 25th February 2019 desribe why conformance to SC 4.1.1 is difficult to check and provides an approach: use the W3C Markup Validation Service, then use a bookmarklet to filter the errors and warnings output by the validator. The remaining errors are to be considered violations of the success criterion. The main issue with the bookmarklet is that it retains many errors that are strictly violations of content models rather than syntax issues. Another issue is that the bookmarklet code on GitHub looks unmaintained: as of November 2022 it has open issues dating back to November 2019; newer warnings issues by the validator are not taken into account.

Another approach, suggested elsewhere, consists in using parsetree.validator.nu. This service seems to reliably find syntax errors without reporting validaion errors. However, this has several drawbacks. First, it is an undocumented service; neither About Validator.nu nor The Validator.nu HTML Parser mention this service, nor how to run it elsewhere. It is not mentioned in the validator wiki on GitHub either. Second, it outputs a represeation of an entire parse tree, whereas users would only be interested in the presence of syntax issues, which are listed only at the very end of the results page. Finally, it does not catch duplicate IDs, since that is not a syntax issue, so anothe tool would still be needed to check IDs.

For these reasons, it seems better to adapt Steve Faulkner's bookmarklet so it filters out all validation errors and retains only those issues that really are violations of the success criterion. The WCAG syntax only bookmarklet is an adapted version of Steve Faulkner's WCAG parsing only bookmarklet that filters more types of errors and warnings. Validator messages caused by incorrect content models, which are not syntax issues, are now no longer shown after using the bookmarklet. The full source code (not minified) is available elsewhere on this website and in the site's GitLab repository.

Notes on the History of Success Criterion 4.1.1

Even though I was a member of the WCAG Working Group when this success criterion was discussed and formulated, people sometimes want to know if there is any evidence for my explanation of the success criterion, especially the condition elements are nested according to their specifications. The history of the success criterion has beome difficult to reconstruct, because when discussion about it began, the working group's minutes recorded only resolutions and to-do items (and some notes and links that were deemed important enough) instead of full discussions, and because Bugzilla bugtrackers—both the W3C public bugzilla and the WCAG Working Group's Bugzilla hosted at the Trace Research & Development Center (then still at the University of Wisconsin at Madison)—are no longer online. Publicly available evidence is based on those WCAG Working Group meeting minutes that are still publicly accessible (not all of them are) and on posts to the working group's mailing list. Below are a few pointers retrieved in mid June 2022, especially references to the concept of well-formedness.

  1. WCAG WG meeting minutes, 23 June 2005. The wording in the November 2004 draft requires more discussion.
    Resolution: leave all of the SC out of GL 4.1 and have an ed note stating the problem and pointing people to an external page describing the problem in more detail along with our current proposals. Ed note will invite comment and comments can be added to the "problem/proposal" page. This editorial note was added to the 30 June 2005 working draft; the external page Validity and Accessibility. The document states that although HTML is a SGML application and SGML does not have the concept of well-formedness, it does seem like a reasonable approach to say "make sure the DOM can be consistently formed" and refer to well-formedness in XML. The sub-group should also investigate PDF, Flash, and other formats and propose a parallel criterion. The section “Well-formed and valid” contains additional notes that show that well-formedness, or something similar that would translate to non-XML documents, was what some working group members wanted to see at Conformance Level A (still called Level 1 at the time).
    The minutes also contain a link to a discussion about semantics versus well-formedness on the working group's mailing list (20 June 2005) that discusses SGML, XML and well-formedness. The starting point of the discussion thread was a question about the mention of well-formedness in a draft for Guideline 4.1. See, for exammple, Well-formed (was: Re: F2F Proposed Resolutions Draft Updates) (17.06.2005): In Brussels, we tried to define a level of "correctness" that is lower than validity and that still makes sense for formats that are not based on XML. It is true that SGML does not define well-formedness, but if you say that a well-formed document is essentially "one that can unambiguously be parsed to create a logical tree in memory" (Jon Bosak, at http://www.isgmlug.org/n3-1/n3-1-18.htm), then you can apply this concept also to SGML. See also Gez Lemon's response with arguments against the term “well-formed”.
  2. WCAG WG meeting minutes, 6 October 2005: editors are doing 4.1.
  3. WCAG WG meeting minutes, 10 November 2005: after a discussion on SGML content models, the working group accepts the following wording for SC 4.1.1: Delivery units can be parsed unambiguously. The following definition of parsing was accepted along with it: Parsing transforms markup or other code into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Parsing unambiguously means that there is only one data structure that can result.
    This wording was published in the 23 November 2005 working draft. This wording is clearly based on syntax and does not require valid content models. For XML, well-formedness would essentially give the same result. A requirement for the uniquenesss of identifiers is not yet present at this stage.
  4. Charles McCathieNevile: [last call] Unambiguous parsing, 23.06.2006. In this comment on the (first) last call working draft, Charles McCathieNevile suggested to return to the wording from WCAG 1.0: conforms to formal published grammars.
  5. WCAG WG meeting minutes, 17 November 2005: resolutions related to a draft of How to meet SC 4.1.1 (referred to as “Guide doc” in the minutes). Quotes from the minutes: accept unanimously wendy's rewording SC 4.1.1 "delivery units can be parsed unambiguously and the relationships in the resulting data structure are also unambiguous" and … ensuring that the delivery unit is well-formed AND that unique ids are specified adopted (the latter referring to two proposed techniques).
    Titles for proposed techniques: Ensuring that unique ids are specified AND that opening and closing tags of all elements can be parsed unambiguously (for HTML-based content) and Ensuring that the delivery unit is well-formed AND that unique ids are specified (for XML-based content).
  6. WCAG WG meeting minutes, 2 December 2005: proposed changes to 4.1 and 4.2: http://lists.w3.org/Archives/Public/w3c-wai-gl/2005OctDec/att-0592/4.1and4.2Proposal.htm. (No change for SC 4.1.1, nor anything relevant to its understanding.)
  7. WCAG WG meeting minutes, 25 October 2007: Issue 2134, 2218: SC 4.1.1 and markup languages. Due to the WCAG bug tracker no longer being available, the content of the resolution can no longer be retrieved.
    The next public working draft, dated 11 December 2007 contained the following wording of SC 4.1.1: Content implemented using markup languages has elements with complete start and end tags except as allowed by their specifications, the elements are nested according to their specifications, and any IDs are unique. This wording was introduced after the 27 April 2006 working draft, which contained the following wording: Web units or authored components can be parsed unambiguously, and the relationships in the resulting data structure are also unambiguous.
  8. Understanding Success Criterion 4.1.1: Parsing (for WCAG 2.0) also discusses how well-formedness informed the wording of the success criterion: Note: The concept of "well formed" is close to what is required here. However, exact parsing requirements vary amongst markup languages, and most non XML-based languages do not explicitly define requirements for well formedness. Therefore, it was necessary to be more explicit in the success criterion in order to be generally applicable to markup languages. Because the term "well formed" is only defined in XML, and (because end tags are sometimes optional) valid HTML does not require well formed code, the term is not used in this success criterion.
    See also the note for Understanding Success Criterion 4.1.1: Parsing (for WCAG 2.0) contains the same language.
    The “Intent” section uses the “data structure” concept that had been used in earlier version of the success criterion: If the content cannot be parsed into a data structure, then different user agents may present it differently or be completely unable to parse it. Some user agents use "repair techniques" to render poorly coded content. This is based on syntax rather than validation.

A Proposal to Rephrase Success Criterion 4.1.1

Since many accessibility testers and other accessibility experts erroneously interpret the phrase elements are nested according to their specifications as referring to content models instead of syntactical nesting, a rewording that avoids this misunderstanding is highly desirable. Below is a proposed rewording. (This rewording was submitted as WCAG issue 2525 on 22.06.2022.)

In content implemented using markup languages, the following are true, except where the specifications for the markup languages being used allow exceptions to these requirements:

  1. elements have complete start and end tags,
  2. elements are nested according to the syntactical rules of their specifications,
  3. elements do not contain duplicate attributes, and
  4. any IDs are unique.

Note: Start and end tags that are missing a critical character in their formation, such as a closing angle bracket or a mismatched attribute value quotation mark are not complete.

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

Note: When a scripting language is used to manipulate elements or attributes (or both) in the Document Object Model, the resulting in-memory representation is still regarded as content implemented using markup languages.

Description of the changes: The second condition has been reworded to highlight syntactical correctness (as opposed to validity of content models). The second note, which is new, draws attention to this. The first note is identical to the note in the WCAG 2.1 recommendation from June 2018.
The numbering is new; the phrase the following are true is copied from other success criteria, such as SC 1.2.1 and SC 2.2.2.
The third note addresses an issue unrelated to the distinction between correct syntax and validity.

Compatibility with existing versions of WCAG 2 and EN 301 549: Since the proposed rewording results in a requirement that is less strict than the interpretation of most accessibility testers, all documents that pass the current version of SC 4.1.1 should also pass its proposed rewording. In this sense, the proposed rewording is compatible with the current version.
Clause 9.4.1.1 of EN 301 549 says, Where ICT is a web page, it shall satisfy WCAG 2.1 Success Criterion 4.1.1 Parsing. Unless the editors of EN 301 549 want to retain the current version of the success criterion, no rewording of clause 9.4.1.1 is needed beyond, at some future point in time, an update of the referenced version of WCAG.

The intent of this rephrasing is not to “defend” the many types of validation errors that accessibility testers flag using this success criterion. My intent is merely to eliminate a common misunderstanding about what the success criterion actually means. If non-syntactical validation errors which impact accessibility are found, these should be caught either by existing success criteria or by new ones that still need to be created.

Questions and Statements about the SC's Meaning

Questions and discussions about how the success criterion should be interpreted:

Discussions on How to Check Conformance

Discussions about how to check conformance to this criterion and its potential removal from future versions of WCAG:

Removing SC 4.1.1 from WCAG

In January 2023, the Accessibility Guidelines Working Group decided to remove success criterion 4.1.1 from WCAG 2.2; see Re: CFC - Removing 4.1.1 Parsing from WCAG 2.2 on the working group's mailing list (12 January 20223). The GitHub issue Do not remove 4.1.1 Parsing from WCAG 2.2 was closed as a consequence of this decision. The GitHub issue Proposal to Rephrase Success Criterion 4.1.1 remained open, but with a note saying, Noting that this has been removed this from WCAG 2.2, but leaving open for potential updates to 2.1/2.0.

Success criterion 4.1.1 was removed from the Candidate Recommendation Draft of 25 January 2023.

For reactions and feedback on this decision, see, for example, the following resources:

The Accessibility Guidelines Working Group also discussed what to do with success criterion 4.1.1 in WCAG 2.0 and WCAG 2.1.