On this page
Keyboard with all empty keys except four keys: H, T, M, and L next to each other.

Check if a string is valid HTML using JavaScript

Discover effective way to validate HTML strings in JavaScript. Ensure correctness and efficiency with this comprehensive guide.

Validating if a string is valid HTML can be done using DOMParser API and its method parseFromString.

The DOMParser API interface allows you to parse XML or HTML source code from a string and convert it into a DOM Document. It is used to convert a string of XML or HTML into a structured DOM object that can be easily manipulated using JavaScript.

Essential requirements

The parseFromString method requires two arguments: string and mimeType.

The argument string must contain either an HTML, xml, XHTML, or svg document. The argument mimeType determines whether the XML parser or the HTML parser is used to parse the string.

Valid mime type values are:

  • text/html
  • text/xml
  • application/xml
  • application/xhtml+xml
  • image/svg+xml

How does the DOMParser interface parse HTML strings differently based on the mimeType argument?

The DOMParser interface parses HTML strings differently based on the mimeType argument. The mimeType argument determines whether the XML parser or the HTML parser is used to parse the string. The difference in parsing is that the XML parser is more strict and will return a parser error for invalid HTML, while the HTML parser is more lenient and will try to interpret the string as HTML even if it contains errors.

Practical example

See the check if a string is valid HTML using JavaScript example. Enter some HTML into the textarea and activate the submit button to determine if the provided string is valid HTML.

Notice that different mime types give different results in the validation.

Code

Here are two version of the code: TypeScript and JavaScript. We need also catch the errors.

When using the XML parser with a string that doesn’t represent well-formed XML, the XMLDocument returned by parseFromString will contain a <parsererror> node describing the nature of the parsing error.

Additionally, the parsing error may be reported to the browser’s JavaScript console and you should not use this for any kind of validation, sanitation, or XSS checks.

The function isStringValidHtml returns an object with the following properties:

  • isParseErrorAvailable – a boolean that determines if <parsererror> element is available. true indicates that for a given mime type, the string is valid.
  • isStringValidHtml – a boolean that determines if a given string is valid HTML.
  • parsedDocument – it contains the <parsererror> content or document when <parsererror> is not available.
Check if the string is a valid HTM, TypeScript version.
public static isStringValidHtml(html: string, mimeType: string = 'application/xml'): { [key: string]: any } {
  const domParser: DOMParser = new DOMParser();
  const doc: Document = domParser.parseFromString(html, mimeType);
  const parseError: Element | null = doc.documentElement.querySelector('parsererror');
  const result: { [key: string]: any } = {
    isParseErrorAvailable: parseError !== null,
    isStringValidHtml: false,
    parsedDocument: ''
  };

  if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
    result.parsedDocument = parseError.outerHTML;
  } else {
    result.isStringValidHtml = true;
    result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
  }

  return result;
}
Check if the string is a valid HTM, JavaScript version.
function isStringValidHtml(html, mimeType) {
const domParser = new DOMParser();
  const doc = domParser.parseFromString(html, typeof mimeType == 'string' ? mimeType : 'application/xml');
  const parseError = doc.documentElement.querySelector('parsererror');
  const result = {
    isParseErrorAvailable: parseError !== null,
    isStringValidHtml: false,
    parsedDocument: ''
  };

  if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
    result.parsedDocument = parseError.outerHTML;
  } else {
    result.isStringValidHtml = true;
    result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
  }

  return result;

Example of validation error

Example of HTML string validation using JavaScript DOMParser: "error on line 1 at column 22: Extra content at the end of the document"

What a MIME type is and why it’s used in the isStringValidHtml function?

In the context of the isStringValidHtml function, the MIME type is used to tell the DOMParser object what type of document to expect. When parsing a string, the parser needs to know the format of the string in order to parse it correctly. By specifying the MIME type, we give the parser this information. For HTML strings, the MIME type would typically be text/html or application/xhtml+xml. If the MIME type is not specified, it defaults to application/xml.

How does the DOMParser API handle invalid HTML syntax?

The DOMParser API in JavaScript handles invalid HTML syntax by attempting to parse the string and creating a HTMLDocument object. If the string is not well-formed HTML, the resulting HTMLDocument object might contain a <parsererror> node, which describes the nature of the parsing error.

The DOMParser API does not fix or correct the invalid HTML. It merely attempts to parse the string and reports any errors it encounters during parsing.

Related posts

Comments

Leave a Reply

Real-user monitoring for Accessibility, Performance, Security, SEO & Errors (SiteLint)