On this page
Blocks with JavaScript written on them. Underneath: "Escape HTML string". Image by Alltechbuzz_net from Pixabay.

Escape HTML string tags in JavaScript

Learn how to escape HTML string tags in JavaScript to prevent XSS attacks and ensure secure coding practices.

To escape HTML string tags in JavaScript, you need to replace certain special characters with their corresponding HTML entities. This is crucial for preventing issues like Cross-Site Scripting (XSS) attacks and ensuring that user-generated content is displayed correctly without being interpreted as HTML.

Why escape HTML?

Escaping HTML is a security best practice that helps prevent Cross-Site Scripting (XSS) attacks. XSS attacks occur when an attacker injects malicious code, usually JavaScript, into a website, allowing them to steal user data, take control of the user’s session, or perform other malicious actions.

What happens when HTML is not escaped?

When user-input data is not properly escaped, an attacker can inject malicious HTML code, which can lead to:

  • JavaScript injection: an attacker can inject malicious JavaScript code, which can be executed by the browser, allowing them to access sensitive data or take control of the user’s session.
  • HTML injection: an attacker can inject malicious HTML code, which can alter the structure and content of a web page, potentially leading to phishing attacks or other malicious activities.

How does escaping HTML prevent XSS attacks?

Escaping HTML converts special characters in user-input data into their corresponding HTML entities, making it impossible for an attacker to inject malicious code.

For example:

  • < (less than) becomes &gt;
  • > (greater than) becomes &gt;
  • & (ampersand) becomes &amp;
  • " (double quote) becomes &quot;
  • ' (single quote) becomes &#039;
  • ` (backtick) becomes &#096;

By escaping HTML, you ensure that user-input data is treated as plain text, rather than executable code, preventing XSS attacks and protecting your users’ sensitive data.

When to escape HTML?

You should escape HTML whenever you’re working with user-input data that will be displayed on a web page, such as:

  • User comments or feedback.
  • User profiles or bio data.
  • Search query results.
  • Form input data.
  • Dynamic content.
  • Data from external sources.
  • HTML attributes.
  • Displaying special characters.

In general, escaping HTML is crucial whenever you’re working with untrusted or user-provided data that will be rendered as HTML to prevent XSS (Cross-Site Scripting) attacks.

Best practices for secure HTML escaping

Escape on output, not on input

It’s generally recommended to escape HTML just before rendering the data to the page. This allows you to store the raw, unescaped data in your database, preserving its original format.

Use a trusted escaping mechanism

Use a well-tested and widely-accepted escaping mechanism, such as a library or better a browser’s built-in HTML parsing and escaping functionality.

Avoid manual replacement

Avoid manually replacing special characters with their escaped equivalents, as this can be error-prone and lead to security vulnerabilities.

Test your escaping mechanism

Thoroughly test your escaping mechanism using unit tests to ensure it correctly handles various input scenarios and edge cases, such as:

  • Valid and invalid input.
  • Special characters and Unicode characters.
  • Nested HTML elements and attributes.
  • Edge cases, such as empty strings or null values.

Consider Content Security Policy (CSP)

CSP is a browser security mechanism that can help mitigate XSS attacks by restricting the sources from which scripts can be loaded. While not a replacement for proper escaping, it provides an additional layer of defense.

For example, you could create a CSP that specifies which sources of content are allowed. This can include scripts, styles, images, and more.

CSP example for the Apache server
Content-Security-Policy: default-src 'self'; script-src 'self' https://trusted.cdn.com; img-src 'self' data:;

How to escape an HTML string in JavaScript?

In JavaScript, escaping HTML strings can be achieved through character replacement, manually swapping special characters with their HTML entities, or more elegantly, using a tagged template literal function for a cleaner, more readable approach.

Character replacement

This is a straightforward approach where you replace specific characters with their HTML entity equivalents.

Escape HTML by replacing specific characters with their HTML entity equivalents
function escapeHtmlByReplacingCharacters(str) {
  if (typeof str !== 'string') {
    return '';
  }

  const escapeCharacter = (match) => {
    switch (match) {
      case '&': return '&amp;';
      case '<': return '&lt;';
      case '>': return '&gt;';
      case '"': return '&quot;';
      case '\'': return '&#039;';
      case '`': return '&#096;';
      default: return match;
    }
  };

  return str.replace(/[&<>"'`]/g, escapeCharacter);
}

Tagged template literal function

Tagged template literals in JavaScript are a powerful feature that allows you to create custom string processing functions. When it comes to escaping HTML special characters, you can create a tagged template literal function that automatically handles this for you.

Here is the code implementation:

Escape HTML by tagged template literal function
function escapeHtml(strings, ...arguments) {
  const div = document.createElement('div');
  let output = strings[0];
  const args = arguments.entries();

  for (const [i, arg] of args) {
    div.innerText = arg;

    output += div.innerHTML;
    output += strings[i + 1];
  }

  return output;
}

// Example usage
console.log(escapeHtml`<br> ${'<br>'}`)

The provided function escapeHtml is a tagged template literal function that escapes HTML special characters in a string. It achieves this by creating a temporary div element, setting its innerText to the value to be escaped, and then retrieving the escaped innerHTML.

In JavaScript, when you define a function with a name followed by a set of parentheses, it’s a regular function definition. However, when you define a function with a name followed by a template literal tag (e.g., escapeHtml), it’s a tagged template literal function.

Tagged template literal functions are called with a template string, which is a string that contains embedded expressions enclosed in ${}. The function receives the template string parts as an array of strings, and the embedded expressions as separate arguments.

In the case of escapeHtml, it’s defined to accept a template string with embedded expressions, like this:

escapeHtml`<br> ${'<br>'}`

The backticks (``) denote the template string, and the ${} syntax is used to embed an expression. The escapeHtml function is then called with the template string parts and the embedded expression as separate arguments.

If you try to call escapeHtml with brackets or parentheses, like this:

escapeHtml(['<br> ${<br>}'], '<br>');

or

escapeHtml(['<br> ', ''], '<br>');

It won’t work as expected, because the function is designed to work with tagged template literals.

Comparison of the provided code examples

The tagged template literal function (escapeHtml) automatically escapes any dynamic content (e.g., variables) passed into it. This ensures that all user-provided or dynamic data is sanitized before being inserted into the output, reducing the risk of XSS attacks. This is a clever and effective way to leverage the browser’s built-in HTML escaping capabilities.

In contrast, the traditional replacement function (escapeHtmlByReplacingCharacters) requires you to explicitly call the function on every string you want to escape, which can lead to errors if you forget to escape a particular variable.

You can also check the performance for both solutions in the test Escape HTML string tags in JavaScript.

In summary

Tagged template literals offer a more readable, maintainable, and secure way to escape HTML special characters, especially when you’re dealing with user-provided content. They reduce the risk of errors and make your code easier to understand and modify. While the character-replacing method is functional, the tagged template literal approach is generally preferred for its advantages in terms of code quality, performance and security.

Related posts

Comments

Leave a Reply

Search in sitelint.com

Struggling with writing tests for accessibility?

You can cut 80% of the time you spend writing tests using SiteLint.