
Removing Angular-specific data from HTML
Clean Angular HTML easily: remove framework-specific attributes, comments, and whitespace for pure markup.
When working with Angular applications, the rendered HTML often contains framework‑specific attributes such as _ngcontent, ng-*, or data-ng-*.
These attributes are essential for Angular’s internal mechanics, but they can clutter the markup when you want to:
- Export or snapshot HTML for testing
- Compare DOM structures
- Generate framework‑agnostic reports
- Ensure clean output for accessibility or SEO audits
To solve this, we’ll use a utility method that parses HTML, removes Angular-specific attributes, strips comments, and cleans up insignificant whitespace – leaving only meaningful content.
The utility method
/**
* Removes Angular-specific attributes, HTML comments, and insignificant whitespace
* from HTML string while preserving meaningful content.
*
* @param html Valid HTML string. Passing null/undefined throws TypeError.
* @throws {TypeError} If html is not a string.
* @returns Cleaned HTML string.
*/
public static getCleanHtml(html: string): string {
if (typeof html !== 'string') {
throw new TypeError('html must be a string');
}
if (html.trim().length === 0) {
return '';
}
const parser: DOMParser = new DOMParser();
const doc: Document = parser.parseFromString(html, 'text/html');
const container: HTMLElement = doc.body;
if (!container.firstChild) {
return '';
}
const walker: TreeWalker = doc.createTreeWalker(
container,
NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT
);
// Nodes are collected and removed after traversal to avoid invalidating the TreeWalker while iterating.
const nodesToRemove: Node[] = [];
while (true) {
const node: Node = walker.nextNode();
if (node === null) break;
switch (node.nodeType) {
case Node.COMMENT_NODE:
nodesToRemove.push(node);
break;
case Node.ELEMENT_NODE: {
const element = node as Element;
const attrsToRemove: Attr[] = [];
for (const attr of Array.from(element.attributes)) {
if (/^(?:_ng(?:content|host)|ng-|data-ng-)/.test(attr.name)) {
attrsToRemove.push(attr);
}
}
for (const attr of attrsToRemove) {
element.removeAttributeNode(attr);
}
break;
}
case Node.TEXT_NODE: {
const textNode = node as Text;
const content = textNode.textContent;
// Whitespace is only removed at the top level to avoid collapsing meaningful spacing inside inline elements.
if (content && !content.trim() && textNode.parentNode === container) {
nodesToRemove.push(textNode);
}
break;
}
}
}
for (const node of nodesToRemove) {
if (node.parentNode) {
node.parentNode.removeChild(node);
}
}
let cleanedHtml = container.innerHTML;
// Prevent adjacent tags from collapsing visually (e.g., </div><span>)
cleanedHtml = cleanedHtml.replace(/></g, '> <');
return cleanedHtml.trim();
}Example
Input HTML:
<div _ngcontent-c0 ng-reflect-name="example">
<!-- Angular comment -->
<span> Hello World </span>
</div>Output after cleaning:
<div>
<span> Hello World </span>
</div>Why this matters
- Cleaner snapshots: useful for testing frameworks like Jest or Cypress.
- Accessibility audits: removes noise so tools can focus on meaningful content.
- SEO and reporting: produces framework‑agnostic HTML for analysis.
- Portability: makes markup easier to reuse outside Angular.
Node.js and Jest limitations note
This implementation uses DOMParser, which is a browser-only API and is not available in Node.js environments.
What this means:
- Tests running in Jest (Node.js) will throw
ReferenceError: DOMParser is not defined. For Jest tests, you may want to use JSDOM, which provides a browser-like DOM implementation. Jest already ships with JSDOM by default (unlesstestEnvironment: "node"is set). - Server-side rendering (SSR) scenarios will fail.
- Any non-browser JavaScript runtime will encounter this issue.
Security note
What it does not do:
- Remove malicious
scripttags. - Sanitize dangerous URL protocols (
javascript:,data:). - Remove event handlers (
onclick,onerror, etc.). - Protect against SVG-based XSS or other advanced injection techniques.
Intended use only for:
- Cleaning trusted Angular framework output.
- Removing Angular-specific attributes from controlled HTML.
- Use cases where HTML source is already sanitized by Angular.
Closing thought
Angular’s attributes are powerful inside the framework, but they don’t belong in exported or analyzed HTML. With a simple utility like getCleanHtml, you can strip away the noise and focus on what really matters: the content.