On this page
Binary code with text "XPath" on it

Get XPath from the element using JavaScript

Learn how to extract XPath from any DOM element using JavaScript, enhancing your web scraping and automation skills.

To get the XPath of an element using JavaScript, you can use an approach that traverses up the DOM tree from the target element, constructing the XPath string as it goes.

Custom function to get XPath

The provided TypeScript function getXpath is designed to generate an XPath expression that uniquely identifies a given DOM element (el). It traverses up the DOM tree from the starting element, collecting information about its type and any siblings of the same type, to construct a relative XPath expression.

Note that this function is also SVG friendly by using *[name()='svg'] for namespace sensitivity and it is very useful in XML-heavy contexts.

Here is an implementation written in TypeScript (superset of JavaScript):

Get XPath from the element using TypeScript
function getXpath(el: Document | Element | DocumentFragment) {
    let element: Element | (Node & ParentNode) = el;
    let parent: Element | (Node & ParentNode) | null;
    let sames: Node[];
    let elementType: number;
    let result = '';

    const escapeXPath = (name: string): string => name.replace(/([:*])/g, '\\$1'); // Escapes colons and asterisks

    const filterNode = (_node: Node): void => {
      if (_node.nodeName === element.nodeName) {
        sames.push(_node);
      }
    };

    if (!(element instanceof Node)) {
      return result;
    }

    parent = element.parentNode ?? (element.ownerDocument ?? null);

    while (parent !== null) {
      elementType = element.nodeType;
      sames = [];

      try {
        parent.childNodes.forEach(filterNode);
      } catch {
        break;
      }

      const nodeNameEscaped: string = escapeXPath(element.nodeName);

      switch (elementType) {
        case Node.ELEMENT_NODE: {
          const nodeName: string = nodeNameEscaped.toLowerCase();
          const isSVG = (element as Element).namespaceURI === "http://www.w3.org/2000/svg";
          const name: string = isSVG
            ? `*[name()='${nodeName}']`
            : nodeName;
          const sameNodesCount: string = `[${[].indexOf.call(sames, element as never) + 1}]`;
          result = `/${name}${sames.length > 1 ? sameNodesCount : ''}${result}`;
          break;
        }

        case Node.TEXT_NODE: {
          const textNodes: ChildNode[] = Array.from(parent.childNodes).filter(n => n.nodeType === Node.TEXT_NODE);
          const index: number = element instanceof Node && 'remove' in element
            ? [].indexOf.call(sames, element as never) + 1
            : 1;

          result = `/text()${textNodes.length > 1 ? `[${index}]` : ''}${result}`;
          break;
        }

        case Node.ATTRIBUTE_NODE: {
          result = `/@${nodeNameEscaped.toLowerCase()}${result}`;
          break;
        }

        case Node.COMMENT_NODE: {
          const index: number =
            Array.from(parent.childNodes).filter(n => n.nodeType === Node.COMMENT_NODE).indexOf(element as never) + 1;
          result = `/comment()[${index}]${result}`;
          break;
        }

        case Node.PROCESSING_INSTRUCTION_NODE: {
          result = `/processing-instruction('${nodeNameEscaped}')${result}`;
          break;
        }

        case Node.DOCUMENT_NODE: {
          result = `/${result}`;
          break;
        }

        default:
          break;
      }

      element = parent;
      parent = element.parentNode ?? (element.ownerDocument ?? null);
    }

    return `.//${result.replace(/^\//, '')}`;
}

Breakdown of how it works

This getXpath function dynamically constructs a precise XPath expression for any given DOM node, which can be particularly helpful in automation scripts or tools that interact with web pages programmatically. It walks up the DOM tree from the target node, analyzing its type – whether it’s an element, attribute, text node, or comment node – and appends a path fragment for each ancestor. This incremental construction ensures accuracy and allows the expression to recreate the node’s exact location in the document structure.

It accounts for common and specialized scenarios, including namespaced elements like SVGs (using *[name()='svg']), sibling indexing for disambiguation, and escaping special characters in node names. The result is XPath that can be reliably evaluated using document.evaluate. The function includes also safety checks for detached nodes, support for PROCESSING_INSTRUCTION_NODE and DOCUMENT_NODE types, and thoughtful formatting that begins the path with .// for compatibility with XPath engines.

Additional tools

There are several tools that can be handy when working with XPath:

Related posts

Comments

Leave a Reply

Search in sitelint.com

Audit and debug pages with browser extension

Boost your website’s quality by auditing your page with SiteLint, a chromium-based extension that improves accessibility, quality, technical SEO, and provides easy-to-understand reports to help you prioritize and fix issues.