On this page
Binary code with text "XPath" on it

Get XPath from the element using JavaScript

Learn how to extract XPath from any DOM element using JavaScript, enhancing your web scraping and automation skills.

To get the XPath of an element using JavaScript, you can use an approach that traverses up the DOM tree from the target element, constructing the XPath string as it goes.

Custom function to get XPath

The provided JavaScript and TypeScript function getXpath is designed to generate an XPath expression that uniquely identifies a given DOM element (el). It traverses up the DOM tree from the starting element, collecting information about its type and any siblings of the same type, to construct a relative XPath expression.

Get XPath from the element using JavaScript
function getXpath(el) {
  let element = el;
  let parent;
  let sames;
  let elementType;
  let result = '';

  const filterNode = (_node) => {
    if (_node.nodeName === element.nodeName) {
      sames.push(_node);
    }
  };

  if (element instanceof Node === false) {
    return result;
  }

  parent = el.parentNode;

  while (parent !== null) {
    elementType = element.nodeType;
    sames = [];
    parent.childNodes.forEach(filterNode);

    switch (elementType) {
      case Node.ELEMENT_NODE: {

        const nodeName: string = element.nodeName.toLowerCase();
        const name: string = nodeName === 'svg' ? `*[name()='${nodeName}']` : nodeName;
        const sameNodesCount: string = `[${[].indexOf.call(sames, element as never) + 1}]`;

        result = `/${name}${sames.length > 1 ? sameNodesCount : ''}${result}`;
        break;
      }

      case Node.TEXT_NODE: {
        result = `/text()${result}`;
        break;
      }

      case Node.ATTRIBUTE_NODE: {
        result = `/@${element.nodeName.toLowerCase()}${result}`;
        break;
      }

      case Node.COMMENT_NODE: {
        result = `/comment()${result}`;
        break;
      }

      default: {
        break;
      }
    }

    element = parent;
    parent = element.parentNode;
  }

  return `./${result}`;
}

Here is the same function, but written in TypeScript:

Get XPath from the element using TypeScript
function getXpath(el) {
  let element: Element | Node & ParentNode = el;
  let parent: Element | Node & ParentNode | null;
  let sames: Node[];
  let elementType: number;
  let result = '';

  const filterNode = (_node: Node): void => {
    if (_node.nodeName === element.nodeName) {
      sames.push(_node);
    }
  };

  if (element instanceof Node === false) {
    return result;
  }

  parent = el.parentNode;

  while (parent !== null) {
    elementType = element.nodeType;
    sames = [];
    parent.childNodes.forEach(filterNode);

    switch (elementType) {
      case Node.ELEMENT_NODE: {

        const nodeName: string = element.nodeName.toLowerCase();
        const name: string = nodeName === 'svg' ? `*[name()='${nodeName}']` : nodeName;
        const sameNodesCount: string = `[${[].indexOf.call(sames, element as never) + 1}]`;

        result = `/${name}${sames.length > 1 ? sameNodesCount : ''}${result}`;
        break;
      }

      case Node.TEXT_NODE: {
        result = `/text()${result}`;
        break;
      }

      case Node.ATTRIBUTE_NODE: {
        result = `/@${element.nodeName.toLowerCase()}${result}`;
        break;
      }

      case Node.COMMENT_NODE: {
        result = `/comment()${result}`;
        break;
      }

      default: {
        break;
      }
    }

    element = parent;
    parent = element.parentNode;
  }

  return `./${result}`;
}

Breakdown of how it works

Here’s a breakdown of how it works:

  • Initialization: The function starts by initializing several variables:
    • element: Holds the current DOM element being processed.
    • parent: Temporarily holds the parent node of the current element during traversal.
    • sames: An array to store sibling elements of the same type as the current element.
    • elementType: Stores the type of the current node (e.g., Node.ELEMENT_NODE, Node.TEXT_NODE, etc.).
    • result: A string builder for constructing the XPath expression.
  • Validation check: It checks if the input el is indeed a DOM node. If not, it returns an empty string immediately.
  • Traversal loop:
    • The outer loop continues until it reaches the root of the document (null), effectively moving up the DOM tree.
    • Inside the loop, it sets parent to the current element’s parent node and resets sames to collect new sibling nodes of the same type on each iteration.
  • Node type handling:
    • Depending on the type of the current node (elementType), it appends different segments to the result string:
      • For element nodes, it constructs part of the XPath based on the node’s tag name and whether there are multiple siblings of the same type.
      • For text nodes, it adds a segment indicating a text node.
      • For attribute nodes, it adds a segment indicating an attribute node.
      • For comment nodes, it adds a segment indicating a comment node.
  • XPath construction logic:
    • The core logic for constructing the XPath expression is within the switch statement. It handles different node types differently, appending relevant parts to the result string.
    • For element nodes, it considers the node’s tag name and whether there are multiple siblings of the same type. It uses the Array.prototype.indexOf method creatively to determine the position of the current element among its siblings, allowing it to append a count to the XPath if needed.
    • For other node types (text, attribute, comment), it simply appends a generic segment to the XPath.
  • Returning the result: After completing the traversal and construction of the XPath expression, it prepends a ./ to the result to indicate a relative path from the current node and returns this constructed XPath expression.

The function is useful for generating XPath expressions dynamically based on the structure of the DOM, which can be particularly helpful in automation scripts or tools that interact with web pages programmatically.

Additional tools

There are several tools that can be handy when working with XPath:

Related posts

Comments

Leave a Reply

Real-user monitoring for Accessibility, Performance, Security, SEO & Errors (SiteLint)