On this page
Man in a hood on the background of green letters (matrix). On the back there is an inscription: "JavaScript Capitalize First Letter"

Get and capitalize the first letter of a string in JavaScript

Learn how to capitalize the first character of a string in JavaScript correctly, handling Unicode characters.

How to capitalize the first letter of a string in JavaScript?

To capitalize the first character of a string in JavaScript while properly handling Unicode characters, you can use the String.prototype.charAt(), String.fromCodePoint() and String.prototype.substring() methods. This ensures that the first character is transformed to uppercase while the rest of the string remains unchanged, accommodating various Unicode character sets effectively, specifically surrogate pairs.

Surrogate pairs are crucial in Unicode because they allow the encoding of characters that fall outside the Basic Multilingual Plane (BMP), which includes code points above U+FFFF. In UTF-16 encoding, these characters are represented using two 16-bit code units: a high-surrogate and a low-surrogate. This mechanism is essential for supporting a wide range of characters, including many emojis and rare scripts, ensuring that applications can display and manipulate all Unicode characters effectively.

Without surrogate pairs, many characters would be inaccessible, limiting the representation of global languages and symbols in digital formats. For example, the character for the “grinning face with smiling eyes” emoji (😄) has a code point of U+1F604. In UTF-16, this is represented as the surrogate pair 0xD83D and 0xDE04.

How are strings represented in JavaScript?

In JavaScript, strings are represented as sequences of UTF-16 code units. UTF-16 is a character encoding capable of encoding all 1,112,064 valid code points in Unicode using one or two 16-bit code units. The first 65,536 code points, known as the Basic Multilingual Plane (BMP), can be represented by a single 16-bit code unit. Characters outside the BMP, known as astral characters, must be represented using two 16-bit code units, a high-surrogate code unit and a low-surrogate code unit. This pair of code units is known as a surrogate pair.

  • High-surrogate code units have values between 0xD800 and 0xDBFF.
  • Low-surrogate code units have values between 0xDC00 and 0xDFFF.

When dealing with strings that contain astral characters, it’s crucial to handle surrogate pairs correctly.

Getting and capitalizing the first character of a string in JavaScript

The below function capitalize the first character of a given text string while handling special Unicode characters correctly.

Capitalize the first character of a given text string using JavaScript
function firstCharacterToUpperCase(text) {
  if ("\uD83D\uDE00".length === 1) {
    return text.charAt(0).toLocaleUpperCase() + text.substring(1);
  }
  
  const firstChar = text.codePointAt(0);

  if (firstChar === undefined) {
    return text;
  }

  const index = firstChar > 0xFFFF ? 2 : 1;

  return String.fromCodePoint(firstChar).toLocaleUpperCase() + text.substring(index);
}

Let’s break down the code:

  • The function takes a text parameter, which represents the input string to be processed.
  • The first if statement checks if the length of the Unicode character \uD83D\uDE00 is equal to 1. This is used to determine if the browser supports UTF-32. If this condition is met, it capitalizes the first character using text.charAt(0).toLocaleUpperCase() and concatenates it with the rest of the string starting from index 1 using text.substring(1).
  • If the above condition is not met, the function then determines whether the code point for the first character is greater than 0xFFFF (which indicates a surrogate pair) and adjusts the index for the substring method accordingly.
  • Finally, it capitalizes the first character using String.fromCodePoint(firstChar).toLocaleUpperCase() and concatenates it with the rest of the string starting from either index 1 or 2 based on whether it’s a surrogate pair.
The toLocaleUpperCase() method is used to ensure that the conversion respects any locale-specific case mappings, especially for languages like Turkish where the default Unicode case mappings may not be appropriate.

The code could be more optimized, if needed, but less readable, by wrapping in an immediately invoked function expression (IIFE) and testing for UTF-32 support only once. The optimized version of getting and capitalizing the first letter code is included in the CodePen below.

You can also play and use The Surrogate Pair Calculator.

Practical example

Related posts

Comments

Leave a Reply

Search in sitelint.com

Is your site slow?

Discover performance bottlenecks to provide a better customer experience and meet your company’s revenue targets.

Real-user monitoring for Accessibility, Performance, Security, SEO & Errors (SiteLint)