On this page
Man in a hood on the background of green letters (matrix). On the back there is an inscription: "JavaScript Capitalize First Letter"

Get and capitalize the first letter of a string in JavaScript

Learn how to capitalize the first character of a string in JavaScript correctly, handling Unicode characters.

To covert a first letter in the string to upper case we need to use toLocaleUpperCase and taking the surrogate pairs into account.

How strings are represented in JavaScript?

In JavaScript, strings are represented as sequences of UTF-16 code units. UTF-16 is a character encoding capable of encoding all 1,112,064 valid code points in Unicode using one or two 16-bit code units. The first 65,536 code points, known as the Basic Multilingual Plane (BMP), can be represented by a single 16-bit code unit. Characters outside the BMP, known as astral characters, must be represented using two 16-bit code units, a high-surrogate code unit and a low-surrogate code unit. This pair of code units is known as a surrogate pair.

  • High-surrogate code units have values between 0xD800 and 0xDBFF.
  • Low-surrogate code units have values between 0xDC00 and 0xDFFF.

When dealing with strings that contain astral characters, it’s crucial to handle surrogate pairs correctly.

Getting and capitalizing the first character of a string in JavaScript

The below function capitalize the first character of a given text string while handling special Unicode characters correctly.

Capitalize the first character of a given text string using JavaScript
function firstCharacterToUpperCase(text) {
  if ("\uD83D\uDE00".length === 1) {
    return text.charAt(0).toLocaleUpperCase() + text.substring(1);
  }
  
  const firstChar = text.codePointAt(0);
  const index = firstChar > 0xFFFF ? 2 : 1;

  return String.fromCodePoint(firstChar).toLocaleUpperCase() + text.substring(index);
}

Let’s break down the code:

  • The function takes a text parameter which represents the input string to be processed.
  • The first if statement checks if the length of the Unicode character \uD83D\uDE00 is equal to 1. This is used to determine if the browser supports UTF-32. If this condition is met, it capitalizes the first character using text.charAt(0).toLocaleUpperCase() and concatenates it with the rest of the string starting from index 1 using text.substring(1).
  • If the above condition is not met, the function then determines whether the code point for the first character is greater than 0xFFFF (which indicates a surrogate pair) and adjusts the index for the substring method accordingly.
  • Finally, it capitalizes the first character using String.fromCodePoint(firstChar).toLocaleUpperCase() and concatenates it with the rest of the string starting from either index 1 or 2 based on whether it’s a surrogate pair.
The toLocaleUpperCase() method is used to ensure that the conversion respects any locale-specific case mappings, especially for languages like Turkish where the default Unicode case mappings may not be appropriate.

The code could be more optimized, if needed, but less readable, by wrapping in an immediately invoked function expression (IIFE) and testing for UTF-32 support only once. The optimized version of getting and capitalizing the first letter code is included in the CodePen below.

You can also play and use The Surrogate Pair Calculator.

Practical example: Capitalizing the first letter of a string

Related posts

Comments

Leave a Reply

Real-user monitoring for Accessibility, Performance, Security, SEO & Errors (SiteLint)