Get and capitalize the first letter of a string in JavaScript
Learn how to capitalize the first character of a string in JavaScript correctly, handling Unicode characters.
How to capitalize the first letter of a string in JavaScript?
To capitalize the first character of a string in JavaScript while properly handling Unicode characters, you can use the String.prototype.charAt()
, String.fromCodePoint()
and String.prototype.substring()
methods. This ensures that the first character is transformed to uppercase while the rest of the string remains unchanged, accommodating various Unicode character sets effectively, specifically surrogate pairs.
Surrogate pairs are crucial in Unicode because they allow the encoding of characters that fall outside the Basic Multilingual Plane (BMP), which includes code points above U+FFFF
. In UTF-16 encoding, these characters are represented using two 16-bit code units: a high-surrogate and a low-surrogate. This mechanism is essential for supporting a wide range of characters, including many emojis and rare scripts, ensuring that applications can display and manipulate all Unicode characters effectively.
Without surrogate pairs, many characters would be inaccessible, limiting the representation of global languages and symbols in digital formats. For example, the character for the “grinning face with smiling eyes” emoji (😄) has a code point of U+1F604
. In UTF-16, this is represented as the surrogate pair 0xD83D
and 0xDE04
.
How are strings represented in JavaScript?
In JavaScript, strings are represented as sequences of UTF-16 code units. UTF-16 is a character encoding capable of encoding all 1,112,064 valid code points in Unicode using one or two 16-bit code units. The first 65,536 code points, known as the Basic Multilingual Plane (BMP), can be represented by a single 16-bit code unit. Characters outside the BMP, known as astral characters, must be represented using two 16-bit code units, a high-surrogate code unit and a low-surrogate code unit. This pair of code units is known as a surrogate pair.
- High-surrogate code units have values between
0xD800
and0xDBFF
. - Low-surrogate code units have values between
0xDC00
and0xDFFF
.
When dealing with strings that contain astral characters, it’s crucial to handle surrogate pairs correctly.
Getting and capitalizing the first character of a string in JavaScript
The below function capitalize the first character of a given text string while handling special Unicode characters correctly.
Let’s break down the code:
- The function takes a
text
parameter, which represents the input string to be processed. - The first
if
statement checks if the length of the Unicode character\uD83D\uDE00
is equal to1
. This is used to determine if the browser supports UTF-32. If this condition is met, it capitalizes the first character usingtext.charAt(0).toLocaleUpperCase()
and concatenates it with the rest of the string starting from index1
usingtext.substring(1)
. - If the above condition is not met, the function then determines whether the code point for the first character is greater than 0xFFFF (which indicates a surrogate pair) and adjusts the index for the
substring
method accordingly. - Finally, it capitalizes the first character using
String.fromCodePoint(firstChar).toLocaleUpperCase()
and concatenates it with the rest of the string starting from either index 1 or 2 based on whether it’s a surrogate pair.
The code could be more optimized, if needed, but less readable, by wrapping in an immediately invoked function expression (IIFE) and testing for UTF-32 support only once. The optimized version of getting and capitalizing the first letter code is included in the CodePen below.
You can also play and use The Surrogate Pair Calculator
.
Comments