
Capitalize the first letter in JavaScript
Learn how to capitalize the first character of a string in JavaScript correctly, handling Unicode characters.
How to capitalize the first letter of a string in JavaScript?
To capitalize the first character of a string in JavaScript while properly handling Unicode characters, you can use the String.prototype.charAt()
, String.fromCodePoint()
and String.prototype.substring()
methods. This ensures that the first character is transformed to uppercase while the rest of the string remains unchanged, accommodating various Unicode character sets effectively, specifically surrogate pairs.
Surrogate pairs are crucial in Unicode because they allow the encoding of characters that fall outside the Basic Multilingual Plane (BMP), which includes code points above U+FFFF
. In UTF-16 encoding, these characters are represented using two 16-bit code units: a high-surrogate and a low-surrogate. This mechanism is essential for supporting a wide range of characters, including many emojis and rare scripts, ensuring that applications can display and manipulate all Unicode characters effectively.
Without surrogate pairs, many characters would be inaccessible, limiting the representation of global languages and symbols in digital formats. For example, the character for the “grinning face with smiling eyes” emoji (😄) has a code point of U+1F604
. In UTF-16, this is represented as the surrogate pair 0xD83D
and 0xDE04
.
How are strings represented in JavaScript?
In JavaScript, strings are represented as sequences of UTF-16 code units. UTF-16 is a character encoding capable of encoding all 1,112,064 valid code points in Unicode using one or two 16-bit code units. The first 65,536 code points, known as the Basic Multilingual Plane (BMP), can be represented by a single 16-bit code unit. Characters outside the BMP, known as astral characters, must be represented using two 16-bit code units, a high-surrogate code unit and a low-surrogate code unit. This pair of code units is known as a surrogate pair.
- High-surrogate code units have values between
0xD800
and0xDBFF
. - Low-surrogate code units have values between
0xDC00
and0xDFFF
.
When dealing with strings that contain astral characters, it’s crucial to handle surrogate pairs correctly.
Getting and capitalizing the first character of a string in JavaScript
The below function capitalize the first character of a given text string while handling special Unicode characters correctly.
function firstCharacterToUpperCase(text) {
// Use if ("\uD83D\uDE00".length === 1) when your requirement is to work without native support for String.prototype.isWellFormed
if (text.length === 0 || text.isWellFormed() === false) {
return text.charAt(0).toLocaleUpperCase() + text.substring(1);
}
const firstChar = text.codePointAt(0);
if (firstChar === undefined) {
return text;
}
const index = firstChar > 0xFFFF ? 2 : 1;
return String.fromCodePoint(firstChar).toLocaleUpperCase() + text.substring(index);
}
Key features
- Unicode support:
- Properly handles characters beyond the Basic Multilingual Plane.
- Correctly manages surrogate pairs. Using
isWellFormed()
method to verify proper Unicode composition. - Preserves Unicode integrity during transformation.
- Error handling:
- Gracefully handles empty strings.
- Manages malformed Unicode sequences.
- Provides fallback for unsupported scenarios.
The code could be more optimized, if needed, but less readable, by wrapping in an immediately invoked function expression (IIFE) and testing for UTF-32 support only once. The optimized version of getting and capitalizing the first letter code is included in the CodePen below.
You can also play and use The Surrogate Pair Calculator
.
Comments