HTML Encoding (Character Sets)
29 March 2025 | Category: HTML
HTML encoding, also known as character encoding, specifies how characters are represented in a web page. Character sets (encoding) define how text, numbers, symbols, and other characters are stored and displayed.
1. What is Character Encoding?
Character encoding is a system that assigns a unique number (code point) to every character in a set, enabling computers to represent and manipulate text. Commonly used character encodings include UTF-8, ASCII, and ISO-8859-1.
2. Why is Character Encoding Important?
- Ensures consistent display of text across different devices and platforms.
- Avoids issues with special characters (e.g., accented letters, symbols).
- Prevents errors like “�” or “???”, often caused by encoding mismatches.
3. Declaring Character Encoding in HTML
To specify the character encoding of an HTML document, use the <meta>
tag inside the <head>
element.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Character Encoding Example</title>
</head>
<body>
<p>Welcome to HTML Encoding! 😊</p>
</body>
</html>
Explanation:
<meta charset="UTF-8">
: Specifies UTF-8 encoding, which supports most characters from all languages and is the standard encoding for modern web pages.
4. Common Character Encodings
Encoding | Description |
---|---|
UTF-8 | Universal encoding that supports nearly all characters from every language. |
ASCII | Encodes 128 characters, primarily for English letters, numbers, and symbols. |
ISO-8859-1 | Western European encoding (Latin-1), now largely replaced by UTF-8. |
UTF-16 | Extended version of UTF for multilingual support, commonly used in Windows. |
5. HTML Entities for Special Characters
For some special characters, you can use HTML entities to ensure they are rendered correctly, regardless of the encoding.
Common HTML Entities:
Character | Entity Name | Entity Number | Description |
---|---|---|---|
& | & | & | Ampersand |
< | < | < | Less Than |
> | > | > | Greater Than |
" | " | " | Double Quote |
' | ' | ' | Apostrophe |
© | © | © | Copyright Symbol |
® | ® | ® | Registered Trademark |
6. How UTF-8 Works
UTF-8 uses 1 to 4 bytes to represent characters:
- 1 byte for standard ASCII characters (English alphabets, numbers, basic symbols).
- 2-4 bytes for non-ASCII characters (e.g., emojis, Chinese, Arabic, etc.).
Example of UTF-8 Encoding:
Character | Unicode Code Point | UTF-8 Encoding |
---|---|---|
A | U+0041 | 01000001 |
Ω | U+03A9 | 11001110 10100111 |
😊 | U+1F60A | 11110000 10011111 10011000 10001010 |
7. HTML Encoding Problems and Fixes
Common Problems
- Garbled Text: Characters like
é
appear instead ofé
. - Mismatched Encoding: The browser interprets the page in a different encoding.
Solutions
- Specify UTF-8 Encoding: Add the following line in the
<head>
:<meta charset="UTF-8">
- Save Files in UTF-8 Format: Ensure your HTML file is saved in UTF-8 encoding using your text editor.
8. Testing Character Encoding
You can test how your HTML page handles different characters by including text in various languages or symbols.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Encoding Test</title>
</head>
<body>
<p>English: Hello, World!</p>
<p>Greek: Καλημέρα!</p>
<p>Chinese: 你好!</p>
<p>Emoji: 😃🌟❤️</p>
</body>
</html>
9. Summary
- Always use UTF-8 encoding for modern web pages.
- Declare encoding in the
<meta>
tag for consistent results. - Use HTML entities for special characters when needed.
- Save your files in the correct encoding format to avoid rendering issues.
Quick Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>HTML Encoding</title>
</head>
<body>
<p>HTML Encoding supports special symbols like © and €, and languages like 中文 or Español.</p>
</body>
</html>