HTML Encoding (Character Sets)
29 March 2025 | Category: HTML
HTML encoding, also known as character encoding, specifies how characters are represented in a web page. Character sets (encoding) define how text, numbers, symbols, and other characters are stored and displayed.
1. What is Character Encoding?
Character encoding is a system that assigns a unique number (code point) to every character in a set, enabling computers to represent and manipulate text. Commonly used character encodings include UTF-8, ASCII, and ISO-8859-1.
2. Why is Character Encoding Important?
- Ensures consistent display of text across different devices and platforms.
- Avoids issues with special characters (e.g., accented letters, symbols).
- Prevents errors like “�” or “???”, often caused by encoding mismatches.
3. Declaring Character Encoding in HTML
To specify the character encoding of an HTML document, use the <meta> tag inside the <head> element.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Character Encoding Example</title>
</head>
<body>
  <p>Welcome to HTML Encoding! 😊</p>
</body>
</html>
Explanation:
- <meta charset="UTF-8">: Specifies UTF-8 encoding, which supports most characters from all languages and is the standard encoding for modern web pages.
4. Common Character Encodings
| Encoding | Description | 
|---|---|
| UTF-8 | Universal encoding that supports nearly all characters from every language. | 
| ASCII | Encodes 128 characters, primarily for English letters, numbers, and symbols. | 
| ISO-8859-1 | Western European encoding (Latin-1), now largely replaced by UTF-8. | 
| UTF-16 | Extended version of UTF for multilingual support, commonly used in Windows. | 
5. HTML Entities for Special Characters
For some special characters, you can use HTML entities to ensure they are rendered correctly, regardless of the encoding.
Common HTML Entities:
| Character | Entity Name | Entity Number | Description | 
|---|---|---|---|
| & | & | & | Ampersand | 
| < | < | < | Less Than | 
| > | > | > | Greater Than | 
| " | " | " | Double Quote | 
| ' | ' | ' | Apostrophe | 
| © | © | © | Copyright Symbol | 
| ® | ® | ® | Registered Trademark | 
6. How UTF-8 Works
UTF-8 uses 1 to 4 bytes to represent characters:
- 1 byte for standard ASCII characters (English alphabets, numbers, basic symbols).
- 2-4 bytes for non-ASCII characters (e.g., emojis, Chinese, Arabic, etc.).
Example of UTF-8 Encoding:
| Character | Unicode Code Point | UTF-8 Encoding | 
|---|---|---|
| A | U+0041 | 01000001 | 
| Ω | U+03A9 | 11001110 10100111 | 
| 😊 | U+1F60A | 11110000 10011111 10011000 10001010 | 
7. HTML Encoding Problems and Fixes
Common Problems
- Garbled Text: Characters like éappear instead ofé.
- Mismatched Encoding: The browser interprets the page in a different encoding.
Solutions
- Specify UTF-8 Encoding: Add the following line in the <head>:<meta charset="UTF-8">
- Save Files in UTF-8 Format: Ensure your HTML file is saved in UTF-8 encoding using your text editor.
8. Testing Character Encoding
You can test how your HTML page handles different characters by including text in various languages or symbols.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Encoding Test</title>
</head>
<body>
  <p>English: Hello, World!</p>
  <p>Greek: Καλημέρα!</p>
  <p>Chinese: 你好!</p>
  <p>Emoji: 😃🌟❤️</p>
</body>
</html>
9. Summary
- Always use UTF-8 encoding for modern web pages.
- Declare encoding in the <meta>tag for consistent results.
- Use HTML entities for special characters when needed.
- Save your files in the correct encoding format to avoid rendering issues.
Quick Example:
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>HTML Encoding</title>
</head>
<body>
  <p>HTML Encoding supports special symbols like © and €, and languages like 中文 or Español.</p>
</body>
</html>
