HTML Encoding (Character Sets)

29 March 2025 | Category: HTML

HTML encoding, also known as character encoding, specifies how characters are represented in a web page. Character sets (encoding) define how text, numbers, symbols, and other characters are stored and displayed.

1. What is Character Encoding?

Character encoding is a system that assigns a unique number (code point) to every character in a set, enabling computers to represent and manipulate text. Commonly used character encodings include UTF-8, ASCII, and ISO-8859-1.

2. Why is Character Encoding Important?

Ensures consistent display of text across different devices and platforms.
Avoids issues with special characters (e.g., accented letters, symbols).
Prevents errors like “�” or “???”, often caused by encoding mismatches.

3. Declaring Character Encoding in HTML

To specify the character encoding of an HTML document, use the <meta> tag inside the <head> element.

Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Character Encoding Example</title>
</head>
<body>
  <p>Welcome to HTML Encoding! 😊</p>
</body>
</html>

Explanation:

<meta charset="UTF-8">: Specifies UTF-8 encoding, which supports most characters from all languages and is the standard encoding for modern web pages.

4. Common Character Encodings

Encoding	Description
UTF-8	Universal encoding that supports nearly all characters from every language.
ASCII	Encodes 128 characters, primarily for English letters, numbers, and symbols.
ISO-8859-1	Western European encoding (Latin-1), now largely replaced by UTF-8.
UTF-16	Extended version of UTF for multilingual support, commonly used in Windows.

5. HTML Entities for Special Characters

For some special characters, you can use HTML entities to ensure they are rendered correctly, regardless of the encoding.

Common HTML Entities:

Character	Entity Name	Entity Number	Description
`&`	`&`	`&`	Ampersand
`<`	`<`	`<`	Less Than
`>`	`>`	`>`	Greater Than
`"`	`"`	`"`	Double Quote
`'`	`'`	`'`	Apostrophe
`©`	`©`	`©`	Copyright Symbol
`®`	`®`	`®`	Registered Trademark

6. How UTF-8 Works

UTF-8 uses 1 to 4 bytes to represent characters:

1 byte for standard ASCII characters (English alphabets, numbers, basic symbols).
2-4 bytes for non-ASCII characters (e.g., emojis, Chinese, Arabic, etc.).

Example of UTF-8 Encoding:

Character	Unicode Code Point	UTF-8 Encoding
`A`	U+0041	01000001
`Ω`	U+03A9	11001110 10100111
`😊`	U+1F60A	11110000 10011111 10011000 10001010

7. HTML Encoding Problems and Fixes

Common Problems

Mismatched Encoding: The browser interprets the page in a different encoding.

Solutions

Specify UTF-8 Encoding: Add the following line in the <head>: <meta charset="UTF-8">
Save Files in UTF-8 Format: Ensure your HTML file is saved in UTF-8 encoding using your text editor.

8. Testing Character Encoding

You can test how your HTML page handles different characters by including text in various languages or symbols.

Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Encoding Test</title>
</head>
<body>
  <p>English: Hello, World!</p>
  <p>Greek: Καλημέρα!</p>
  <p>Chinese: 你好!</p>
  <p>Emoji: 😃🌟❤️</p>
</body>
</html>

9. Summary

Always use UTF-8 encoding for modern web pages.
Declare encoding in the <meta> tag for consistent results.
Use HTML entities for special characters when needed.
Save your files in the correct encoding format to avoid rendering issues.

Quick Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>HTML Encoding</title>
</head>
<body>
  <p>HTML Encoding supports special symbols like &copy; and &euro;, and languages like 中文 or Español.</p>
</body>
</html>