HTML

HTML (HyperText Markup Language) is the foundation of every website. It defines the structure of web pages and works seamlessly with CSS and JavaScript to create interactive and visually appealing sites. Let's start coding today! 🚀

HTML Encoding (Character Sets)

29 March 2025 | Category:

HTML encoding, also known as character encoding, specifies how characters are represented in a web page. Character sets (encoding) define how text, numbers, symbols, and other characters are stored and displayed.


1. What is Character Encoding?

Character encoding is a system that assigns a unique number (code point) to every character in a set, enabling computers to represent and manipulate text. Commonly used character encodings include UTF-8, ASCII, and ISO-8859-1.


2. Why is Character Encoding Important?

  • Ensures consistent display of text across different devices and platforms.
  • Avoids issues with special characters (e.g., accented letters, symbols).
  • Prevents errors like “�” or “???”, often caused by encoding mismatches.

3. Declaring Character Encoding in HTML

To specify the character encoding of an HTML document, use the <meta> tag inside the <head> element.

Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Character Encoding Example</title>
</head>
<body>
  <p>Welcome to HTML Encoding! 😊</p>
</body>
</html>

Explanation:

  • <meta charset="UTF-8">: Specifies UTF-8 encoding, which supports most characters from all languages and is the standard encoding for modern web pages.

4. Common Character Encodings

EncodingDescription
UTF-8Universal encoding that supports nearly all characters from every language.
ASCIIEncodes 128 characters, primarily for English letters, numbers, and symbols.
ISO-8859-1Western European encoding (Latin-1), now largely replaced by UTF-8.
UTF-16Extended version of UTF for multilingual support, commonly used in Windows.

5. HTML Entities for Special Characters

For some special characters, you can use HTML entities to ensure they are rendered correctly, regardless of the encoding.

Common HTML Entities:

CharacterEntity NameEntity NumberDescription
&&amp;&#38;Ampersand
<&lt;&#60;Less Than
>&gt;&#62;Greater Than
"&quot;&#34;Double Quote
'&apos;&#39;Apostrophe
©&copy;&#169;Copyright Symbol
®&reg;&#174;Registered Trademark

6. How UTF-8 Works

UTF-8 uses 1 to 4 bytes to represent characters:

  • 1 byte for standard ASCII characters (English alphabets, numbers, basic symbols).
  • 2-4 bytes for non-ASCII characters (e.g., emojis, Chinese, Arabic, etc.).

Example of UTF-8 Encoding:

CharacterUnicode Code PointUTF-8 Encoding
AU+004101000001
ΩU+03A911001110 10100111
😊U+1F60A11110000 10011111 10011000 10001010

7. HTML Encoding Problems and Fixes

Common Problems

  1. Garbled Text: Characters like é appear instead of é.
  2. Mismatched Encoding: The browser interprets the page in a different encoding.

Solutions

  1. Specify UTF-8 Encoding: Add the following line in the <head>: <meta charset="UTF-8">
  2. Save Files in UTF-8 Format: Ensure your HTML file is saved in UTF-8 encoding using your text editor.

8. Testing Character Encoding

You can test how your HTML page handles different characters by including text in various languages or symbols.

Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Encoding Test</title>
</head>
<body>
  <p>English: Hello, World!</p>
  <p>Greek: Καλημέρα!</p>
  <p>Chinese: 你好!</p>
  <p>Emoji: 😃🌟❤️</p>
</body>
</html>

9. Summary

  • Always use UTF-8 encoding for modern web pages.
  • Declare encoding in the <meta> tag for consistent results.
  • Use HTML entities for special characters when needed.
  • Save your files in the correct encoding format to avoid rendering issues.

Quick Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>HTML Encoding</title>
</head>
<body>
  <p>HTML Encoding supports special symbols like &copy; and &euro;, and languages like 中文 or Español.</p>
</body>
</html>