HTML – Character Encodings

Character encoding is a fundamental aspect of HTML that ensures proper representation and rendering of text on the web. HTML supports various character encoding standards, allowing you to handle different languages, special characters, and symbols. In this post, we will explore HTML character encoding and provide a reference guide to help you work with character encoding effectively.

Character Sets

HTML relies on character sets to define the repertoire of characters that can be used in a document. The character set is specified in the <meta> tag within the document’s <head> section using the charset attribute.

Example:

<meta charset="UTF-8">

In the above example, the character set is set to UTF-8, which supports a wide range of characters from various languages.

Special Characters

HTML includes a set of special characters that have special meanings or require special handling. To display these characters directly in HTML, you can use character entities or numeric character references. Here are a few examples:

  • < (Less Than): &lt; or <
  • > (Greater Than): &gt; or >
  • & (Ampersand): &amp; or &
  • " (Double Quote): &quot; or "
  • ' (Single Quote): &apos; or '

Non-ASCII Characters:

To include non-ASCII characters, such as accented letters or characters from other languages, you can use character entities or numeric character references as well. Here are a few examples:

  • é: &eacute; or é
  • ñ: &ntilde; or ñ
  • 漢:

Unicode and UTF-8:

Unicode is a character encoding standard that assigns unique numeric codes to almost every character in every writing system. UTF-8 is a popular character encoding that represents Unicode characters using variable-length encoding.

To include Unicode characters directly in HTML, you can use the character’s hexadecimal or decimal code points. For example:

  • é: é or é
  • ñ: ñ or ñ

Internationalization and Localization:

HTML provides mechanisms for internationalization and localization to handle different languages and cultural conventions. This includes support for bidirectional text, right-to-left scripts, and language-specific attributes.

To accommodate bidirectional text, you can use the dir attribute with values of “ltr” (left-to-right) or “rtl” (right-to-left). For example:

<p dir="rtl">مرحبًا بالعالم</p>

In the above example, the text within the paragraph is set to right-to-left direction.

Example usage:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>HTML Character Encoding</title>
</head>
<body>
  <h1>Special Characters</h1>
  <p><HTML></p>
  <p>♥</p>
  
  <h1>Non-ASCII Characters</h1>
  <p>é - é</p>
  <p>ñ - ñ</p>
  
  <h1>Unicode Characters</h1>
  <p>😁</p>
<p>😄</p>
</body>
</html>

In the above example, various special characters, non-ASCII characters, and Unicode characters are properly displayed using character entities and numeric character references.

Understanding and properly handling character encoding in HTML is essential for creating web content that accurately represents text from different languages and character sets. Use this reference guide to ensure the correct display and handling of characters in your HTML documents.