TL;DR
An XML formatter pretty-prints raw XML with proper indentation and validates its structure. Use DOMParser + XMLSerializer in browsers, xml.etree.ElementTree or lxml in Python, and fast-xml-parser in Node.js. Validate XML against an XSD schema using lxml in Python or JAXB in Java. Query XML with XPath and transform it with XSLT. Try our free online XML formatter for instant formatting and validation, or follow the code examples below.
What Is XML? Structure, Elements, Attributes, and Namespaces
XML (eXtensible Markup Language) is a W3C standard markup language designed for storing and transporting structured data in a human-readable format. Unlike HTML, XML has no predefined tags — you define your own vocabulary to describe your data. XML separates data from presentation, making it ideal for data interchange between systems.
A well-formed XML document has this fundamental structure:
<?xml version="1.0" encoding="UTF-8"?>
<!-- XML declaration specifies version and character encoding -->
<library xmlns="http://example.com/library"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/library library.xsd">
<book id="978-0-13-110362-7" category="programming">
<title lang="en">The C Programming Language</title>
<authors>
<author>Brian W. Kernighan</author>
<author>Dennis M. Ritchie</author>
</authors>
<price currency="USD">45.99</price>
<description><![CDATA[
Classic reference for the C language.
Contains <examples> and & special chars.
]]></description>
<tags>
<tag>c</tag>
<tag>programming</tag>
<tag>systems</tag>
</tags>
</book>
</library>Key XML concepts:
- Elements: The basic building blocks —
<element>content</element>. Every element must have a closing tag or be self-closing:<br/>. - Attributes: Name-value pairs inside the opening tag. Values must always be quoted:
id="123"orid='123'. - Root element: Every XML document must have exactly one root element that contains all others.
- CDATA sections:
<![CDATA[ ... ]]>allows embedding raw text with special characters without escaping. - Namespaces:
xmlns:prefix="URI"declarations prevent naming conflicts when combining XML vocabularies. - Processing instructions:
<?xml-stylesheet type="text/xsl" href="style.xsl"?>provide metadata to applications. - Comments:
<!-- comment -->— cannot contain double hyphens inside.
XML vs HTML — Key Differences
| Feature | XML | HTML |
|---|---|---|
| Purpose | Store and transport data | Display data in browsers |
| Tags | User-defined (any name) | Predefined (div, p, span...) |
| Case sensitivity | Case-sensitive (Name ≠ name) | Case-insensitive |
| Closing tags | Mandatory for all elements | Optional for some (br, img) |
| Attribute values | Must always be quoted | Quotes optional in HTML5 |
| Error handling | Fatal error on any malformed XML | Lenient — browsers auto-correct |
| White space | Preserved (significant) | Collapsed by browsers |
XML Formatting and Pretty-Printing — Why Indentation Matters
Parsers treat whitespace between elements as either insignificant (in element-only content) or significant (in mixed content). For data-centric XML (like configuration files), whitespace between elements is insignificant and can be freely added for readability.
Minified XML is valid but hard to read. Pretty-printed XML uses consistent indentation (2 or 4 spaces per level). The xml:space="preserve" attribute signals that whitespace in that element should be preserved:
<!-- Minified XML — valid but unreadable -->
<root><person id="1"><name>Alice</name><age>30</age></person></root>
<!-- Pretty-printed XML — same data, readable -->
<root>
<person id="1">
<name>Alice</name>
<age>30</age>
</person>
</root>
<!-- xml:space="preserve" preserves whitespace in pre-like content -->
<code xml:space="preserve">
function hello() {
return "world";
}
</code>JavaScript — DOMParser and XMLSerializer (Browser)
The browser provides DOMParser to parse XML strings into DOM documents and XMLSerializer to serialize DOM back to strings. For pretty-printing, you must implement indentation manually since browsers do not format XML by default.
// Parse XML string in browser
function parseXml(xmlString: string): Document | null {
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, 'application/xml');
// Check for parse errors
const parseError = doc.querySelector('parsererror');
if (parseError) {
console.error('XML parse error:', parseError.textContent);
return null;
}
return doc;
}
// Serialize DOM document back to string
function serializeXml(doc: Document): string {
const serializer = new XMLSerializer();
return serializer.serializeToString(doc);
}
// Pretty-print XML with indentation
function prettyPrintXml(xmlString: string, indent = ' '): string {
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'application/xml');
// Check for parse errors
const error = xmlDoc.querySelector('parsererror');
if (error) throw new Error('Invalid XML: ' + error.textContent);
return formatNode(xmlDoc.documentElement, 0, indent);
}
function formatNode(node: Element, depth: number, indent: string): string {
const pad = indent.repeat(depth);
const childPad = indent.repeat(depth + 1);
// Build opening tag with attributes
let result = pad + '<' + node.nodeName;
for (const attr of Array.from(node.attributes)) {
result += ` ${attr.name}="${attr.value}"`;
}
const children = Array.from(node.childNodes).filter(
n => n.nodeType === Node.ELEMENT_NODE ||
(n.nodeType === Node.TEXT_NODE && n.textContent?.trim())
);
if (children.length === 0) {
result += '/>';
return result;
}
result += '>';
const hasElementChildren = children.some(n => n.nodeType === Node.ELEMENT_NODE);
if (hasElementChildren) {
result += '\n';
for (const child of children) {
if (child.nodeType === Node.ELEMENT_NODE) {
result += formatNode(child as Element, depth + 1, indent) + '\n';
} else if (child.nodeType === Node.TEXT_NODE) {
const text = child.textContent?.trim();
if (text) result += childPad + text + '\n';
}
}
result += pad + '</' + node.nodeName + '>';
} else {
// Inline text content
const text = node.textContent?.trim() || '';
result += text + '</' + node.nodeName + '>';
}
return result;
}
// Usage
const xml = '<root><person id="1"><name>Alice</name><age>30</age></person></root>';
console.log(prettyPrintXml(xml));
/*
<root>
<person id="1">
<name>Alice</name>
<age>30</age>
</person>
</root>
*/
// Extract data from XML
function getPersonNames(xmlString: string): string[] {
const doc = parseXml(xmlString);
if (!doc) return [];
return Array.from(doc.querySelectorAll('name')).map(el => el.textContent || '');
}
// Check if XML is well-formed
function isValidXml(xmlString: string): boolean {
const doc = parseXml(xmlString);
return doc !== null;
}Node.js — xml2js and fast-xml-parser
Node.js does not include a native XML parser, so you need npm packages. fast-xml-parser is the fastest option with zero dependencies. xml2js is older but very widely used. Both convert XML to JavaScript objects and back.
# Install
npm install fast-xml-parser
npm install xml2js # alternative// fast-xml-parser — recommended for performance
import { XMLParser, XMLBuilder, XMLValidator } from 'fast-xml-parser';
const xmlString = `<?xml version="1.0" encoding="UTF-8"?>
<library>
<book id="1">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<price>39.99</price>
</book>
<book id="2">
<title>The Pragmatic Programmer</title>
<author>Andrew Hunt</author>
<price>44.99</price>
</book>
</library>`;
// Parse XML to JavaScript object
const parser = new XMLParser({
ignoreAttributes: false, // include XML attributes
attributeNamePrefix: '@_', // prefix attribute keys with @_
parseAttributeValue: true, // parse attribute values as numbers/booleans
});
const result = parser.parse(xmlString);
console.log(result.library.book);
// => [{ '@_id': 1, title: 'Clean Code', author: 'Robert C. Martin', price: 39.99 }, ...]
// Validate XML before parsing
const validationResult = XMLValidator.validate(xmlString);
if (validationResult !== true) {
console.error('XML validation error:', validationResult.err);
}
// Build XML from JavaScript object
const builder = new XMLBuilder({
ignoreAttributes: false,
attributeNamePrefix: '@_',
format: true, // pretty-print with indentation
indentBy: ' ', // 2-space indent
suppressEmptyNode: true, // <empty/> instead of <empty></empty>
});
const jsObject = {
library: {
book: [
{ '@_id': 1, title: 'Refactoring', author: 'Martin Fowler', price: 49.99 },
{ '@_id': 2, title: 'SICP', author: 'Harold Abelson', price: 39.99 },
]
}
};
const newXml = builder.build(jsObject);
console.log(newXml);
// xml2js — alternative with promise-based API
import { parseStringPromise, Builder } from 'xml2js';
const parsed = await parseStringPromise(xmlString, {
explicitArray: false, // don't wrap single elements in arrays
mergeAttrs: true, // merge attributes into the element object
trim: true,
});
console.log(parsed.library.book);
// Build XML from object with xml2js
const xmlBuilder = new Builder({
renderOpts: { pretty: true, indent: ' ' },
xmldec: { version: '1.0', encoding: 'UTF-8' },
});
const xmlOutput = xmlBuilder.buildObject({ library: { book: [/* ... */] } });Python — xml.etree.ElementTree and minidom
Python's standard library includes xml.etree.ElementTree (fast, minimal) and xml.dom.minidom (DOM-based, better for pretty-printing). Python 3.9+ added the indent() function to ElementTree for easy pretty-printing.
import xml.etree.ElementTree as ET
from xml.dom import minidom
xml_string = """<?xml version="1.0" encoding="UTF-8"?>
<library>
<book id="1">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<price currency="USD">39.99</price>
</book>
</library>"""
# Parse XML
tree = ET.parse('library.xml') # from file
root = ET.fromstring(xml_string) # from string
# Access elements
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = book.find('price').text
currency = book.find('price').get('currency', 'USD')
book_id = book.get('id')
print(f"Book {book_id}: {title} by {author} ({currency} {price})")
# Find all titles using findall with path
titles = [el.text for el in root.findall('./book/title')]
# Iterate all elements recursively
for element in root.iter():
print(f"Tag: {element.tag}, Text: {element.text}")
# Modify XML
for book in root.findall('book'):
price_el = book.find('price')
if price_el is not None:
old_price = float(price_el.text)
price_el.text = str(round(old_price * 1.1, 2)) # 10% increase
# Pretty-print with Python 3.9+ indent() function
ET.indent(root, space=' ')
print(ET.tostring(root, encoding='unicode'))
# Pretty-print with minidom (works in older Python)
def pretty_print_xml(xml_string: str) -> str:
parsed = minidom.parseString(xml_string)
return parsed.toprettyxml(indent=' ', encoding=None)
pretty = pretty_print_xml(xml_string)
print(pretty)
# Create new XML from scratch
root = ET.Element('library')
root.set('xmlns', 'http://example.com/library')
book = ET.SubElement(root, 'book')
book.set('id', '1')
title = ET.SubElement(book, 'title')
title.text = 'The Pragmatic Programmer'
author = ET.SubElement(book, 'author')
author.text = 'Andrew Hunt'
# Serialize to string
ET.indent(root, space=' ')
xml_output = ET.tostring(root, encoding='unicode', xml_declaration=True)
# Write to file
tree = ET.ElementTree(root)
ET.indent(tree, space=' ')
tree.write('output.xml', encoding='utf-8', xml_declaration=True)Python — lxml Library (XPath, XSD Validation, XSLT)
lxml is the most powerful Python XML library, built on libxml2 and libxslt. It supports full XPath 1.0, XSD/RELAX NG validation, XSLT transformations, and has much better performance than the standard library. Install with pip install lxml.
from lxml import etree
import requests
xml_string = b"""<?xml version="1.0" encoding="UTF-8"?>
<library xmlns="http://example.com/library">
<book id="1" price="39.99">
<title>Clean Code</title>
<author>Robert C. Martin</author>
</book>
<book id="2" price="29.99">
<title>The Pragmatic Programmer</title>
<author>Andrew Hunt</author>
</book>
</library>"""
# Parse XML
root = etree.fromstring(xml_string)
# or from file: root = etree.parse('file.xml').getroot()
# Pretty-print
pretty = etree.tostring(root, pretty_print=True, encoding='unicode')
print(pretty)
# XPath queries — powerful element selection
ns = {'lib': 'http://example.com/library'}
# Get all book titles
titles = root.xpath('//lib:book/lib:title/text()', namespaces=ns)
print(titles) # => ['Clean Code', 'The Pragmatic Programmer']
# Get books with price under 35
cheap_books = root.xpath('//lib:book[@price < 35]', namespaces=ns)
for book in cheap_books:
print(book.xpath('lib:title/text()', namespaces=ns))
# Get the first book
first_book = root.xpath('//lib:book[1]', namespaces=ns)
# XSD Schema validation
xsd_string = b"""<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/library"
xmlns="http://example.com/library">
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
<xs:attribute name="price" type="xs:decimal" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>"""
xsd_doc = etree.fromstring(xsd_string)
schema = etree.XMLSchema(xsd_doc)
# Validate XML against schema
if schema.validate(root):
print("XML is valid!")
else:
for error in schema.error_log:
print(f"Validation error: {error.message} (line {error.line})")
# XSLT transformation
xslt_string = b"""<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:lib="http://example.com/library">
<xsl:template match="/">
<html>
<body>
<table border="1">
<tr><th>ID</th><th>Title</th><th>Author</th></tr>
<xsl:for-each select="//lib:book">
<tr>
<td><xsl:value-of select="@id"/></td>
<td><xsl:value-of select="lib:title"/></td>
<td><xsl:value-of select="lib:author"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>"""
xslt_doc = etree.fromstring(xslt_string)
transform = etree.XSLT(xslt_doc)
result_tree = transform(root)
print(str(result_tree))XML Validation — DTD vs XSD vs RELAX NG
XML validation checks that a document conforms to a defined structure beyond just being well-formed. There are three major schema languages:
- DTD (Document Type Definition): The original XML schema format. Simple syntax, supports elements and attributes but not data types. Cannot validate that an element contains a number vs string. Used in legacy XML, XHTML, and HTML5 doctype declarations.
- XSD (XML Schema Definition): The W3C standard, written in XML itself. Supports 44 built-in data types (string, integer, date, boolean, etc.), namespaces, inheritance, regular expression patterns for value constraints. Industry standard for SOAP, enterprise XML.
- RELAX NG: More expressive than XSD, simpler to write. Available in XML syntax and compact notation. Does not support data type facets like minInclusive directly. Used in document-centric XML, OpenDocument Format, EPUB.
<!-- DTD (Document Type Definition) — inline or external -->
<!DOCTYPE library [
<!ELEMENT library (book+)>
<!ELEMENT book (title, author, price)>
<!ATTLIST book
id ID #REQUIRED
category CDATA #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
]>
<library>
<book id="b1">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<price>39.99</price>
</book>
</library>
<!-- XSD Schema — external file (library.xsd) -->
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element name="book" type="BookType"
minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string" maxOccurs="5"/>
<xs:element name="price">
<xs:simpleType>
<xs:restriction base="xs:decimal">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="9999.99"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="isbn" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<!-- ISBN-13 pattern: 978-x-xxx-xxxxx-x -->
<xs:pattern value="\d{3}-\d-\d{3}-\d{5}-\d"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="id" type="xs:positiveInteger" use="required"/>
<xs:attribute name="category">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="programming"/>
<xs:enumeration value="science"/>
<xs:enumeration value="fiction"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:schema>XPath — Querying XML Documents
XPath (XML Path Language) is a query language for selecting nodes from an XML document. It uses a path expression syntax similar to filesystem paths, plus predicates for filtering.
<!-- Sample XML for XPath examples -->
<bookstore>
<book category="programming" lang="en">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<price>39.99</price>
<year>2008</year>
</book>
<book category="programming" lang="en">
<title>SICP</title>
<author>Harold Abelson</author>
<price>29.99</price>
<year>1996</year>
</book>
<book category="science" lang="en">
<title>A Brief History of Time</title>
<author>Stephen Hawking</author>
<price>14.99</price>
<year>1988</year>
</book>
</bookstore>
XPath Expression Reference:
/bookstore — root element bookstore
//book — all book elements anywhere
/bookstore/book[1] — first book child of bookstore
/bookstore/book[last()] — last book
//book/@category — category attribute of all books
//book[@category] — books that have a category attribute
//book[@category='programming'] — programming books only
//book[price > 20] — books more expensive than 20
//book[price > 20 and @category='programming'] — combined predicate
//title/text() — text content of all titles
//book[contains(title, 'Code')] — books whose title contains "Code"
//book[starts-with(title, 'Clean')] — books starting with "Clean"
count(//book) — count of all books
sum(//price) — sum of all prices
//book[year < 2000]/title — titles of books before 2000// XPath in JavaScript (browser) — XPathEvaluator API
function xpath(expression: string, doc: Document): string[] {
const result = doc.evaluate(
expression,
doc,
null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
null
);
const nodes: string[] = [];
for (let i = 0; i < result.snapshotLength; i++) {
nodes.push(result.snapshotItem(i)?.textContent || '');
}
return nodes;
}
// Usage
const doc = new DOMParser().parseFromString(xmlString, 'application/xml');
const titles = xpath('//title/text()', doc);
// => ['Clean Code', 'SICP', 'A Brief History of Time']
const expensiveBooks = xpath('//book[price > 20]/title/text()', doc);
// => ['Clean Code', 'SICP']
// String value XPath result
function xpathString(expression: string, doc: Document): string {
const result = doc.evaluate(expression, doc, null, XPathResult.STRING_TYPE, null);
return result.stringValue;
}
const firstTitle = xpathString('//book[1]/title', doc);
// => 'Clean Code'XSLT — Transforming XML to HTML, Text, or Other XML
XSLT (eXtensible Stylesheet Language Transformations) transforms XML documents into another format using template-based rules. An XSLT processor applies the stylesheet to the source XML and produces output. XSLT 1.0 is universally supported in browsers; XSLT 2.0/3.0 require the Saxon library.
<!-- XSLT 1.0 stylesheet — books XML to HTML table -->
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Root template — matches the document root -->
<xsl:template match="/">
<html>
<head>
<title>Book Catalog</title>
<style>
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; }
th { background: #f4f4f4; }
</style>
</head>
<body>
<h1>Book Catalog</h1>
<table>
<tr>
<th>Title</th>
<th>Author</th>
<th>Price</th>
<th>Year</th>
</tr>
<!-- For each book element, create a table row -->
<xsl:for-each select="bookstore/book">
<!-- Sort by price descending -->
<xsl:sort select="price" data-type="number" order="descending"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="author"/></td>
<td>$<xsl:value-of select="price"/></td>
<td><xsl:value-of select="year"/></td>
</tr>
</xsl:for-each>
</table>
<!-- Conditional output -->
<p>
Total books: <xsl:value-of select="count(bookstore/book)"/>
</p>
<xsl:if test="count(bookstore/book) > 5">
<p>Large catalog!</p>
</xsl:if>
</body>
</html>
</xsl:template>
</xsl:stylesheet>// Apply XSLT in browser using XSLTProcessor API
async function transformXml(xmlString: string, xsltString: string): Promise<string> {
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'application/xml');
const xsltDoc = parser.parseFromString(xsltString, 'application/xml');
const xsltProcessor = new XSLTProcessor();
xsltProcessor.importStylesheet(xsltDoc);
// Apply transformation
const resultDoc = xsltProcessor.transformToDocument(xmlDoc);
const serializer = new XMLSerializer();
return serializer.serializeToString(resultDoc);
}
// Usage
const htmlOutput = await transformXml(booksXml, xsltStylesheet);
document.getElementById('output')!.innerHTML = htmlOutput;XML Namespaces — Avoiding Name Conflicts
XML namespaces allow elements from different vocabularies to coexist in the same document without name collisions. A namespace is declared with xmlns or xmlns:prefixattributes, associating a URI with elements in scope.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Multiple namespaces in one document -->
<root
xmlns="http://default.example.com" <!-- default namespace -->
xmlns:book="http://books.example.com"
xmlns:price="http://pricing.example.com"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- Uses default namespace -->
<title>Book Catalog</title>
<!-- Uses book: namespace prefix -->
<book:catalog version="2.0">
<book:item book:id="1">
<book:title>Clean Code</book:title>
<!-- Mixed namespaces in one element tree -->
<price:cost price:currency="USD">39.99</price:cost>
</book:item>
</book:catalog>
<!-- Embedded XHTML with its own namespace -->
<description>
<xhtml:p>This is a <xhtml:strong>great</xhtml:strong> book.</xhtml:p>
</description>
</root>
<!-- Real-world: SOAP envelope uses multiple namespaces -->
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Header/>
<soap:Body>
<GetStockPrice xmlns="http://myservice.example.com/">
<StockName>IBM</StockName>
</GetStockPrice>
</soap:Body>
</soap:Envelope>// Handling namespaces in JavaScript
const xmlWithNs = `<lib:library xmlns:lib="http://books.example.com">
<lib:book lib:id="1">
<lib:title>Clean Code</lib:title>
</lib:book>
</lib:library>`;
const parser = new DOMParser();
const doc = parser.parseFromString(xmlWithNs, 'application/xml');
// Use getElementsByTagNameNS for namespace-aware queries
const books = doc.getElementsByTagNameNS('http://books.example.com', 'book');
console.log(books.length); // => 1
// getAttribute needs namespace for namespaced attributes
const book = books[0];
const id = book.getAttributeNS('http://books.example.com', 'id');
console.log(id); // => '1'
// XPathEvaluator with namespace resolver
const nsResolver = {
lookupNamespaceURI: (prefix: string | null) => {
if (prefix === 'lib') return 'http://books.example.com';
return null;
}
};
const result = doc.evaluate(
'//lib:book/lib:title',
doc,
nsResolver as XPathNSResolver,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
null
);CDATA Sections — Embedding Special Characters
CDATA sections let you include text that contains XML special characters (<, >,&) without escaping them. The parser treats the entire CDATA block as raw character data.
<!-- CDATA vs entity references — choose based on amount of special chars -->
<!-- Without CDATA — requires escaping every special character -->
<code>
if (x < 10 && y > 0) {
document.write("Hello & World");
}
</code>
<!-- With CDATA — no escaping needed, much more readable -->
<code><![CDATA[
if (x < 10 && y > 0) {
document.write("Hello & World");
}
]]></code>
<!-- CDATA is commonly used for: -->
<!-- 1. Embedding script code in XML-based formats -->
<script type="text/javascript">
<![CDATA[
function validate() {
var re = /^[a-z]+$/i;
return re.test(document.getElementById('name').value);
}
]]>
</script>
<!-- 2. Embedding SQL in configuration XML -->
<query><![CDATA[
SELECT * FROM users
WHERE name LIKE '%O''Brien%'
AND age > 21 AND age < 65;
]]></query>
<!-- 3. Embedding HTML content in XML -->
<description><![CDATA[
<p>Learn <strong>XML formatting</strong> & validation.</p>
<ul>
<li>Parse XML</li>
<li>Validate with XSD</li>
</ul>
]]></description>
<!-- CDATA cannot contain ]]> (the closing sequence) -->
<!-- Workaround: split into two CDATA sections -->
<text><![CDATA[First part ]]]]><![CDATA[> second part]]></text>
<!-- XML Entity References for single special characters -->
<!-- & = & (ampersand) -->
<!-- < = < (less than) -->
<!-- > = > (greater than) -->
<!-- " = " (double quote) -->
<!-- ' = ' (single quote / apostrophe) -->XML vs JSON — Comparison Table and When to Use Each
Both XML and JSON are widely used for data interchange, but they have different strengths. JSON dominates modern REST APIs while XML remains essential for SOAP, RSS, SVG, and document formats.
| Feature | XML | JSON |
|---|---|---|
| Verbosity | More verbose (closing tags) | Compact, less overhead |
| Parsing speed | Slower (more complex) | Faster (JSON.parse) |
| Schema support | DTD, XSD, RELAX NG (mature) | JSON Schema (less mature) |
| Attributes | Elements + attributes | Keys and values only |
| Comments | Supported | Not supported |
| Namespaces | Full namespace support | No native namespaces |
| Query language | XPath, XQuery | JSONPath, jq |
| Transform | XSLT | jq, JavaScript |
| Binary data | Base64 or CDATA | Base64 string |
| Use cases | SOAP, RSS, SVG, Office formats, config | REST APIs, web storage, config |
Choose XML when: working with SOAP/WSDL web services, generating or parsing RSS/Atom feeds, working with SVG graphics, processing Microsoft Office Open XML (.docx, .xlsx), or building Android UI layouts. Choose JSON when: building REST APIs, storing data in NoSQL databases, sending data between frontend and backend, or working with modern JavaScript frameworks.
Common XML Errors and How to Fix Them
XML parsers are strict — any well-formedness violation is a fatal error. Here are the most common XML errors and how to fix them:
<!-- ERROR 1: Unclosed tags -->
<!-- Invalid -->
<root>
<name>Alice
<age>30</age>
</root>
<!-- Fixed -->
<root>
<name>Alice</name>
<age>30</age>
</root>
<!-- ERROR 2: Tags not properly nested (overlapping) -->
<!-- Invalid -->
<bold><italic>text</bold></italic>
<!-- Fixed -->
<bold><italic>text</italic></bold>
<!-- ERROR 3: Unquoted attribute values -->
<!-- Invalid -->
<book id=1 category=programming>
<!-- Fixed -->
<book id="1" category="programming">
<!-- ERROR 4: Unescaped special characters in text content -->
<!-- Invalid — ampersand must be escaped -->
<title>Kernighan & Ritchie</title>
<!-- Fixed -->
<title>Kernighan & Ritchie</title>
<!-- Or use CDATA -->
<title><![CDATA[Kernighan & Ritchie]]></title>
<!-- ERROR 5: Unescaped angle brackets in attribute values -->
<!-- Invalid -->
<filter condition="price < 50">
<!-- Fixed -->
<filter condition="price < 50">
<!-- ERROR 6: Invalid characters in element names -->
<!-- Invalid — element names cannot start with a number or contain spaces -->
<1st-book>, <book name>, <book@store>
<!-- Fixed -->
<first-book>, <book-name>, <book-store>
<!-- ERROR 7: Missing XML declaration encoding when using non-ASCII -->
<!-- Can cause issues with non-UTF-8 files -->
<!-- Add encoding declaration -->
<?xml version="1.0" encoding="UTF-8"?>
<!-- ERROR 8: Byte Order Mark (BOM) issues -->
<!-- Some editors add UTF-8 BOM (EF BB BF) before XML declaration -->
<!-- This breaks many parsers — save without BOM -->
<!-- In vim: :set nobomb | :w -->
<!-- In Python: use 'utf-8-sig' codec for BOM-aware reading -->
import xml.etree.ElementTree as ET
# Remove BOM if present
with open('file.xml', 'r', encoding='utf-8-sig') as f:
content = f.read()
root = ET.fromstring(content)
<!-- ERROR 9: Duplicate attribute names -->
<!-- Invalid — attributes must be unique within an element -->
<book id="1" id="2">
<!-- Fixed — use different attribute names or child elements -->
<book primary-id="1" alternate-id="2">
<!-- ERROR 10: Multiple root elements -->
<!-- Invalid -->
<root1></root1>
<root2></root2>
<!-- Fixed — wrap in a single root -->
<root>
<root1></root1>
<root2></root2>
</root>Large XML Streaming — SAX Parsers and iterparse
DOM parsers load the entire XML document into memory as a tree structure. For large XML files (hundreds of MB or GB), this is impractical. SAX (Simple API for XML) and streaming parsers process XML as a sequence of events without loading the whole document.
# Python — ElementTree iterparse (streaming, memory-efficient)
import xml.etree.ElementTree as ET
def count_books_streaming(filepath: str) -> int:
"""Process a large XML file with millions of books"""
count = 0
# iterparse yields (event, element) tuples
for event, element in ET.iterparse(filepath, events=('start', 'end')):
if event == 'end' and element.tag == 'book':
count += 1
# CRITICAL: Clear the element to free memory
element.clear()
return count
# Extract data from large XML with iterparse
def extract_books(filepath: str, max_price: float) -> list[dict]:
books = []
current_book = {}
for event, element in ET.iterparse(filepath, events=('start', 'end')):
if event == 'start':
if element.tag == 'book':
current_book = {'id': element.get('id')}
elif event == 'end':
if element.tag == 'title':
current_book['title'] = element.text
elif element.tag == 'price':
current_book['price'] = float(element.text or 0)
elif element.tag == 'book':
if current_book.get('price', 0) <= max_price:
books.append(current_book.copy())
element.clear() # Free memory
return books
# SAX Parser in Python — even more memory efficient
import xml.sax
import xml.sax.handler
class BookHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.books = []
self.current_book = {}
self.current_element = ''
self.current_text = ''
def startElement(self, name, attrs):
self.current_element = name
self.current_text = ''
if name == 'book':
self.current_book = {'id': attrs.get('id', '')}
def characters(self, content):
self.current_text += content
def endElement(self, name):
if name in ('title', 'author', 'price'):
self.current_book[name] = self.current_text.strip()
elif name == 'book':
self.books.append(self.current_book.copy())
self.current_book = {}
handler = BookHandler()
xml.sax.parse('large_library.xml', handler)
print(f"Parsed {len(handler.books)} books")// Node.js SAX streaming with 'sax' package
// npm install sax @types/sax
import sax from 'sax';
import { createReadStream } from 'fs';
interface Book {
id: string;
title: string;
author: string;
price: number;
}
function streamParseBooks(filePath: string): Promise<Book[]> {
return new Promise((resolve, reject) => {
const books: Book[] = [];
let currentBook: Partial<Book> = {};
let currentElement = '';
let currentText = '';
const parser = sax.createStream(true, { lowercase: false });
parser.on('opentag', (node) => {
currentElement = node.name;
currentText = '';
if (node.name === 'book') {
currentBook = { id: node.attributes['id'] as string };
}
});
parser.on('text', (text) => {
currentText += text;
});
parser.on('closetag', (tagName) => {
const text = currentText.trim();
if (tagName === 'title') currentBook.title = text;
else if (tagName === 'author') currentBook.author = text;
else if (tagName === 'price') currentBook.price = parseFloat(text);
else if (tagName === 'book') {
books.push(currentBook as Book);
currentBook = {};
}
currentText = '';
});
parser.on('error', reject);
parser.on('end', () => resolve(books));
createReadStream(filePath).pipe(parser);
});
}
// Usage
const books = await streamParseBooks('library.xml');
console.log(`Parsed ${books.length} books from large file`);Real-World XML Formats — RSS, SOAP, Maven, Android, Office
RSS 2.0 Feed
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Dev Blog</title>
<link>https://example.com</link>
<description>Latest developer articles</description>
<language>en-us</language>
<lastBuildDate>Thu, 27 Feb 2026 00:00:00 +0000</lastBuildDate>
<atom:link href="https://example.com/rss.xml" rel="self" type="application/rss+xml"/>
<item>
<title>XML Formatting Guide 2026</title>
<link>https://example.com/xml-guide</link>
<description><![CDATA[Complete guide to XML formatting and validation.]]></description>
<pubDate>Thu, 27 Feb 2026 12:00:00 +0000</pubDate>
<guid isPermaLink="true">https://example.com/xml-guide</guid>
<category>XML</category>
</item>
</channel>
</rss>Maven pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>my-app</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<spring.boot.version>3.3.0</spring.boot.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>${spring.boot.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>Android Layout XML (res/layout/activity_main.xml)
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
android:padding="16dp">
<TextView
android:id="@+id/title"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="@string/app_title"
android:textSize="24sp"
android:textStyle="bold"/>
<EditText
android:id="@+id/input"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:hint="@string/enter_text"
android:inputType="text"/>
<Button
android:id="@+id/submit"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="@string/submit"
app:backgroundTint="@color/primary"/>
</LinearLayout>SOAP Request and Response
<!-- SOAP 1.1 Request -->
POST /StockPrice HTTP/1.1
Host: www.example.com
Content-Type: text/xml; charset=utf-8
SOAPAction: "GetStockPrice"
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:m="http://www.example.com/stock/">
<soap:Header>
<m:AuthToken>Bearer eyJhbGciOiJSUzI1NiJ9...</m:AuthToken>
</soap:Header>
<soap:Body>
<m:GetStockPrice>
<m:StockName>IBM</m:StockName>
<m:Currency>USD</m:Currency>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
<!-- SOAP Response -->
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<m:GetStockPriceResponse xmlns:m="http://www.example.com/stock/">
<m:Price>175.43</m:Price>
<m:Currency>USD</m:Currency>
<m:Timestamp>2026-02-27T12:00:00Z</m:Timestamp>
</m:GetStockPriceResponse>
</soap:Body>
</soap:Envelope>
<!-- SOAP Fault (error response) -->
<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<soap:Fault>
<faultcode>soap:Client</faultcode>
<faultstring>Invalid stock symbol</faultstring>
<detail>
<m:StockError xmlns:m="http://www.example.com/stock/">
<m:ErrorCode>INVALID_SYMBOL</m:ErrorCode>
</m:StockError>
</detail>
</soap:Fault>
</soap:Body>
</soap:Envelope>Java — XML Validation with JAXB and javax.xml
Java provides comprehensive XML support through the standard library. DocumentBuilder parses XML,javax.xml.validation.Validator validates against schemas, and JAXB marshals between Java objects and XML.
import javax.xml.parsers.*;
import javax.xml.validation.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;
// Parse and validate XML against XSD in Java
public class XmlValidator {
public static Document parseXml(String xmlFilePath) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // Required for namespace processing
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new File(xmlFilePath));
}
public static boolean validateAgainstXsd(String xmlFilePath, String xsdFilePath) {
try {
SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = schemaFactory.newSchema(new File(xsdFilePath));
Validator validator = schema.newValidator();
// Collect all errors instead of throwing on first error
final java.util.List<String> errors = new java.util.ArrayList<>();
validator.setErrorHandler(new org.xml.sax.ErrorHandler() {
public void warning(org.xml.sax.SAXParseException e) {
errors.add("Warning: " + e.getMessage());
}
public void error(org.xml.sax.SAXParseException e) {
errors.add("Error at line " + e.getLineNumber() + ": " + e.getMessage());
}
public void fatalError(org.xml.sax.SAXParseException e) {
errors.add("Fatal: " + e.getMessage());
}
});
validator.validate(new StreamSource(new File(xmlFilePath)));
if (errors.isEmpty()) {
System.out.println("XML is valid!");
return true;
} else {
errors.forEach(System.err::println);
return false;
}
} catch (Exception e) {
System.err.println("Validation failed: " + e.getMessage());
return false;
}
}
public static void prettyPrint(Document doc) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
StringWriter writer = new StringWriter();
transformer.transform(
new DOMSource(doc),
new StreamResult(writer)
);
System.out.println(writer.toString());
}
// XPath queries in Java
public static String xpathQuery(Document doc, String expression) throws Exception {
javax.xml.xpath.XPathFactory xpathFactory =
javax.xml.xpath.XPathFactory.newInstance();
javax.xml.xpath.XPath xpath = xpathFactory.newXPath();
return xpath.evaluate(expression, doc);
}
public static void main(String[] args) throws Exception {
Document doc = parseXml("library.xml");
boolean valid = validateAgainstXsd("library.xml", "library.xsd");
prettyPrint(doc);
// XPath query
String firstTitle = xpathQuery(doc, "//book[1]/title");
System.out.println("First book: " + firstTitle);
}
}Key Takeaways
- XML is strict: all tags must be closed, attributes must be quoted, and any malformed document causes a fatal error. HTML is lenient but XML is not.
- Use DOMParser + XMLSerializer in browsers for XML parsing; implement custom indentation for pretty-printing since browsers do not format by default.
- fast-xml-parser is the recommended Node.js library — zero dependencies and the fastest parse speed. Use
XMLValidator.validate()before parsing. - Python 3.9+ added
ET.indent()to ElementTree, making pretty-printing simple without minidom. Uselxmlfor XPath, XSD validation, and XSLT. - XSD (XML Schema) is the industry standard for XML validation — it validates data types, patterns, cardinality, and namespaces. DTD is legacy; RELAX NG is simpler than XSD.
- XPath expressions select nodes using path syntax. Use
XPathEvaluatorin browsers andelement.xpath()in Python lxml. Always specify namespaces when querying namespaced XML. - For large XML files, use streaming parsers: Python
iterparse()withelement.clear(), or Node.jssaxstream. DOM parsers load the entire file into memory. - CDATA sections (
<![CDATA[ ... ]]>) let you embed raw text with special characters. Use for embedding code, SQL, or HTML inside XML without escaping every&and<. - XML namespaces prevent element name conflicts. Always specify
xmlnswhen processing namespaced XML — querySelector does not work with namespaces, usegetElementsByTagNameNS()instead. - Real-world XML: RSS/Atom for feeds, SOAP for web services, Maven pom.xml for Java builds, Android layouts, and Office Open XML (.docx, .xlsx) all rely on XML.