XML External Entity (XXE) Attack

In this article, I will write about the XML External Entity attack. For this attack to occur, the application must have logic for parsing XML input.

This injection will happen if there is a weakly configured XML parser. A successful attack would be if the attacker would be able to view files on the application server and interact with the backend. This XXE vulnerability could be used to perform server-side request forgery (SSRF) attacks, denial of service (DoS) Billion Laughs Attack, and many more.

What are XXE types?

There is no strict classification of XXE attacks, but we can divide them into two types: in-band and out-of-band(blind).

· In-band are more common than out-of-band ones. In this case, the attacker will receive an immediate response to the XXE payload.

· Out-of-band or so-called Blind XXE, there is no immediate response. This type involves the creation of an external Document Type Definition. For this type, the XML parser also needs to make an additional request to an attacker-controlled server.

What are the cases when attacker can execute this injection?

· In old applications where the version of SOAP is less than 1.2

· Applications where users are logged in based on their sessions – SAML(single sign-on (SSO) login standard). Chances for this attack to happen in this case can be very high because SAML uses XML for identity assertions

· If there are XML inputs or XML uploads into XML documents that can be added from untrusted data and parsed by an XML processor after that.

· There is a high risk when Document Type Definitions (DTD) is enabled

When would application parse XML?

XML is often used in both: frontend and backend web development.

Examples:

The Frontend side of the application can request, for example, an XML file from API and create and present a UI form based on the data in XML. Then we can have an option to add a new field into the form and if we would like to save the changes. Afterward, the XML input would be added into the XML document.

From the backend parsing, XML would be used to transfer the data in some standard format. Also, in mobile development, Android applications use it to create layouts and store configurations.

On the OWASP site, you can find more examples of XXE attacks. Portswigger has a nicely explained example of this attack:

For example, suppose a shopping application checks for the stock level of a product by submitting the following XML to the server:

The application performs no particular defenses against XXE attacks, so you can exploit the XXE vulnerability to retrieve the /etc/passwd file by submitting the following XXE payload:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM “file:///etc/passwd”> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>

This XXE payload defines an external entity &xxe; whose value is the contents of the /etc/passwd file and uses the entity within the productId value. This causes the application’s response to include the contents of the file:

Invalid product ID:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
…

List of preventions for XXE

Using JSON instead of XML and avoiding serialization of sensitive data
As I mentioned before, this attack can happen easily when the application is using SOAP < 1.2, so try to update to the higher version
Implement XSD validation in your application (“XML Schemas”) for all XML file inputs
Patch or upgrade all XML libraries
Use SAST tools for checking out if there are XXE vulnerabilities.

How to prevent if you are using SAML?

SAML language is used to construct authorization statements, whose authenticity is protected by the XML digital signature applied over the statements.

Many attacks happen because of wrong assumptions made by developers; for example, the token is always properly formed XML compliant with SAML schema.

The developers can assume that SAML would have just one Assertion tag in the document (the properly formed SAML would have). With that fact, developers can validate just the first element they get when searching for elements by the tag name in the XML document.

To get list of nodes JS “getElementsByTagName” method can be used:

NodeList xmlNodes = doc.getElementsByTagName(“saml:Assertion”);

To xmlNodes will be assigned the list of matching elements from document with tag Name “saml:Assertion”.

As developers can assume that this is the properly formed SAML with one Assertion tag, they will get the first element and validate it after:

let firstElement = (Element)xmlNodes.item(0);

*As you can guess, this is not the proper way to validate the tag because the attacker can also assume that developers used this approach for the validation. In this case, the attacker can catch the first element (tag) and replace it with a malicious assertion before the original one, and it will never be detected.

With the same logic, some developers use “getElementsByTagNameNS” but the result would be the same: easily inserted malicious script in the first element.

Proper prevention would be:

· Parsing the XML document. Using structure validation based on the supplied schema. Never allow automatic download of schemas from the third party but prefer to use local trusted copies. It would also be good if it is possible to inspect schemas and perform schema hardening. This could be used to disable possible wildcard types or relaxed processing statements.

· Digital signature validation, which verifies the authenticity and integrity of the assertion embedded in the SAML document. This prevents forgery.

**Most important when writing schema is to describe the intended document’s structure precisely.

How to prevent using XSD validation?

I will explain how to create a C# solution to validate XML data.

The most important reason we want to use XSD (XML Schema Definition) validation is that we want the sender and receiver to have the same “expectations” about the content. Using schemas, we need to describe exactly the data so both parties would be clear about them.

Steps:

· Add XML file into the code

When adding XML file, you will just see xml tag:

<?xml version=”1.0″ encoding=”utf-8″ ?>

I will add object User with properties FirstName, LastName, Address, so xml file would look like this:

· Create XML Schema for this file

You will get XML schema structure like this:

· Modify XSD

Now you can modify the file- add validations for FirstName and Address. In this case, I just show how to add validations for these fields, but they will, of course, not prevent the attack; they will just validate the length and the type of mentioned fields.

· Validate XML using XSD

What am I doing in the code?

Getting the local path of Assembly so I can after add XML file name and XSD file name to get their full paths
Creating schema using XmlSchemaSet and XmlSeverityType which are from System.Xml.Schema
Using XMLReader from System.XML so I can create XDocument imported from System.Xml.Linq
When I create document, I want to use validate method that class has and pass schema by which I will validate and the method ValidationEventHandler (I named it like that) which is throwing exception if type is error. In this method you should add all validation logic.

This is just an example on how to create XSD for XML file and which libraries you can use for the validation.

How to prevent with implementation of DTD?

We can also validate XML file using DTD. Here are some differences between XSD and DTD on site.

In this example, I am validating an XML file using a DTD file with DtdProcessing.

Steps:

Setting the validation settings using XmlReaderSettings
Creating the XmlReader object so I can parse the file using the method read()
Creating ValidationEventHandler method which is throwing an exception if the type is an error. In this method, you should add all validation logic.

List of SAST testing tools

SAST testing tools will help you with static application security testing.

SAST tools can be free, commercial, and open-source tools.

A list of the most popular SAST Tools currently are:

Veracode
LGTM
Checkmarx
Klocwork
Reshift
SpectralOps
HCL AppScan
Codacy
Insider CLI
Argon

Why is SOAP version < 1.2 vulnerable to XXE attack and why you should use later versions?

Before version 1.2 external entities were allowed within SOAP messages.

Since version 1.2 some changes were introduced to the envelope and encoding schemas. Both schemas have been updated to be compliant with the XML Schema Recommendation.

You can see the list of recommendations which were used:

· http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

· http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

· http://www.w3.org/TR/1999/REC-xml-names-19990114

· http://www.w3.org/TR/2000/REC-xml-20001006

· http://www.w3.org/TR/2000/PR-xlink-20001220/

Also, additional changes occurred in this version, within the names of datatypes in the XML Schema specification, and some datatypes were removed. If you want check out all changes which were made you can go to this site.

Conclusion

This article presented some prevention steps that could help you defend your application from XXE attack.

The OWASP team, which is constantly working to discover new ways the attackers can exploit your application and perform their malicious actions, are always updating their Prevention Cheat Sheet.

The best way to secure your application would be to always be up to date with the new prevention ways: best libraries to use, best detection tools, etc.

In the end, secure code is the cheapest code!

Cover photo by Joshua Woroniecki

#XXE_attack #XSD #DTD #SAML #vicarius_blog

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About VRX
VRX is a consolidated vulnerability management platform that protects assets in real time. Its rich, integrated features efficiently pinpoint and remediate the largest risks to your cyber infrastructure. Resolve the most pressing threats with efficient automation features and precise contextual analysis.

2022 vRx

About Version 2 Digital

Hello!