At PH-Neutral, I recently presented a bunch of information about how no two PDF readers will see a PDF file in the same way. Which is useful if you’re trying to sneak an exploit past a smart A/V scanner. [Unfortunately, most A/V scanners are not even smart enough to find an exploit sitting in easy-to-read plaintext at the top of a well-formed file.]
Someone took a picture of one of my slides, which has been quite popular, based upon the number of retweets and views.
So, I’ll explain how this works, for the benefit of everyone who wasn’t there at the time&hellip
How Object References Work in PDF
You can take any other PDF data type and give it a number by wrapping it in “obj” and “endobj“. Then later on, when you want to use that chunk of data, you can reference it, by number, with the “R” operator. (See Figure 1.)
These two examples are equivalent to Acrobat…
2 0 obj (Hello World) endobj 3 0 obj << /Example 2 0 R >> endobj
3 0 obj << /Example (Hello World) >> endobj
Acrobat, and most other PDF readers I assume, expand the references at time of use, not at the time of parsing. So if you define object number one to be a number two “2“, and then try to use it like this: “1 0 R 0 R” it doesn’t work. That is not equivalent to “2 0 R“. If anyone knows of a PDF reader that actually does this, that would be really neat, because that parser would unintentionally make PDF equivalent to Lisp.
Think about it, you could write Ω = (λx. x x) (λx. x x) as “1 0 obj 1 0 R 1 0 R endobj“
- All Page data
- All Whitespace, except for End-Of-Line after comments
- The version number part of
%PDF-1.1 - The
%%EOF - The
xreftable - And thus also
startxref - Most Object
/Types
%PDF-anything, but if the file is too confusing for Acrobat, you need at least the first number. Like%PDF-1.- A
trailerwith a/Rootdictionary for the Catalog - A
/Pagesdictionary, but this can be empty, just as long as it’s a dictionary type. - An
/OpenActionif you want to launch your Javascript upon file open. - The Javascript Action.
Using Didier Steven’s well-formed PDF example, for this example…
%PDF-1.1 1 0 obj << /Type /Catalog /Outlines 2 0 R/Pages3 0 R/OpenAction7 0 R >> endobj 2 0 obj << /Type /Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet [/PDF /Text] /Font << /F1 6 0 R >> >> >> endobj 5 0 obj << /Length 56 >> stream BT /F1 12 Tf 100 700 Td 15 TL (JavaScript example) Tj ET endstream endobj 6 0 obj << /Type /Font /Subtype /Type1 /Name /F1 /BaseFont /Helvetica /Encoding /MacRomanEncoding >> endobj 7 0 obj<</Type /Action/S /JavaScript /JS (app.alert({cMsg: 'Hello from PDF JavaScript', cTitle: 'Testing PDF JavaScript', nIcon: 3});) >>endobj xref 0 8 0000000000 65535 f 0000000010 00000 n 0000000098 00000 n 0000000147 00000 n 0000000208 00000 n 0000000400 00000 n 0000000507 00000 n 0000000621 00000 ntrailer <</Size 8/Root1 0 R>>startxref 773 %%EOF
And remember, all objects references can be replaced by their contents. And so we get…
Note: I’ve only tested this in Acrobat v9.1.3, from what I’ve been told, Acrobat v8 will throw an error on this file.
%PDF-1.
trailer<</Root<</Pages<<>>/OpenAction<</S/JavaScript/JS(app.alert({cMsg:'Stuff Goes Here'});)>>>>>>
There are only 71 bytes required, aside from the Javascript code. With the improvements below, it drops to only 58 bytes required.
Without any pages, Acrobat kinda sits there for a while going duh, a scroll or a click event will make it realize there’s an error.
An /OpenAction gets to happen before the above error, but if you use the “Will Close” (/WC) Action mentioned below, the Javascript execution happens after this error.
Tavis Ormandy pointed out that you can terminate the “%PDF-” with a NULL “\0” byte, which saves two bytes (compared to “1.\n“).
Ryan MacArthur pointed out that you can use a “Will Close” “Additional Action” rather than an “OpenAction”, which saves quite a few bytes, but with a null page object, Acrobat won’t actually perform the Will Close action until some other action is performed by the user. Such as clicking or scrolling on the non-existent page. Immediately Closing or Quitting after opening the document won’t trigger the “Will Close” action.
00000000 25 50 44 46 2d 00 74 72 61 69 6c 65 72 3c 3c 2f |%PDF-.trailer<</| 00000010 52 6f 6f 74 3c 3c 2f 50 61 67 65 73 3c 3c 3e 3e |Root<</Pages<<>>| 00000020 2f 41 41 3c 3c 2f 57 43 3c 3c 2f 53 2f 2f 4a 53 |/AA<</WC<</S//JS| 00000030 28 29 3e 3e 3e 3e 3e 3e 3e 3e |()>>>>>>>>| 0000003a
Julia Wolf @ FireEye Malware Intelligence Lab
Questions/Comments to research [@] fireeye [.] com







evince in Debian fails to read the 71 byte version:
Error: PDF file is damaged - attempting to reconstruct xref table…
Error: Couldn’t find trailer dictionary
Error: Couldn’t read xref table
Error: PDF file is damaged - attempting to reconstruct xref table…
Error: Couldn’t find trailer dictionary
Error: Couldn’t read xref table
Same for okular on gentoo.
Error: PDF file is damaged - attempting to reconstruct xref table…
Error: Couldn’t find trailer dictionary
Error: Couldn’t read xref table
Yeah, I expect that *everything* except for Adobe Acrobat 9.1.3 is going to error out. This PDF file is way-way-way out of spec. Older versions of Acrobat won’t even read this.
ePDFViewer on Ubuntu show “encrypted document. Enter password” popup.
Do you want to make a post / analysis about this http://seclists.org/fulldisclosure/2010/Jul/7 case?
> Do you want to make a post / analysis about this http://seclists.org/fulldisclosure/2010/Jul/7 case?
I suppose I could, though it’s not terribly new or interesting. There is a ton of this sort of activity every day, and has been for years. This particular spam campaign was also pretending to be from wordpress.com, also saying that you’d just signed up for an account, with all links leading to that infectious PDF. (Which used one of three possible exploits, all at least a year old.)
Using the message attached to that FD post, this particular article of spam was send from 202.13.62.5. Observe:
[...]
> Received: from TUHWJATY (unknown [202.133.62.5])
> by stg.iki.fi (Postfix) with ESMTP id 26EC819D5C
> for ; Thu, 1 Jul 2010 13:25:28 +0300 (EEST)
> Received: from 202.133.62.5 (port=0267 helo=[swaraj])
> by mail.ragoarts.com with asmtp
> id 981EFE-000841-91
> for ______hack.fi; Thu, 1 Jul 2010 15:56:02 +0530
> Someone from the IP address 202.133.62.5 has registered the account “fgeek” with [...]
The IP address of the spam drone is included in the body text, as well as the recipient username. Every single message is like this, just with the corresponding values filled in.
If it wasn’t 2:30am, and I didn’t have something else to be finishing right now, I’d probably lookup the name of whichever particular spam bot this is.
(I’ve redacted the email address of the recipient, just to avoid that much more spam sent to them.)
I think you’ll find this illuminating:
http://www.google.com/search?q=http%3A%2F%2Fchipsnchils.com%2Fwordpress.html