FireEye Malware Intelligence Lab:Do AntiVirus Products Detect Bots?

Less than half the time in practice.

At least if you believe the results of a quick study I just spent the last few days doing.

Let me first introduce myself, since I'm hoping to now devote a fraction of my effort to blogging here to augment Atif and the rest of the reseach team's efforts over the last year. I'm Stuart Staniford, the Chief Scientist here at FireEye. Prior to FireEye, I have a long history in the intrusion detection field, starting in the research arena at UC Davis back in 1994, and then doing a variety of research projects with a small government contractor I started called Silicon Defense (you can see my research here). I've been working for various Silicon Valley security companies (mainly as a consultant) since 2004. I've been working with FireEye for about three years in total, initially as an outside consultant, and joined full time in Jan 2008 when I became very excited about the potential of the technology I was involved in creating here to detect malicious websites. That formed a core part of our 4.0 product version which has just been released, thus freeing me up to do a little blogging.

Now to the antivirus study - here's what I did. FireEye has a number of appliances where the customer allows us access to the malware the box discovers in operation. I took a couple of the boxes that have been running the 4.0 software (beta and/or production versions). This is the software capable of detecting web infections as they occur. It inspects all objects being carried by HTTP, parses them rapidly, and then uses a variety of statistical anomaly techniques to find "strange" looking objects, relative to the normal syntax for that kind of object. It takes these strange objects and tries them in a browser inside FireEye's heavily instrumented virtual machine on the appliance (running an actual browser in an actual operating system). If bad things happen to the virtual machine (execution of inappropriate parts of memory, registry changes, extra processes, etc, etc ad nauseum) then we know for sure that the object is malicious. By this means, we believe we are able to detect in excess of 90% of drive-by downloads with few false positives (basically, false positives only arise when we have a bug in the product - in theory we should have none and we fix them when we find them).

One of the side effects of the virtual machine analysis is that we capture a number of Windows binaries. The typical scenario for a web-driven bot is that you accidentally brush up against a compromised website that has had an <iframe> inserted which brings you (possibly via a chain of other sites) into contact with an exploit server which delivers you some malicious javascript (usually) that exploits your browser to take control of the machine. At that point, the payload will download a number of binaries (sometimes just one, but often more) which perform the various bot functions. If we find a virtual machine going out and trying to get a file, and then executing it as a binary, it's pretty much an open and shut case, and those binaries are obviously of some interest.

I took a pretty-much-random sample of 217 of these binaries which were originally found by our appliances between the last week of September and yesterday. Over the course of the last three days (11/17-11/19 Pacific time) I (manually) uploaded each of these 217 binaries to VirusTotal which runs 36 current antivirus programs on it. In most cases, VirusTotal had already seen that binary (it had the same MD5 hash as a binary someone else had uploaded). In that case, it gave me a page that looked something like this:

As you can see, this records the first time VirusTotal saw that MD5, the most recent time someone chose to run the analysis (with the 36 AV programs) on a binary with that MD5, and the score at that time (how many of the 36 programs thought the file was bad - in this case only 4/36 detected this particular binary).

I used two main pieces of information from this analysis. The first piece was the initial time that the file was first submitted to VirusTotal (which I take to be some kind of proxy for "when did this particular MD5 become known to the security community"). The second piece was the detection rate - what fraction of the 36 AV products covered by VirusTotal where successful in detecting this binary. If this was most recently assessed within the period when I was doing my submissions (17th Nov-19th Nov), I used the existing assessment. If not, I asked VirusTotal to rescan the file. Wherever VirusTotal had not seen this particular MD5 before, that also triggered a scan. An example of part of the scan results for a given file looks like this:

The metric I used here was just the fraction of products that detected a particular file. In some sense, this represents the chance that a randomly picked antivirus product would detect this particular file.

So let's look at some results. This first graph plots a scattergram for two dates. The date on the x-axis is the file creation date for this particular sample in our central repository at FireEye. The date on the y-axis is the date at which VirusTotal recorded the first submission of a binary with that particular MD5. The straight line is where a particular sample would lie if it was first tested at VirusTotal at the exact same time we added it to our repository after an appliance captured it.

Ok, so what the appliances are doing is essentially sampling the binaries that are out there on the malweb being delivered to victim machines at any given time. So as you can see, the points mostly cluster along the line. What this shows is that mostly VirusTotal hears about a given MD5 about the same time (give or take a few days or weeks) as a FireEye appliance captured it. It's rarer for us to see executables that have been very long known to the security community (which would show up deep in the lower right of the graph). It's also less common (but not unknown) for us to find executables with an MD5 which was previously unknown to VirusTotal until I uploaded the sample there weeks or months later (which show up in the balloon I've drawn along the top of the graph). My interpretation of these cases is that these are highly polymorphic situations where a particular binary is produced with so much variation in the actual byte sequence that any given MD5 of it is rarely used and never becomes known to the community. (There are dedicated tools for producing superficially different binary versions of the same functionality).

One caveat about this analysis - in doing this in haste I decided to use the file creation date in our repository for the analysis. The samples get moved from the appliances to the repository by a manually initiated process which in theory happens every day, but in practice seems to sometimes get neglected for a few days and then caught up on (you can see this in the vertical line structure in places in the graph above). So the FireEye file creation dates used in the graph are actually 1-3 days later than when the appliance really captured the binary. I could have gone back and manually verified the original dates on the appliance, but that would have been a lot more work than just doing a 'find' command on the repository, and I was too lazy to do it for a blog post. So just bear in mind that there is some variable lag of 1-3 days in our timings, and treat the conclusions here as qualified by that degree of approximation.

If we compress the graph above into a cumulative density plot of the difference between the two times (when a particular executable was captured by us, and when someone first gave it to VirusTotal) it looks like the following:

On the left side, you can see that there are about 5-10% of the cases where the binary MD5 appears to be rare (and I take this to mean highly polymorphic). On the right side, you can see there are about 5% of cases where we find old hoary binaries that have been out for weeks. However, 80%+ of cases are in the middle where we are finding binaries a few days before, or up to a week after, VirusTotal first hears about them.

This tells us the rough modus operandi of the malweb operators with respect to these binaries. They generally create a new binary packing, use that exact version for a few days to a week or so, and then discard it once it becomes widely known. Bear that in mind as we go forward - the lifetime of a newly created malweb binary is a few days to a week before it's discarded. Thus in assessing antivirus detection, it's much more important to look at ability to detect new binaries than old ones.

It's also worth noting in passing that, allowing for the lag, about half of the binaries FireEye finds are unknown to VirusTotal at the time we find them (in an MD5 sense).

Next, let's look at detection rates. Again, recall that I updated all the AV runs on VirusTotal during the period 11/17-11/19 (I have shown that interval in the red window in this next graph). So what I am plotting next is the fraction of AV products that find a given sample during those three days as a function of the date when VirusTotal first saw it (the graph against when FireEye saw it looks roughly similar).

As you can see, on the whole, the longer a MD5 has been known to VirusTotal, the more products will detect it. There's a lot of scatter, but the R² is a statistical measure which tells us that 28% of the variance in the data is explained by how long the sample has been known. I've shown a linear fit, which clearly doesn't capture everything that's happening in this data, but gives a rough idea of the trend.

So the conclusion is that AV works better and better on old stuff - by the time something has been out for a couple of months, and is still in use, it's likely that 70-80% of products will detect it (still falling quite short of 100% though).

However, from the analysis of the time data, we know that what's really important is the performance on malware samples that are only a few days old, because the sample is likely to get discarded by the bad guys pretty soon after that. If you look just at the samples that fall in the three day window when I was actually updating samples at VirusTotal, the average detection rate is only 40% of products. The trend in the data breaks sharply downwards on very recent executables. This justifies my claim above that AV is likely to detect a currently used bot binary "less than half" the time.

I actually suspect it's worse than that if I got more precise with my timings. What I really need to do is figure out how to add some scripting on the appliance so that I can automatically look up a new binary on VirusTotal right when we very first capture it. I have a feeling that measured that way, the average VirusTotal score would be even lower. Perhaps that can make the subject of a future blog post.

FireEye Malware

Intelligence Lab

Do AntiVirus Products Detect Bots?

TrackBack

Recent Comments

Search this Blog

Subscribe