File identification

Last updated 8 months ago

Frequently that's the first step in a binary analysis. For packer and compiler identification a lot of people still use PEiD. To understand how it works, have a look at an entry of a userdb.txt database file used by PEiD:

[Themida/WinLicense V1.8.0.2 + -> Oreans Technologies]
signature = B8 00 00 00 00 60 0B C0 74 68 E8 00 00 00 00 58 05 ?? 00 00 00 80 38 E9 75 ?? 61 EB ?? DB 2D ?? ?? ?? ?? FF FF FF FF FF FF FF FF 3D 40 E8 00 00 00 00
ep_only = true

The example above will detect Themida/WinLicense v.1.8.0.2 or higher if the byte sequence shwon in signature field is found on entrypoint (ep_only = true option). The only metacharacter supported by PEiD is this ??, that means "any byte here", so any number ranging from 0x00 to 0xFF.

PEiD is great but it has its limitations: not scriptable, designed for Windows GUI, closed source, etc. To fix that, I've rewritten PEiD engine in pepack (part of pev PE analysis toolkit) back in 2012 so I could get the result of checking hundreds of binaries in a formatted output format like JSON but I still had and old database to rely on.

Others came up with different solutions like converting PEiD signatures to Yara rules. An Yara version of the Themida signature above would be as follows:

import "pe"
rule themida1802 {
strings:
$a = { B8 00 00 00 00 60 0B C0 74 68 E8 00 00 00 00 58 05 ?? 00 00 00 80 38 E9 75 ?? 61 EB ?? DB 2D ?? ?? ?? ?? FF FF FF FF FF FF FF FF 3D 40 E8 00 00 00 00 }
condition:
$a at pe.entrypoint

Of course Yara can do much more than this. It is a must for binary analysts but for packer identification it is nice to have an up to date signature database ready for use. The closest I got from 100% of accuracy is using a tool called DIE (Detect It Easy):

Screenshot of DIE 0.98 running on OS X 10 El Capitain

DIE has a lot of useful features and just like PEiD, its signatures are open so everyone can see how it works and create new ones. The Armadillo signature shown in the screenshot is as follows:

// DIE's signature file
init("protector","Armadillo");
function detect(bShowType,bShowVersion,bShowOptions)
{
if(PE.compareEP("60E8000000005D50510FCAF7D29CF7D20FCAEB0FB9EB0FB8EB07B9EB0F90EB08FDEB0BF2EBF5EBF6F2EB08FDEBE9F3EBE4FCE99D0FC98BCAF7D1595850510FCAF7D29CF7D20FCAEB0FB9EB0FB8EB07B9EB0F90EB08"))
{
sVersion="3.X-9.X";
bDetected=1;
}
else if(PE.compareEP("558BEC83EC0C5356578B450850FF15........83C4048945FC8B45FC51B900080000B906000000"))
{
sVersion="4.44a public build";
bDetected=1;
}
else if(PE.compareEP("E8E3400000E916FEFFFF6A0C68........E8441500008B4D0833FF3BCF762E6AE05833D2F7F13B"))
{
sVersion="5.00";
bDetected=1;
}
else if(PE.compareEP("837C2408017505E8DE4B0000FF7424048B4C24108B54240CE8EDFEFFFF59C20C006A0C68"))
{
sVersion="5.00";
sOptions="DLL";
bDetected=1;
}
else if(PE.compareEP("6A..8BB5........C1E6048B85........2507....8079054883C8F84033C98A88........8B95........81E207....8079054A83CAF84233C08A82"))
{
sVersion="2.xx";
sOptions="CopyMem II";
bDetected=1;
}
else if(PE.compareEP("60E8........5D5051EB0FB9EB0FB8EB07B9EB0F90EB08FDEB0BF2EBF5EBF6F2EB08FDEBE9F3EBE4FCE959586033C9"))
{
sVersion="3.00";
bDetected=1;
}
else if(PE.compareEP("60E8........5D5051EB0FB9EB0FB8EB07B9EB0F90EB08FDEB0BF2EBF5EBF6F2EB08FDEBE9F3EBE4FCE959585051EB"))
{
sVersion="3.00a-3.70a";
bDetected=1;
}
else
{
if((PE.getMajorLinkerVersion()==0x53)&&(PE.getMinorLinkerVersion()==0x52))
{
for(var i=0;i<=PE.nLastSection;i++)
{
var nOffset=PE.section[i].FileOffset;
if(PE.compare("'PDATA000'",nOffset))
{
sVersion="6.X-9.X";
break;
}
}
if(sVersion=="")
{
if(PE.section.length>7)
{
sVersion="6.X-9.X";
}
}
bDetected=1;
}
}
return result(bShowType,bShowVersion,bShowOptions);
}

DIE signatures are written in JavaScript so you have the full power of a Turing complete programming language to use loops, conditions, etc. DIE also allows you to write custom scripts and plugins to run against files. For example, in a recent malware analysis I noticed the C&C server URL was hex-encoded in many PE samples in a fixed file offset (0xa6748), so I wrote a script called MalwareC2 to quickly extract this specific from the samples:

function load()
{
script.addMessage("Loading me...");
}
function name()
{
return "Malware C2";
}
function info()
{
return "Malware C&C URL extractor 1.0";
}
// https://stackoverflow.com/questions/3745666/how-to-convert-from-hex-to-ascii-in-javascript
function hex2a(hexx)
{
var hex = hexx.toString();
var str = "";
for (var i=0; i < hex.length; i+=2)
str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
return str;
}
function run()
{
var pe = PEFile;
var szCurrentFile = script.getCurrentFileName();
pe.setFileName(szCurrentFile); // Required call!!!
var szURL = pe.getString(0xa6748, 255); // get "687474703A2F2F"
if (szURL.length < 1)
{
script.addMessage("Encrypted URL not found. Aborting...");
return;
}
script.addMessage("Decrypted C&C URL:\n" + hex2a(szURL));
}

When analyzing this particular family samples, I just need to load MalwareC2.sg script against it, using the Script tab to get the C&C URL:

DIE is very powerful. Following is an incomplete list of its main features:

  • Includes a signature editor, a debugger and examples.

  • Supports scripts and even more advanced plugins.

  • Analyzes any file type, not PE files only.

  • For some file types like ELF and PE, DIE has a structure parser and editor.

  • Hexadecimal editor.

  • Overlay, resources and other objects extractor (dump).

One last example with DIE: when you are taking notes of findings in an analysis of a PE file with ASLR enabled, you may mess up all the addresses as they change every time you load the file. Using DIE, you can quickly disable ASRL bit on its bitmask field. Just go to PE -> NT Headers -> Optional Headers and click on the ... labeled button besides DLL characteristics field. In the next window you just need to uncheck Read only and DYNAMIC_BASE checkboxes and click Apply:

DIE can detect protectors, libraries, compiler, linker and other tools versions used to build a program. You can even extend DIE to detect crypters and/or any other thing you think you should using custom signatures, scripts and plugins.