Saturday, April 08, 2017

Understanding File and Data Formats

When I started down my path of studying techniques and methods for computer forensic analysis, I'll admit that I didn't start out using a hex editor...that was a bit daunting and more than a little overwhelming at the time.  Sure, I'd heard and read about those folks who did, and could, conduct a modicum of analysis using a hex editor, but at that point, I wasn't seeing "blondes, brunettes, and redheads...".  Over time and with a LOT of practice, however, I found that I could pick out certain data types within hex data.  For example, within a hex dump of data, over the years my eyes have started picking out repeating patterns of data, as well as specific data types, such as FILETIME objects.

Something that's come out of that is the understanding that knowing the structure or format of specific data types can provide valuable clues and even significant artifacts.  For example, understanding the structure of Event Log records (binary format used for Windows NT, 2000, XP, and 2003 Event Logs) has led to the ability to parse for records on a binary level and completely bypass limitations imposed by using the API.  The first time I did this, I found several valid records in a *.evt file that the API "said" shouldn't have been there.  From there, I have been able to carve unstructured blobs of data for such records.

Back when I was part of the IBM ISS ERS Team, an understanding of the structure of Windows Registry hive files led us to being able to determine the difference between credit card numbers being stored "in" Registry keys and values, and being found in hive file slack space.  The distinction was (and still is) extremely important.

Developing an understanding of data structures and file formats has led to findings such as Willi Ballenthin's EVTXtract, as well as the ability to parse Registry hive files for deleted keys and values, both of which have proven to be extremely valuable during a wide variety of investigations.

Other excellent examples of this include parsing OLE file formats from Decalage, James Habben's parsing Prefetch files, and Mari's parsing of data deleted from SQLite databases.

Other examples of what understanding data structures has led to includes parsing Windows shortcuts/LNK files that were sent to victims of phishing campaigns.  This NViso blog post discusses tracking threat actors through the .lnk file they sent their victims, and this JPCert blog post from 2016 discusses finding indications of an adversary's development environment through the same resource.

Now, I'm not suggesting that every analyst needs to be intimately familiar with file formats, and be able to parse them by hand using just a hex editor.  However, I am suggesting that analysts should at least become aware of what is available in various formats (or ask someone), and understand that many of the formats can provide a great deal of data that will assist you in your investigation.

No comments: