Do You Know Your NSM Data Types?

When I first began writing about network security monitoring in 2002, I based my understanding on my experience in the Air Force Computer Emergency Response Team (AFCERT) and the tools and processes we used to detect criminal and nation-state intruders. Our Automated Security Incident Measurement (ASIM) platform, built on code developed by Todd Heberlein as his “NSM,” offered analysts three forms of data: events, in the form of network intrusion detection system (NIDS) alerts; session data, similar to the Zeek or Corelight conn.log; and full content data, rendered as a transcript of human-readable text or a PCAP file to be opened in a new tool, then called Ethereal (now Wireshark).

I wrote about NSM and these three data types for the first time in late 2002 while working at Foundstone as an incident response consultant for Kevin Mandia. Our consultants were writing the fourth edition of the wildly popular Hacking Exposed book, and they asked me to contribute a case study on NSM. I included an example of the three data types and explained why events or alerts were insufficient, and that session and full content data were needed to answer the common “now what?” question analysts posed when staring at an independent NIDS alert.

Over the years I considered the need to add additional NSM data types, and by 2013 my fourth book, The Practice of Network Security Monitoring (PNSM), had expanded the NSM data types list to seven. The entire first chapter of the book is available for free on the publisher’s Web site, and includes example data for each type. Briefly, the seven types were the familiar full content, alert, and session data, but also included extracted data, transaction data, statistical data, and metadata, as shown below, from page 16.

Recently I’ve been trying to simplify this collection, namely because at least two of the data types are more about doing something with the data, and less about the data themselves. To that end, statistical data and metadata don’t fit the bill anymore.

Statistical data is the result of mathematical, computational, or even human analysis of one or more aspects of network traffic. Statistical data isn’t inherent to the data itself, but builds on the other types. Similarly, metadata, as described in PNSM, is “data about data,” pulled from sources outside the network traffic, such as WHOIS information, or Autonomous System Numbers (ASNs), or other enrichment.

While both types are incredibly important to NSM, threat intelligence, and security operations, they are not truly NSM data types, when one defined NSM data as forms of data present on the wire or in the air.

We can make another simplification when we consider that session data and transaction data are essentially the same data type, except session data is transaction data at a lower level. Session data is basically layer 4 conversation information, recording source IP, source port, destination IP, destination port, IP protocol, timestamp and duration, and byte and packet counts. I used the term “transaction data” to describe logs that went higher than layer 4, such as those provided by Bro, now called Zeek.

In brief, session data was what one could collect via NetFlow, Argus, or the Zeek conn.log, and transaction data included the rich panoply of log data generated almost exclusively by Zeek. For that reason, there doesn’t seem to be a need to have a separate session log type, and going forward I plan to refer to all such logs as “transaction data.” (This has the added benefit of avoiding questions of “sessionless” protocols, for which “session logs” can be generated.)

Without too much further thought, I stepped into the Twitterverse and claimed:

“There are basically 3 outputs from a *passive* network visibility platform: 1) summarize traffic (@Zeekurity/@corelight_inc), 2) record traffic (pcap), and 3) judge traffic (“IDS”). #networksecuritymonitoring relies on this data, and tools tend to converge on these capabilities.”

Unfortunately, I conveniently ignored one final form of NSM data from my 2013 book — extracted content!

This was a shame because extracted content is one of the main reasons customers love Corelight sensors. Customers deploy our technology to address many use cases, but I’ve heard of several who just wanted to extract files from the network in order to feed their malware sandbox or analysis tools. This omission was a serious oversight, because these extracted files (imagine Office documents, PDFs, executables, and so on) aren’t really well-addressed by the “full content” NSM data type. Sure, the content is present, but full content refers to everything — headers, footers, encoding, whatever exists around the file or other data transferred between endpoints, and so on.

Fortunately my friend hogfly (@4n6ir) reminded me of this oversight, and I realized I needed to add extracted content back to my simplified NSM data type summary.

If I were to annotate the book excerpt from page 16 to account for these changes, it would look like this.