July 29, 2019 by Richard Bejtlich
In the course of my network security monitoring work at Corelight, I’ve encountered the terms detection, inference, and identification. In this post I will examine what these terms mean, and how they can help you describe the work you do when investigating normal, suspicious, and malicious activity in your environment.
Let’s start with detection. I previously defined it in my 2004 book The Tao of Network Security Monitoring, where I wrote:
Detection is the process of identifying intrusions. Intrusions are policy violations or computer security incidents. Kevin Mandia and Chris Prosise define an incident as any “unlawful, unauthorized, or unacceptable action that involves a computer system or a computer network.” (bold emphasis added)
Note that I cited the 2001 book Incident Response: Investigating Computer Crime, a foundational text in the digital forensics and incident response (DFIR) field.
Walking back up this chain of definitions, it becomes clear that in order to detect an intrusion, an organization must define “unlawful, unauthorized, or unacceptable,” generally via a security policy. Without that definition, we cannot say if activity qualifies as an incident.
For example, I recently witnessed a discussion on Reddit concerning logins to a File Transfer Protocol (FTP) server. The original poster was concerned that he had created a security incident by logging into an FTP server that accepted any username and password.
I saw that situation in a different light. If the owner of the FTP server deployed it to allow such access, then using it in the expected manner was not a security incident. In fact, so-called “anonymous FTP servers” have been a popular means of distributing software for decades.
A solid example of detection involves an organization stating that it is unacceptable for a device in their care, such as an FTP server, to be used to host pirated software. If intruders compromised the server, either by using an exploit or abusing a misconfiguration, they could take advantage of it to distribute such unwelcome files. Discovering this situation, via any number of technical or non-technical means, would qualify as detecting a security incident.
How does detection relate to inference and identification? As much as I am loathe to quote dictionaries, the first two definitions of “inference” offered by Merriam-Webster are illuminating:
I am especially interested in definition 2, as it mentions that an inference is a product — a conclusion based on evidence. Definition 1 also encourages the use of “if-then” thinking, i.e., “if this aspect of network traffic is present, then it is likely that the following condition is in play.”
For example, in 2002 Steve Bellovin published “A Technique for Counting NATted Hosts.” His goal was to first identify the use of network address translation systems, and to count the number of clients using those NAT devices to access the Internet. By observing the value of the Internet Protocol identification field in packets leaving a designated IP address, Bellovin was able (at least in 2002) to have some success with both goals.
Bellovin could infer the presence of a NAT device and count the number of clients using it. If he had visibility to both sides of the NAT device, i.e., behind the NAT and in front of the NAT, he could directly observe its presence. Without this access, he instead relied on inferring the existence and effects of NAT.
Therefore, an inference is an interpretation or a conclusion based on evidence, despite the absence of direct observation and confirmation, and without judgement of its nature. Inference is also the process by which this conclusion is reached. We can speak of “an inference” and “inference” fairly naturally, as the former refers to a conclusion and the latter to a process.
It’s worth asking if we could say the same for the previously defined term “detection.” Detection is definitely a process, but is it also a conclusion? This would require us to speak of “a detection,” or “detections,” which to me sounds unnatural. I prefer to preserve detection as a process, with the product being an incident. Inferring an unexpected NAT device in a corporate environment could qualify as a security incident if defined by policy as being unauthorized, or it could be completely innocent and within acceptable norms.
On a related note, the term “hypothesis” comes to mind. A hypothesis is an also an interpretation, but it is often formed before the questioner possesses evidence. The hypothesis is a reasonable explanation that requires testing against evidence. An inference is farther down the line of reasoning, as one derives it from evidence.
With detection and inference defined, let’s turn to identification. To me, identification refers to a firm discovery based on close observation. Whereas inference implies a degree of uncertainty, identification carries an air of authority. We might say that scientists infer the presence of water ice at the lunar poles due to various observations from orbit, but it could take a rover and sampling lunar materials to positively identify water ice in the same locations. There is a sense of finality with identification.
Like inference, identification is neither inherently good nor bad. Also as with inference, I prefer to avoid speaking of “an identification” or “identifications.”
Therefore, identification is the process of discovering and confirming a fact, thanks to direct observation, without judgement of its nature. We can say that we infer the presence of a NAT in a corporate environment based on investigating network traffic, but we positively identify its presence by validating its existence by speaking with the device’s owner, or by gaining access to the device itself, perhaps via an unsecure administrative portal. If the NAT is unauthorized, we have detected a security incident.
I recognize that people with good intentions can disagree with these definitions. However, this is my attempt at differentiating between three terms that we use when describing how we understand network activity. To summarize:
Thinking in these terms, you can see how data from Corelight and Zeek can help you understand activity in your network environment. If you can accurately and precisely define an activity in policy and in Zeek’s scripting language, and it manifests, you are likely to have detected a security incident. If you are able to directly observe and confirm network activity, but do not assign benefit or malfeasance to it, you are performing identification. If you lack the ability to directly observe and confirm the nature of a network event, but you can draw conclusions based on available evidence, then you are making an inference.
I hope you find these definitions useful in your organization!