Editor's note: This is the fifth in a series of Corelight blog posts focusing on evidence-based security strategy. Catch up on all of the posts here.
The saying “data is king” has been around for quite a while and we all know that the world operates and makes decisions on digital data 24x7x365. But, is data king in the field of cybersecurity? I believe that evidence - not data - is what is needed to speed defenders’ knowledge and response capabilities, so let's talk about both.
Data is a precursor to evidence and is often collected from the network, endpoints, infrastructure (on-premise and cloud), applications in the form of logs, and even people (e.g., law enforcement, partners, etc.). The scale of collections in large enterprises can prove to be an overwhelming amount of data and can lead to questions like: Where do you store all that data? How long do you need to keep it? How do you correlate it to make sense of it, or turn it into evidence that enables an organization to speak with confidence about what happened?
Evidence, on the other hand, begins with the data but extends and enhances it through contextual enrichment (e.g. GeoIP, blocklist/allowlist, CMDB asset information, CVE information, etc.) and correlation (e.g. what happened before, what happened after, etc.). Evidence, with its context and correlation, is what is needed when revealing details of an incident, intrusion, or breach to constituents, board of directors, government oversight agencies, or a court of law. Richard Bejtlich illuminated an important use case for evidence in an earlier blog. So how do you turn data into evidence?
Let's take a relatively easy example of the most popular tool used in security operations centers: a SIEM (security information and event management), such as Splunk, Humio, Elastic, Sentinel, etc., in it’s simplest form, is used to collect data from different data sources and sends alerts on potential security threats and vulnerabilities. A necessary, but not sufficient, data source is network data. Network data can be passively collected (so an adversary wouldn’t know whether their activity is being seen) and is immutable (it can’t be manipulated, bypassed, or deleted). The network data is turned into evidence as it is parsed, analyzed, normalized, given context, and correlated with previous and following actions. This evidence (i.e. contextual understanding of what the data means) elevates defenders’ capabilities, allowing them to really focus on their higher-risk detections based on their unique environment. Zeek - the de facto, open source network security standard - is the basis for collecting network data and transforming it into network evidence. The open source community for Zeek has more than 25 years of experience with building network detections for what are known bad behaviors (not just signatures) and these detections continue to grow based on the real-world experience of members across the globe.
Let's explore a bit deeper into how raw data can be turned into evidence, again using network data as the example. Network data can and should be examined and analyzed as it flows, which is usually referred to as network traffic analysis (NTA). But what is actually examined? Is it really that useful? The protocols (e.g. HTTP, DNS), the timing of the network sessions (e.g. human keystrokes over SSH), the metadata of encrypted network traffic (e.g. SSL, RDP, SSH), or even just the identification of various VPNs in use are examples of how network data can be analyzed. But what now? Notifications or alerts are sent to the security operations center but how do they know what to investigate or what is the most important/dangerous? It is easy to say that each alert should be investigated but the reality is that hundreds of these might be received in an hour and no one has enough manpower to do this. Having the context and correlated activities surrounding the alert (i.e. the evidence) can greatly speed up the human analysis and can be used to fine tune and prioritize the actionable alerts.
As a more complex example, Corelight Investigator has a component that uses machine learning to analyze a series of DNS lookups to assess whether their temporal clustering, failure rates, and lexical structure are strongly suggestive of malware using a "Domain Generation Algorithm" to establish a command-and-control connection. Here we take one form of evidence - the information that Zeek traffic analysis provides - and develop higher-level evidence regarding the nature of activity occurring on a potentially infected system.
Whether your organization only uses a SIEM, a combination of SIEM and custom analytics in a data lake, or a SaaS-based NDR solution (e.g. Corelight Investigator) with a SIEM, the most important thing to ensure is that your organization has the evidence necessary to support effective investigations. Comprehensive network evidence supported with machine-learning and other analytics in a fast, intuitive search platform accelerates security operations to the next level. It dramatically simplifies tier one workflows, so teams have more time for hunting and response - activities that move faster than ever when coupled with an intuitive log query engine. With the ability to quickly pivot to the raw data, teams have the evidence to authoritatively detail what happened and understand how the event occurred.
The take away: you need to turn data into evidence. Evidence powers detections. Detection results, coupled with evidence, provide an organization with the knowledge for a ‘defensible disclosure’ (e.g. ability to be able to articulate exactly what occurred, when it occurred, how long did it last, and how was the activity mitigated). Your organization, stakeholders, and oversight bodies will thank you.
By Jean Schaffer, Federal CTO, Corelight