Editor's note: This is the fourth in a series of Corelight blog posts focusing on evidence-based security strategy. Catch up on all of the posts here.
Evidence is the currency cyber defenders use to pay down security debt, balancing the value equation between adversaries and the enterprise. Defenders can use evidence proactively, identifying and protecting structural risks within our zone of control. Evidence can also be used reactively by supporting detection (re)engineering, response, and recovery activities, guiding us back to identifying and protecting structural risks. It is impossible to avoid a security event, but which side we spend most of our cycles on is dependent on our overall data strategy and how we nurture our evidence.
Much has been written on data lakes and data marts. Corelight has previously discovered that for a number of organizations one SIEM is not enough. Organizations often implement a data collection strategy out of fear, collecting everything “just in case.” I challenge the assumption that we must collect EVERYTHING and determine its usage at the point of incident.
The Security Operations Intelligence Supply Chain
Defenders have a responsibility to proactively inform the business of risk to the organization or reactively provide evidence to mitigate realized risk. Our ability to gain decision advantage over adversaries will depend on the selection and conversion of raw data into accurate decision-making knowledge. We view the world through two lenses: the structural context of our organization and the situational context that exists within the structure. Sounil Yu provides a framework that effectively captures the balance between the asset classes we’re enlisted to protect and the activities that defenders perform in the Cyber Defense Matrix. He goes on to contextualize activities as left and right of “boom” with boom signifying a security event. The identify and protect (left of boom) activities are focused on the structural context related to risk in the organization. The detect, respond, and recover activities (right of boom) are largely based on situational context related to risk.
Security teams leverage vendor support or self engineering to convert raw data into actionable intelligence through some number of operational processes. The net result of those actions catalyzes protection or response initiatives across the organization. If we reframe our thinking, we can recognize this is a supply chain. Strong risk assessment activities (threat modeling, penetration testing, compliance engagements, etc.) are fundamental to developing the use-cases (products) our cyber security supply chain supports.
Data Strategy
A complete data strategy allows an organization to work backward from risk to raw logs and create a supply chain that generates information critical to risk reduction activities. It is imperative that your data strategy includes the following attributes:
- Relevant Data: Rather than collecting everything, start with risk. Leverage risk assessment activities such as threat modeling, penetration testing, and vulnerability scanning to determine which log types and underlying attributes provide the most value. Starting with a use case allows you to maximize the usage of currently collected logs before onboarding lower value logs. If you are not starting from zero, ask the following questions:
- Am I able to determine which log or subset of logs provides us with maximum value?
- How many log types are used on a regular basis to support threat hunting or incident response activities? Which specific log attributes provide the maximum value and how can I measure that?
- How many log types are not providing any value (just being collected and stored)?
- The ability to learn, unlearn and relearn: Leverage threat hunting, incident post-mortems, and intelligence feedback to consistently re-evaluate your data strategy. Identify opportunities to provide additional context through the collection of additional log types or the fusion of previously disparate data (e.g. mapping the IP address from a network log to the IP address in a device log allows us to leverage the username field in the device log and combine them with an identity log).
- What number of current log types have combination value to add additional context to hunting or response workflows?
- Am I able to improve my understanding of the relationships between the resources I’m protecting? What happens if those relationships change? Do my data collection and transformation processes allow for agility based on what we learn?
- Stakeholder inputs: Without the business, there is nothing to secure and if the business is unable to translate security outputs into action, there is no value. Listen to the stakeholders and use their feedback to re-evaluate your data strategy. I believe that the security team has the deepest and most complete visibility into all business activities by nature of the logs we collect and/or have access to. If we are able to incorporate business logic into the way we craft our outputs or provide additional analytic value through the logs we collect, it’s a win-win for everyone.
- Are you able to measure the impact of evidence with respect to risk reduction velocity across the business?
- Are there ways that my data collection and transformation processes can increase velocity? Do the business units/stakeholders have enough context?
- Do we have an opportunity to provide stakeholders value outside of security use-cases?
- Automated outputs: Executing ad-hoc queries to support threat hunting or incident response is highly inefficient. Do the work up front to categorize and classify activities within logs. Doing this not only supports analytic workflows but allows defenders to build dashboard-like views of the relevant activities, spot short-term and long-term trends, and identify blind spots in coverage.
- Are you able to identify the activities associated with your detection coverage heatmap (MITRE ATT&CK)? Can you develop a secondary coverage map that reflects the logs required to generate classes/categories of activities?
- What is your ratio of dashboards to collected log types? Is there heavy bias in one area or do you have broad coverage?
- Extensible: The ability to add additional log types, transformation processes, or generate new outputs should be unrestricted. If your security teams or business units develop new use cases, your data strategy and underlying supply chain architecture must allow for friction-free and timely implementation.
The choice of a data lake, data mart or leveraging a SIEM is dependent on your individual environment, previous investments and stakeholder requirements. However, it is imperative that we view our data and evidence as raw materials in the intelligence supply chain and seek opportunities to extract maximum value. In all cases, we leverage the currency in our evidence bank to buy us time through proactive structural change or to buy our way out of unnecessary adversarial impact.
By Bernard Brantley, CISO, Corelight