July 16, 2018 by Alan Saldich
Incident response, threat hunting and cybersecurity in general relies on great data. Just like the rest of the world where virtually everything these days is data-driven, from self-driving cars to personalized medicine, effective security strategies also need to be data-driven.
Whatever security solution, service or business process you implement, it probably relies on data from just four sources: logs, networks, hosts and third party intelligence feeds. Some or all of that data is fed into an analytics stack like Apache Spark™ where advanced analytics, artificial intelligence, machine learning and other techniques can be applied.
When it comes to the data about network traffic though, most organizations are stuck between “not enough data” and “way too much.” The former is normally NetFlow data that provides basic information about network traffic, source / destination IPs, time and date, bytes sent / received and few other important (but sparse) pieces of information. The latter is typically PCAP (packet capture) where every bit on the wire is stored. Since big networks carry a tremendous amount of data, storing it all is non-trivial and can become expensive and cumbersome.
So what’s the alternative? Well for over 20 years, an open source project called Bro has been used in production at some of the world’s largest organizations and biggest networks, namely government agencies, research universities and very large / web-scale companies.
Bro is a network monitoring framework that ingests a copy of all traffic on a network, parses and analyzes it in real time using its event processing engines and scripts, and then outputs data (“Bro logs”) to some external analytics system like Spark. There are dozens of Bro logs for most common protocols like SMTP, SSL, SMB, DHCP, DNS and many others. Each Bro log contains 10 to 40 fields describing that part of the network traffic.
Furthermore, Bro logs include key pivot points in the data that allow incident responders to follow their instincts or observations by taking advantage of unique identifiers for critical aspects of network traffic: all connections (Connection UID) and files (FUID) are uniquely identified. Along with consistent and precise timestamps across all logs, Bro gives incident responders access to everything they’ll need to resolve most security incidents quickly without having to resort to PCAP except in rare instances.
Corelight was founded by the creators of Bro to deliver network security solutions built on this powerful and widely used framework. Since Corelight does not provide an analytics application, we are excited to work with leading companies like Databricks to combine the power of Bro data with the intelligence of Apache Spark and all of its analytic capability.
Staying abreast of the latest threat isn’t the only challenge. The increasing volume and complexity of threats require security teams to capture and mine mountains of data in order to avoid a breach. Yet, the Security Information and Event Management (SIEM) and threat detection tools they’ve come to rely on were not built with big data in mind resulting in a number of challenges:
In order to effectively detect and remediate threats in today’s environment, security teams need to find a better way to process and correlate massive amounts of real-time and historical data, detect patterns that exist outside pre-defined rules and reduce the number of false positives.
Databricks offers security teams a new set of tools to combat the growing challenges of big data and sophisticated threats. Where existing tools fall short, the Databricks Unified Analytics Platform fills the void with a platform for data scientists and cybersecurity analysts to easily build, scale, and deploy real-time analytics and machine learning models in minutes, leading to better detection and remediation.
Databricks complements existing threat detection efforts with the following capabilities:
A leading technology company employs a large cybersecurity operations center to monitor, analyze and investigate trillions of threat signals each day. Data flows in from a diverse set of sources including intrusion detection systems, network infrastructure and server logs, application logs and more, totaling petabytes in size.
When a suspicious event is identified, threat response teams need to run queries in real-time against large historical datasets to verify the extent and validity of a potential breach. To keep pace with the threat environment the team needed a solution capable of:
As an example of how customers are using advanced cybersecurity analytics, check out this recent video from Apple.
For another more general explanation of how Corelight and Databricks can be deployed together, Databricks produced this video that includes an explanation of how they ingest Bro logs as part of their security solution.
It took a team of twenty engineers over six months to build their legacy architecture that consisted of various data lakes, data warehouses, and ETL tools to try to meet these requirements. Even then, the team was only able to store two weeks of data in its data warehouses due to cost, limiting its ability to look backward in time. Furthermore, the data warehouses chosen were not able to run machine learning.
Using the Databricks Unified Analytics platform the company was able to put their new architecture into production in just two weeks with a team of five engineers.
Their new architecture is simple and performant. End-to-end latency is low (seconds to minutes) and the threat response team saw up to 100x query speed improvements over open source Apache Spark on Parquet. Moreover, using Databricks, the team is now able to run interactive queries on all its historical data — not just two weeks worth — making it possible to better detect threats over longer time horizons and conduct deep forensic reviews. They also gain the ability to leverage Apache Spark for machine learning and advanced analytics.
As cybercriminals continue to evolve their techniques, so do cybersecurity teams need to evolve how they detect and prevent threats. Comprehensive network traffic extracted by Bro is the highest quality data available to incident responders and threat hunters, and an essential ingredient to the Databricks analytics platform for cybersecurity. Big data analytics and AI offer a new hope for organizations looking to improve their security posture, but choosing the right platform is critical to success.
Download our Cybersecurity Analytics Solution Brief or watch the replay of our recent webinar “Enhancing Threat Detection with Big Data and AI” to learn how Databricks can enhance your security posture, including ingestion of Bro logs for network traffic monitoring.
Alan Saldich - CMO, Corelight
Brian Dirking - Sr. Director Partner Marketing, Databricks