Machine learning in cybersecurity: use cases & benefits
- Executive summary
- What is machine learning in cybersecurity?
- A brief history of machine learning and network security
- Relevant use cases for machine learning in cybersecurity
- ML use cases for network threat detection
- The importance of explainability
- Challenges of using machine learning in cybersecurity
- Considerations for ML models applied to network security
- How Corelight utilizes machine learning in network security
Executive summary
- Machine learning (ML) in cybersecurity has many applications that can improve efficiency in security operations centers, enable threat hunting, and improve detection and analysis.
- ML represents a powerful extension of the security toolset, and ML models are a significant enhancement to existing security tools. SOCs can leverage ML-enhanced alerting and analysis to uplevel their skill sets and take a more proactive approach to threat hunting.
- As a rapidly evolving technology, ML deployment can be overwhelming to consider. Security teams should invest in partnerships with industry leaders with an established understanding and use of the technology, as well as its privacy and security implications.
What is machine learning in cybersecurity?
Machine learning (ML) is a subcategory of artificial intelligence (AI) and involves training computer models, or algorithms, to recognize highly complex patterns. This process is similar to, but not the same as, rational decision making. The creation, training, and tuning of the models depends on experts who specialize in computer science, data science, or a combination of the two disciplines.
While the term “AI” is extremely broad and describes a continuum of technology (existing, in development, and theoretical), ML represents a segment that already impacts most industries and the public sector. Cybersecurity use cases are among the most consequential and useful applications of machine learning today. Large Language Models (LLMs), Agentic AI, and Generative AI are additional technologies under the spectrum of AI that are rapidly expanding and also having a significant impact on cybersecurity. While their impact is crucial, this article is focused on the impact of machine learning on cybersecurity.
Machine learning represents a significant jump in the complexity of computer programming. Rather than traditional programming and creating strands of explicit instructions for a computer program to follow, ML models can predict outputs based on sets of training data.
As we have seen with the recent advancements in AI, machine learning models’ predictions have improved considerably, and will continue to do so as the technology evolves, new additional data is processed, and human oversight continues to tune the model. Creators are training their models to deal with complex problems that would be very hard to solve through traditional programming approaches.
The newest ML models can greatly assist humans who must discern patterns in enormous and complex datasets and improve the accuracy of their predictions. Like other analytics (e.g., Zeek scripts, Suricata, LogScale queries), they can help automate critical but repetitive tasks. They can also provide new or more expansive insights when multiple aspects of new activity must be jointly analyzed and/or compared against historical data and baselines.
Machine learning models are delivering much-needed assistance to the cybersecurity industry. If they haven’t already, security teams need to take advantage of the technology and target the relevant use cases like behavior and anomaly detection where ML excels.
Training methods for cybersecurity ML models | DEFINITION |
---|---|
Supervised machine learning |
Labeled data sets are used to train algorithms, create models, and predict outcomes. Human intervention is necessary for tuning and labeling the data sets. In cybersecurity for threat detection, supervised machine learning models may help create rules or script logic that combine multiple features, or work off samples of benign and malicious code to evaluate new samples. ML can train to learn patterns from prior threat activity and techniques, to detect when new and previously unseen attacker tools use the same types of malicious activity, in addition to detecting known malware. |
Unsupervised machine learning |
In unsupervised machine learning, data sets are delivered to the ML model without human provided labeling or tagging for classification purposes. Unsupervised ML models can identify common or anomalous patterns in observed activities. Use cases may include data analysis, identifying new attack methods, or other behaviors that are unusual or out of the ordinary that may indicate anomalous behavior and adversary activity. |
Deep learning |
A subset of ML, deep learning models are multi-layer neural networks that can learn more complex features from the data. Deep learning models can be leveraged in supervised or unsupervised scenarios. Deep learning models can provide more complex traffic analysis and facilitate detections of many types of adversary activities associated with reconnaissance, delivery (e.g., malicious downloads, social engineering domains), and command and control. A number of different deep learning techniques are currently in use, each used for different detection and recognition activities. These technologies include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Neural Collaborative Filtering (NCF). |
Reinforcement learning |
Involves a “trial and error” or reward/punishment model, in which different actions are assigned positive or negative values to train a system through experience. Cybersecurity use cases include adversarial simulations, monitoring cyber-physical systems (e.g., driverless cars and device sensors). |
A brief history of machine learning and network security
Machine learning developed as data scientists and computer scientists sought more efficient and faster ways to improve on predictive analytics and pattern recognition. As more computing power became available, leaders in both fields worked to build and train algorithms with increasingly large datasets. Within cybersecurity, machine learning models might process telemetry from endpoints, network traffic, industrial control systems, or any other portion of the enterprise.
Machine learning in cybersecurity has evolved from early rule-based systems in the 1980s to highly sophisticated algorithms analyzing vast datasets for threat detection and response. Initially, ML focused on anomaly and unusual behavior detection, but advancements in deep learning, natural language processing (NLP), and reinforcement learning have expanded its capabilities and targets. This evolution has in part been driven by the increasing sophistication and volume of cyber threats, making ML a crucial tool for proactive and adaptive cybersecurity.
Corelight founders Dr. Vern Paxson and Robin Sommer have been active in ML research, publishing important research as early as 2010 on applying machine learning tools to network intrusion systems. The paper identified issues with using ML in cybersecurity for network intrusion detection, and provided recommendations for future research and development in machine learning for cybersecurity.
The use of ML for network detection has advanced significantly since that paper was published, driven by the suggestions provided by Paxson, Sommer, and others, along with other significant technical advancements. ML has evolved and transformed cybersecurity from a traditionally reactive field relying on signature-based detection to a proactive and adaptive one, capable of detecting the earliest stages of attacks through sophisticated algorithms and data analysis. In an increasingly digitized and connected world, ML has become indispensable.
Relevant use cases for machine learning in cybersecurity
Applications for ML tools have proliferated and continue to expand. Most can be included in a few general categories:
Threat detection. Machine learning models trained on data from real-world attack tactics, techniques, and procedures, help analysts detect indicators of known and unknown threats. Used in conjunction with other detection methods, such as behavioral analysis and signature-based detection, machine learning models enhance threat detection toolkits and can give analysts an edge against adversaries with the sophistication to evade more static analytic methods.
Anomaly detection. Anomaly detection targets new, novel threats, including the ones that utilize living off the land (LOTL) tools, by baselining a monitored environment in order to identify suspicious usage of otherwise legitimate and/or malicious tools. Anomaly detection uses an unsupervised machine learning algorithm to identify activity that deviates from a learned or observed baseline.
Alert correlation. Alert correlation is the process of using machine learning algorithms to analyze and link seemingly unrelated security alerts to identify patterns, connections, and potential threats that might be missed when examining alerts in isolation. This helps security teams reduce alert fatigue, reduce false positives, prioritize critical incidents, and improve their overall security posture.
Increased visibility. The automating and analytic capacity of ML tools delivers a critical force multiplier to data classification. They assist in the collections and categorization of:
- Entities, such as apps, subnets, hosts, devices, domains, users, etc.
- Network traffic, such as certificates, logs, and server scans.
- Beaconing, where malicious software (malware) periodically sends out communication signals to a command and control (C2) server to check in, report its presence, and receive further instructions.
ML use cases for network threat detection
As the industry has evolved beyond intrusion detection systems to network detection and response platforms, ML has bolstered the effectiveness of threat detection. Models trained on datasets of normal network traffic and malicious activity patterns continue to improve with tuning and have become better at predicting variations on known attack signatures. Unsupervised ML has added important anomaly detection capabilities to detect new and novel threats, including threats that may be using EDR-evasive techniques. ML, used in tandem with signature-based tools, behavioral analytics, and up-to-date analytics, makes threat detection more effective and impactful for SOCs, and can support a more proactive approach to security incidents and routine operations.
Network security leaders regularly develop ML models trained on specific malicious behaviors, such as creation of C2 channels, data exfiltration through domain name system (DNS), domain generation algorithms (DGA), and malicious software downloads. With regular tuning and a supply of new datasets, these models’ outputs have become more detailed and today are an important part of the threat detection toolkit.
The importance of explainability
Despite their sophistication, ML models’ output usefulness is still a direct function of human oversight, training data quality, and modeling choices. The creator of a model may not provide a full or even partial view of its inner workings, assumptions, and biases to those who use it. While this may not be important in certain use cases, a lack of explainability in most security environments can leave SOCs unable to easily validate outputs generated by these models.
While there are various means for estimating a model’s performance (e.g., receiver operating characteristic (ROC) and precision-recall (PR) curves), the model’s training data may not be representative of real-world activity. A model’s performance can’t be fully assessed until it is deployed in the wild.
Machine learning tools need to incorporate explainability into their development, to ensure security teams have a complete story from the model. When a detection occurs, the model should include an easy-to-understand explanation for the SOC analyst. By including both detailed and easy-to-understand explanations, any false positives will be easier to identify, and incident response time can be significantly reduced.
Challenges of using machine learning in cybersecurity
An ongoing challenge of ML and cybersecurity is the resource-intensive nature of model development, maintenance, and usage. ML models consume vast amounts of computing power and in the past required the dedication of teams with specialized expertise. Today, the barrier to entry has lowered significantly, allowing SOC teams to leverage highly sophisticated machine learning models by relying on the development and/or tuning of machine learning models by cybersecurity vendors.
Even when model creators and users have an abundance of talent, creating and utilizing ML may come with a variety of challenges, requiring SOC teams to evaluate and compare vendors who supply ML based solutions:
- Calibrating models’ performance and creating a good signal. ML models that scan for malicious software or behavior can generate a high number of false positives, which can impact operational performance and leave security teams chasing alerts. They also can be tuned to thresholds that increase the likelihood of a true positive being overlooked. Finding the right balance requires SOCs to work with vendors who have the teams and resources to work on ongoing tuning and training with new datasets.
- Securing sufficient, high-quality data. ML models require regular infusions of new and relevant datasets, which can be difficult to obtain due to regulatory constraints, privacy considerations, or a lack of mapping with threat frameworks. Furthermore, data that lacks robust classification or accurate labeling may lower the quality of the model’s predictions or scoring.
- Overpromising from vendors. Like any technology, ML can be subject to hype and hyperbolic marketing that exaggerates its capabilities. In terms of security, it is important to remember that ML is a powerful addition to the security toolbox, and not a silver bullet.
- Choosing the right use case. Given the significant investment of time and resources, security teams must consider which ML use cases will deliver maximum benefit to their organization.
Considerations for ML models applied to network security
The needs of every organization and security team are unique. When evaluating the efficacy and value of ML models, some general questions can help SOC teams determine which specific use cases and models will bring the greatest benefit to their security apparatus and bottom line (note: the same considerations should apply to other detection approaches):
- What type of environment is being protected?
- What is the cost of missed attacks?
- What can the model detect, and how?
- What can the model not detect?
- What is the model’s reliability?
How Corelight utilizes machine learning in network security
Corelight has approached machine learning with a focus on the essential role of quality evidence, deep expertise, and a consideration of our customer’s journey. Moving from proof of concept to deployment with technology as complex and potentially game-changing as ML can be difficult for security teams. Networks are extremely complex and normally present a great deal of variability. When investing in ML-based tools, successful deployment often depends on an engaged approach that elevates use cases with the most potential for return on investment and enhanced system visibility.
To that end, Corelight’s unsupervised, supervised, and deep learning models are supported by an evidence-first approach and can significantly expand an organization’s threat detection coverage. Corelight provides the richest contextual evidence around ML-driven detections and alerts, while using cutting-edge ML to make alerts and analysis explainable.
Corelight has incorporated ML, Agentic AI, Generative AI, and LLMs into its Open NDR Platform to summarize and explain alerts; as well as suggest investigative next steps without sending any customer data to the models. The approach balances privacy concerns with the substantial value delivered by ML in terms of SOC efficiency, improved analysis, and opportunity for analysts to uplevel skills. Corelight’s Polaris Program provides the crucial additional advantage of validating and tuning ML models within real-time, organic and large-scale environments.
Learn more about how ML enhances detection and analytics on the Corelight platform, powers an intuitive SaaS solution, and simplifies SOC workflows.
Book a demo
We’re proud to protect some of the most sensitive, mission-critical enterprises and government agencies in the world. Learn how Corelight’s Open NDR Platform can help your organization mitigate cybersecurity risk.
