Read the Gartner® Competitive Landscape: Network Detection and Response Report
Read the Gartner® Competitive Landscape: Network Detection and Response Report
START HERE
WHY CORELIGHT
SOLUTIONS
CORELIGHT LABS
Close your ransomware case with Open NDR
SERVICES
ALLIANCES
USE CASES
Find hidden attackers with Open NDR
Corelight announces cloud enrichment for AWS, GCP, and Azure
Corelight's partner program
10 Considerations for Implementing an XDR Strategy
March 25, 2025 by Keith J. Jones
In my security research role at Corelight, I often have to go through large, complex data sets to detect subtle anomalies and threats. It reminds me of a famous quote by Abraham Lincoln:
Give me six hours to chop down a tree and I will spend the first four sharpening the axe.
For me, that means investing time up front to build tools that allow a large language model (LLM) to do the heavy lifting on key tasks, namely those that teams of analysts would have handled in the past.
One such tool is our map-reduce script, which overcomes the inherent context limitations of LLMs by processing vast amounts of data in smaller, manageable chunks.
Modern LLMs are incredibly powerful but are constrained by a fixed context window—they can only process a certain amount of data at once. RAM size is often the biggest constraint when running LLMs.
Our map-reduce approach addresses this challenge by:
This approach makes it possible to analyze huge datasets efficiently, and also transforms the way we extract actionable insights from data in cybersecurity.
(It is important to note that Langchain and Langgraph offer some map-reduce functionality here: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain/. Still, we will implement our method in this article without complicating the process with a state graph.)
This map-reduce methodology is transformative for several key cybersecurity activities:
Below is an overview of the prerequisites, installation, and step-by-step usage instructions. After that, I will show two examples where this script could be used to analyze Zeek scripts and logs.
Below is an overview of the prerequisites, installation, and step-by-step usage instructions. After that, I will show two examples where this script could be used to analyze Zeek scripts and logs.
Once you have the prerequisites in place, install the required Python packages by running:
pip install -r requirements.txt
To execute the script, use the following command:
python map-reduce.py --directory /path/to/your/documents --query "Your query here"
-d, --directory:
-p, --path:
-q, --query:
-f, --query_file:
-m, --model:
phi4
).-c, --chunk_size:
100000
).-o, --chunk_overlap:
100
).-t, --temperature:
0.0
).-x, --num_ctx:
37500
).-u, --output:
-s, --tika_server:
http://localhost:9998).
-z, --debug:
RecursiveCharacterTextSplitter
, making it easier to handle large documents.To illustrate how this tool works in practice, consider the following example using the ZeekNetSupport detector developed by Corelight. This Zeek package monitors network traffic to detect the usage of NetSupport—an administrative tool that is often exploited by malware operators to facilitate unauthorized remote access.
Example Command:
% python map-reduce.py -d ~/Source/zeek-netsupport-detector -q "How does this Zeek package detect NetSupport?." --path "(?i).*readme\.md,.*/scripts/.*\.(zeek|sig)"
Output:
Ingesting file: /Users/keith.jones/Source/zeek-netsupport-detector/README.md
Ingesting file: /Users/keith.jones/Source/zeek-netsupport-detector/scripts/netsupport.sig
Ingesting file: /Users/keith.jones/Source/zeek-netsupport-detector/scripts/main.zeek
Ingesting file: /Users/keith.jones/Source/zeek-netsupport-detector/scripts/__load__.zeek
File README.md produced 1 chunks.
File netsupport.sig produced 1 chunks.
File main.zeek produced 1 chunks.
File __load__.zeek produced 1 chunks.
[Map] Sending chunk from README.md (chunk 1/4) to the model...
[Map] Sending chunk from netsupport.sig (chunk 2/4) to the model...
[Map] Sending chunk from main.zeek (chunk 3/4) to the model...
[Map] Sending chunk from __load__.zeek (chunk 4/4) to the model...
[Reduce] Starting reduction stage...
[Reduce] Combining map outputs for reduction...
[Reduce] Combined output within context limit. Sending final prompt to model...
Final Answer:
The Zeek package detects NetSupport Command and Control (C2) traffic by analyzing network traffic captured in PCAP files for specific patterns associated with the use of NetSupport—a legitimate administrative tool that is often exploited by malware operators for remote administration. The detection process involves two primary mechanisms: HTTP headers analysis and command strings detection within TCP payloads.
1. **HTTP Headers Detection**:
- The package identifies suspicious C2 traffic by examining HTTP headers in the network packets. It looks for characteristics such as "USER-AGENT" or "SERVER" headers containing the string "NetSupport". When these patterns are observed, a notice is triggered indicating potential NetSupport-related activity (`NetSupport::C2_Traffic_Observed_HTTP_Headers`). This mechanism is detailed in `main.zeek` (Global chunk 3 of 4).
2. **Command Strings Detection**:
- The package also detects specific command strings within TCP payloads, such as `CMD=POLL` and `CMD=ENCD`, which are indicative of NetSupport's use in remote administration.
- For the pattern `CMD=POLL`, a regular expression `/.*(\x0a|\x0d)CMD=POLL(\x0a|\x0d)/` is used to match occurrences bounded by newline characters. Detection triggers the function `NetSupport::netsupport_cmd_poll_match`.
- Similarly, for `CMD=ENCD`, the corresponding pattern is matched using `/.*(\x0a|\x0d)CMD=ENCD(\x0a|\x0d)/`. Upon detection, the function `NetSupport::netsupport_cmd_encd_match` is executed.
- These detection signatures are defined in `netsupport.sig` (Global chunk 2 of 4).
The process is automated through Zeek scripts that parse PCAP files and apply these rules to identify potential misuse of NetSupport. Detailed logs capture information such as timestamps, IP addresses, ports, and notes on the detected events.
**Citations:**
- `README.md`
- `netsupport.sig` (Global chunk 2 of 4)
- `main.zeek` (Global chunk 3 of 4)
- `__load__.zeek` (Global chunk 4 of 4)
You can see in the final LLM output that it successfully distilled the complex Zeek detection logic into three methods:
This LLM response makes the detection mechanisms easier to understand for those who may not be familiar with Zeek source code. It also illustrates how the LLM can translate intricate code into actionable insights. By breaking down the logic into these distinct methods and providing citations to the relevant source files, the output serves as an invaluable resource that bridges the gap between complex technical implementations and practical security analysis.
Below is another example where we use the map-reduce script to analyze just the Zeek logs (no source code) produced from the testing PCAP in the NetSupport repository. In this case, the tool reviews multiple log files and returns a consolidated analysis of suspicious or malicious activities, complete with direct quotes from the raw logs for context:
% time python map-reduce.py -d ~/Desktop/logs -q "Review these Zeek logs representing network traffic and tell me about any suspicious or malicious cybersecurity activities. Quote the raw logs to support your arguments, for context." --path "(?i).+[^/]\.log$"
Ingesting file: /Users/keith.jones/Desktop/logs/notice.log
Ingesting file: /Users/keith.jones/Desktop/logs/x509.log
Ingesting file: /Users/keith.jones/Desktop/logs/conn.log
Ingesting file: /Users/keith.jones/Desktop/logs/ssl.log
Ingesting file: /Users/keith.jones/Desktop/logs/files.log
Ingesting file: /Users/keith.jones/Desktop/logs/analyzer.log
Ingesting file: /Users/keith.jones/Desktop/logs/http.log
Ingesting file: /Users/keith.jones/Desktop/logs/packet_filter.log
Ingesting file: /Users/keith.jones/Desktop/logs/weird.log
Ingesting file: /Users/keith.jones/Desktop/logs/dns.log
Ingesting file: /Users/keith.jones/Desktop/logs/ocsp.log
File notice.log produced 1 chunks.
File x509.log produced 1 chunks.
File conn.log produced 1 chunks.
File ssl.log produced 1 chunks.
File files.log produced 1 chunks.
File analyzer.log produced 1 chunks.
File http.log produced 1 chunks.
File packet_filter.log produced 1 chunks.
File weird.log produced 1 chunks.
File dns.log produced 1 chunks.
File ocsp.log produced 1 chunks.
[Map] Sending chunk from notice.log (chunk 1/11) to the model...
[Map] Sending chunk from x509.log (chunk 2/11) to the model...
[Map] Sending chunk from conn.log (chunk 3/11) to the model...
[Map] Sending chunk from ssl.log (chunk 4/11) to the model...
[Map] Sending chunk from files.log (chunk 5/11) to the model...
[Map] Sending chunk from analyzer.log (chunk 6/11) to the model...
[Map] Sending chunk from http.log (chunk 7/11) to the model...
[Map] Sending chunk from packet_filter.log (chunk 8/11) to the model...
[Map] Sending chunk from weird.log (chunk 9/11) to the model...
[Map] Sending chunk from dns.log (chunk 10/11) to the model...
[Map] Sending chunk from ocsp.log (chunk 11/11) to the model...
[Reduce] Starting reduction stage...
[Reduce] Combining map outputs for reduction...
[Reduce] Combined output within context limit. Sending final prompt to model...
Final Answer:
The analysis of the provided Zeek log data reveals several instances of potentially suspicious or malicious network activity associated with NetSupport malware and other cybersecurity concerns. Here is a consolidated summary:
### Suspicious Activities Identified
1. **NetSupport Malware Command-and-Control (C2) Traffic:**
- The `notice.log` entries indicate multiple instances of NetSupport C2 traffic, identified through specific HTTP headers and commands (`CMD=POLL`, `CMD=ENCD`). These patterns are consistent with malware operations using NetSupport for remote control and data exfiltration.
- **HTTP Headers Detection:**
```
1717442617.920239 CQ7b0y4Vd4NVQ3nJRi 192.168.100.146 49741 45.134.174.143 443 tcp NetSupport::C2_Traffic_Observed_HTTP_Headers
```
- **CMD=POLL Detection:**
```
1717442617.920239 CQ7b0y4Vd4NVQ3nJRi 192.168.100.146 49741 45.134.174.143 443 tcp NetSupport::C2_Traffic_Observed_CMD_POLL
```
- **CMD=ENCD Detection:**
```
1717442617.955368 CQ7b0y4Vd4NVQ3nJRi 192.168.100.146 49741 45.134.174.143 443 tcp NetSupport::C2_Traffic_Observed_CMD_ENCD
```
2. **Repeated Connections and Long Duration Traffic:**
- The `conn.log` entries show repeated connections to a single IP address (`4.231.128.59`) with no data transferred, which could indicate scanning or probing activities.
```
1717442509.310809 ClEkJM2Vm5giqnMf4h 192.168.100.146 49676 4.231.128.59 443 tcp -
```
- Long duration connections, such as to `40.126.32.76`, suggest potential data exfiltration or sustained C2 communication.
```
1717442518.264335 C9rXSW3KSpTYvPrlI1 192.168.100.146 49689 40.126.32.76 443 tcp ssl
```
3. **DNS and HTTP Anomalies:**
- The `dns.log` entries reveal repeated queries to suspicious domains like `gurjlxyozkjnjqf.gurjlxyozkjnjqf`, which could indicate DNS tunneling or automated scripts.
```
1717442517.960287 CwjjYJ2WqgTbAqiHl6 192.168.100.146 50229 192.168.100.2 53 udp gurjlxyozkjnjqf.gurjlxyozkjnjqf
```
- The `http.log` entries show multiple OCSP and CRL requests, suggesting a compromised system querying certificate statuses.
```
1717442517.843005 C3eiCBGOLw3VtHfOj 192.168.100.146 49687 192.229.221.95 80 GET ocsp.digicert.com
```
4. **TCP Connection Anomalies:**
- The `weird.log` entries indicate TCP acknowledgment and sequence number irregularities, which could be exploited for network attacks.
```
1717442550.344203 CfQutTEgs3g8sUrsa 192.168.100.146 49700 23.14.28.77 443 TCP_ack_underflow_or_misorder
```
5. **Invalid TLS Version:**
- An entry in `analyzer.log` shows an invalid TLS version being used, potentially pointing to attempts to exploit vulnerabilities.
```
1717442617.920239 violation protocol SSL CQ7b0y4Vd4NVQ3nJRi - 192.168.100.146 49741 45.134.174.143 443 Invalid version in TLS connection. Version: 20307
```
### Conclusion
The consolidated analysis of logs from `notice.log`, `conn.log`, `dns.log`, `http.log`, `weird.log`, and `analyzer.log` suggests several cybersecurity threats, including potential malware activity, scanning or probing attempts, DNS tunneling, and protocol anomalies. These insights, backed by direct log quotes, highlight the value of using the map-reduce approach to rapidly sift through extensive log data and identify actionable intelligence.
**Citations:**
- `notice.log`
- `conn.log`
- `dns.log`
- `http.log`
- `weird.log`
- `analyzer.log`
python map-reduce.py -d ~/Desktop/logs -q --path "(?i).+[^/]\.log$" 6.93s user 1.53s system 0% cpu 19:58.50 total
This example clearly demonstrates the power of the map-reduce approach in cybersecurity log analysis. Even if the LLM’s response is imperfect, it gives the network analyst a head start when looking into these logs. By breaking down extensive Zeek log data into manageable chunks and then consolidating the results, the tool efficiently distills complex network activities into actionable insights. The final output not only highlights key suspicious behaviors—such as potential NetSupport malware communications, anomalous connection patterns, DNS irregularities, and protocol issues—but also directly references raw log excerpts to provide context.
This detailed yet concise summary enables security teams to rapidly assess threats and prioritize further investigation, ultimately enhancing their incident response and forensic capabilities.
By investing time in sharpening our analytical tools—just as Lincoln advised with his axe—we enable LLMs to process complex, large-scale data efficiently. The map-reduce approach allows us to extract actionable insights from massive datasets, fundamentally transforming threat hunting, incident response, and forensic analysis.
Whether you’re analyzing network logs, dissecting source code like the Zeek NetSupport detector, or exploring new ways to automate data analysis, this methodology paves the way for more agile and accurate cybersecurity practices.
Explore this approach further in the LLM-Ninja repository, and join me in harnessing the full power of LLMs to stay ahead of evolving cyber threats.
Tagged With: network detection response, featured, map reduce, large language model, llm