Recently a very interesting Linux-based command-and-control (C2) malware wasdescribed by the research team at Intezer. As usual there is a set of simple network-based IOCs in the form of domains and IPs that organizations can search against their Zeek dns.log, http.log and conn.log. Using Zeek, detecting this threat at a deeper layer is also relatively straightforward, and this C2 provides a good demonstration of how to use Zeek’s ability to extensively track state to augment simple IOC-based logic.
The ability to create behavioral-based detections – such as the state-based detection as described in this blog – is a powerful option because threat actors can easily change traditional IOC factors (e.g IP address, URL and domain names) which results in simpler IOC-based detections failing. Another benefit of using lower level behavioral detection logic is they are vastly less prone to false positive detections.
Let’s start with a few interesting points about this malware:
The implant is specifically designed for Linux systems, having been compiled on a legacy compiler which is the default on RHEL6 – this may be of interest to those in threat intelligence and perhaps those seeking attribution, but is not further relevant to this detection.
This malware, like many others, uses HTTP as a C2 channel. While modern C2/exfiltration traffic is often encrypted, it’s not always being sent over HTTPS. “Pre-encrypted” data can be sent using POSTs over HTTP. While we certainly have other tools to help shine a light on encrypted network traffic, we should not neglect good old HTTP analysis – even today.
RedXOR payloads and exfiltration can be decoded with enough effort, as shown in Intezer’s research. However we don’t have to rely on the payload being decoded, we can treat RedXOR as yet another example of malware “pre-encrypting” data and then sending that encrypted data as HTTP – for example, as Emotet does .
The C2 sends commands to the implant within HTTP cookies. For the purpose of this detection, we will focus solely on commands used in the registration of the C2 implant (highlighted below).
The detection logic we use in this demonstration involves looking for a consecutive pattern of cookie transactions between the Implant and the C2 as follows:
This series of cookies indicates an initial infection. We can use similar state-based logic for other cookie values that represent the various commands that may run subsequent to infection.
Using Zeek statefully to implement the detection logic
Zeek is an event-based engine, which means Zeek runs particular code only when it sees an associated network event occur. In our case we are interested in inspecting HTTP cookies, and these are passed within HTTP headers. The most relevant Zeek event is http_all_headers. This event provides us a mime_header_list called hlist. We simply need to fish out the cookie component from mime_header_list by referencing the correct element, and then check whether what we find is the next in the sequence we are looking for. This is the only Zeek event required to detect this threat.
The script creates a state-keeping variable and attaches it to each HTTP connection. It is aptly called c$http_state$redxor_cookies_seen_so_far and it keeps track of how many matching cookies we’ve seen so far in that connection. When it reaches five we know that we’ve seen all five cookies in the correct order and we raise a “notice” in notice.log.
Keen reviewers of the code may note that a proxy can potentially add and even reorder headers, and so the cookie won’t also be in the same location. This can be readily accounted for by cycling through each element of the header list to find the cookie header. However so as not to over-complicate this demonstration, let’s assume that the headers are not re-ordered.
Since all the pieces required for this detection exist in the same TCP connection, they all share the same Zeek uid, which has two important consequences:
We can use the variable c$http$trans_depth to determine how far into the HTTP connection we are. There are two header lists per trans_depth – one from client and one from server. Since we are only looking for the first five cookies, these will be contained in the first three trans_depth as shown below.
We also don’t need to consider support for clustered environments, since all artifacts will have the same uid, and thus the same worker will handle them. Making efficient stateful detections in clustered environments requires a surprisingly nuanced layer of logic to account for streams traversing different workers, as well as latency issues that may arise in such a distributed system. Perhaps this is a good topic for a future malware detection demonstration.
Having performance in mind
In a busy network with a lot of HTTP headers, http_all_headers will occur a great deal, so it’s helpful to look at some tactics to ensure that the script doesn’t waste resources. A good reference for performance tuning is the “Profiling in Production” presentation by Corelight’s Justin Azoff at ZeekWeek 2019. Note that I’m not an expert at Zeek performance, and I’ve only learned some of these things the hard way and with expert advice from fellow Corelighters – so I’ll share some of the things that have become part of my workflow as I build out a detection like this:
Look for reasons why an HTTP event is not relevant and use this as the first piece of logic that can be applied to release Zeek’s resources by returning from the event handler. For example, if the HTTP method is not “POST”, then return straight away. We are only interested in the POST method, not GET or any other method.
Since the server header contains four items and the client contains five items, if the header we are examining does not contain at least four items, there’s no need to look further because this header is too short. Apart from performance, you also need to consider what would happen if you reference a particular element (in our case the fourth element) of the header list. If the header list doesn’t contain >4 items, you’ll get runtime errors.
Return from the script as soon as you have found the artifact of interest. This sounds obvious, but can sometimes be nuanced. For instance if we have found a match for the client/implant’s cookie in this header list, then there is no need to check that list again for server cookies – just return straight away.
Put a tight bound on your search logic. In this case, we know that all cookies will be contained within the first three client/server header lists. This means if trans_depth > 3 we can return straight away. This is important because a lot of HTTP sessions have high cardinality here. This is also not an aspect that surfaces easily during script testing on a small pcap – it’s often only when you run the script at scale that these issues will arise, so it’s a good habit to put bounds checking into your script development mindset.
Efficacy vs efficiency : Changing to a less intuitive logic flow in the pursuit of optimal efficiency (i.e the detection logic uses a minimal amount of resources) could potentially have a positive OR negative effect on efficacy (ie. the detection logic detects as expected).
Having a super-performing script that misses your pathological case because a logic bug snuck in while you were performance-tuning to the nth degree isn’t the goal! Making things more efficient at the expense of readability, simplicity and workability is a balancing act in my humble opinion.
Test your script on real-world traffic with as much volume and variety as you can.
Assume there are benign edge cases that will cause false positives, and that you just need to find them. Our experience is that there are almost always such cases, and fine-tuning your detection is not the main issue, the hard part is just finding the edge cases.
Assume there will be circumstances that make your script over-utilize the resources. Try to guard against these, and test at production volumes.
There are various other ways to detect this malware with Zeek, and we build detections like this into the Corelight C2 collection. This script has been prepared as a tutorial-style demonstration of one such technique, as it highlights how Zeek’s state keeping can be used as a fairly intuitive and practical way to detect modern C2 malware, as well as demonstrating some performance issues to keep in mind when writing Zeek scripts yourself.
Credit to Intezer for their research on RedXOR and their collaboration with us at Corelight. Refer to this writeup for a low level description of the malware:
The script was prepared from an abstraction of the actual pcap (which could not be shared in its native format due to sensitive information contained within). This abstraction was prepared by the Intezer Research team and shared with Corelight Labs for the purpose of writing this demonstration.