Security workers across the world have been busy since last Friday dealing with CVE-2021-44228, the log4j 0-day known as Log4Shell, that is already being heavily exploited across the Internet. Given the huge number of systems that embed the vulnerable library, the myriad ways that attackers can exploit the vulnerability, and the fact that automated exploitation has already begun, defenders should expect to be dealing with it for the foreseeable future.
The good news is that detections already exist - and defenders working with Corelight and Zeek have a leg up on investigating hits on those detections, as well as a historical record that can reveal exploits that took place before the public was made aware of the vulnerability.The nature of the vulnerability itself is fairly straightforward. The Java Naming and Directory Interface (JNDI) is a core Java service, which allows programs to look up objects such as class files by name. A sample call to it would look like this:
Several other protocols beyond LDAP are possible transport vectors, including DNS, RMI, and LDAPS.
Attackers wishing to trigger the vulnerability simply need to send a JNDI string to a system that will log that string via log4j. That string will typically point to a malicious resource that they control. Early exploits simply set their HTTP User-Agent strings to the malicious link, but variations quickly popped up. Corelight has observed attacks using other HTTP headers, and other researchers have already been discussing using DNS lookups. Commonly logged protocols like SMTP would also be obvious vectors. Whatever the vector, once the malicious string is logged, the Java resource pointed to by that string will be executed on the vulnerable system.
Note that that system may not be the one where the malicious string was sent in the first place. For example, log aggregation systems or proxy servers could easily read in the exploit string from a target system, and trigger the bug on themselves in the process.
There are a myriad of ways to obfuscate the actual attack string. Constructs such as this are entirely valid:
Since Java string parsing is highly complex, the list of possibilities is essentially infinite. As a result, defenders are creating multiple detection strategies, focused both on high-signal but easily evadable strings like ${jndi:ldap
and more generic but potentially false positive prone strings like ${
.
As is often the case, Suricata signatures were one of the first mechanisms released for detecting exploits in the wild. As of the writing of this post, 26 signatures were available within the Emerging Threats rule set, which is part of the default Corelight configuration. We recommend that customers running Suricata enable them all immediately, since even the most generic ones among them are producing minimal false positives in testing on our Polaris research network, and the Zeek context around them will make triaging any false positives extremely simple. Note that your $HTTP_SERVERS
and/or $HOME_NET
variables must be configured to include the IP addresses of any servers you’re defending as observed by your sensor, depending on its placement in your network, as the signatures make use of those variables and will not function properly if not configured to include your servers.
In the meantime, Corelight has been actively developing a Zeek script to detect exploit attempts. Indicators of compromise from prior to its installation can be found by manually searching historic logs for the string ${, i.e.:
Once installed, this new code will highlight these attempts in the notice log and apply logic to help hone in on malicious behavior. The script is designed to search across all HTTP headers, and extract relevant details about the detected string into the notice log and an optional log appropriately named log4j, to make it easy for analysts to see the payload URI and thus determine whether it represents an exploit attempt or a legitimate use of JNDI. If connections are made to the resource identified in the notice log, then a successful attack has most likely occurred.
The code described below is on Corelight’s GitHub (and updated versions can be found in the same repo at the latest revision). To cast a wide net, we consider all HTTP connections whose HTTP header keys or values contain the exploit pattern below, which matches ${
characters in the string, as suspicious. When the log option is true (T
), a log named log4j
with this schema will be generated. Change this to false (F
) if you just want to see notices.
Figure 1: Initial exploit pattern and flag to generate (optional) log.
The bulk of the detection takes place in the http_header
event, which is raised for each HTTP header seen on the network. On lines 76 and 77, we check if the header’s name and value match against the exploit pattern defined above. If these matches are due to non-ASCII data, however, they are ignored and we return from the event (lines 81-84), as this causes a common class of false positives. Finally, we add a tag to the HTTP connection denoting a possible RCE attempt.
Figure 2: Handling the http_header
event to detect potential log4j exploit attempts.
After identifying exploit attempts, we parse the payload, generate notices, and write log entries. The payload parsing function is shown below in figure 3:
Figure 3: Parsing the log4j exploit payload.
We look for payloads that exactly match the simplest possible payload:
Keeping in mind that more complicated payloads are already in the wild, we gracefully handle failure so payloads that don’t exactly match the above format are still logged successfully. Just look for “-” in either the logs or notices this package generates to find creative attempts at exploitation.
Now that we have our script searching for exploit attempts arriving over inbound HTTP traffic, it’s time to start looking for attempts that were successful. The steps are to extract the hostnames or IPs that are serving the next stage of the exploit chain and then look for signs that a device actually downloaded the payload. For now, we’ll walk through this processes manually in order to make it as clear as possible. In the near future, this will be automated as part of the Zeek scripts that we just reviewed.
First, we need to identify the payload hosts that our Zeek script is identifying. This information is easily extracted from the log4j.log that our Zeek script creates.
This list contains the hosts and ports that are serving malicious payloads. If the port number is shown as ‘-’ this indicates instances where no port was provided in the URL. In this case there are a number of bare IP addresses and a handful of hostnames.
To detect if any of these inbound exploit attempts was successful, we turn back to Zeek data flowing from our network sensors and look for signs of outbound traffic heading toward any of these sites. First, search for DNS queries for any of the hostnames from this list:
Here we can see that two devices, 10.0.0.11
and 10.1.1.5
resolved hostnames that are very similar to those that we identified. These hosts probably need to be examined for signs of compromise.
Finally, we’ll use the conn.log from our sensors to search for outbound connections to the IP addresses of the payload servers that we identified. Copy the IPs into a separate file (log4j-scanners
in this case) and then use the following command to search for matching records:
In this case, we found some matching outbound traffic! But notice that it’s all ICMP and the Zeek ports correspond to a network telling a scanning device that ports and hosts are unreachable. Digging further confirms this suspicion:
Had there been any unexplainable outbound connections to one of the payload servers, then the host(s) that originated those connections could have been compromised and would need further investigation.
The above search can be reduced to a single compound search statement within Splunk:
We have successfully validated this search in the wild on the Polaris network:
A major part of our workflow is identifying and eliminating false positives. In this case, as the indicator is simply ${
we were expecting to have to deal with a large number of false positives. As an example, we quickly discovered this string can appear legitimately in WinRM traffic, which makes extensive use of HTTP headers. We were able to remove these quite easily using Zeek’s is_ascii()
function, since the binary strings that caused false positives cannot be used for legitimate attacks; see figure 2, lines 81-84. Apart from this class of false positives, the indicator is surprisingly robust. There are some true false positives on legitimate web applications, but we have also caught other malicious activity like web shells with this detection. Both “false” detections are extremely low volume in comparison with log4j exploits. Based on this, we have not eliminated these false positives from our initial code release.
Speed of the package is also a key concern, because the HTTP header event is very common in busy networks. We have tested speed with two methodologies, which have both come up with positive results. First, we used “worst case” PCAPs with millions of HTTP requests, and no other traffic, which led to some performance refinements. From there, we put the script onto our Polaris research network, which both validated performance in real-world environments and allowed us to watch for evolving obfuscations in the wild. We have seen no significant performance issues in Polaris, so we expect that the script can be used in the real world without negative side effects.
Corelight will be continuing to watch this attack in the wild as it becomes further weaponized and potentially new obfuscations arise. Check back with us for updated detection as we improve our Zeek script for detecting Log4j exploit attempts and automate the process of identifying successful exploits.
By Corelight Labs Team