October 24, 2018 by Christian Kreibich
One of the first tasks for any incident responder when looking at network logs is to figure out the host names that were associated with an IP address in prior network activity. With Corelight’s 1.15 release we help automate the process and I would like to explain how this works.
Zeek (formerly known as Bro) provides a logging framework that gives users great control over summarization and reporting of network activity. Equipped with dozens of logs by default, it provides convenient features to extend these logs with additional fields, filter log entries according to user-defined criteria, create new log types, and hook new activity into logging events. Several log types provide identifiers that allow convenient pivoting from one log type to another, such as conn.log’s UID that many other log types use to link app-layer activity to the underlying TCP/IP flows.
Other information is only implicitly linked across log types, so analysts need to reveal it in manual SIEM-based post-processing. One example of such implicitly available information is host naming, which lets analysts look past IP addresses like 184.108.40.206 to corresponding (and often revealing) DNS names like ujkwwvftddjk.ru, a recent example from Spamhaus’s DBL. While Zeek’s dns.log closely tracks address-name associations, other logs do not repeat this information. Manually establishing the cross-log linkage can prove tedious since offline resolution of those names generally does not provide accurate results. Instead, one needs to identify historic name lookups that temporally most closely preceded TCP/IP flows to/from resulting IP addresses. (Other approaches, such as leveraging HTTP Host headers, also exist but here we were looking for the most generic approach.)
Zeek’s stateful network-oriented scripting language makes it ideally suited to automate such linkage: we can enrich desired logs with DNS host names in response to network events unfolding in real time. In Corelight’s 1.15 release we provide this ability via the Namecache feature. When enabled, Zeek starts monitoring forward and reverse DNS name lookups and establishes address-name mappings that allow subsequent conn.log entries to include names and the source of the naming (here, DNS A or PTR queries). For analysts requiring immediate access to host names, conn.log now readily provides this information. The following (slightly pruned) log snippet using Zeek’s JSON format shows an example:
Our data analysis shows that for the most relevant addresses — those outside of local networks — Namecache can establish names in more than 90% of log entries. In addition to the conn.log enrichment the feature adds a separate log, reporting operational statistics (powered by the SumStats framework) such as the cache hit rate in various contexts. Starting with the 1.16 release, you’ll see local vs non-local hit rates for your network as well.
None of the above required patching the core Zeek distribution. All functionality exists in form of new event handlers and state managed via the scripting language. Nevertheless, implementing Namecache posed some interesting technical challenges. Most immediately, Bro’s multiprocessing architecture and flow distribution mean that in a cluster setting (which we do use in our Sensors) the Zeek worker observing a DNS lookup most likely is not the one observing the TCP/IP connection to the resulting IP address. Moreover, since their respective processing is fully asynchronous we also cannot guarantee that processing the DNS query finishes prior to that of the subsequent TCP/IP connection. Finally, to approach global visibility of the address–name mappings, we need to communicate the mappings across the cluster via Bro events, raising questions about event communication patterns, sustainable event rates, and processing races.
One key observation immediately simplified the problem: Zeek writes conn.log entries only when it expires its state for a given flow, i.e., at the very end of the flow’s lifetime. This means we have at least several seconds to propagate naming information for this flow across the cluster before needing to access it.
This left the event flow to tackle. In a first iteration we decided to centralize mapping ownership in the manager process: workers communicate new mappings to the manager process, which propagates additions to other workers and tracks mapping size and age. When mapping state needs to get pruned, the manager sends explicit pruning events to the workers. This proved clearly inferior to a distributed approach where the workers manage mappings autonomously, including expirations, and only communicate new mappings to the manager. The manager in turn only relays additions across the workers, saving the memory needed for an extra copy of the mappings. This approach worked quite well but induced a few percent of packet loss on our most heavily loaded AP-1000 appliances. In a final tweak, we tuned the rate at which workers transmit mapping additions. With this change we no longer observed any operational overhead of the activated Namecache feature while preserving its effectiveness.
The Namecache feature is only one example of a wide range of log enrichments we envision. We’ll soon migrate the cluster communication to the new Broker framework, add improved multicast DNS support, and we’re considering other sources of naming as well as inverse mappings where names get enriched with corresponding IP addresses.