Corelight launches the Entity Collection

Corelight Labs, our amazing research team, has been hard at work on another content collection which we are excited to introduce: the Corelight Entity Collection.

Corelight evidence is powerful and comprehensive. So comprehensive, in fact, that it can sometimes be hard to know where to start. Providing customers faster ways to find meaningful context in our data was the driving force behind the creation of the Entity Collection.

Originally shipped with our v26 software release and recently updated in our v27 release, the Entity Collection includes three new packages:

Known Entities

Perhaps the most important part of the new collection is the Known Entities package which provides relevant summaries of information that analysts and security teams often wind up hand-crafting to accelerate their incident response and threat hunting.

In this context, we define an entity as an element of an enterprise network - it might be a system, a server, a user, a domain, a certificate and so on. We’ve collected these attributes into a set of new logs which are interlinked and summarized from the full Corelight log streams for faster searching. These logs include the entity information about everything on your networks from IT devices (laptops, servers, phones, printers) to industrial control systems (ICS) and operational technology (OT) devices (building automation, cameras, industrial control systems). In principle, all of this data was previously available in the existing Corelight logs, but sometimes difficult to extract. The Known Entities package synthesizes it together in a much more compact form.

The Known Entities logs also operate differently than most of Corelight’s other data logs in that they focus on summarizing activity over 15-minute periods of activity, and by default only operate on your internal networks. The initial set of logs are: hosts, devices, services, names, domains, certs, users, and remotes (this last log tracks external network hosts and is optional and turned off by default, due to its high volume).

There are three primary use cases for this new set of known entity logs:

Context: Allows an analyst to quickly find the important context about an entity or set of entities related to an alert or investigation.

For example, imagine starting the incident response process with a single internal host IP from an alert. An analyst will first want to understand all the various information about that host and can do that by asking questions:

Is there a hostname associated with this IP? What is its MAC address?
Have I seen any authentication events that would tie this host to a user?
Are there services offered by this IP that could give me a clue to the type of machine or its function in the network? How busy is the host on the network?
Can I tell where this device is and how it might be related to the organization?

The Known Entities logs help quickly take a single host IP address and find all of its associated entities. Instead of searching across the connection log and multiple protocol logs, you can drill down into that context quickly and efficiently. After the quick entity search you may also find additional entities to explore and the small size of logs allows quick pivoting without breaking your investigation flow.

Asset identification/inventory: Provides the data an analyst needs to help create an accurate inventory of what devices are on the network.

By succinctly summarizing the activity for every host on the network in 15-minute windows, it’s easy to answer questions like:

How many hosts were active on subnet B today?

What is the list of all of the devices with MAC addresses associated with Canon printers?

By drawing information from network connections, DNS, DHCP, and more analysts can generate an inventory and check for changes against it for any time window that matters. It’s also a quick way to take stock of all the potentially unmanaged devices on a network -- just a few quick searches and comparing a list of managed hosts with everything seen over the past 24 hours gives an idea not only what may be unmanaged, but also some properties about those unmanaged assets.

Indexing: Provides a faster way for an analyst to search for many indicators or across a long time horizon and quickly pivot to a subset of the full logs for deeper investigation.

Consider it as the TL;DR for the Corelight logs. Searches for multiple IOCs across large swaths of time can just take too long in many SIEMs. Sometimes teams just want quick answers to questions like:

Did this IP address interact with any other hosts on Tuesday?
Was this internal hostname looked up on the network in the past 30 days?

The known logs help quickly identify the time window and relevant summaries for the traffic at which point you can zoom in on the other important data in the full logs, providing a shortcut to coffee break searches and saving valuable time in the analysis process. These summary logs are nearly two orders of magnitude smaller than the full logs, making it more affordable to store or keep them in a faster index much longer.

There are a TON of other ways to make use of the incredible data coming from these known logs. We’ve been showing this to customers for many months and the feedback is incredibly positive. Many summary indexes, custom dashboards, and inventory scripts that our customers are using can be either replaced or simplified due to the power of the summarized data in the known logs. We also have been working on some new dashboards in our Investigator platform as well as with our SIEM partners.

Here’s a couple examples of what this looks like in Splunk:

This first screen shot shows an example of a dashboard which makes use of all the known logs together, allowing quick display over a time window for top entries, but importantly providing a search widget allowing you to zoom in on a host, domain, user, or any other IOC or string that you want to quickly investigate.

This second example uses the known_devices log to highlight just how much information can be extracted from the MAC addresses in DHCP, RADIUS, and other logs around device type, location, and more.

And our built-in dashboards in our Investigator product, powered by Crowdstrike’s LogScale (formerly Humio) platform:

Application Identification

Being able to identify applications in use on the network provides helpful context to security analysts. While Corelight data identifies the underlying ports and protocols, sometimes more detail is needed to understand (especially in encrypted traffic) what applications are using those protocols.

To this end, the Entity Collection also includes our new Application Identification package. Using a variety of techniques, from DNS queries to certificate SNIs and protocol metadata, we categorize 80+ types of applications and write a new field directly to the connection log for easy correlation (these also feed into the Known Entity logs described above). We’re busy building more application identifiers to continue growing the suite of known applications.

Local subnets

Many of Corelight’s detections and data generation capabilities are driven by understanding what subnets are local to your network. Without knowing correctly what’s “inside” and what’s “outside” the particular vantage point of your Corelight sensor, it can be hard to trigger the appropriate alert or provide the context for investigation. We have always provided the ability to manually configure your local network settings, however, we find that customers don’t always actually know all of their local networks (and sometimes just forget to configure them). The Entity Collection’s new “local subnets'' package uses sophisticated algorithms to determine what local subnets have been seen and then generates a summary (as a “notice”).

You can use the list to help configure your local network settings, or to identify any potential overlooked networks in the data.

As you can tell, we’re pretty excited about the value that the new Entity Collection can unlock for our customers. Not only can we help to quickly identify new applications and what subnets are being used on your local networks, the Known Entity logs can accelerate your incident response and threat hunting with powerful summarized data. We have a ton of exciting new updates to the Entity Collection in the works, in fact our next update is right around the corner. We’d love to get your feedback on how you’re using the Entity Collection and how we can make it even better.

Read our white paper to learn more about the Entity Collection.

By Vince Stoffer, Senior Director of Product Management, Corelight