November 18, 2019 by Anthony Kasza
Corelight has recently released a new package, focusing on SSH inferences, as part of our Encrypted Traffic Collection. The package installs on sensors with a few clicks and provides network traffic analysis (NTA) inferences on live SSH traffic. Which SSH connections transferred files? Which SSH connections transferred keystrokes? And approximately how many commands were typed during the connection? The SSH Inference package provides these insights as well as others, detailed below. These new insights bring valuable context to threat hunters and incident responders who struggle with visibility in encrypted environments.
What is an SSH inference? Richard Bejtlich provided a great explanation in a previous blog post. The analogy I like to use follows. If you break your arm, your doctor doesn’t take a cross section of your arm. There’s no need to cut open and inspect the bone. She takes an x-ray image. Similarly, purposely breaking or downgrading encryption is often overkill and a violation of privacy. You don’t need to see what’s inside an encrypted tunnel to infer what’s occurring within it.
By loading the SSH Inference package on a Corelight sensor, customers automatically get access to a bunch of new capabilities and insights around SSH traffic. These new features are briefly outlined below. If you’re a customer and would like a more detailed look at the feature set, see the technical documentation.
Inference tags based on SSH usage – SSH can be used in many different ways, including transferring files, executing a single command, or providing an interactive terminal. The following tags will be present in a newly added field, inferences, of the SSH log if present during an SSH connection:
An aside on keystroke inferences:
It would be very useful to infer the approximate size of commands a client sent to a server. For example, the string “sudo su” will always be 7 characters long (and will rarely be completed using the tab key), the server response size will generally be small, and the client-provided password will not be line-buffered. This could lead to more complex analyses identifying password lengths. Corelight Labs attempted to add such an inference to this package but found too many edge cases to consider our prototype sufficiently robust to release. Ben Reardon, now a member of the Corelight team and the developer of packetStrider, agrees with our opinion. Clients using things like visual editors, screen and tmux, ncurses, tab completion, the backspace key, history navigation (up/down arrow keys), command aliases, and client keystroke buffering all make determining client command size difficult. (Note: I did not say “impossible”.)
An improved SSH authentication result – open source Zeek employs some very clever packet-level processing in its core SSH analyzer, which raises authentication attempt, success, and failure events to scriptland. We developed some improvements to this logic and the logic around logging the authentication result. Users should see fewer unset auth_success fields in their SSH logs.
Tunable configuration options – what may be worth an analyst’s attention at one site may be very normal at another. This is one of the core tenants of Zeek’s policy-neutral event system. By exposing tunable knobs to customers you get to decide which inferences are worth turning on or being notified about.
The following is a video demonstrating, at a high level, how the SSH Inference package analyzes SSH encrypted packet lengths, order, and direction. By hooking the ssh_encrypted_packet() event and printing the size to the screen, we can see what an SSH sequence looks like for an interactive session containing keystrokes. Positive sized packets are transmitted by the client while negative sized packets are transmitted by the server.
The following video demonstrates the extensions to the SSH log that the SSH Inference package makes. The client issues three commands to the server via keystrokes.
If you’d like to see a live demo of the SSH Inference package in action on your network, contact us!
Inferences are based on the concept of sequence of lengths. During an SSH connection, packets are exchanged between clients and servers. By analyzing the size, order, and direction of these packets, the SSH sub-protocols’ state machines can be modeled and tracked throughout the life of a connection, even without the ability to parse content due to encryption. Once the SSH connection sub-protocol begins, a client’s mode-of-use can be inferred from the structure of the packet sequences.
I began by visualizing the first 30 encrypted packets of all SSH connections from sample SSH traffic. These encrypted packets are exchanged immediately after NewKeys messages are sent. In Figure 1 each line represents a single SSH connection. The x-axis represents the order of the packets while the y-axis represents the direction and size of the packet. Positive values are packets sent by the client and negative values are packets sent by the server.
By making each connection’s line slightly transparent, natural clusters visually emerge. These clusters were then manually teased out (sounds like a job for machine learning) and labeled. The SSH RFCs were then reviewed to attempt to identify what each cluster of connections could be representing. Recall that SSH consists of three sub-protocols, and these sub-protocols interact in a specific way. Figure 2 illustrates an approximate overlay of the three sub-protocols on the SSH connection’s sequence of packet lengths.
The next step in creating inferences on SSH traffic was to identify patterns and codify them. Some examples of patterns follow:
As our SSH traffic sample was small and manageable, it provided a good initial data set for building institutional knowledge around SSH behavior. Once we had some understandings of SSH behaviors we tested those understandings across our Polaris deployments. Evaluating our inferences built from a few hundred connections against a few hundred thousand connections exposed many assumptions we made during the initial prototype of this package. Iteratively testing the package and incorporating changes from real networks allowed us to develop robust and scalable inferences. We’d like to thank all of our Polaris partners and champions for helping shape the foundation of this package as well as other research ideas.
Corelight is releasing the SSH Inference package to customers as part of the Encrypted Traffic Collection preview. We’re calling it a preview because more is to come. While length, order, and direction were used to build the SSH Inference package, we did not incorporate timing into the analyses; doing so potentially unlocks additional inferences. New features based on timing are currently in the planning stages for the next release of the package.
How frequently is SSH being used on your network? Are the connections long-lived or short? Are the SSH connections primarily interactive or bulk transfers? Do you want to know more about the existing SSH traffic on your network? If so, Corelight can provide you with the insights outlined above.