Episode 6 - Detecting DNS Covert Channels in the Wild (Part 2)

About the episode

In Episode 6 of Corelight DefeNDRs, we delve deeper into the fascinating world of DNS covert channels with Vern Paxson, our chief scientist and co-founder. Continuing from our previous discussion, Vern shares his insights on techniques developed to detect these stealthy channels utilized by intruders to evade security measures. We explore the innovative approach of leveraging time series analysis of DNS lookups, how to distinguish benign traffic from potential threats, and the real-world implications of our findings across significant datasets. This episode is a must-listen for anyone interested in enhancing their understanding of network detection and response, as we uncover the delicate balance between legitimate data communication and covert malicious activity. Join me as we navigate these complex yet critical aspects of cybersecurity.

Episode transcript

Download transcript

Episode 6 - Detecting DNS Covert Channels in the Wild (Part 2)

Welcome to Corelight Defenders. I'm Richard Bejtlich, strategist and author in residence at Corelight. In each episode, we explore insights from the front lines of NDR, Network Detection and Response. And today I'm continuing a conversation with Vern Paxson, our chief scientist and co-founder at Corelight. In the previous episode, Vern and I discussed covert channels. These are a mechanism used by stealthy intruders to try to evade detection. Vern told me about techniques he and his team developed to identify covert channels abusing the domain name system. In this second part, Vern provides details and findings. This was my first paper, as I recall, that made what we in academics call a complete orbit, which is you submit to one of the top four conferences, it rejects.

You submit to the next, you reject, et cetera. All four rejected it. And then on the fifth submission, uh, it was accepted by the one we tried originally. But it got better each time. Where we wound up was, uh, essentially a very powerful principled approach, um, and this is, this is why I just really enjoy talking about this work, because it's sort of achieved what one really wants to go after in, uh, detection work, which is, you can say in sound terms why it works, and it's general and it's powerful. So, he- here's the idea.

The attacker, you know, observing these lookups to kittens.com, the information they receive, you can just consider it a, like a time series of lookups. So, you know, at this time, um, and you've got some resolution on the clock. We used 10 milliseconds

I think. At this time, this name was looked up. And they've got just a whole bunch of those, and that's what they have to work from. The idea then is you, uh, because we're observing all these lookups, we can similarly construct that time series. We take some care in how we represent it and skip those details.

And now you can take a lossless compressor, because in real life, the benign versions of this often are highly redundant. So you, you can take a lossless compressor and try to reduce that time series as small as you can make it. That size that you achieve with the compressor is an upper bound on how much information could possibly be there, because you've represented it in that much size and it's lossless, so you can recon- recover the original. In addition, you don't need to stick to just one compressor. You can try, you know, and we tried, uh, I think it was four different algorithms, and just pick the one that, for that particular time series does the best, makes it the smallest.

And that's still a bound. Given this, you can take what are often huge initial time series going to... in, in benign traffic, and reduce them to much, much smaller representations. You then, for any representation that is below your bound, let's, let's stick with 10 kilobytes now, it's easy to keep in mind, you can say, "Well, you know, if that was actually an attacker going to that domain, uh, it was less than my bound, so, you know, I don't need to, to worry about it further, because the deal I'm making with the analyst is I'll find everything above the bound." If it's above the bound, then you say, "Well, it looks like actually a lot of information was conveyed," and you flag it for the analysts. And, and, and what's cool is, you know, we did even the timing.

And so i- if they're using one of these sneaky timing channels, we'll still find them. Vern, can you tell me how this might have worked outside the lab? Yeah, how does that actually work in practice ? Yeah. And we gathered enormous amount of DNS lookups. So, in the end, we had more than 200 billion from, um, a, a variety of sites. It's, it's wonderful having, uh, contacts with fellow data pack rats who can make stuff available.

And, uh, and we ran this detector on, um... and some of the sites, it was years of data. And sure enough, we find tunnels. Some of them are nearly all of them are benign. It turns out, for example, that your antivirus, Sophos, when it wants to ask the mothership, "Hey, is this something I should be concerned about?" uh, it encodes all that into a bunch of DNS queries and sends it out. And back comes, um, the answer to the query as either, "That's benign," or, "That's a problem." So Sophos would pop up in, in, uh... for each of these enterprises where we were analyzing all this data, it would pop up over and over. Was it in the clear? Could you see it or did you have to do some kind of interpretation? Yeah, you couldn't tell. I mean, it's just a whole bunch of gobbledygook, you know, letters and numbers. Who knows what it is. Yeah. And, you know, that's sort of the point. And, and yet, you know, you trust Sophos, so you don't think it's actually exfiltrating your information. Another one's Spotify, would do this a lot. I'm like, "What the heck's up with that?"

You know, is it busy trying to tell the mothership, "This is what's on this computer. What do you think they would like as music?" I mean, I don't know. And, and you would like to know what's in these. Last part of the algorithm is to, first you need some sort of initial period where you just say, "We're learning all of the things at the site that do a bunch of DNS communication." You know, either you ask the analyst, "Please look at these 50," or you just assume they're okay. And, uh, so you've got a sort of startup transient where you're finding all those. And then the idea is from going forward, as you find new ones, you alert the, those to the analysts. The analysts got to go run them down.

They're very often gonna be benign. Uh, we showed that at, um, the site where we had the most longitudinal data, which was Lawrence Berkeley National Lab, once you've done all that pre-training or whatever you want to call it, then the analysts would be bugged in the long term about once a week. And so, for sites that really care, are really worried about...... their, um, security, having the analysts bugged once a week or, you know, once a day, that, that's fine, because a positive finding is incredibly, uh, important. And in addition, we found actual tunneling tools.

And so, some of these you can recognize because they have very identifiable patterns. Like, uh, there's a popular one called Iodine. And, uh, Iodine has a keep alive in it, meaning even if you're not doing anything with the covert channel, it'll still send out periodic or quasi-periodic lookups. And it's doing that because Iodine's meant for a two-way channel, so you can- Mm-hmm. ... communicate both directions. And so, internal system that, that's making the queries has to ask repeatedly,

"Hey, do you have anything for me? Do you have anything for me?" And so, Iodine has this structure that's very easy to recognize, and we were able to look at some of the tunnels we found and say, "You know, that's clearly Iodine." Mm-hmm. That was great. So, that meant, you know, in that enormous quantity of lookups, there were actual tunnels that were not these big corporate sort of quasi-tunnels, like

Sophos or Spotify. Yeah. And we were able to find them. So, so that, that's the algorithm in, as a whole. That's pretty cool. And it's really fascinating to think that intruders would use DNS because they have a high likelihood of being able to communicate through these corporate defenses, and legitimate providers make the same decision.

Yes. As they're trying to figure out a way to communicate and they're like, "Oh, well, DNS, we'll do it that way." Yes, yes, yes. I mean, yeah, exactly. It is, it is such a difficult channel to fully suppress. You just gotta have it. And so, yeah, it, it really makes sense as the go-to covert channel. So, you mentioned with the way you implemented it, you could get, say, maybe a detection a day or perhaps a, a week. What if you were to tune your thresholds lower, like you talked about the 10 kilobyte limit. Yeah. If you went down to the four, I would just assume you're getting more of those alerts per week or day. Yeah, exactly.

It goes up as you lower that, and it goes up, um, w- once you get much below four, it goes up a lot. Yeah. And, uh, but you can do the opposite, which is say,

"You know, in my environment, I really don't want to be bugged unless it's important," and set it at 100 kilobytes, and you're not gonna get very many at all. Hm. Yeah. So, I think this also points to the fact that you need evidence to look at once you've had one of these alerts. Oh, absolutely.

Yeah, yeah. You gotta understand what, what is that system doing and, and also just analyze long and hard, why is it talking to that domain? What's it trying to achieve? You know, just, is it benign? And very often, it is. But, uh, it takes a lot of context to make that determination. Are you familiar with how any of that was done in your examples, like how those, once you delivered an alert to someone, how they investigated what they found? Yeah. Um, so for some of those, um, uh, Lawrence

Berkeley National Lab, one of our co-authors was there, and he was part of the security team, so he would run them down like, like he would. And then similarly at IBM, they were able to work with the security team at, uh, IBM Watson, which is where we got the IBM data. And so, for those two, we, we could do, uh, in depth. For the others, we would just have to do it contextually. Like, uh, well, so some of them we can just recognize, "Hey, it's Iodine." You know, we know. Yeah. And some of them, we, you, you're just doing who is on the name and Googling that in a tunnel and stuff like that, you know, sort of secondary assessment, that would generally resolve it. So, there were a number where we're like, "We've just done that."

I'm guessing at Berkeley they were looking at Zeek data to try to help understand what was going on. Uh, yes, yes. Um, Lawrence Berkeley runs, uh, uh, the world's longest running

Zeek installation. Wonderfully has logs going all the way back to even before I started running Zeek there, and I had scripts that produced similar logs. So, and they've kept all that. Little footnote on that, uh, there have been multiple times when they've needed to go back five or more years to try to understand something. And, uh, that's been gold having that data for that long.

And, uh, really quite striking. Oh, yeah. Uh, I... The, the very first Mandy and MTrans report, we reported... This is back in 2011. We reported that the median dwell time, I think, was 400 and... 411 days. It was well over a year. Oh, yeah. And we had reliable case data where the, the forensic evidence would lead us back as far as six or seven years, and that was as far as we, we were able to go back because the data simply ended.

The intruders could have been there even longer. Um, and... That's amazing. Yeah. Yeah, it, it- It's sobering. Th- yeah, thankfully, things have improved quite a bit. I think the last report was down to 11 days. You know, who knows, maybe we're...

If we could actually get down to single, single-digit days, that would be quite an improvement. But, um... Uh, that would be great. On the other hand, that's average, right? So, you're gonna have a tail- Yeah. ... that you're gonna care about. A- and you, let's say you get it down to a week, that's still a week of the intruder being able to do whatever they want. Yeah, yeah. Which, you know, we, we talk about trying to get down to an hour is sort of... If you can get down to an hour, that severely restricts the intruder's ability to accomplish their mission, unless they're really on point and they know what they're doing. Yep, yep, yep. So, as, as they- Yeah. ... build up their automation, they'll get better and better at doing it quickly.

Um... Yes. It also points out, um, and, and a comment I wanted to make earlier, I... So, detecting these is, the, the DNS stuff, is, is what I term, it's not really the term used in the community, but, uh, behavioral detection. Namely you're finding behavior that means you already have a problem. It's not about preventing the initial problem, uh, and yet it's just, it, highly useful for what you just framed, which is to reduce dwell time. Mm-hmm. My experience, the best way to find out what an adversary knows about you is to compromise their infrastructure, get inside their decision loop, be on their computers. But that is a luxury that only a few in the world have. It is not open- That sounds so fun. ... to private organizations. Yeah.

So, so the next best is to do the sorts of, uh, what you're describing, is these really thoughtful high-end planning for certain types of activity and looking for it that way. And I think that it's really clever that if you, like so many people try to approach this problem and try to solve everything, and you set some boundaries where you said, "No, we're gonna, we're gonna think about what we can do and keep it within what the adversary would be able to do by setting those, those, uh, four and 10 kilobyte limits." I think that's pretty clever. I actually have a talk I give sometimes about finding very damaging needles in enormous haystacks. And, uh, so I've gone after several problems like this one. This is one of the ones I really like to talk about, where th- that's indeed part of the overall strategy. You have to find a subset of the problem that you still care about, and not try to tackle the whole problem. Yeah. Maybe we'll chat about that, one of those, uh, on another, another podcast.

I think that would be great. Vern, I, I really appreciate you spending this time with me today and explaining it so I could understand it, and I'm sure our audience will appreciate that as well. So, uh, thank you for joining us. Sure. It was a lot of fun. Thank you. Thank you for joining us on the Network Defenders. podcast, sponsored by Corelight. We will see you on the network.

You've been listening to Corelight. Defenders. To stay informed with expert intelligence on today's cybersecurity challenges, please subscribe to ensure you never miss an episode. We'll see you on the network.

Richard Bejtlich

Strategist & Author in Residence

Richard is strategist and author in residence at Corelight. He was previously chief security strategist at FireEye, and Mandiant's CSO when FireEye acquired Mandiant in 2013. At General Electric, as director of incident response, he built and led the 40-member GE Computer Incident Response Team (GE-CIRT). Richard began his digital security career as a military intelligence officer in 1997 at the Air Force Computer Emergency Response Team (AFCERT), Air Force Information Warfare Center (AFIWC), and Air Intelligence Agency (AIA). Richard is a graduate of Harvard University and the United States Air Force Academy. His fourth book is 'The Practice of Network Security Monitoring'. He also writes for his blog and Mastodon.