The first time an MCP (Model Context Protocol) server felt real to me, it wasn't because of a clean demo. It was because of the noise.
TL;DR: The harness matters more than the protocol, and the evidence matters more than both. MCP earns its keep when it shortens the path from a good security question to trustworthy evidence, and almost everything interesting about making that work happens in the harness wrapped around the model. In this series, I will cover how to build an MCP for an AI SOC.
To effectively defend against modern threats, security leaders must first understand how their tools behave under extreme pressure. The Black Hat NOC provides a unique crucible for testing these systems, exposing the critical visibility gaps and human bottlenecks that traditional workflows fail to address. In this chapter, I explore the realities of high-noise environments and the initial growing pains of building a system to handle them.
This chapter includes:
The Black Hat NOC isn't a normal SOC. People in classrooms are learning exploits, trainers are running labs, researchers are demoing techniques, and CTF traffic is constantly moving across the network. The show floor has its own activities going on. Behavior that would light up an enterprise SOC is not just allowed here, it is the point of the event.
That creates a detection category most organizations don't have. We call it a Black Hat positive. The alert is real. The detection did its job. The activity is genuinely suspicious. At Black Hat, it may also be expected. A paid classroom running a sanctioned exploit isn't a false positive, but it also isn't something I need a human burning cycles on. The harder job is finding what is truly malicious in a place where so much malicious-looking activity is legitimate by design.
One example sticks with me. We got an alert on one of our own access points reaching a destination flagged by an intel feed. In a normal enterprise, that's an immediate investigate-now, drop-everything kind of alert. Our own APs should not be calling out to known bad destinations. In the Black Hat NOC, this alert landed in a queue next to every other suspicious-looking-but-legitimate thing. It didn't get the instant attention it deserved, because humans were busy running down other alerts that also looked real.
It turned out to be a bug in how the AP was routing traffic. The connection and DNS evidence lined up with a misrouted lookup, not C2 behavior, and the destination was a stale listing, not active infrastructure. Not an incident, but the lesson stuck. When noise is high, genuinely critical signals can sit longer than they should, not because anyone is asleep, but because there are only so many cycles to go around. That's the whole reason ML- and AI-assisted triage matter here. They do not replace the analyst, but rather lower the noise floor. This allows the signal that actually matters to get to a human on it sooner, so fewer truly bad things get buried.
Most analysts already know what they want to ask:
The problem is getting from the question to the right data quickly. In most environments, that means remembering which tool has the data, which query language it wants, which field names matter, how to pivot, and how to explain the result to the next person. That's the translation tax. It shows up in every SOC and it gets painfully obvious in the Black Hat NOC, because the volume is high, the environment is weird, and chasing the wrong thing is expensive.
MCP lets an AI client talk to tools through a common interface. A user asks in natural language, and the model uses the MCP server to reach the system underneath: a SIEM, a data lake, an endpoint platform, a database, an internal API, or whatever holds the answer. Instead of forcing every analyst to think in Splunk, Elastic, LogScale, XSIAM, or SQL syntax, you move that translation into the AI layer. The analyst still owns judgment. The system still needs evidence, and the telemetry still has to be good. But the path from "What is this?" to "Here's what we know" gets shorter.
I started taking AI-assisted workflows seriously in late 2024. The first few months were mostly frustration. Small demos were easy, but the real work wasn't. I'd get one part working, and then the model would break something adjacent. Context windows were too small. Tool surfaces got too large. The system would solve the next request while damaging the previous one.
The breakthrough wasn't a new model. It was learning to constrain the work. In February 2025, a friend named John Gillis sat with me for two hours at the Après-Cyber Slopes Summit and walked me through how he built things with AI. He helped me get past the issues I'd been running into developing with AI, and it unlocked something. I stayed up until 5am that night working on a project, and I haven't stopped building with AI since. The frame I'd been missing was simple. You don't let the model wander. You tell it what it can touch, what it can't, which tools are visible for this task, what good looks like, and what evidence it has to return so a human can trust the answer.
MCP itself was brand new at that point. The first MCP I built came the following month, March 2025, and it went live for Black Hat USA that summer. Once I started putting rules around the system, the MCP got useful. Connecting a model to tools is the easy headline, but keeping the workflow bounded, inspectable, and useful takes work.
My first MCP connected to a SIEM pulling demo Corelight logs. It was useful, but nothing like running against Black Hat traffic. This is the line where a lot of AI security projects get humbled, including mine. A sample dataset isn't the same as live, messy, high-volume data in an environment that's already weird. At Black Hat, a lot of traffic is intentionally suspicious. A normal enterprise baseline doesn't apply cleanly there. You can't just block everything that looks like an exploit, because some of those exploits are the classroom.
So the question gets more subtle. Is this malicious, expected, educational, accidental, or just weird conference traffic?
We answer it with frequency analysis, the threat hunter's bread and butter. Stack count by destination, JA3, user-agent, and URI, then bucket those by time and classroom VLAN. If a big chunk of hosts in the same room are hitting the same destinations with the same signature in the same window, it's a strong sign that the behavior is expected for that room. The inverse is equally diagnostic. A single host doing something the rest of the room isn't, or one host that should be part of a pattern and isn't, gets a human. The lone outlier is as informative as the stack.
That kind of context has to be encoded into the workflow. The model may know what a Zeek UID is. It may know what an IP address is. It doesn't know your room schedule, your classroom context, your acceptable-use boundaries, or which systems you're actually trying to protect until you give it that context. MCP can help pull from the systems that hold it, but the design has to tell the model what matters.
Filtering through the noise and surviving the transition from a concept to a real-world tool are important first steps, but that is not enough to build an intelligent workflow. To move into empowering automation, we have to nail the right harness. In Chapter 2: Building the harness: Constraints, context, and code, I will unpack how to encode investigative pivots, manage tool sprawl, and make tools useful for those they are meant to serve.