Quick FIX® log management: How metadata simplifies financial protocol tracking (and how Corelight’s platform can help)

March 7, 2025 by Steve Smoot

Financial institutions, such as banks and trading houses, have a strong interest in recording key transaction activity within their networks. In the face of daunting data storage requirements, many are finding that Corelight’s network metadata—notably metadata produced by Zeek^®—is the key to a simplified tracking and storage process.

Many of our customers used to rely on packet capture (PCAP). However, the challenges of PCAP are well known: keeping everything can require a tremendous amount of disk space, and finding what you are looking for can take hours instead of minutes when a quick response counts.

But now financial institutions are streamlining investigations and meeting their internal or compliance-driven record keeping goals with a combination of Zeek logs and Smart PCAP. This pairing is a feature of Corelight’s Open NDR platform, providing organizations a means to create a record of traffic of interest alongside their Zeek metadata repository, and delivering a useful, additional distillation of the PCAP record.

Zeek is extremely extensible, both in custom behaviors (scripts) and custom protocols (spicy parsers) that can be created by users. Here is one example that applies to the Financial Information eXchange (FIX) protocol, which is used widely across the industry for a variety of purposes (to maintain strict customer privacy, all examples cited here will be from Wikipedia and the public PCAP, and edited for clarity):

A day's work on FIX protocol extended the sensor to handle the ISO 3531 versions (FIX 4.0-5.0SP2 and FIXT1.1)

A transaction on the wire may look like:

    
     8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 | 11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |

In plain English, this is an execution report in the FIX 4.2 version of the protocol, wherein 2007 Microsoft stock has a pending_replace event (open for a day) under “PROGRAM_ORDER_INDEX_ARB_FOR_MEMBER_FIRMORG”, with no previous trade information leaving 15 shares… etcetera. As you can see, lots of information can be encoded very precisely in a compact form about a proposed or executed transaction.

Now with Corelight, a FIX log record would look like this:

    
     {
  "ts": 1448733590.539525,
  "uid": "CY0ARQ2Xj5BOVTyzCi",
  "id.orig_h": "10.100.10.66",
  "id.orig_p": 53867,
  "id.resp_h": "10.100.10.10",
  "id.resp_p": 11001,
  "fix": "1.1",
  "msg": "879=869.0 | 888=Blechnod | 11=ord1 | 78=1 | 458=OverFoo | 804=1 | 40=2 | 34=3 | 524=nestedpartyID1 | 52=20241120-17:59:50.538 | 887=5 | 49=DLD_TEX | 44=481.19398 | 59=4 | 60=20241120-17:59:50.535 | 457=2 | 545=subnestedpartyID1 | 539=1 | 35=D | 79=Account1 | 55=BHP | 38=5472.0 | 54=1 | 56=TEX1_DLD | 711=3 | 453=0 | 311=BOOM"
}

This annotates the log with the network sender/receiver and timestamp, as well as the FIX version and a unique ID (UID) for the connection, which enables tracing of multiple messages. This log can be in a data lake, log storage system, or SIEM, which makes it much more accessible than in a wire protocol embedded in a drive full of PCAPs.

This format is great for FIX experts, but nonexperts may prefer a wordier version of a log that spells things out. So, one option is to convert the log generated into:

    
     {
  "ts": 1448733590.539525,
  "uid": "CMJkbU2duNyzHsGYdl",
  "id.orig_h": "10.100.10.66",
  "id.orig_p": 53867,
  "id.resp_h": "10.100.10.10",
  "id.resp_p": 11001,
  "fix": "1.1",
  "msg": "MsgType=ORDER_SINGLE | SenderCompID=DLD_TEX | TargetCompID=TEX1_DLD | MsgSeqNum=3 | SendingTime=20241120-17:59:50.538 | ClOrdID=ord1 | NoPartyIDs=0 | NoAllocs=1 | AllocAccount=Account1 | NoNestedPartyIDs=1 | NestedPartyID=nestedpartyID1 | NoNestedPartySubIDs=1 | NestedPartySubID=subnestedpartyID1 | Symbol=BHP | NoUnderlyings=3 | UnderlyingSymbol=BLAH | UnderlyingQty=869.0 | UnderlyingSymbol=FOO | NoUnderlyingSecurityAltID=2 | UnderlyingSecurityAltID=UnderBlah | UnderlyingSecurityAltID=OverFoo | UnderlyingSymbol=BOOM | NoUnderlyingStips=5 | UnderlyingStipType=Reverera | UnderlyingStipType=Orlanda | UnderlyingStipType=Withroon | UnderlyingStipType=Longweed | UnderlyingStipType=Blechnod | Side=BUY | TransactTime=20241120-17:59:50.535 | OrderQty=5472.0 | OrdType=LIMIT | Price=481.19398 | TimeInForce=FILL_OR_KILL"
}

Here the standard field names and enums replace the numbers (unknown ones remain as numbers, such as internal extensions to the protocol). You can easily extend the script to take in a table of custom definitions (e.g., 5000 = "AuthorizingAgent") or any other protocol fields you wish to capture.

These logs, while easier for the user, will take up more space. However, a test on the Wireshark PCAP shows only a 10% increase in gziped size between numeric and string, so they may be suitable for many applications leveraging compression.

To enable multiyear record keeping of this protocol, it makes sense to identify just the transaction types of interest. For example, you may want only message types of 8 (EXECUTION_REPORTs).

Additionally, you may want to only record specific fields, or to drop specific fields; these can easily be identified in the Zeek options (onlyFields and neverFields). A super-minimized version might only save symbols and underlying symbols (message types of "D" and onlyFields={55, 311}). If you change the options in the script, then it might slim down to:

    
     "msg": "Symbol=BHP | UnderlyingSymbol=BLAH | UnderlyingSymbol=FOO | UnderlyingSymbol=BOOM"

Looking at other network data

While keeping the pure FIX logs may be sufficient, especially for multiyear storage, you may want to record more network activity to provide context. Suppose your tapping location gives you all trader network activity, not just FIX traffic. In that case, it might be interesting to record all their connections. For example, you could correlate chat apps and web browsing with trading activity. By default, Zeek will capture and analyze all network traffic, so you might have a log like:

    
     {
  "ts": "2024-11-21T07:46:17.047199Z",
  "uid": "CUliK93lh2Ov0OKnhj",
  "id.orig_h": "10.100.10.66",
  "id.orig_p": 52179,
  "id.resp_h": "52.123.188.44",
  "id.resp_p": 443,
  "proto": "tcp",
  "service": "ssl",
  "duration": 11.984806060791016,
  "orig_bytes": 119635,
  "resp_bytes": 19194,
  "conn_state": "S1",
  "local_orig": true,
  "local_resp": false,
  "missed_bytes": 0,
  "history": "ShADad",
  "orig_pkts": 128,
  "orig_ip_bytes": 126303,
  "resp_pkts": 68,
  "resp_ip_bytes": 22738,
  "resp_cc": "US",
  "app": [
    "microsoft",
    "ms-teams"
  ],
  "id.resp_h_name.src": "SSL_SNI",
  "id.resp_h_name.vals": [
    "api.flightproxy.teams.microsoft.com"
  ]
}

Here you can use the network IPs to connect trader workstation activity and timestamps for temporal locality. If your network tap is much wider, however, you might not want to store all network logs, and could get fancy and just record logs for machines that send/receive FIX messages. Code left as an exercise for the interested reader.

Of course, if you need different sensors for different networks you'll want a different way to identify traffic, such as subnets that you can just configure in the user interface for Corelight Sensor configuration.

As described above, the UIDs will link a single connection, making it easy to look up related activity:

    
     {"uid":"CR5tND1dyPMOffqEDh",..."msg":"NoUnderlyingSecurityAltID=2 | TargetCompID=TEX1_DLD | Side=BUY | UnderlyingSymbol=...}
{"uid":"CR5tND1dyPMOffqEDh",..."msg":"NoUnderlyingSecurityAltID=2 | TargetCompID=TEX1_DLD | Side=BUY | UnderlyingSymbol=...}
{"uid":"CR5tND1dyPMOffqEDh",..."msg":"NoUnderlyingSecurityAltID=2 | TargetCompID=TEX1_DLD | Side=BUY | UnderlyingSymbol=...}

In our PCAP example there is only one, so it's not super useful in this case. But it can be useful in the real world.

For organizations actively using these logs, it may be easier to investigate the message (msg) as JSON, which SIEMs, data lakes, and other tools can more easily handle. For example:

    
     {
  "ts": 1448733607.949709,
  "uid": "CYqlUEm1vT0DHPQkf",
  "id.orig_h": "10.100.10.66",
  "Id.orig_p": 53867,
  "id.resp_h": "10.100.10.10",
  "id.resp_p": 11001,
  "fix": "1.1",
  "msg": "{\"35\":\"8\",\"49\":\"TEX1_DLD\",\"56\":\"DLD_TEX\",\"34\":\"584\",\"52\":\"20241120-18:00:07.949\",\"37\":\"ord51\",\"11\":\"ord1\",\"453\":\"0\",\"17\":\"exec530\",\"150\":\"0\",\"39\":\"1\",\"55\":\"BHP\",\"711\":\"3\",\"311\":\"BLAH\",\"879\":\"877.0\",\"311\":\"FOO\",\"457\":\"2\",\"458\":\"UnderBlah\",\"458\":\"OverFoo\",\"311\":\"BOOM\",\"887\":\"5\",\"888\":\"Reverera\",\"888\":\"Orlanda\",\"888\":\"Withroon\",\"888\":\"Longweed\",\"888\":\"Blechnod\",\"54\":\"1\",\"38\":\"3449.0\",\"40\":\"2\",\"44\":\"379.62\",\"59\":\"4\",\"32\":\"1.0\",\"151\":\"0.0\",\"14\":\"3449.0\",\"6\":\"379.62\",\"60\":\"20241120-18:00:07.948\",\"78\":\"1\",\"79\":\"Account1\",\"539\":\"1\",\"524\":\"nestedpartyID1\",\"804\":\"1\",\"545\":\"subnestedpartyID1\"}"
}

This looks sort of annoying with all the quoting, but the tools will handle it. Pretty printing just msg and switching to the text version would instead be:

    
     {
  "MsgType": "EXECUTION_REPORT",
  "SenderCompID": "TEX1_DLD",
  "TargetCompID": "DLD_TEX",
  "MsgSeqNum": "584",
  "SendingTime": "20241120-18:00:07.949",
  "OrderID": "ord51",
  "ClOrdID": "ord1",
  "NoPartyIDs": "0",
  "ExecID": "exec530",
  "ExecType": "NEW",
  "OrdStatus": "PARTIALLY_FILLED",
  "Symbol": "BHP",
  "NoUnderlyings": "3",
  "UnderlyingSymbol": "BOOM",
  "UnderlyingQty": "877.0",
  "NoUnderlyingSecurityAltID": "2",
  "UnderlyingSecurityAltID": "OverFoo",
  "NoUnderlyingStips": "5",
  "UnderlyingStipType": "Blechnod",
  "Side": "BUY",
  "OrderQty": "3449.0",
  "OrdType": "LIMIT",
  "Price": "379.62",
  "TimeInForce": "FILL_OR_KILL",
  "LastQty": "1.0",
  "LeavesQty": "0.0",
  "CumQty": "3449.0",
  "AvgPx": "379.62",
  "TransactTime": "20241120-18:00:07.948",
  "NoAllocs": "1",
  "AllocAccount": "Account1",
  "NoNestedPartyIDs": "1",
  "NestedPartyID": "nestedpartyID1",
  "NoNestedPartySubIDs": "1",
  "NestedPartySubID": "subnestedpartyID1"
}

Alternatively, you could pull all of the FIX fields up into the master log record fields. However, this approach can result in multiple JSON fields with the same name, which isn't valid. This can result in repeat fields that require indexing or additional refinements for production use, depending on the toolchain, so we don’t recommend this approach. If you've solved this problem, let us know!

Program correctness or LINTing

In security, network information is considered “ground truth.” While adversaries can disguise their trail on endpoints once they are breached, what is sent on the network remains on the network. The network’s ground truth is also useful for debugging or certifying that traffic has the expected properties as it moves between applications.

Suppose you have two applications, one producing messages and one consuming them. The consumer requires certain properties to be set; in that case, you can add checks within the Zeek script to verify they have been applied and to let you know which application to look at for troubleshooting. Similarly, you can detect specific custom FIX extensions that the downstream applications may not handle (or may require!). Some of these can be evaluated in the downstream data lake/SIEM, but even complex requirements can be handled in the Zeek scripting layer.

Behavior change or packet capture in defense

Because Zeek is based on events, further customization is possible. You can use the presence of certain fields or fields with values to set off alerts (“NOTICEs” in Zeek-speak) or trigger a packet capture for a specific session if the message is near the start. For example, very suspicious trades could trigger packet captures for other sessions as well (e.g., all the sessions from a trader’s workstation), or to influence severity (e.g., bumping up the priority of alerts from systems with a certain kind of FIX transactions).

Summary

Compliance and internal security are critical objectives for financial organizations, but given the volume of network traffic, network administrators and monitors need tools that scale and simplify long-term data storage. The customization of Corelight Sensors we’ve described here gives organizations across the financial services vertical a new tool that provides both types of functionality.

We encourage potential customers to consider the value of these features. But existing customers who are not yet using Corelight out of band to monitor low-latency trading networks can derive additional value from Smart PCAP and Zeek’s extensibility.

Corelight Bright Ideas Blog

Quick FIX® log management: How metadata simplifies financial protocol tracking (and how Corelight’s platform can help)

Looking at other network data

Program correctness or LINTing

Behavior change or packet capture in defense

Summary

Recent Posts

Sign up for our newsletter

Locations

1 (888) 547-9497

We're hiring!