Configuration Guide for VMX-Capture

In the previous example, we saw how a stack probe can perform complex functions efficiently, and be configured to send summarised statistics or anomalies to VMX-Analysis and provide an overview of system performance.

This example looks at how a probe for order entry protocols might be created in VMX-Capture. It uses configuration that was generated for the Beeks Analytics for Markets (BAM) templated deployment as an example.

Within BAM order entry traffic is notably different from market data for the following reasons:

The traffic is bidirectional (BAM makes the assumption that market data is unidirectional, from venue to internal systems).
Although the traffic is summarised as statistics, individual messages are also sent (as Agent Events) to VMX-Analysis for it to perform more complex correlations (and to persist to the database, for querying in order searches).
- The fact that individual messages are sent as Agent Events also means that there is more emphasis on normalising key message fields with orders than there is with market data. This allows, for example, FIX orders and OUCH orders to be viewed side-by-side in the Item Trace view.

We will examine how these differences are implemented in the configuration in the following sections.

See the Beeks Analytics Data Guide for more of a description of what the Beeks Analytics for Markets template covers.

Stack probe configuration for NYSE FIX

We will take the processing of NYSE FIX data as our example configuration.

As mentioned above, order traffic is captured in both directions. The Beeks Analytics for Markets configuration implements that with two separate stack probes:

nyse_fix_ingress.stack.json describes the stack probe configuration for processing order messages that are received from NYSE.
nyse_fix_egress.stack.json describes the stack probe configuration for processing order messages that are received to NYSE.

Both of the above files are found in the standard directory for stack probe configuration for a Beeks Analytics for Markets configuration:

<confdir>agent/pmux/<PMUX_NAME>/

Here’s an example of what the start of the ingress.stack.json configuration file looks like:

{
    "probe": {
        "parameters": {
            "name": "nyse_fix_ingress",
            "filter": "tcp and ((src 172.18.10.36 and src port 5001)) ",
            "debug": false,
            "protocols": [
                {
                    "type": "module",
                    "value": "ethernet"
                }
                ...

Stack probes configured per direction and Filter Settings

Defining the probe by the direction of traffic is achieved easily, by switching around source and destination in the BPF filter for the probe:

Stack Config File	Filter Value
nyse_fix_ingress.stack.json	tcp and ((src 172.18.10.36 and src port 5001))
nyse_fix_egress.stack.json	tcp and ((dst 172.18.10.36 and dst port 5001))

Stack layer overview

The following layers are present in the stacks:

            "protocols": [
                {
                    "type": "module",
                    "value": "ethernet"
                },
                {
                    "type": "module",
                    "value": "ip"
                },
                {
                    "type": "module",
                    "value": "fix",
                    "id": "dec_payload"
                }
            ]

With the BAM configuration, the IP and Ethernet modules are only required because some of these fields are important for direction of traffic or for labelling traffic. Actual statistics for IP (or indeed TCP) is relevant to order sessions - but these are captured via separate stack probes.

Decoder layer (FIX)

The dec_payload module has some additional configuration in the stack configuration as follows:

    "dec_payload": {
        "parameters": {
            "ddc": [
                {
                    "type": "dynamicDatafield",
                    "id": "52",
                    "datafieldType": "timestamp",
                    "name": "beeks.payload_timestamp"
                },
                {
                    "type": "dynamicDatafield",
                    "id": "60",
                    "datafieldType": "timestamp",
                    "name": "fix.TransactTime"
                },
                {
                    "type": "dynamicDatafield",
                    "id": "17",
                    "datafieldType": "int",
                    "name": "fix.ExecID"
                }
            ]
        }
    }

The dynamicDatafield performs the following transformations as part of the protocol decode:

Converts FIX tag 52 (the SendingTime field) and FIX tag 60 (the Transaction Time field) into a timestamp format, and gives them appropriate names for use elsewhere in the configuration (“beeks.payload_timestamp” and “fix.TransactTime”).
Converts FIX tag 17 into an integer, with the name ‘fix.ExecID’.

Transform collector configuration

The transform collector provides the mapping functions. These ensure that the properties that are passed to VMX-Analysis as Agent Events or statistics are correctly mapped from fields that have been decoded.

            "transform_collector": [
                {
                    "type": "module",
                    "value": "mapper",
                    "id": "entitiesMapper"
                },
                {
                    "type": "module",
                    "value": "mapper",
                    "id": "protocolNormalizingMapper"
                }
            ]

entitiesMapper

Unlike for market data, the entitiesMapper can now map internal IP addresses to specific hostnames, if that will help users to understand the data that they are seeing in the Analytics system. Here is an example of that mapping in the entsMapper/nyse_fix_egress.stack.mapper.json file:

{
    "datafields": {
        "beeks.extgroup": "string",
        "beeks.switchport": "string",
        "beeks.intentity": "string"
    },
    "actions": [
        {
            "assign": {
                "beeks.extgroup": "unassigned",
                "beeks.switchport": "unassigned",
                "beeks.intentity": "unassigned"
            }
        },
        {
            "compositeMap": {
                "key": [
                    "ip.dst_host",
                    "ip.dst_port"
                ],
                "mapping": [
                    {
                        "key": {
                            "ip.dst_host": "172.18.10.36",
                            "ip.dst_port": "5001"
                        },
                        "actions": [
                            {
                                "assign": {
                                    "beeks.extgroup": "NYSE_NY_Primary",
                                    "beeks.switchport": "Port5"
                                }
                            }
                        ]
                    }
                ]
            }
        },
        {
            "subnet": {
                "datafield": "ip.src_host",
                "mapping": [
                    {
                        "netmask": "172.18.10.38/32",
                        "actions": [
                            {
                                "assign": {
                                    "beeks.intentity": "Dedicated_Server_1"
                                }
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

protocolNormalizingMapper

The protocolNormalizingMapper for FIX has two sections:

A datafields section which defines the datafields and their type for the different fields that will be processed in the FIX messages.
An actions section which details the transformations that the message will go through. These can be conditional on other fields.

Here is an example of where a mapping is made from a particular order type ID in the message to a more human-readable string which can be passed to VMX-Analysis as part of the Agent Event:

      "isSet": {
        "datafield": "fix.TimeInForce",
        "true": [
          {
            "map": {
              "comment": "Map Time in Force",
              "key": "fix.TimeInForce",
              "mapping": {
                "0": [
                  {
                    "assign": {
                      "beeks.tif": "Day"
                    }
                  }
                ],
                "1": [
                  {
                    "assign": {
                      "beeks.tif": "GTC"
                    }
                  }
                ]
                ...

Here is an example of where the mapping deals with a field that isn’t present for every message. In order to display these messages consistently using the same data model, the field is populated with a null value if the field doesn’t exist in the original FIX message:

      "isSet": {
        "datafield": "fix.Price",
        "true": [
          {
            "assignExpr": {
              "beeks.price": "df['fix.Price']"
            }
          }
        ],
        "false": [
          {
            "assign": {
              "beeks.price": "NULL"
            }
          }
        ]
      }

This is an example from the same file of where FIX messages of Message Type 5 (logout messages) are excluded from the App-to-wire wiretime latency calculation:

"isSet": {
        "datafield": "beeks.payload_timestamp",
        "true": [
          {
            "map": {
              "comment": "Msg Type Wiretime Exclusion List",
              "key": "fix.MsgType",
              "mapping": {
                "5": [
                  {
                    "assign": {
                      "beeks.session_id": "df['fix.TargetCompID'] + ':' + df['fix.SenderCompID']"
                    }
                  }
                ]
              },
              "default": [
                {
                  "assignExpr": {
                    "beeks.wiretime": "df['TIMESTAMP'] - df['beeks.payload_timestamp']",
                    "beeks.session_id": "df['fix.TargetCompID'] + ':' + df['fix.SenderCompID']"
                  }
                }
              ]
            }
          }
        ]
      }

Stat Collector configuration (vmxgenericaggregatorinputeventconnector, aka GAIE)

The stat collectors specify which statistics to compute for the packets that match the BPF. There are five stat collectors defined for most BAM order processing:

            "stat_collector": [
                {
                    "type": "module",
                    "value": "vmxgenericaggregatorinputeventconnector",
                    "id": "tradingStatsExtgrpDS"
                },
                {
                    "type": "module",
                    "value": "vmxgenericaggregatorinputeventconnector",
                    "id": "tradingStatsDS"
                },
                {
                    "type": "module",
                    "value": "vmxgenericaggregatorinputeventconnector",
                    "id": "tradingStatsExtGrp"
                },
                {
                    "type": "module",
                    "value": "vmxgenericaggregatorinputeventconnector",
                    "id": "gatewayStatsExtgrpDS"
                },
                {
                    "type": "module",
                    "value": "vmxgenericaggregatorinputeventconnector",
                    "id": "gatewayStatsExtgrp"
                }
            ]

However, unlike the stat collectors for market data all of these stat collectors are vmxgenericaggregatorinputeventconnector modules. The different collectors are just referenced in order to output the statistics to different aggregators.

For more information on the aggregator structure in Beeks Analytics for Markets templated deployments, please see the Beeks Analytics Data Guide.

Message Collector configuration (vmxconnector)

Market Data stack probes did not have any message collectors defined. For orders, we do need to define this collector, as it will have responsibility for sending the Agent Events to VMX-Analysis:

            "msg_collector": [
                {
                    "type": "module",
                    "value": "vmxconnector",
                    "id": "coll_vmxconnector"
                }
            ]

The coll_vmxconnector configuration is defined later in the configuration file:

    "coll_vmxconnector": {
        "parameters": {
            "blocking": false,
            "buffer_size": "1224",
            "timestamp": "TIMESTAMP",
            "datafield_list": [
                "ip.src_host", 
                "ip.dst_host", 
                "ip.src_port", 
                "ip.dst_port", 
                "beeks.correlationKey", 
                "beeks.request_id", 
                "beeks.request_index", 
                "beeks.previous_request_id", 
                "beeks.session_id", 
                "beeks.request_type", 
                "beeks.response_type", 
                "beeks.exchange_order_id", 
                "beeks.intentity", 
                "beeks.extgroup", 
                "beeks.wiretime", 
                "beeks.gateway", 
                "beeks.vp",
                "beeks.native_msg_type", 
                "beeks.price", 
                "beeks.side", 
                "beeks.quantity", 
                "beeks.tif", 
                "beeks.symbol", 
                "beeks.unsolicited_id",  
                "beeks.transact_time"
                ],
            "filter_gen": {
                "type": "module",
                "value": "variable",
                "id": "uid_gen_agent_routing"
            }
        }
    },
    "uid_gen_agent_routing": {
        "parameters": {
            "uid": "{beeks.target_agent:=Request}Request|{beeks.target_agent:=Response}Response|{beeks.target_agent:=Ack_Response}Ack_Response|{beeks.target_agent:=Reject}Reject|{beeks.target_agent:=Unsolicited}Unsolicited",
            "name": "Request"
        }
    }

The datafield_list is a list of all of the fields which are made available to VMX-Analysis. You can reference these fields in your VEx in order to access these fields in VMX-Analysis.

The uid_gen_agent_routing filter_gen is used to ensure that the message is sent to the correct Agent in the VMX-Analysis configuration.

You also have to ensure that aggregator definitions and agent definitions are all set to match the expected fields.

See the Core Data Feed Guide for examples of how to configure the kafka message collector module, which allows you to output via CDF-M. The rest of the configuration in this section can easily be adapted to output order details via the kafka stat and message collectors.