The two main ways in which stack probes provide information to VMX-Analysis or via the Core Data Feed are:

  • As individual messages

  • As summarised statistics.

Message publishing is covered in the following section on Message Collectors.

Summarised statistics are useful, for example, for network information where downstream systems would not be sized to cope with the full message rate of the data from the network. Summarised statistics are also useful as an overview of performance over longer time periods. Careful use of the percentiles that are stored makes these statistics even more valuable.

Both VMX-Capture and VMX-Analysis share the concept of aggregations as the data structure in which statistics are presented. See the Analytics Concepts Guide or Beeks Analytics Data Guide for a high-level introduction to aggregations.

VMX-Analysis computes stats based on the Agent events it receives, correlates, and associates.

VMX-Capture can generate stats about the quality of the Visibility Points that the Agents reside within - such as network, middleware, and market data stats.

Aggregations of the results of calculations that are performed in the VMX-Capture layer are known as pre-aggregated statistics. This distinguishes them from the aggregators that are defined only in the VMX-Analysis layer.

Where statistics are pre-aggregated by VMX-Capture, this can be performed in two places:

  • The stat_collector within an individual probe can perform pre-aggregation of the traffic that it has visibility of, and can pass these statistics to VMX-Analysis (including aggregations).

  • If multiple stack probes are processing messages at high volume, and the results need to be combined before they are passed to VMX-Analysis, then the P3 pre-aggregation function can be used for this.

This section describes the stat_collector within the stack probe, which is the standard way of performing pre-aggregation. See Advanced VMX-Capture Configuration for an overview of the P3 process, including P3 pre-aggregation.

Once the statistics are calculated, they can be output to another system using one of the following methods:

  • The statistics can be sent to VMX-Analysis as statistics events. VMX-Analysis server does not need to calculate the statistics but does take responsibility for persisting the statistics and making them available to query. See the Market Data Worked Example for a fuller worked example of a configuration which sends statistics to VMX-Analysis.

  • The statistics can be sent to a customer application via the Core Data Feed. See the CDF-T section of the Core Data Feed Guide for more information. The kafka collector is the stat collector responsible for CDF-T output.

Example Stat Collector Configuration

The stat collector defines which statistics to compute for the packets that match the BPF.
For example:

"stat_collector": [
{
"type": "module",
"value": "vmxgenericaggregatorinputeventconnector",
"id": "mdStats" // md stats is configured for gap detection, wire latency, micro bursts etc
},
{
"type": "module",
"value": "vmxanomalyconnector",
"id": "coll_vmxanomalyconnector" // anomaly connector creates stats on anomalies.
}
]

As with the decoder definitions, where there is an id defined, there will be further definition for that particular statistic collector later in the configuration file.

Statistics are provided to VMX-Analysis as aggregations. In the above example, a separate file containing an aggregator definition is provided in the mdStats part of the configuration:

"mdStats": {
"parameters": {
"blocking": false,
"pool_size": 1000,
"buffer_size": 336,
"publish_interval_us": 10000000,
"timestamp": "TIMESTAMP",
"connector_id": "MD_statsAgent",
"node_path_stats_json_filename": "$VMX_HOME/../../server/config/agent/global/preagg/MD_stats.stack.agg.json"
}
},

Example Aggregator Configuration

Below is an example aggregation configuration file. This might be referenced by the vmxgenericaggregatorinputeventconnector configuration above, or it might be referenced by other stats collectors like for example the Kafka Stack Statistics Collector (collKafka).

For more background on the Kafka Stack Statistics Collector, see the Core Data Feed Guide for how the CDF-T is configured to output statistics.

{
"fieldDefinitions": {
"packet_bytes": {
"statistic": {
"name": "ip.packet_bytes"
}
},
"packet_bytes_1ms": {
"statistic": {
"name": "ip.packet_bytes",
"microburstNS": 1000000
}
},
"tcp_conversations": {
"activeStreams": {
"type": "tcp"
}
}
},
"fieldSets": {
"allFields": [
"packet_bytes",
"packet_bytes_1ms",
"tcp_conversations"
]
},
"expressionDefinitions": {
"srcDstTypeNameExpression": {
"static": {
"value": "srcDstType"
}
},
"srcIpExpression": {
"datafield": {
"name": "ip.src_host"
}
},
"dstIpExpression": {
"datafield": {
"name": "ip.dst_host"
}
}
},
"propertySets": {
"srcDstTypeProperties": {
"name": "srcDstTypeNameExpression"
}
},
"aggregations": {
"srcDst": {
"keys": [
{
"srcIp": "srcIpExpression"
},
{
"dstIp": "dstIpExpression"
}
],
"fieldSets": [
"allFields"
],
"propertySets": [
"srcDstTypeProperties"
]
}
}
}

The above aggregation configuration file could produce a statistic amount similar to the following:

{
"srcIp": "10.1.1.1",
"dstIp": "10.1.1.2",
"name": "srcDstType",
"packet_bytes": 700.0,
"packet_bytes_1ms": 420000.0,
"tcp_conversations": 1.0
}

Field Definitions (Required)

The mandatory fieldDefinitions object(s) defines a data source, which generates a statistic.

This is a list of the possible objects that can form each statistic that you define.

List of Field Definition Data Sources

It is also possible to calculate distributions in an aggregation as well - speak to your Beeks contact about how to enable this in the configuration.

A commonly used statistic is microburstNS , which takes an integer which defines the microburst interval in nanoseconds that you want to calculate. The following aggregator configuration extract shows how to calculate different granularity microburst measurements using the stack microburstNS function:

"fieldDefinitions": {
"packets": {
"statistic": {
"name": "ip.packets"
}
},
"packets_1ms": {
"statistic": {
"name": "ip.packets",
"microburstNS": 1000000
}
},
"packets_10ms": {
"statistic": {
"name": "ip.packets",
"microburstNS": 10000000
}
},
"packets_100ms": {
"statistic": {
"name": "ip.packets",
"microburstNS": 100000000
}
},
"packets_1s": {
"statistic": {
"name": "ip.packets",
"microburstNS": 1000000000
}
}
}

Field Sets (Required)

The mandatory fieldSets object(s) is a map of field set name to the list of fields which have been defined in the field definitions. Each fieldSet object must be an array of definition names. The example above creates a fieldSet called allFields which contains all the fields defined in the earlier Field Definitions section.

Expression Definitions (Optional)

The optional expressionDefinitions object(s) can be one or more expression generators, which result in a string.

The most common types of expression are either datafield expressions, which returns the value of a datafield, or static expressions, which always return the same value.

Property Sets (Optional)

The optional propertySets object(s) provides a map of property set name to expression name. This mapping is used in the later aggregations section of the configuration.

Aggregations (Required)

The mandatory aggregations object(s) define the specific aggregations.

Each additional property must conform to the following schema:

Parameter

Type

Required

Description

keys

array of strings

N

Map of key name -> expression name.

fieldSets

array of strings

Y

Array of field sets to calculate for the aggregation.

propertySets

array of strings

N

Array of property sets which are static for each instance of the aggregation.