The two main ways in which stack probes provide information to VMX-Analysis or via the Core Data Feed are:
As individual messages
As summarised statistics.
Message publishing is covered in the following section on Message Collectors.
Summarised statistics are useful, for example, for network information where downstream systems would not be sized to cope with the full message rate of the data from the network. Summarised statistics are also useful as an overview of performance over longer time periods. Careful use of the percentiles that are stored makes these statistics even more valuable.
Both VMX-Capture and VMX-Analysis share the concept of aggregations as the data structure in which statistics are presented. See the Analytics Concepts Guide or Beeks Analytics Data Guide for a high-level introduction to aggregations.
VMX-Analysis computes stats based on the Agent events it receives, correlates, and associates.
VMX-Capture can generate stats about the quality of the Visibility Points that the Agents reside within - such as network, middleware, and market data stats.
Aggregations of the results of calculations that are performed in the VMX-Capture layer are known as pre-aggregated statistics. This distinguishes them from the aggregators that are defined only in the VMX-Analysis layer.
Where statistics are pre-aggregated by VMX-Capture, this can be performed in two places:
The stat_collector within an individual probe can perform pre-aggregation of the traffic that it has visibility of, and can pass these statistics to VMX-Analysis (including aggregations).
If multiple stack probes are processing messages at high volume, and the results need to be combined before they are passed to VMX-Analysis, then the P3 pre-aggregation function can be used for this.
This section describes the stat_collector within the stack probe, which is the standard way of performing pre-aggregation. See Advanced VMX-Capture Configuration for an overview of the P3 process, including P3 pre-aggregation.
Once the statistics are calculated, they can be output to another system using one of the following methods:
The statistics can be sent to VMX-Analysis as statistics events. VMX-Analysis server does not need to calculate the statistics but does take responsibility for persisting the statistics and making them available to query. See the Market Data Worked Example for a fuller worked example of a configuration which sends statistics to VMX-Analysis.
The statistics can be sent to a customer application via the Core Data Feed. See the CDF-T section of the Core Data Feed Guide for more information. The kafka collector is the stat collector responsible for CDF-T output.
Example Stat Collector Configuration
The stat collector defines which statistics to compute for the packets that match the BPF.
For example:
"stat_collector"
: [
{
"type"
:
"module"
,
"value"
:
"vmxgenericaggregatorinputeventconnector"
,
"id"
:
"mdStats"
// md stats is configured for gap detection, wire latency, micro bursts etc
},
{
"type"
:
"module"
,
"value"
:
"vmxanomalyconnector"
,
"id"
:
"coll_vmxanomalyconnector"
// anomaly connector creates stats on anomalies.
}
]
As with the decoder definitions, where there is an id defined, there will be further definition for that particular statistic collector later in the configuration file.
Statistics are provided to VMX-Analysis as aggregations. In the above example, a separate file containing an aggregator definition is provided in the mdStats part of the configuration:
"mdStats"
: {
"parameters"
: {
"blocking"
:
false
,
"pool_size"
:
1000
,
"buffer_size"
:
336
,
"publish_interval_us"
:
10000000
,
"timestamp"
:
"TIMESTAMP"
,
"connector_id"
:
"MD_statsAgent"
,
"node_path_stats_json_filename"
:
"$VMX_HOME/../../server/config/agent/global/preagg/MD_stats.stack.agg.json"
}
},
Example Aggregator Configuration
Below is an example aggregation configuration file. This might be referenced by the vmxgenericaggregatorinputeventconnector
configuration above, or it might be referenced by other stats collectors like for example the Kafka Stack Statistics Collector (collKafka
).
For more background on the Kafka Stack Statistics Collector, see the Core Data Feed Guide for how the CDF-T is configured to output statistics.
{
"fieldDefinitions"
: {
"packet_bytes"
: {
"statistic"
: {
"name"
:
"ip.packet_bytes"
}
},
"packet_bytes_1ms"
: {
"statistic"
: {
"name"
:
"ip.packet_bytes"
,
"microburstNS"
:
1000000
}
},
"tcp_conversations"
: {
"activeStreams"
: {
"type"
:
"tcp"
}
}
},
"fieldSets"
: {
"allFields"
: [
"packet_bytes"
,
"packet_bytes_1ms"
,
"tcp_conversations"
]
},
"expressionDefinitions"
: {
"srcDstTypeNameExpression"
: {
"static"
: {
"value"
:
"srcDstType"
}
},
"srcIpExpression"
: {
"datafield"
: {
"name"
:
"ip.src_host"
}
},
"dstIpExpression"
: {
"datafield"
: {
"name"
:
"ip.dst_host"
}
}
},
"propertySets"
: {
"srcDstTypeProperties"
: {
"name"
:
"srcDstTypeNameExpression"
}
},
"aggregations"
: {
"srcDst"
: {
"keys"
: [
{
"srcIp"
:
"srcIpExpression"
},
{
"dstIp"
:
"dstIpExpression"
}
],
"fieldSets"
: [
"allFields"
],
"propertySets"
: [
"srcDstTypeProperties"
]
}
}
}
The above aggregation configuration file could produce a statistic amount similar to the following:
{
"srcIp"
:
"10.1.1.1"
,
"dstIp"
:
"10.1.1.2"
,
"name"
:
"srcDstType"
,
"packet_bytes"
:
700.0
,
"packet_bytes_1ms"
:
420000.0
,
"tcp_conversations"
:
1.0
}
Field Definitions (Required)
The mandatory fieldDefinitions
object(s) defines a data source, which generates a statistic.
This is a list of the possible objects that can form each statistic that you define.
Data Source | Type | Additional Properties | Description |
---|---|---|---|
statistic | object | name (Required, string) MicroburstNS (optional, integer) | The name of a particular stack statistic. See below for MicroburstNS description |
activeStreams | object | type (enum of string) | Predefined configurations to count active streams of different types. The different types are: ip, tcp, or udp. It is also possible to define your own custom count of active streams - speak to your Beeks contact about how to enable this in the configuration. |
count | object | datafield (string) | An accumulator count of a particular stack datafield. |
countPerSecond | object | datafield (string) | An accumulator count of a particular stack datafield, as a rate. |
minimum | object | datafield (string) | An accumulator minimum of a particular stack datafield. |
maximum | object | datafield (string) | An accumulator maximum of a particular stack datafield. |
mean | object | datafield (string) | An accumulator mean of a particular stack datafield. |
last | object | datafield (string) | The last seen value of a particular stack datafield |
standardDeviation | object | datafield (string) | An accumulator standard deviation of a particular stack datafield. |
skewness | object | datafield (string) | An accumulator skewness of a particular stack datafield. |
kurtosis | object | datafield (string) | An accumulator kurtosis of a particular stack datafield. |
percentile | object | datafield (string, Required) value (number, must be greater than or equal to 0 and strictly lesser than 100, Required) | An accumulator kurtosis of a particular stack datafield. The percentile that is calculated is defined by |
It is also possible to calculate distributions in an aggregation as well - speak to your Beeks contact about how to enable this in the configuration.
A commonly used statistic is microburstNS
, which takes an integer which defines the microburst interval in nanoseconds that you want to calculate. The following aggregator configuration extract shows how to calculate different granularity microburst measurements using the stack microburstNS function:
"fieldDefinitions"
: {
"packets"
: {
"statistic"
: {
"name"
:
"ip.packets"
}
},
"packets_1ms"
: {
"statistic"
: {
"name"
:
"ip.packets"
,
"microburstNS"
:
1000000
}
},
"packets_10ms"
: {
"statistic"
: {
"name"
:
"ip.packets"
,
"microburstNS"
:
10000000
}
},
"packets_100ms"
: {
"statistic"
: {
"name"
:
"ip.packets"
,
"microburstNS"
:
100000000
}
},
"packets_1s"
: {
"statistic"
: {
"name"
:
"ip.packets"
,
"microburstNS"
:
1000000000
}
}
}
Field Sets (Required)
The mandatory fieldSets
object(s) is a map of field set name to the list of fields which have been defined in the field definitions. Each fieldSet object must be an array of definition names. The example above creates a fieldSet called allFields which contains all the fields defined in the earlier Field Definitions section.
Expression Definitions (Optional)
The optional expressionDefinitions
object(s) can be one or more expression generators, which result in a string.
The most common types of expression are either datafield
expressions, which returns the value of a datafield, or static
expressions, which always return the same value.
Property Sets (Optional)
The optional propertySets
object(s) provides a map of property set name to expression name. This mapping is used in the later aggregations section of the configuration.
Aggregations (Required)
The mandatory aggregations
object(s) define the specific aggregations.
Each additional property must conform to the following schema:
Parameter | Type | Required | Description |
---|---|---|---|
keys | array of strings | N | Map of key name -> expression name. |
fieldSets | array of strings | Y | Array of field sets to calculate for the aggregation. |
propertySets | array of strings | N | Array of property sets which are static for each instance of the aggregation. |