VMX-Analysis uses Aggregators to transform data into useful insights that are visualised in VMX-Explorer. An Aggregator analyses Agent Events in real-time, allowing you to perform calculations on specific Attributes and roll-up (or slice and dice) the result using other business properties of those Agent Events.

Aggregator structure

Each Aggregator can be thought of as a spreadsheet representing a flattened tree structure, where each cell is tracking a different stat. Each row is typically ‘something that is being measured’ and each column is typically a metric.

In the example below, the nodepath column on the far-left shows the level of the breakdown of the recorded data. There is generally a top level summary row followed by multiple levels of breakdown by technical or business attributes.

Example aggregator structure shown in VMX-Explorer

A key benefit of Beeks Analytics is the ability to be flexible with defining the particular breakdown that is useful for your own individual business purposes, without needing to do a lot of configuration. It is enough to define the technical or business attributes that you want to aggregate by, and then Beeks Analytics will build that structure for you.

Another advantage of the aggregator structure over and above fixed session based configuration is that it supports, without any further customisation, a 'drill-down' experience which maps really well to the kind of problem solving that clients need when analysing network data for capital markets environment.

This structure provides fast answers to questions like 'how is my latency budget being used between different components of the system', 'which market data feeds are causing that spike in volume' and 'Some clients are reporting delays to order entry messages, is this a common problem or just specific to one client'.

Example business use for an aggregator

As a simple example, consider VMX-Analysis receives a series of interval events for an Item. Each interval event carries data about the measurement between two Agents, including the time interval, start location, start host, and end host and location. Imagine you want to view the average latency of the interval events. To do this, you add a moving average calculator to the Aggregator. This calculator takes the time interval value from the interval event and computes the average value. Beeks Analytics provides a number of pre-defined calculators for use in Aggregators.

Aggregator input filters

An input filter lets you control which interval events are presented to a given Aggregator. For example, you can select specific intervals or intervals from one or more Flows within the monitored environment.

Aggregator columns and calculators

Each column in an Aggregator represents a calculation that is performed based on the incoming interval data. This data can be timing measurements or any captured business data. For example, you can compute average latency by averaging the interval time property or the total quantity of orders for a given stock. As well as the pre-defined calculators provided out-of-the-box, advanced users can also define their own calculators.

Calculators can also be linked together. For example, you can use the output from a moving average latency calculation as the input to a best or worst calculation before recording the highest or lowest average latency seen over time.

Pre-Aggregated statistics within VMX-Analysis

As well as Agent Event data, Aggregators are also used to display statistics that have been pre-aggregated by VMX-Capture. This is useful for displaying statistics based around a high volume of messages, e.g. market data.

Historic tick data capture

Any ticking Aggregator cell value can be persisted to a database to provide a historical record of how the value changed over time. For example, you can record overall moving average latency and flow, or some sub-aggregation such as location, subsystem, and so on.

Historic data for a given cell is known as a time series. The ticking cell value is monitored over a short configurable interval, for example 10 seconds, and the open, close, high, low, mean, and time-weighted mean are recorded in the database along with an identifier to indicate the time series and a time-stamp.

By selecting a given time series, a chart can be generated from a set of value points in the database. To reduce the amount of data stored in the database, a compression scheme may be applied to store older data at a lower resolution.

Historic data charts are consequently available for each day of operation. They can be used in more advanced statistical operations such as determining a 30-day moving average. This data can also be plotted on charts and can be used as benchmarks for relative alerts.

Although we refer to this historic data as ‘charts’ for ease of reference, you can more accurately think of these as a set of timeseries data that covers a particular period of time for a particular Aggregator cell at a particular interval, which may be further compressed to a lower resolution interval (e.g. one data point every minute instead of one data point every 10 seconds) for timeseries data that is further in the past.

Time series sets

You can record time series' for a set of related cells. This is called a Time Series Set (TSS). Managing time series as a set simplifies configuration and offers new possibilities when charting and alerting new time series. The benefits of using a Time Series Set rather a single time series include:

  • A Time Series Set can expand automatically to include new time series.
    For example, if a TSS includes all discovered clients, then as new clients are added to the monitored system, they are discovered and automatically added to the TSS.

  • A large number of time series can be configured in a single action, regardless of whether they are currently required.

  • A Time Series Set is efficient at bulk recording.

  • Because Time Series Sets have a hierarchical structure, you can drill down to explore details when viewing a TSS on a chart.

  • Because a Time Series Set represents a group of related time series, you can rank them for display purposes; for example, you can show the top 5 customers by order volume this month.

Types of Time Series Set

There are two kinds of Time Series Set:

  • Aggregator region Time Series Set
    An Aggregator region is a subset of all the cells in an Aggregator. All cells in a defined region are recorded in bulk. You can define multiple overlapping Aggregator regions to work with charts.

  • Externally recorded Time Series Set
    An External time series set allows access to time series data recorded outside the Beeks Analytics server; most typically from an external capture device. By defining an external TSS you can use the externally captured data in VMX-Explorer dashboards, potentially combining external series with locally recorded series onto a single chart.

Using Time Series Sets on Charts

Charts can help you to group and explore the related items in the set. You can use a region to refer to a group of related items rather than defining individual trend lines. For example, by defining a region from the children of a customer node, you can plot trends for a specific per-client measure or rand and display the top five customers by that measure. As new children are discovered, they will automatically be added to the chart. This makes it possible to build dynamic charts without prior knowledge of future data content.

Charts also allow drill-down into a time series so you can navigate through the tree structure of the data to discover more detail. For example, if your Aggregator breaks down customer data further by connection, you could drill down through a given customer's trend line to show the individual connections that customer uses.

Distributions

Appropriate Aggregator cell values can also be setup to store distributions for them.

Beeks Analytics uses High Dynamic Range (HDR) distributions. This is distinct from distributions which use so-called linear buckets. In a linear distribution, each linear bucket is the same “width”. This is useful where the shape of the input data is well understood and the shape of the bucket can be tuned appropriately. By contrast, HDR distributions record the data in buckets of varying “width”. This makes the distribution easy to tune and able to cope well with varied input data.