The previous validation example demonstrates how to filter packet data at the application level. For UDP data which normally contains full messages in one packet, this is straightforward, but for TCP protocols which contain messages fragmented across multiple packets, we need to treat the data like a stream.

Validation is an important concept to grasp to understand streams. The example in Validation: Packet Filtering has a “data” key set to “packet”. The majority of Market Data protocols contain messages that are defined entirely in the packet data itself. These messages never cross packet boundaries in most cases. TCP protocols and some complex UDP protocols can fragment messages across multple packets. TCP protocols do this because of the nature of the TCP protocol itself, and some UDP protocols with complex reliability mechanisms can split messages across packets.

To decode these types of protocols, we have to define a “stream”. The packet data is combined into a continuous stream of messages which we can decode as complete (unfragmented) individual messages.

The diagram below shows this relationship between packet data and messages. The Start of Message (SOM) is always at offset 0 for UDP messages. For TCP, the SOM is a set of bytes which can identify the beginning of a message but can appear at any offset within packet data.

“minimum_size” tells the decoder that for “packet”, the packet data size must be at least this many bytes before even attempting to validate any data. Any packet data below this size is discarded.

In the case of “data”:”stream”, the “minimum_size” tells the decoder how many bytes to buffer before attempting to validate it.

For stream protocols, we need to define the validation rules which will identify the SOM. The protocol we’re attempting to decode must contain a “msg_size” field. It can come under many names but the underlying meaning is the same: it is the byte size of the message that follows in the stream. Typically, these protocols have a few fixed identification bytes at the start and a message size immediately after.

The validation rules are applied and if the data at the current offset (e.g. 0) is invalid, the current byte is skipped and then the validation rules are applied again at offset 1 and so on until the decoder finds the SOM for which the validation rules succeed.

Once the SOM is identified, and message size decoded, ACD will buffer the required message size into a complete message. Once buffered it will attempt to decode the data depending on its message definitions. In the simple case, only one set of messages can be defined and used to decode the protocol. In a complex scenario, multiple streams can be defined (see next section).

Streams: UDP vs TCP

Simple Streams example

"Validation": {
"data":"stream",
"minimum_size":7,
"fields": [
{
"name": "MessageSize",
"type": "UINT16",
"flags": ["msg_size"]
},
{
"name": "string_id", "type":"STRING4", "offset":3, "validate": ["v2.0"]
},
{
"name": "numeric_id2", "type":"UINT8", "offset":2, "validate": [255]
}
]
},

Validation object:

JSON key: "data" - required
JSON value: A string containing “data” or “stream”.

JSON key: "minimum_size" - required
The minimum amount of data required before applying the validation rules.

JSON key: "fields" and “validate” - optional
The set of fields to validate. For each field decoded, the JSON key “validate” is optional. but must be added to every field listed for any validation to be performed.