IRIS: An Interactive Programmable Fine-Grained Network Telemetry System

by

Mana Atarod

A thesis accepted and approved in partial fulfillment of the 

requirements for the degree of

Master of Science

in Computer Science

Dissertation Committee:

Prof. Reza Rejaie, Co-Chair

Prof. Ram Durairajan, Co-Chair

Dr. Christopher Misa, Core Member

University of Oregon

Winter 2025


© 2025 Mana Atarod
All rights reserved.

2


THESIS ABSTRACT

Mana Atarod

Master of Science in Computer Science

Title: IRIS: An Interactive Programmable Fine-Grained Network Telemetry System

Network telemetry systems provide insights into network traffic that are 

essential for maintaining performance and security. While significant progress has 

been made in the area of telemetry system design and implementation, existing 

solutions focus on specific design aspects rather than providing a comprehensive 

end-to-end approach and require extensive expertise in data plane technology

and programming. In response to these issues, this work presents IRIS: an 

interactive, end-to-end, and programmable fine-grained telemetry system built on 

the BroadScan module found in ASICs from Broadcom, Inc., which are deployed in 

a wide range of present-day networks. IRIS provides a flexible, high-level interface 

that enables users to define queries using a simple Python interface. It then 

translates the user-defined queries into hardware-level configurations for BroadScan, 

and efficiently retrieves the generated results from it . This work provides an 

overview of IRIS’s architecture and evaluates its ability to utilize BroadScan’s

flow table at maximum capacity, produce time-based and loss-based statistics, and 

implement example use cases based on real-world network telemetry tasks.

3


ACKNOWLEDGEMENTS

This section is dedicated to all the people who have supported and guided

me throughout my journey as a graduate student.

I would like to sincerely thank my advisors and committee members, Prof.

Reza Rejaie and Prof. Ram Durairajan, for believing in me and giving me the

opportunity to start my graduate career by joining the Oregon Network Research

Group (ONRG). Their unwavering support and guidance were essential in bringing

this journey to completion. I am deeply grateful to Dr. Chris Misa, my mentor and

committee member, whose constant help, support, patience, and wisdom guided

me every step of the way throughout my entire journey as a graduate student.

Not only did his mentorship help me grow in countless ways, but he also laid the

foundation for IRIS and implemented the initial version of it, the expansion of

which made this thesis possible. I would also like to thank Prof. Suyash Gupta,

faculty member of the ONRG lab, for his continued support during the process of

preparing this thesis.

I am thankful to Shahram Davari and the Broadcom Inc. team for their

support throughout the completion of this thesis.

I would also like to extend my thanks to the graduate members of the

ONRG lab—Emad Taghiye, Nima Nikkhah, and River Bartz—who I had the

pleasure of collaborating with and learning from. I am also grateful to the previous

generations of ONRG students whose work laid the foundation for this thesis, and

the future generation who will be carrying on the work.

4


Lastly, I want to express my deepest gratitude to my family and friends,

who were always there for me and provided unwavering support throughout this

entire journey.
This work is supported by the National Science Foundation through CNS

2212590 and OAC 2126281. The views and conclusions contained herein are those
of the authors and should not be interpreted as necessarily representing the official
policies or endorsements, either expressed or implied of NSF.

5


TABLE OF CONTENTS

Chapter Page

I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 11

II. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . 15

III. IRIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1. IRIS Architecture . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1. BroadScan . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.2. Agent . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.3. Remote Server . . . . . . . . . . . . . . . . . . . . . 22

3.2. Python Interface . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1. Query Definition . . . . . . . . . . . . . . . . . . . . 25

3.2.2. Visualization . . . . . . . . . . . . . . . . . . . . . . 27

3.3. Controller . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1. Running Queries . . . . . . . . . . . . . . . . . . . . 28

3.3.2. Updating Queries . . . . . . . . . . . . . . . . . . . . 30

3.3.3. Tuning Queries . . . . . . . . . . . . . . . . . . . . . 30

3.3.4. Stopping Queries and Clearing Switch Configurations . . . . 35

3.4. Collector . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5. Adding New Capabilities . . . . . . . . . . . . . . . . . . . . 38

3.6. Available Switch Layout . . . . . . . . . . . . . . . . . . . . 39

3.7. Required System Configurations . . . . . . . . . . . . . . . . . 41

IV. EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1. BroadScan Flow Table Capacity . . . . . . . . . . . . . . . . . 43

4.2. Collector Performance Analysis . . . . . . . . . . . . . . . . . 44

6


Chapter Page

4.2.1. IPFIX Record Collection Time . . . . . . . . . . . . . . 46

4.2.2. IPFIX Record Loss Percentage . . . . . . . . . . . . . . 47

4.2.3. IPFIX Record Parsing Time . . . . . . . . . . . . . . . 48

4.2.4. IPFIX Record Post-Processing Time . . . . . . . . . . . 50

4.3. Example Use Cases . . . . . . . . . . . . . . . . . . . . . . 51

4.3.1. Tuned Byte Count per Source and Destination IPs . . . . . 51

4.3.2. Top Destination Host by Prefix Zooming . . . . . . . . . . 54

4.3.3. Anomaly Detection . . . . . . . . . . . . . . . . . . . 55

4.3.4. TTL Histogram . . . . . . . . . . . . . . . . . . . . . 57

V. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 60

REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . 61

7


LIST OF FIGURES

Figure Page

1. High-Level Overview of ASICs Supporting P4, Such as
Tofino Switches . . . . . . . . . . . . . . . . . . . . . . . . . 16

2. High-Level Overview of the BroadScan Module . . . . . . . . . . . . 17

3. IRIS Architecture . . . . . . . . . . . . . . . . . . . . . . . . 19

4. Remote Server Class Diagram . . . . . . . . . . . . . . . . . . . 24

5. Translation of a Query to its JSON Representation . . . . . . . . . . 29

6. Quick Tuning FSM . . . . . . . . . . . . . . . . . . . . . . . . 32

7. Continuous Tuning FSM . . . . . . . . . . . . . . . . . . . . . 34

8. IRIS Collector . . . . . . . . . . . . . . . . . . . . . . . . . . 38

9. Border Deployment of IRIS . . . . . . . . . . . . . . . . . . . . 40

10. Lab Deployment of IRIS . . . . . . . . . . . . . . . . . . . . . 41

11. Flow Byte Count and Packet Count Analysis . . . . . . . . . . . . 45

12. Lab Switch IPFIX Record Collection Time Based on Record Count . . . 47

13. Border Switch IPFIX Record Collection Time Based on
Record Count . . . . . . . . . . . . . . . . . . . . . . . . . . 47

14. Border Switch IPFIX Record Loss Percentage Per Epoch . . . . . . . 48

15. Border Switch IPFIX Record Loss Percentage Based on
Record Count . . . . . . . . . . . . . . . . . . . . . . . . . . 48

16. Lab Switch IPFIX Record Parsing Time Based on Record Count . . . . 49

17. Border Switch IPFIX Record Parsing Time Based on
Record Count . . . . . . . . . . . . . . . . . . . . . . . . . . 49

18. Lab Switch IPFIX Record Post-processing Time Based on
Record Count . . . . . . . . . . . . . . . . . . . . . . . . . . 50

8


Figure Page

19. Border Switch IPFIX Record Post-processing Time Based
on Record Count . . . . . . . . . . . . . . . . . . . . . . . . . 50

20. Query Tuning Progress . . . . . . . . . . . . . . . . . . . . . . 52

21. Time Series of Record Count for the Executed Tuned Query . . . . . . 52

22. Query Tuning Progress on the Border Switch . . . . . . . . . . . . 53

23. Time Series of Record Count for the Executed Tuned Query
on the Border Switch . . . . . . . . . . . . . . . . . . . . . . . 53

24. Lab Switch Anomaly Detection . . . . . . . . . . . . . . . . . . 57

25. Border Switch Anomaly Detection . . . . . . . . . . . . . . . . . 58

26. Border Switch TTL Histogram . . . . . . . . . . . . . . . . . . . 59

9


LIST OF TABLES

Table Page

1. Top Destination Prefix Based on Byte Count per Epoch with
the Zooming Strategy . . . . . . . . . . . . . . . . . . . . . . . 55

10


CHAPTER I

INTRODUCTION

Network operators often need to monitor the state of networks to analyze

performance, identify potential issues, detect attacks, or understand user behavior.

This process involves constant and near real-time measurement and analysis of

network components and events [6]. To this end, they deploy network telemetry

systems, which are technologies designed to provide insights into networks and

enable efficient, automated network management [17]. A network telemetry system

must be able to handle high volumes of data, provide real-time and accurate

visibility into the network, and scale efficiently with the growing demands of

networks.

An early and widely adopted tool for network telemetry is the Simple

Network Management Protocol (SNMP) [7]. This protocol involves managers

sending requests to agents (deployed on network devices), with the agents either

responding with the requested data or performing an action. SNMP is useful when

users need to retrieve specific information, such as CPU usage, interface counters,

or memory usage, from a device. However, it only exposes a limited set of counter

measurements. In addition to that, since SNMP uses a polling methodology, it

struggles to scale well in larger networks.

Another approach for network telemetry is the use of packet-level monitoring

tools, such as Wireshark [1, 16, 15]. Extracting specific information from the

data captured by these tools often requires extensive post-processing, which

can introduce significant overhead and be resource-intensive. For instance, in a

typical university campus network, tens of thousands of users may be using the

network and generating millions of packets. Using Wireshark requires parsing every

11


individual packet and extracting information from it, which is not feasible in such

a network. This makes them inefficient for real-time traffic monitoring, as the post-

processing cannot keep up with the high volume of data generated by the network.

Additionally, these tools can consume considerable storage space, as all captured

data must be saved for later analysis.

Flow-level monitoring tools, such as NetFlow [5], provide an alternative by

exporting network information at a coarser level. This reduces overall storage usage

but can still be inefficient for larger networks. Techniques like packet sampling can

help with this to an extent, but at the cost of reduced accuracy, meaning they can

lead to the system potentially missing critical information. Additionally, these tools

capture only a specific subset of flow parameters, including source and destination

IP addresses, port numbers, protocol type, type of service, TCP flags, packet, and

byte counts, start and end timestamps, interface numbers, and routing information.

Any diagnostic requiring information beyond the predefined features captured

during data collection, such as TCP window size, would be unable to use these

records effectively. Moreover, these tools do not capture details about how the

behavior of individual flows changes over time.

A relatively recent approach to network telemetry is programmable data

plane telemetry. This category of systems allows operators to use programmable

data planes to customize how traffic data is collected directly at the network’s

core, often in switches, enabling fine-grained monitoring with minimal overhead and

allowing for scalability in larger networks. Moreover, since the processing is handled

by dedicated hardware pipelines in application-specific integrated circuits (ASICs)

rather than relying on the CPU, programmable dataplanes can operate at line-rate,

12


delivering real-time information to operators. They can also be programmed on the

fly, allowing operators to dynamically adjust them to meet their needs.

A network telemetry system based on programmable data planes must meet

several key factors to properly harness their capabilities. First, it should be flexible

enough to allow operators to customize data collection based on their specific needs

and network conditions. Second, it must report results quickly enough to meet

user needs. For example, if network telemetry is used for detecting and mitigating

security attacks, reports must be generated with sufficient granularity to enable

users to identify and respond to threats in real-time—rather than after an attack

has already occurred. Third, it should be able to scale with the number of traffic

size and number of users in the network. For instance, a campus network could

have tens of thousands of users, and a telemetry system deployed on it should be

able to generate reports in a timely manner. Lastly, it should be resource-efficient,

as programmable dataplanes often have limited resources that must be utilized

efficiently and effectively.

To this end, this work presents IRIS, an interactive programmable fine-

grained network telemetry system, developed by the Oregon Network Research

Group (ONRG) at the University of Oregon. IRIS is built on BroadScan, the flow-

analytics engine present on the widely deployed Broadcom StrataXGS ASICs

[3]. IRIS focuses on providing an easy-to-use and flexible interface for network

operators to configure BroadScan with custom queries and collect real-time results.

The rest of this paper is organized as follows: Section 2 discusses the

background of programmable network telemetry and the related works in the

field, and describes the contributions of IRIS based on their limitations. Section

3 provides a detailed description of the IRIS architecture and its key components.

13


In Section 4, IRIS is first evaluated to confirm its ability to operate at maximum

capacity. Then, its performance is evaluated based on time-based and loss-based

statistics on the receiving, parsing, and post-processing of records exported from

BroadScan. This section also presents several use cases to demonstrate IRIS in

action. Finally, Section 5 concludes the work and outlines future directions.

14


CHAPTER II

BACKGROUND

Traditionally, the data plane is responsible for processing network packets

by executing a series of operations, which include parsing a packet, determining

the processing steps to be applied to it, and forwarding the packet based on the

outcomes of these operations. When programmable data planes were introduced,

dynamic and programmatic changing of the basic packet processing functionality

became possible [11]. Programmable dataplanes allow for customizable, real-time,

fine-grained data collection directly at the network’s core using dedicated ASICs.

This technology resulted in the emergence of a new generation of solutions for

programmable telemetry. By configuring hardware with specific queries, these

solutions can efficiently report only the relevant information, making them faster

and more resource-efficient with performance at line-rate.

There are two common programmable data plane switch ASICs used for

network telemetry:

1. Switches with ASICs supporting P4 (e.g., Tofino switches):

These ASICs provide a fully programmable match-action pipeline,

allowing operators to define the entire sequence of switch operations.

Sketch algorithms are commonly used with this technology. Accessible

documentation on the inner workings of these switches, combined with their

flexible programmability, contributes to their popularity among researchers.

However, deploying telemetry solutions on these devices is only feasible in

certain types of networks, such as hyperscalers and research and education

(R&E) networks with access to research talent. This is due to the need to

15


define all switch operations from scratch on these devices. Figure 1 shows a

high-level view of this type of technology.

Programmable
Deparser

Programmable
Parser

AM

M A

M A

M A

M A

...

AM

M A

M A

M A

M A

...

AM

M A

M A

M A

M A

...

...

Programmable Match-Action Pipeline

Packets Records

Figure 1. High-Level Overview of ASICs Supporting P4, Such as Tofino Switches

2. Broadcom-based switches: These types of switches are already well

integrated into existing network stacks and support many common network

features out of the box. Switches that have Broadcom’s StrataXGS ASIC

contain a certain module called the BroadScan flow analytics engine, which

adds monitoring capabilities without the need to rebuild the network stack.

Unlike the last category of switches, these switches have a pre-defined pipeline

that can be configured with Broadcom’s software development kit (SDK).

However, documentation on how to achieve that is not easily accessible,

making it less popular in the research community. Due to the nature of

the non-disclosure agreement (NDA) signed with Broadcom, details of the

pipeline cannot be shared. However, a high-level overview of the BroadScan

module can be observed in figure 2.

Building network telemetry solutions on programmable dataplanes is not

without challenges. Using these technologies often requires extensive knowledge

about the data plane technology and details about their hardware configurations.

Moreover, the limited resources available on these ASICs, including memory and

16


Flow Table

......

Filter and Learn Update Export

BroadScan Module Pipeline

Broadcom SDK

Configurations

RecordsPackets

Figure 2. High-Level Overview of the BroadScan Module

table lengths, require optimal resource usage to allow the ASICs to work to their

full potential.

Numerous works have been introduced to facilitate the use of programmable

data planes for network telemetry to address these challenges. For example,

Marple [14], Sonata [6], and ESnet High Touch Services [10] provide declarative

interfaces that allow users to express queries for various networking tasks on

programmable switches using high-level abstractions. Another group of solutions

focuses on building highly scalable systems by balancing improvements in coverage,

accuracy, and resource efficiency [19, 9, 8, 18]. Additionally, some works introduce

novel algorithms that enable new capabilities and features in network telemetry

solutions. For example, Chen et al. focus on enabling the simultaneous execution

of multiple queries [4]. Misa et al. present DynATOS [13], a dynamic scheduling

algorithm for telemetry queries built on BroadScan, and later its extended version

DynATOS+[12], with the added feature of adding user-specified accuracy goals

to queries. One limitation of these works is that they typically focus on only one

aspect of programmable network telemetry (declarative interfaces, balancing of

coverage, accuracy and resource efficiency, adding new capabilities, etc.). Moreover,

they often rely on dedicated P4-supporting technology, and only present prototypes

17


with no real-world deployment. In addition, to the best of our knowledge, no

ready-to-use platform for writing dynamic traffic monitoring applications has been

presented.

Building a ready-to-use dynamic data plane telemetry platform comes

with a few requirements. First, it needs to provide an easy-to-use and unified

interface for defining queries, receiving results, and plugging in external tools for

extended applications. It should also handle proper query translation to hardware

configuration. Balancing resources, accuracy, and coverage is another requirement,

based on the limitations of programmable dataplanes. Additionally, operations

must be efficient enough to keep pace with received records from these telemetry

solutions and deliver near real-time results to the operator after parsing and post-

processing the records.

In this work, IRIS is developed on BroadScan, a programmable data plane

switch ASIC, to provide such a platform. It is deployed both in a controlled lab

environment and on a real campus network to showcase its applicability as a real-

world programmable data plane telemetry solution.

18


CHAPTER III

IRIS

IRIS is ONRG’s switch hardware-based network traffic monitoring system.

It is built on Broadcom’s System Verification Kits (SVKs) which contain ASICs

with the BroadScan module. This chapter describes the overall architecture

of IRIS, the available layout of the switches in the ONRG lab, and the system

requirements for the proper deployment of IRIS.

3.1 IRIS Architecture

Remote Server

Query Results
Controller Collector

IPFIX Parser

Unified Interface

Post-processing
function

Broadcom SVK

CPU Broadcom SDK
Agent BCM Driverssoc_libraries

ASIC
BroadScan

Hardware
Configurations

Switch

IPFIX Packets

Network
Traffic

JSON Configurations
(Query + Metadata)

Figure 3. IRIS Architecture

Figure 3 demonstrates the remote-server architecture of IRIS, with the

sections specifically implemented by IRIS highlighted. A user operates IRIS

through the remote server’s interface to send custom queries via the controller to

the agent on the switch. The agent receives these queries and employs the BCM

drivers through the soc libraries available in the Broadcom SDK [2] to configure

the BroadScan module in the ASIC to track flows, collect flow data, and export

19


them, all based on the input queries. The user can then collect the exported

data through the collector, and if needed, do post-processing on the results. The

following sections describe each of IRIS’s components in more detail.

3.1.1 BroadScan. The BroadScan flow-analytics engine is a hardware

module included in various Broadcom ASICs that can be used to track flows,

collect flow data, and report it to a local or remote flow collector. Due to the NDA

with Broadcom, the specific hardware details of Broadscan can not be included in

this document. However, a high-level and abstract overview is provided to explain

IRIS’s capabilities.

BroadScan uses a pipeline with the following high-level components to

collect and report data from network traffic:

1. Filter: Filter packets based on certain header values. This consists of two

parts itself:

(a) Pre-selection: Select which packets enter BroadScan

(b) Selection: Select a subset of flows to be considered for a configuration

(If multiple queries are to be executed, their selected subflows should be

mutually exclusive)

2. Learn: Learn flows based on custom flow definition, e.g., the five tuples

(source IP, destination IP, source port, destination port, and protocol)

3. Update: Update the a custom subset of stored flow parameters when a new

packet from a learned flow is received

4. Export: Export the flow parameters periodically or based on a specific

threshold

20


A user can use the Broadcom SVK to configure the BroadScan module to

define how each of the components in the pipeline must behave. This is feasible

through the BCM Drivers, which provide a map between human-readable tables

and register names, and machine-readable memory addresses. Broadcom provides

these drivers as an open-source SDK [2] which communicates with the ASIC’s

kernel drivers.

To effectively use BroadScan directly, users must have an in-depth

understanding of its pipeline, including the memories and registers involved in its

configuration for specific tasks. The SDK provides a set of flowtracker APIs that

simplify its usage to some extent. However, these flowtrackers do not expose the

full potential of BroadScan, and still require some knowledge of the underlying

hardware. While flowtrackers may be sufficient for many industry applications,

research often requires greater flexibility for customization.

An alternative to using flowtrackers is programming in C at the System-on-

Chip (SoC) level, which grants direct access to memories and registers in order to

access all the capabilities of BroadScan. This can be achieved through the APIs

available in the SDK’s header files and libraries.

IRIS was developed using this latter approach to maximize flexibility and

customization. Additionally, it enables users to operate BroadScan through a

Python interface without needing to understand the intricacies of BroadScan

hardware. IRIS leverages SoC-level APIs, including variations of soc mem read(),

soc mem write(), soc mem field set(), soc format field set(), soc mem clear(),

soc reg field set(), soc reg field get(), and specialized per-memory read/write

APIs provided by the SDK through the BCM drivers. These APIs allow direct

interaction with the memories and facilitate the translation of high-level queries

21


into hardware configurations. Essentially, IRIS can serve as a replacement for

the flowtracker APIs, providing a higher-level, query-based interface for defining

BroadScan configurations while enabling more flexible access to hardware

configurations.

3.1.2 Agent. The agent deployed on the SVK is responsible for

receiving the JSON representation of the queries sent by the user from the remote

server and translating them into hardware configurations for BroadScan to execute.

To receive queries sent from the remote server, a basic JSON server is

implemented using socket programming. Currently, this implementation is designed

to handle queries from only a single client at this time.

Once the agent receives a JSON object through the socket containing the

queries and metadata sent by the user, it utilizes the Broadcom SDK to convert

them into commands that modify the values of the memories and registers in

BroadScan, configuring the BroadScan pipeline to generate results based on the

queries.

This agent is entirely programmed in C and uses the SDK provided by

Broadcom to implement the hardware configurations of BroadScan via the BCM

drivers. Although the majority of the contributions of IRIS are in this section,

due to the nature of the NDA with Broadcom, details of this implementation

that expose BroadScan’s hardware are confidential and can not be included in this

document.

3.1.3 Remote Server. The remote server in IRIS provides an

interface for users to define queries, converts them into their JSON representation,

and sends them to the agent on the switch. It also collects exported IPFIX packets

22


containing query results from the switch and returns the processed results to the

user, which is accessible through the same interface.

With ease of use, speed, and flexibility as primary goals, the architecture

of the IRIS remote server was designed to align with these objectives, and a

combination of Python and Cython was used in its implementation. The interactive

interface is fully developed in Python, offering better flexibility and usability. This

setup allows operators to either configure and reconfigure BroadScan interactively

with new queries as needed, or create algorithms to automatically reconfigure

BroadScan based on received results. Most other parts of the remote server are

also implemented in Python, with Cython used in performance-critical sections.

Figure 4 demonstrates the class diagram of the remote server, specifying the

components implemented in Cython and Python.

The user specifies the high-level BroadScan configurations by creating

instances of the Query class. The user must also create an instance of the switch

class which will be used to facilitate communication between the remote server and

the switch.

The remote server also provides a Logger class that provides a logging

feature. The logger can be used to keep track of IRIS’s performance statistics and

record possible problems in the system.

In addition to its Python interface, the remote server performs operations

that, in its logical view, can be divided into two phases: as a controller and as a

collector. It acts as a controller when defining queries and transmitting them to

the switch, and as a collector when receiving and processing the exported results.

The controller and collector are described in more detail in sections 3.3 and 3.4,

23


Condition

Field
ttl

dos_vector

src_port

ipv4_src
ipv4_len

dst_port

ipg

ipv4_dst
tcp_seq

tcp_flags protocol

Reducer

byte_count

count

ewma_value

min_value

latest_value

add_value

max_value

threshold

Exporter

periodic

Query

PostProcessor

Logger

Q_Tune_FSM

Switch

IPFIXParser

select/group

reduce

export

post_process

run/update/tune

Cython

Python

Figure 4. Remote Server Class Diagram

respectively, along with their relationship to the Python interface. These sections

also explain other major classes defined in the remote server.

3.2 Python Interface

IRIS provides a unified, easy-to-use, and flexible interactive interface for

network operators to configure and reconfigure BroadScan with custom queries on

the fly, either interactively or by defining algorithms to update the queries based

24


on received results. Operator can also define custom post-processing functions for

the IPFIX records and receive real-time results, all through the same interface.

This interface is implemented in its entirety in python, allowing easy extention and

integration with other Python libraries and modules.

3.2.1 Query Definition.

To define a query through the Python Interface, a user must make an instance of

the Query class. Several abstract classes are defined to simplify query definition, as

depicted in 4:

– Field: This abstract class is defined to provide a uniform interface for the

fields accessible through IRIS. These Fields can be packet header fields, such

as the five-tuple flow definition values (source IP, destination IP, source port,

destination port, and protocol), TTL value, or TCP flags. They can also be

other types of values that can be calculated by BroadScan’s hardware, such as

the DoS vector or inter-packet gap (IPG). They can be optionally passed with

custom masks and values depending on where they are used.

– Reducer: This abstract class provides a common structure for the

computations that can be applied to a flow. Each concrete reducer subclass

defines the specific operation and optional fields and conditions it takes.

Examples of reducers are packet count, byte count, and min/max values. To

define conditions in reducers, the Condition class is used to define the range

of the target field.

– Exporter: This abstract Class can be used to customize the export

operation of the query. It has two concrete subclasses:

25


1. Periodic: This type of exporter can be used to periodically export flows

with a customizable epoch duration.

2. Threshold: This type of exporter can be used with a reducer to export

flows when the reducer result reaches a specific value. For example, for

byte count, it can export flows only when the packet count for a flow

exceeds 1000 bytes.

Using the abstract classes outlined above, the user can then use the

following methods in the Query class to customize the query:

– select(): Takes a list of Fields as its argument with set values and optional

masks to filter packets and group a subset of flows. This list logically

customizes to the Filter operation of the BroadScan pipeline. It should be

noted that the selected Fields for grouping a subset of flows in the selectors

list must be unique per query if multiple queries are to be executed in

parallel.

– group by(): Takes a list of Fields as its argument with optional masks to

define flows, e.g., source IP and destination IP pairs, or the five tuples source

IP, destination IP, source port, destination port, and protocol. This list maps

to the Learn and Track operations of the BroadScan pipeline.

– process(): Takes a list of Reducers along with the optional Fields and

Conditions as its argument, e.g., keep track of the maximum value of

TTL per flow. This list configures the Store Flow parameters and Update

operations of the BroadScan pipeline.

26


– export(): Takes a list of Exporters as its argument, which can either be

Periodic or Threshold based. This list defines the export condition of flows,

mapping to the Export operation of the BroadScan pipeline.

A few simple examples of how a user can define a query are demonstrated

below:

– Define flows as packets with the same source IP and destination IP, and

report the number of bytes per flow if they exceed 500:

q = Query(q_name) \
.group_by([ipv4_src(), ipv4_dst()]) \
.process(byte_count()) \
.export(threshold(‘reducer_0’, 1000))

– Count the total number of packets sent to each special destination port (¡

1024) and report them every second:

q = Query(q_name) \
.select([protocol(6), dst_port(val=0x0000, mask=0xFC00)]) \
.group_by([dst_port(mask=0xFFFF)]) \
.process(count()) \
.export(periodic(1))

– Report maximum IPG per five tuple flow every 5 seconds:

q = Query(q_name) \
.group_by([

ipv4_src(mask=0xFFFFFFFF),
ipv4_dst(mask=0xFFFFFFFF),
src_port(mask=0xFFFF),
dst_port(mask=0xFFFF),
protocol(mask=0xFF),
tcp_flags(mask=0xFF)

]) \
.process([max_value(ipg())]) \
.export(periodic(5))

3.2.2 Visualization.

Users can use visualization tools with IRIS to better present the post-processed

query results. This can be achieved in two ways:

27


– VIZ: This is IRIS’s simple visualization tool developed using the Dash

and Plotly libraries in Python that has a few common types of plots pre-

implemented. To use this tool, the user needs to create a multiprocessing

manager and a new process. VIZ’s start() function should be the new

process’s target, and the query name, a new manager event, and a manager

list must also be passed as arguments. VIZ will then be hosted on localhost

on the port that it prints for the user, ready to be used.

– Direct implementation: The user can also define custom plots and figures by

directly using libraries such as matplotlib or Plotly.

3.3 Controller

The controller’s main responsibility is to take the queries the user defines

through the interactive Python interface, create a JSON representation of the

queries with the necessary metadata, and send them to the switch. It also offers

features such as updating a query running on the switch, fine-tuning a the epoch

length of a query, or stopping a query and clearing the switch’s configurations. In

the rest of this section, a JSON representation of a query is simply referred to as

query for better readability.

An example of the process in which the controller takes a query defined

through the high-level Python interface and translates it into its JSON

representation is shown in Figure 5. The query used in this example simply

calculates maximum inter-packet gap (IPG) for each 4-tuple TCP flow (source IP

address, destination IP address, source port, and destination port) per 1-second

epochs.

3.3.1 Running Queries.

28


Figure 5. Translation of a Query to its JSON Representation

The run() method of the Switch class instance can be used to send a query to the

agent on the switch. This method takes a query or a list of queries as a parameter,

along with an optional integer to specify how many epochs the query should

be executed for. It then converts the queries into dictionaries and creates a list

containing all the dictionary representations of the queries. In addition to that, it

creates a new dictionary with metadata key and value pairs, including the reset

type. There are two reset types available with IRIS:

– Full reset: When this type of resetting is selected, the agent on the switch

clears all control memories of the BroadScan module as well as the flow table

and flow parameters.

– Partial reset: In this reset mode, the agent only updates the specific control

memories of BroadScan based on the query it receives but still clears the

flow table and flow parameters. This mode should only be used if the user

is familiar with BroadScan’s hardware details and understands the impact of

sending a partial update of a query.

For the run method, the reset mode is always selected as full reset.

Afterward, this method creates a JSON object containing the list of queries

and metadata.

29


The next step involves preparing the collector by creating a new IPFIX

parser based on the queries to define the expected results’ format, setting up a

UDP socket for the collector to receive IPFIX packets from the switch, and then

starting a new process to read from the socket and collect the results, optionally for

a set number of epochs. The structure of the collector is explained in more detail in

section 3.4.

Once the collector is ready, the controller can finally send the new JSON

object with the queries and metadata to the agent. This happens through the

send queries() method of the Switch object, which creates a new TCP socket and

sends the JSON object to the agent on the switch. If it gets an OK response from

the agent, it closes the socket. Otherwise, it warns the user if the agent responded

with an error or timed out.

3.3.2 Updating Queries.

The update() method of the Switch class is very similar to the run() method

in structure. However, in addition to the query or list of queries, it can take an

argument from the user to select the reset type. Additionally, if a previous socket is

already available for receiving IPFIX results due to a previous call of the run() or

update() method, it can reuse that socket when preparing the collector.

3.3.3 Tuning Queries.

The controller also offers a tune() method in the Switch class in collaboration with

the collector. This method uses a finite state machine (FSM) to find the optimal

epoch duration for a query based on a target maximum number of records. It takes

the following arguments:

– q: The query to tune

– epoch len length: The initial epoch duration

30


– exp epoch cnt: Number of epochs executed during a single experiment (An

”experiment” refers to the execution of a query within a state of the FSM.

In other words, exp epoch cnt defines how many epochs are to be completed

while the query runs in each FSM state.)

– max target rec cnt: Maximum desired record count

– fsm type (optional): There are two modes available here:

* Quick (0): This mode is the default and aims to find a stable epoch

duration for which at least 90% of the results per experiment do not

exceed the max target rec cnt for three consecutive experiments. This

mode is useful when a user wants to determine a generally acceptable

epoch duration based on the current network conditions, and then run

the query using that epoch duration. To this end, the tune() function

uses the FSM depicted in figure 6. The state name abbreviations used in

this figure are:

· Init: Initial State

· US H: Unsafe Halve

· S I: Safe Increment

· US D: Unsafe Decrement

· IRR: Irreducible

· S: Safe

· T: Tuned

This mode starts off in a multiplicative decrease mode to quickly

approach the general vicinity of the target epoch duration and then

31


IRR

Init

US_H

S_I

US_D

S

T

OK & E < MAX

E = min(MAX, E + 0.1)
MRR & E <= MIN

MRR & E <= MIN

MRR & E == MIN

MRR & E <= MIN

MRR & E <= MIN

MRR & UNSAFE_DEC_MRR_CNT== 3 & E > MIN

UNSAFE_DEC_MRR_CNT = 0
MAX = E

E = max(MIN, E / 2)

MRR & E > MIN

MAX = E
E = max(MIN, E − 0.1)

OK

MRR & UNSAFE_DEC_MRR_CNT< 3 & E > MIN

UNSAFE_DEC_MRR_CNT += 1
MAX = E

E = max(MIN, E − 0.1)

MRR & E > MIN

SAFE_OK_CNT = 0
MAX = E

E = max(MIN, E − 0.1)

OK & SAFE_OK_CNT == 3

OK

UNSAFE_DEC_MRR_CNT = 0

OK & SAFE_OK_CNT < 3

SAFE_OK_CNT += 1

OK & E == MAX

MRR & E > MIN

E = max(MIN, E / 2)

MRR & E > MIN

E = max(MIN, E / 2)

OK & E < MAX

E = min(MAX, E + 0.1)

Figure 6. Quick Tuning FSM
State name abbreviations:

32


transitions to an incremental decrease/additive increase mode to fine-

tune it. If more than 10% of the results in an experiment exceed the

max target rec cnt at any state of the FSM, the epoch length at that

time is set as the upper threshold, and the epoch duration will not be

increased beyond that value.

* Continuous (1): This mode continuously runs the query and aims to

find the maximum epoch duration that at least 90% of the results per

experiment do not exceed the max target rec cnt. This mode uses the

FSM portrayed in figure 7. The state name abbreviations used in this

figure are:

· Init: Initial State

· US H: Unsafe Halve

· S I: Safe Increment

· US D: Unsafe Decrement

· S D: Safe Double

· IRR: Irreducible

· S: Safe

This mode starts off in a multiplicative increase/decrease mode to get

to the general vicinity of the target epoch duration faster, and then, like

the quick mode, transitions to an incremental decrease/additive increase

mode to fine-tune it. However, unlike the quick mode, it does not set

an upper threshold on the epoch duration as network conditions may

change and a longer epoch duration may result in an acceptable number

of records.

33


IRR

Init

US_H

S_I

US_D

S

S_D

MRR & E > MIN

E = max(MIN, E / 2)

OK 

E = E + 0.1

MRR & E <= MIN

MRR & E <= MIN

MRR & E == MIN

MRR & E <= MIN

MRR & E <= MIN

MRR & E <= MIN

MRR & UNSAFE_DEC_MRR_CNT == 3 & E > MIN

UNSAFE_DEC_MRR_CNT= 0
E = max(MIN, E / 2)

MRR

MRR & E > MIN

E = max(MIN, E − 0.1)

MRR & UNSAFE_DEC_MRR_CNT < 3 & E > MIN

UNSAFE_DEC_MRR_CNT += 1
E = max(MIN, E − 0.1)

MRR & E > MIN

SAFE_OK_CNT = 0
E = max(MIN, E − 0.1)

OK

UNSAFE_DEC_MRR_CNT = 0

OK

E = E * 2

OK

E = E * 2

OK & SAFE_OK_CNT < 3

SAFE_OK_CNT += 1

OK

OK & SAFE_OK_CNT == 3

SAFE_OK_CNT = 0
E = E + 0.1

OK 

E = E + 0.1

MRR & E > MIN

E = max(MIN, E / 2)

MRR & E > MIN

E = max(MIN, E / 2)

Figure 7. Continuous Tuning FSM

– tune dur (optional): How long to run the tuning process. There is no

tuning duration set by default.

34


– min epoch dur (optional): This is the minimum accepted epoch duration,

which is 0.3 seconds by default.

The tune() function creates an instance of the Q Tune FSM class based

on epoch len and fsm type, and runs the query afterward. While tuning is not

complete (tune dur is not completed, or the FSM has yet to either converge to

a final epoch duration or declare tuning impossible within the bounded range in

quick mode), the collector collects the results of an experiment and counts the

percentages of epochs in the experiment in which the number of records exceeded

the max target rec cnt. If this percentage is higher than 10%, it sends an “MRR”

signal to the FSM. Otherwise, it sends an “OK” signal. Upon receiving the signal,

the FSM moves to the next state and updates its internal epoch value accordingly.

After that, the tune() function retrieves the updated epoch value from the FSM

and uses the update() function of the Switch class to adjust the epoch length of the

periodic exporter for the query and sends the updated value to the switch. Once

tuning is complete (if not set to run indefinitely in continuous mode), it returns the

final epoch value or declares that tuning failed if it could not determine an epoch

duration where the max target rec cnt is not exceeded.

3.3.4 Stopping Queries and Clearing Switch Configurations.

Users can call the stop() method of the Switch instance to reset its state and to

command the Agent on the switch to stop all currently running queries and clear

all control memories as well as the flow table and flow parameters.

This method also aggregates all the statistical metadata collected during the

query runs and reports them, including details such as the cumulative number of

IPFIX packets, the number of lost IPFIX records, and timing statistics.

35


3.4 Collector

The collector’s responsibility is receiving all the exported IPFIX records

from the switch, parsing them, performing post-processing on them based on

the user’s custom post-processing function, and reporting the results to the user

through the Python interface as a Pandas data frame. With the possibly thousands

of IPFIX packets exported in each burst, each packet containing up to 64 IPFIX

records with the current BroadScan configurations, there are several potential

sources of performance concerns and bottlenecks in the collector regarding IPFIX

records:

– Packet/record loss: How many records are lost due to IPFIX packet loss?

– Receiving time: How long it takes to receive a burst of IPFIX packets from

BroadScan?

– Parsing time: How long it takes to extract the IPFIX records from the raw

byte stream of IPFIX packets and format them?

– Post-processing time: How long it takes to execute the custom post-

processing function provided by the user?

Motivated by these questions and too ensure the collector operates

efficiently, several key factors had to be considered in its design:

– Slow performance of pure Python: Although convenient and easy to

use, Python is relatively slower in performance compared to more low-level

languages such as C. Therefore, to operate more efficiently where speed is

critical while maintaining compatibility with the Python components, Cython

is a useful tool. In the IRIS collector, the parsing of IPFIX records, observed

to be the main bottleneck, is implemented in Cython for better efficiency.

36


– Limited UDP socket buffer: If packets are not received quickly enough

from the socket buffer, it will fill up, resulting in packet loss. Python UDP

sockets can read from the buffer fast enough to prevent this. However, if the

system waits to parse each burst before reading the next batch, it will not

be able to keep up with the speed of the incoming IPFIX packets from the

switch, resulting in lost records.

To ensure all packets can be received, a separate process called the IPFIX

Receiving Process (IRP) is initialized after the socket is prepared for the

connection to the switch. A multiprocessing queue is also instantiated to

allow the IRP and the main process to exchange data. This occurs right

before sending a query to the switch in the main process, and the IRP begins

reading from the buffer continuously. Every time that the IRP gets a burst

of IPFIX packets, it adds it to the back of the queue along with the relevant

metadata (e.g., epoch number if the query is with an epoch-based export).

While the IRP collects the IPFIX packets, the main process is then able to

read from the multiprocessing queue, parse the results based on the queries

that were sent to the switch, and then pass them as Pandas data frames

to the post-processor unit for any custom post-processing function. Figure

8 depicts the structure of the collector and how its different components

communicate.

– Custom Data Post-Processing: The user is able to define a custom post-

processing function for a query, allowing for variable overhead depending on

the query. To facilitate this customization, a PostProcessor class is defined.

The class constructor takes the switch instance, a custom post-processing

function, and an initial state (which is empty by default) as inputs. This

37


Collector

Main Process Post-ProcessorIPFIX Receiving Process

Multiprocessing Queue

Burst Byte Stream + Metadata

Records

IPFIX
Packets

Results
Query
Parser

Custom Post-processing
function

Figure 8. IRIS Collector

class has a get next() method that takes the next sequence of IPFIX records

from the multiprocessing queue of the switch and applies the custom post-

processing function to it, which must return the updated state.

It should be noted that this customization also results in variable post-

processing overhead completely dependent on the query and the user’s self-

defined post-processing function. It is advised that precautions are taken to

optimize this function to ensure acceptable performance.

3.5 Adding New Capabilities

To add a new capability to IRIS, the following modifications must happen:

– The Python interface: needs to be updated, either by creating a new API

or modifying an existing one, to enable the user to interact with the new

capability.

– The Controller must be modified to generate a JSON representation of the

new capability from the user-defined query.

– The Agent should be modified in two aspects:

38


1. The JSON server should be able to parse the JSON representation of the

query with the added capability

2. The agent should then be modified to configure BroadScan to perform

the new capability

– Based on how BroadScan outputs the results with the added capability, the

controller should modify the IPFIX parser to support the new capability,

and pass it to the collector when a query is generated

3.6 Available Switch Layout

The ONRG has two distinct deployments of the IRIS system: one in an

isolated lab environment and the other co-located with the network infrastructure

of the University of Oregon. Both deployments use Trident3 switches with the

BCM56470 SVK which contains the BroadScan module.

1. Border deployment: The border switch in this setting gets a mirror feed

of all traffic traversing the border of the campus of the University of Oregon.

While it is not used for actual switching and forwarding operations, it can

be used to monitor campus traffic via its BroadScan module. In this setting,

the remote server is deployed on a virtual machine (VM), which introduces

the overhead of shared resources. Moreover, instead of a 40Gb ethenter, the

remote server and the border switch communicate through a VLAN. Figure 9

shows the border setup of IRIS.

2. Lab deployment: The lab switch in this setting is completely isolated and

has no real traffic running through it. This setting is typically used for testing

purposes with synthtic traffic, which can be generated with libraries such

as Scapy, could be a replay of packet traces with tools such as tcpreplay, or

39


IRIS
Border Switch

Remote Server VM

Mirror Feed

UO Campus Switch
Network

Hosts

Internet

VLAN

Figure 9. Border Deployment of IRIS

could be a mirror feed from another source. The programmable lab switch is

connected to the local network through a bottom-of-the-rack switch, and an

additional 40Gbps link directly connects it to the remote server to allow for

optimal speed when sending queries and receiving exported IPFIX records.

This link can be used to send the synthetic traffic to the switch as well. This

switch does not perform any actual forwarding action. In addition, the remote

server has a bare-metal setup, allowing maximum utilization of its resources.

Figure 10 demonstrates this setup.

Each switch executes a separate instance of IRIS and is managed

independently. The lab switch is mostly used for testing purposes with synthetic

traffic during software development. The border switch, on the other hand, can be

40


IRIS
Lab Switch

Remote Server

40 Gb Link

Synthetic
Traffic

Figure 10. Lab Deployment of IRIS

used for feasibility checks during the software development process as well as for

actual telemetry purposes.

3.7 Required System Configurations

To ensure IRIS can function to the best of its abilities, there are certain

environment factors that have to be considered:

– Interface MTU: The IPFIX packets received by the collector from

BroadScan can contain up to 64 records, each record 128 bytes long, as

configured in the current IRIS settings. Including the added headers, the

IPv4 packets carrying these IPFIX packets can be as large as 8250 bytes. If

the MTU is smaller than the size required to accommodate these packets,

they will not be delivered to the collector. Therefore, all interfaces between

BroadScan and the collector must have their MTU set to a value large enough

to ensure these packets can be delivered successfully.

– UDP socket receive buffer: Given the fast speed of the IRP and the

slower speeds of the parser and post-processor described in section 3.4, it is

essential to set the UDP socket receive buffer size to a sufficiently large value.

41


This ensures that the parser and post-processor have enough time to read the

packets and prevents them from being dropped. The UDP receive buffer size

of the remote servers is currently at 39321600 bytes.

– Remote server and switch variables: To enable communication between

the remote server and the switch for sending and receiving queries and IPFIX

packets, details such as their IP addresses, MAC addresses, designated ports,

and other relevant information must be provided for each new environment.

This is done by creating a binary file containing these details and placing a

copy on both the SVK and the remote server.

42


CHAPTER IV

EVALUATION

This section aims to evaluate IRIS’s performance in multiple ways. First,

its ability to configure BroadScan to its maximum capacity is tested. Afterward,

a series of experiments are performed to show IRIS’s collector performance, as it

contains a few bottlenecks. Lastly, a few telemetry use cases are deployed on IRIS

to show its potential as a telemetry solution.

4.1 BroadScan Flow Table Capacity

IRIS should allow users to use BroadScan to monitor the maximum number

of flows it can track, which is 262143, fill the entire width of the flow table (256 bits

of data to update), and collect all results that the switch exports.

For filling all available entries in the flow table, the following query can be

used with custom synthetic traffic sent to the isolated lab switch: The following

query tracks all 5-tuple flows, and every five seconds reports the number of packets

per flow.

q = Query(q_name) \
.group_by([

ipv4_src(mask=0xFFFFFFFF),
ipv4_dst(mask=0xFFFFFFFF),
src_port(mask=0xFFFF),
dst_port(mask=0xFFFF),
protocol(mask=0xFF)

]) \
.process([count()]) \
.export(periodic(5))

To assess if the maximum number of flows can be tracked, synthetic traffic

including 280 thousand flows was sent to the switch. Given that this number is

larger than the maximum size of the flow table, the switch should report the packet

count for the 262143 flows it can track per epoch.

43


The above query was executed over 20 five-second epochs, and the expected

262143 flows were observed to be reported every epoch.

IRIS should also be able to track all the reducers its can to fill the width of

the data section of the flow table. Since the data section of the flow table is 256

bits wide, it should be able to fit 8 32-bit reducers. The following query, targeted

towards the size of the flow packets, can be used to evaluate the feasibility of this:

q = Query(q_name) \
.group_by([

ipv4_src(mask=0xFFFFFFFF),
ipv4_dst(mask=0xFFFFFFFF),
src_port(mask=0xFFFF),
dst_port(mask=0xFFFF),
protocol(mask=0xFF)]) \

.process([
count(),
count(ipv4_len().in_range(100, 1000)),
byte_count(),
min_value(ipv4_len()),
ewma_value(ipv4_len()),
max_value(ipv4_len()),
latest_value(ipv4_len()),
add_value(ipv4_len())]) \

.export(periodic(EPOCH_LEN))

For evaluation purposes, synthetic traffic consisting of packets from a single

flow, with payload sizes drawn from a normal distribution, was sent for twenty

epochs to the lab switch to monitor whether all reducers could effectively track the

defined byte size statistics. Results presented in Figure 11 show IRIS can configure

BroadScan to track the maximum number of reducers it can fit in the flow table,

receive exported records, and produce results.

It should be mentioned that IRIS is able to use all flow table entries with

reducers filling the flow table to the maximum width at the same time as well.

4.2 Collector Performance Analysis

To evaluate the performance of IRIS’s collector, which contains the majority

of IRIS’s bottlenecks, this section presents statistics on IPFIX record collection

44


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Epoch

0

10000

20000

30000

40000

50000

B
yt

es

Total Bytes
Min IPv4 Header Length
EWMA IPv4 Header Length
Max IPv4 Header Length
Latest Packet IPv4 Header Length
Cumulative IPv4 Header Length
Total Packet Count
Count of Packets with 100 to 1000 IPv4 Header Bytes

25

30

35

40

45

50

C
ou

nt

Figure 11. Flow Byte Count and Packet Count Analysis

time, IPFIX record loss percentage, and post-processing time. These statistics

are automatically presented by IRIS after each stop of the Switch instance. The

query selected for these assessment is a simple one that reports the packet count

per five-tuple flow every five seconds with minimal post-processing, with the goal

of demonstrating the actual overhead of IRIS without the influence of intensive

custom post-processing methods. The query used on IRIS for these assessments is

presented here.

q = Query(q_name) \
.group_by([

ipv4_src(mask=0xFFFFFFFF),
ipv4_dst(mask=0xFFFFFFFF),
src_port(mask=0xFFFF),
dst_port(mask=0xFFFF),
protocol(mask=0xFF)

]) \
.process([count()]) \
.export(periodic(EPOCH_LEN))

45


To have a controlled assessment of IRIS with ground truth available for

comparison, synthetic traffic is used on the lab switch. The traffic is designed to

have a set number of flows, ranging from 20k to 280k, with 20k increments per

experiment. For each experiment, the target number of distinct flows is created

using the Scapy library in Python, and the flows are then exported as pcap files,

prepared to be sent to the switch using the tcpreplay tool. On the lab switch, the

query is executed for ten epochs per experiment.

To assess IRIS’s performance in a real-world deployment, an experiment is

also conducted border switch. The traffic in this experiment is real traffic fed to

the switch from the border of the University of Oregon and cannot be controlled.

For that reason, the query is simply executed for 100 consecutive epochs instead to

observe results.

4.2.1 IPFIX Record Collection Time. The IPFIX record collection

time refers to the time it takes IRIS to read all exported records from the IPFIX

receiver socket by the controller. To asses the performance of this module in the

collector, the IPFIX record collection time is measured and reported based on the

total number of records per epoch.

The first experiment for measuring IPFIX record collection time was carried

out on the lab switch. The query was executed over ten epochs for each variable

number of records to obtain a more accurate measurement. Figure 12 illustrates the

results of this experiment.

The second experiment was conducted on the border switch using the same

query except the query had IPv4 length added to its groupers to increase number

of flows stored in the flow table, and the query was executed for 100 epochs. Figure

13 shows the results of the experiment on the border switch.

46


0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0.00

0.02

0.04

0.06

0.08

0.10

0.12

IP
FI

X 
R

ec
ei

vi
ng

 T
im

e 
(s

)

Figure 12. Lab Switch IPFIX Record
Collection Time Based on Record
Count

0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0.00

0.02

0.04

0.06

0.08

0.10

0.12

IP
FI

X 
R

ec
ei

vi
ng

 T
im

e 
(s

)
Figure 13. Border Switch IPFIX Record
Collection Time Based on Record
Count

The figure suggests that the duration of the record collection is, although

typically increasing, rather variable even for the same number of records. This can

be due to the burstiness of IPFIX packet receiving. However, it is observable that

even in the worst-case scenario in this experiment, when the lab switch receives

the maximum possible number of flows flow table, IRIS takes just under 130

milliseconds to collect all IPFIX records, showcasing its efficiency.

4.2.2 IPFIX Record Loss Percentage. IPFIX packet loss in IRIS

can occur due to reasons such as congestion on the network. In the current lab

switch setup, where the switch is connected to a dedicated 40Gbps link, packet loss

is extremely rare and is therefore not reported. However, packet loss is possible

on the border switch. Figure 14 shows experiment results, and the percentage

of IPFIX record loss across all records per epoch, while Figure 15 illustrates the

record loss percentage based on tital record count.

47


0 25 50 75 10
0

Epoch

0

20

40

60

80

100

IP
FI

X 
R

ec
or

d 
Lo

ss
 (%

)

Loss Percentage
Average Loss: 10.04%

Figure 14. Border Switch IPFIX
Record Loss Percentage Per Epoch

0
17

50
0

35
00

0
52

50
0

70
00

0

IPFIX Record Count

0

20

40

60

80

100

IP
FI

X 
R

ec
or

d 
Lo

ss
 (%

)

Average Loss: 10.04%

Figure 15. Border Switch IPFIX Record
Loss Percentage Based on Record Count

This record loss is most likely due to the VLAN settings on the border vs

the direct and dedicated 40Gb link setting in the lab.

4.2.3 IPFIX Record Parsing Time. With IRIS’s settings,

an IPFIX packet can contain a maximum of 64 records, and the triggering of

a periodic or threshold-based export on the switch may result in thousands

of IPFIX packets being sent to the collector. While the previous experiment

demonstrated that receiving these records from the IPFIX receiver socket buffer

is not particularly time-consuming, the parsing process involves extracting the

records from all the collected IPFIX packets and formatting them according to the

structure of the sent queries for each query, which although implemented in Cython

for better performance, can take significant time.

To evaluate IRIS’s performance during the parsing process, the parsing time

of IPFIX records was measured while executing the query on the lab switch. Figure

48


16 illustrates how the parsing time increases with the added number of records per

epoch. The figure suggests that the parsing time for records is very dependent on

the number of records, and can accumulate to significant amounts, which could

cause IRIS to fall behind in scenarios where the flow table is nearly full, especially

if the epoch length is shorter than five seconds.

Figure 17 presents the parsing time of the query when executed on the

border switch. The figure shows a pattern of parsing time increasing as the record

count grows, similar to the lab switch . Additionally, within the observed range of

record counts, the parsing time on the border switch appears consistent with the

parsing time measured on the lab switch for similar number of records.

0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0

1

2

3

4

IP
FI

X 
Pa

rs
in

g 
Ti

m
e 

(s
)

Figure 16. Lab Switch IPFIX Record
Parsing Time Based on Record Count

0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0

1

2

3

4

IP
FI

X 
Pa

rs
in

g 
Ti

m
e 

(s
)

Figure 17. Border Switch IPFIX Record
Parsing Time Based on Record Count

This experiment shows that IRIS’s IPFIX parsing can take up to over 4

seconds (maximum number of entries in the flow table is 262143), meaning that the

main parsing process does not fall behind the receiving process with epoch lengths

over 5 seconds.

49


4.2.4 IPFIX Record Post-Processing Time. Post processing in

IRIS is very query dependent, as it can be fully customized by the user. For this

reason, to show the actual overhead of IRIS itself, the query selected for these

experiments has little to no post-processing involved. Therefore, running the post-

processing function here is quite quick.

To illustrate this point, the query was executed on the lab switch and

varying number of flows per second were sent as synthetic traffic to the switch, with

results present in 18, and then this experiment was repeated on the border switch,

with results demonstrated in 19.

0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0.0

0.1

0.2

0.3

0.4

IP
FI

X 
Po

st
_P

ro
ce

ss
in

g 
Ti

m
e 

(s
)

Figure 18. Lab Switch IPFIX Record
Post-processing Time Based on Record
Count

0
65

00
0

13
00

00

19
50

00

26
00

00

IPFIX Record Count

0.0

0.1

0.2

0.3

0.4
IP

FI
X 

Po
st

-P
ro

ce
ss

in
g 

Ti
m

e 
(s

)

Figure 19. Border Switch IPFIX Record
Post-processing Time Based on Record
Count

Figure 18 presents the results of this experiment on the lab switch, showing

that although dependent on the number of queries, post-processing in its simplest

form takes only a small fraction of a second to complete.

50


It can be seen in figure 19 that on the border switch, post processing for

the same number of records can vary greatly. This can be attributed to the virtual

machine (VM) deployment of the border remote server, which introduces additional

overheads compared to the bare-metal implementation of the lab remote server.

4.3 Example Use Cases

The goal of this section is to demonstrate several use cases for IRIS and

provide solutions on how to utilize IRIS to achieve those use cases.

4.3.1 Tuned Byte Count per Source and Destination IPs. This

example calculates byte count per host and destination IP pairs with tuned epoch

to monitor up to 220k pairs, reports it periodically, and visualizes the results.

It uses a simple query to count bytes and average packet count per source IP

and destination IP pairs and reports them periodically. Initially, it is quick-tuned

with an initial epoch length value of 2 seconds to an epoch length (less than or

equal to the initial 2 seconds) that would result in fewer per-epoch reports than the

target record count. After determining the appropriate epoch length, the query is

executed for 60 epochs. Post-processing for this execution involves aggregating the

results to show the success of the tuning by plotting the number of learned source

IP and destination IP pairs per epoch, which corresponds to the number of records.

The results are all visualized using VIZ.

q = Query(q_name) \
.group_by([ipv4_src(mask=0xFFFFFFFF), ipv4_dst(mask=0xFFFFFFFF)]) \
.process([byte_count()]) \
.export(periodic(EPOCH_LEN))

To test this query, it is first executed on the lab switch. The synthetic

traffic consists of packets from 270k different source IP and destination IP pairs,

continuously sent to the switch. As observable in the code, the initial epoch length

for the query is set to two seconds, and the maximum target record count is set to

51


220k. Figure 20 and 21 show the epoch tuning process and the run of the updated

query respectively.

Figure 20. Query Tuning Progress

Figure 21. Time Series of Record Count for the Executed Tuned Query

This program was then executed on the border switch. Based on the

network conditions, the target record count was modified to be 35k instead.

Moreover, the number of epochs per state was increased to 10 for more stability.

52


The results of this experiment are demonstrated in figures 22 and 23 for the epoch

tuning process and the run of the updated query respectively.

Figure 22. Query Tuning Progress on the Border Switch

Figure 23. Time Series of Record Count for the Executed Tuned Query on the
Border Switch

53


4.3.2 Top Destination Host by Prefix Zooming. IRIS provides

the flexibility to update the configurations on Broadscan on the fly. To demonstrate

this capability, this example use case presents uses a prefix zooming strategy to find

the top destination host based on byte count. Such as strategy can be useful for

networks with too many connections to keep in the flow table, as each time only a

maximum of 255 flows would need to be tracked. The query used for this use case

and its pseudo code for the zooming algorithm can be seen below.

q = Query(q_name) \
.select([ipv4_dst(str(dst_prefix))]) \
.group_by([

ipv4_dst(
mask = int(next(

dst_prefix.subnets(BITS_PER_ITERATION)
).netmask)

)
]) \
.process(byte_count()) \
.export(periodic(1))
s.run(q)

– Initialize dst prefix← 0.0.0.0/0

– Initialize BITS PER ITERATION ← 8

– Do

* Define the Query object with current dst prefix

· Select destination IPs masked based on dst prefix

· Group destination IPs masked based on dst prefix but with its

prefix length + BITS PER ITERATION

* top host ← Top destination IP with the highest byte count extracted

from the query results

* Update dst prefix to the next subnet based on the top host and:

54


· Increase the prefix length by BITS PER ITERATION

· Generate next subnet from dst prefix with the updated prefix

length

– While the prefix length of dst prefix is less than 32

– Return top host

To test the results, it was deployed on the isolated switch with synthetic

traffic consisting of packets continuously sent to 160000 different destination IP

addresses, with the ones to the destination IP 10.10.9.10 having the heaviest

payloads. With this strategy, BroadScan has to monitor and report only up to

20 records each epoch, instead of the whole 160000. This not only allows IRIS to

operate in networks that have more flows than what BroadScan’s flow table is able

to keep track of, but also reduces the collection and post processing overhead. The

progress of the top prefix selected at each step can be seen in table 1, showing that

the correct destination IP address was recognized.

epoch top prefix

0 10.0.0.0
1 10.10.0.0
2 10.10.9.0
3 10.10.9.10

Table 1. Top Destination Prefix Based on Byte Count per Epoch with the Zooming
Strategy

4.3.3 Anomaly Detection. IRIS’s Python interface allows seamless

integration with other Python libraries to build more complex programs. In this

use case, we demonstrate this capability by employing a Long Short-Term Memory

(LSTM) machine learning model, built using TensorFlow and Keras. The model

55


is trained on flow-level features: flow count, average IPG, average packet count,

average byte count, total packet count, and total byte count. IRIS generates

these same features, which are then passed to the model for anomaly detection

by calculating the reconstruction error. Since this model uses a context length of

512 and a prediction length of 196, the experiment in this use case was carried out

for 800 epochs with an epoch length of 1 second. The implemented Query and a

section of the post-processing function to generate the required features for this

experiment are presented below.

q = Query(q_name) \
.group_by([

ipv4_src(mask=0xFFFFFFFF),
ipv4_dst(mask=0xFFFFFFFF),
src_port(mask=0xFFFF),
dst_port(mask=0xFFFF),
protocol(mask=0xFF)]

) \
.process([max_value(ipg()), count(), byte_count()]) \
.export(periodic(EPOCH_LEN))

def post(state, x):
df = x[q.id]
res = df.groupby(’epoch’, as_index=False).agg(

time=(‘time’, ‘first’),
flow_cnt=(‘time’, ‘count’),
avg_ipg=(‘reducer_0’, ‘mean’),
avg_pkt_cnt=(‘reducer_1’, ‘mean’),
avg_byte_cnt=(‘reducer_2’, ‘mean’),
tot_pkt_cnt=(‘reducer_1’, ‘sum’),
tot_byte_cnt=(‘reducer_2’, ‘sum’),

).reset_index(drop=True)
state = pd.concat([state, res])
...

In the initial test on the lab switch, synthetic traffic consisting of packets

from 32k unique 5-tuple flows is sent to the switch using tcpreplay. Figure 24

shows the result of running the code described above in this scenario, and detected

anomalies are colored red.

56


50
0

55
0

60
0

65
0

70
0

75
0

80
0

Epoch

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0
M

ea
n 

E
rr

or
Mean Error Values per Epoch with Anomaly Annotations

Figure 24. Lab Switch Anomaly Detection

For the next test, the query was executed on the border switch. The rest

of the program is the same as the one shown above. Figure 25 demonstrates the

detected anomalies on campus traffic.

4.3.4 TTL Histogram. IRIS’ reducers can be used to produce

histogram of various fields. This example use case presents a query to periodically

report the histogram of TTL for TCP flows with detination port 443 (HTTPS).

histogram = [count(ttl().in_range(x, x + 31)) for x in range(0, 255, 32)]
q = Query() \

.select([protocol(6), dst_port(443)]) \

.group_by([ipv4_src(mask=0x00000000)]) \

.process(histogram) \

.export(periodic(1))
s.run(q)

The query was executed on the border switch to analyze the distribution of

TTL values in campus traffic. The resulting TTL histogram is shown in Figure 26.

57


50
0

55
0

60
0

65
0

70
0

75
0

80
0

Epoch

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0
M

ea
n 

E
rr

or
Mean Error Values per Epoch with Anomaly Annotations

Figure 25. Border Switch Anomaly Detection

The observed values align with TTL values typically seen in networks due to the

default TTL settings used by Linux and Windows hosts.

58


ttl
_0

_3
1

ttl
_3

2_
63

ttl
_6

4_
95

ttl
_9

6_
12

7

ttl
_1

28
_1

59

ttl
_1

60
_1

91

ttl
_1

92
_2

23

ttl
_2

24
_2

55

TTL Range

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Pa
ck

et
 C

ou
nt

1e6

389

3345347

527

499685

499 741 51 3725

TTL Histogram

Figure 26. Border Switch TTL Histogram

59


CHAPTER V

CONCLUSION

Current telemetry solutions, while effective, are often limited as they either

focus on specific aspects rather than providing a complete end-to-end solution,

require extensive knowledge of the underlying technology and programming of

dataplanes, rely on P4-based implementations that are not deployable on Broadcom

ASICs, or do not provide real-world deployments.

To address this gap, this work introduces IRIS—A ready-to-use platform for

interactive and dynamic programmable telemetry, which is system built on top of

the BroadScan module in Broadcom ASICs. IRIS allows users to define high-level

switch configurations, translates them into actual hardware configurations on the

ASIC by directly managing register and memory values, and collects results from

BroadScan. Evaluations show that IRIS can provide real-time results to custom

queries of users with epoch durations over 5 seconds.

5.1 Future Work

IRIS can be improved in several aspects:

– Expanding the list of supported BroadScan features in the interface

– Adding security features to the IRIS and using a more secure method of

communication between the remote server and the agent

– Adding multi-client support to the agent

– Improving performance for parsing IPFIX records

– Improving the border switch configurations to minimize IPFIX record loss

60


REFERENCES CITED

[1] Wireshark. https://www.wireshark.org/.

[2] Broadcom. Openbcm broadcom core switch software development kit (sdk).
https://github.com/Broadcom-Network-Switching-Software/OpenBCM.

[3] Broadcom. Strataxgs switch solutions. https://www.broadcom.com/
products/ethernet-connectivity/switching/strataxgs.

[4] Chen, X., Landau-Feibish, S., Braverman, M., and Rexford, J.
Beaucoup: Answering many network traffic queries, one memory update at a
time. In Proceedings of the Annual conference of the ACM Special Interest
Group on Data Communication on the applications, technologies,
architectures, and protocols for computer communication (2020),
pp. 226–239.

[5] Claise, B. Cisco systems netflow services export version 9. Tech. rep., 2004.

[6] Gupta, A., Harrison, R., Canini, M., Feamster, N., Rexford, J., and
Willinger, W. Sonata: Query-driven streaming network telemetry. In
Proceedings of the 2018 conference of the ACM special interest group on data
communication (2018), pp. 357–371.

[7] Harrington, D., Wijnen, B., and Presuhn, R. An Architecture for
Describing Simple Network Management Protocol (SNMP) Management
Frameworks. RFC 3411, Dec. 2002.

[8] Huang, Q., Sheng, S., Chen, X., Bao, Y., Zhang, R., Xu, Y., and
Zhang, G. Toward {Nearly-Zero-Error} sketching via compressive sensing.
In 18th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 21) (2021), pp. 1027–1044.

[9] Huang, Q., Sun, H., Lee, P. P., Bai, W., Zhu, F., and Bao, Y.
Omnimon: Re-architecting network telemetry with resource efficiency and
full accuracy. In Proceedings of the Annual conference of the ACM Special
Interest Group on Data Communication on the applications, technologies,
architectures, and protocols for computer communication (2020),
pp. 404–421.

[10] Liu, Z., Mah, B., Kumar, Y., Guok, C., and Cziva, R. Programmable
per-packet network telemetry: From wire to kafka at scale. In Proceedings of
the 2021 on Systems and Network Telemetry and Analytics. 2020, pp. 33–36.

61

https://www.wireshark.org/
https://github.com/Broadcom-Network-Switching-Software/OpenBCM
https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs
https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs


[11] Michel, O., Bifulco, R., Retvari, G., and Schmid, S. The
programmable data plane: Abstractions, architectures, algorithms, and
applications. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–36.

[12] Misa, C., Durairajan, R., Rejaie, R., and Willinger, W. Dynatos +:
A network telemetry system for dynamic traffic and query workloads.
IEEE/ACM Transactions on Networking (2024).

[13] Misa, C., O’Connor, W., Durairajan, R., Rejaie, R., and Willinger,
W. Dynamic scheduling of approximate telemetry queries. In 19th USENIX
Symposium on Networked Systems Design and Implementation (NSDI 22)
(2022), pp. 701–717.

[14] Narayana, S., Sivaraman, A., Nathan, V., Goyal, P., Arun, V.,
Alizadeh, M., Jeyakumar, V., and Kim, C. Language-directed
hardware design for network performance monitoring. In Proceedings of the
conference of the ACM special interest group on data communication (2017),
pp. 85–98.

[15] Ponomarev, S., and Atkison, T. Industrial control system network
intrusion detection by telemetry analysis. IEEE Transactions on Dependable
and Secure Computing 13, 2 (2015), 252–260.

[16] Ponomarev, S., Wallace, N., and Atkison, T. Detection of ssh host
spoofing in control systems through network telemetry analysis. In
Proceedings of the 9th Annual Cyber and Information Security Research
Conference (2014), pp. 21–24.

[17] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and Wang, A.
Network Telemetry Framework. RFC 9232, May 2022.

[18] Sun, H., Huang, Q., Lee, P. P., Bai, W., Zhu, F., and Bao, Y.
Distributed network telemetry with resource efficiency and full accuracy.
IEEE/ACM Transactions on Networking 32, 3 (2024), 1857–1872.

[19] Zhou, Y., Sun, C., Liu, H. H., Miao, R., Bai, S., Li, B., Zheng, Z.,
Zhu, L., Shen, Z., Xi, Y., et al. Flow event telemetry on programmable
data plane. In Proceedings of the Annual conference of the ACM Special
Interest Group on Data Communication on the applications, technologies,
architectures, and protocols for computer communication (2020), pp. 76–89.

62


	 Introduction 
	 Background 
	 IRIS 
	IRIS Architecture
	BroadScan
	Agent
	Remote Server

	Python Interface
	Query Definition
	Visualization

	Controller
	Running Queries
	Updating Queries
	Tuning Queries
	Stopping Queries and Clearing Switch Configurations

	Collector
	Adding New Capabilities
	Available Switch Layout
	Required System Configurations

	 Evaluation 
	BroadScan Flow Table Capacity
	Collector Performance Analysis
	IPFIX Record Collection Time
	IPFIX Record Loss Percentage
	IPFIX Record Parsing Time
	IPFIX Record Post-Processing Time

	Example Use Cases
	Tuned Byte Count per Source and Destination IPs
	Top Destination Host by Prefix Zooming
	Anomaly Detection
	TTL Histogram


	 Conclusion 
	Future Work

	REFERENCES CITED