ON THE MULTI-FRACTAL NATURE OF OBSERVED IP ADDRESSES IN MEASURED INTERNET TRAFFIC by WALTON G. O’CONNOR A THESIS Presented to the Department of Computer Science and the Division of Graduate Studies of the University of Oregon in partial fulfillment of the requirements for the degree of Master of Science March 2023 THESIS APPROVAL PAGE Student: Walton G. O’Connor Title: On the Multi-Fractal Nature of Observed IP Addresses in Measured Internet Traffic This thesis has been accepted and approved in partial fulfillment of the requirements for the Master of Science degree in the Department of Computer Science by: Reza Rejaie Chair Ram Durairajan Member and Krista Chronister Vice Provost for Graduate Studies Original approval signatures are on file with the University of Oregon Division of Graduate Studies. Degree awarded March 2023 ii © 2023 Walton G. O’Connor All rights reserved. iii THESIS ABSTRACT Walton G. O’Connor Master of Science Department of Computer Science March 2023 Title: On the Multi-Fractal Nature of Observed IP Addresses in Measured Internet Traffic We examine the presence of multifractal properties in the spatial structure of observed IPv4 addresses in measured Internet traffic. A collection of traffic samples from a variety of network settings are assembled and their spatial structures evaluated for multifractal properties using the method of moments approach. We show that all collected traces have properties consistent with multifractal scaling, but that the scaling behaviors vary by trace. We propose mechanisms which may give rise to these behaviors, and then discuss a number of ways by which our empirical finding concerning the spatial structure of observed IP addresses in measured network traffic can be utilized in practice, including its use in modern dataplane network monitor settings, both as a metric to monitor and as a means to increase hardware utilization efficiency. iv ACKNOWLEDGEMENTS I am deeply indebted to Dr. Reza Rejaie and Chris Misa for patiently mentoring me and putting up with my antics for the better part of four years. I have grown as a programmer, an academic, and most importantly as a person under your guidance. Additionally, this would not have been possible without Dr. Walter Willinger, who took an inordinate amount of time out of his life to get the ball rolling on this project and to make sure I was applying mathematics correctly. Finally I would like to thank Dr. Arpit Gupta and his students at UCSB, who kindly took the time to provide access to their network traffic for this project. v TABLE OF CONTENTS Chapter Page I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Traffic of Interest: Definition and Examples . . . . . . . . . . . . 2 1.1.1. Examples of Traffic of Interest . . . . . . . . . . . . . . 2 1.2. Utility of ToIs to Telemetry . . . . . . . . . . . . . . . . . . . 3 1.2.1. Background on Switch Based Telemetry Systems . . . . . . 4 1.2.2. The Case for Leveraging Address Distributions for the Dataplane . . . . . . . . . . . . . . . . . . . . 5 II. CONTRIBUTION . . . . . . . . . . . . . . . . . . . . . . . . 6 III. DATASETS . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1. Collected Traffic Traces . . . . . . . . . . . . . . . . . . . . 9 IV. METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1. Technique to Detect Multifractal Properties . . . . . . . . . . . . 12 4.1.1. Multiresolution Quantities . . . . . . . . . . . . . . . . 12 4.1.2. Structure Function . . . . . . . . . . . . . . . . . . . 13 4.1.3. Partition Function . . . . . . . . . . . . . . . . . . . 13 4.1.4. Final Analysis . . . . . . . . . . . . . . . . . . . . . 14 V. RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.1. An Example of Nonfractal Data . . . . . . . . . . . . . . . . . 15 5.2. Multifractal Analysis of a Representative Sample . . . . . . . . . 18 5.3. Aggregate Analysis . . . . . . . . . . . . . . . . . . . . . . 19 5.4. Examination of Method of Moments . . . . . . . . . . . . . . . 21 5.4.1. Structure Function . . . . . . . . . . . . . . . . . . . 21 5.4.2. Partition Function . . . . . . . . . . . . . . . . . . . 22 vi Chapter Page 5.4.3. Interpretation of Moments . . . . . . . . . . . . . . . . 22 5.4.3.1. Heatmaps . . . . . . . . . . . . . . . . . . . 25 5.4.4. Influence of Sparsely Populated Subnets on Negative Moments . . . . . . . . . . . . . . . . . . . 25 VI. PHYSICAL EXPLANATION . . . . . . . . . . . . . . . . . . . 28 6.1. Preliminary Findings . . . . . . . . . . . . . . . . . . . . . 28 6.1.1. Physical Interpretation . . . . . . . . . . . . . . . . . 28 6.1.2. Level Weights . . . . . . . . . . . . . . . . . . . . . 30 6.2. Generative Cascade Model . . . . . . . . . . . . . . . . . . . 31 6.2.1. Cascade Model Results . . . . . . . . . . . . . . . . . 32 6.2.2. Cascade Model Limitations . . . . . . . . . . . . . . . . 34 6.2.3. Beta Regression . . . . . . . . . . . . . . . . . . . . . 35 VII. USE CASES FOR MULTIFRACTAL ADDRESS STRUCTURE . . . . . . . . . . . . . . . . . . . . . . . . . . 36 7.1. Clustered Prefix Tracking . . . . . . . . . . . . . . . . . . . 36 7.2. Spectral Signature . . . . . . . . . . . . . . . . . . . . . . . 39 7.3. Implementation and Viability of Deployment to Dataplane . . . . . 40 VIII.DISCUSSION AND OUTLOOK . . . . . . . . . . . . . . . . . . 42 8.1. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 43 IX. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . 46 REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . 47 vii LIST OF FIGURES Figure Page 1. UO-20220517 structure function graph . . . . . . . . . . . . . . . 13 2. UO-20220517 partition function graph . . . . . . . . . . . . . . . 14 3. Structure and partition function graphs for selected datasets . . . . . . 16 4. Graph of the structure and partition function for 100K IP addresses generated from uniformly random integers (i.e. nonmultifractal data). Note that the partition function is computed between /0 and /16 instead of /8 and /24 . . . . . . . . . 17 5. Graph of the structure and partition function for the first 100K IPs of the MAWI-20210110 trace . . . . . . . . . . . . . . . 18 6. CDF of slopes of partition function for q ∈ [−4,−1] . . . . . . . . . 20 7. CDF of slopes of partition function for q ∈ [0, 4] . . . . . . . . . . . 20 8. CDF of ratio of slope of “bent part” over slope of “flat part” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 9. Heatmap showing the portion of addresses in /8 subnets for the CAIDA-dir-A trace . . . . . . . . . . . . . . . . . . . . . . 25 10. Heatmap showing the portion of addresses in /8 subnets for the MAWI-20210110 trace . . . . . . . . . . . . . . . . . . . . 25 11. Level weights for CAIDA-dir-A . . . . . . . . . . . . . . . . . . 29 12. Level weights for UCSB-20220921 . . . . . . . . . . . . . . . . . 29 13. Level weights for MAWI-20210110 . . . . . . . . . . . . . . . . . 29 14. Level weights for cascade model synthetic data. a = 2, b = 2 . . . . . . 33 15. Structure function for 100K synthetic IPs (a=2, b=2) . . . . . . . . . 33 16. Structure function for 100K synthetic IPs (a=3, b=3) . . . . . . . . . 33 17. Structure function for 100K synthetic IPs (a=0.5, b=0.5) . . . . . . . 33 viii Figure Page 18. Partition function for 100K synthetic IPs (a=2, b=2) . . . . . . . . . 33 19. Partition function for 100K synthetic IPs (a=3, b=3) . . . . . . . . . 33 20. Partition function for 100K synthetic IPs (a=0.5, b=0.5) . . . . . . . 33 21. Regression of beta function parameters for MAWI-20210110 at λ = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 22. Regression of beta function parameters for MAWI-20210110 at λ = 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 23. Regression of beta function parameters for MAWI-20210110 at λ = 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 24. Structure functions for cascade model data with a = 2 and b = 2 for different numbers of IPs . . . . . . . . . . . . . . . . . . 43 25. Partition functions for cascade model data with a = 2 and b = 2 for different numbers of IPs . . . . . . . . . . . . . . . . . . 44 ix LIST OF TABLES Table Page 1. List of traffic samples . . . . . . . . . . . . . . . . . . . . . . . 11 2. Number of subnets containing 1 IP address for a set of prefix lengths, underlines indicate the shortest prefix length where a subnet had a single active IP in it. τ(0) is included to illustrate the relationship between scaling factor and IP dispersion and provides an estimate of the fractal dimension of the observed set of IPs. . . . . . . . . . . . . . . . . . . . . . 27 x CHAPTER I INTRODUCTION Current work in network automation promises that the ultimate goal of a self-driving network is contingent on the ability to concurrently execute hundreds or thousands of network telemetry queries that collect key information about the network state in a timely manner. Collected information serves as the primary input in to a bevy of network management systems designed to detect, characterize, and counteract all manner of network related problems, in both the security and performance domains, at scale, and near real-time. This trend is made possible by strides in hybrid network telemetry architectures that back line rate hardware telemetry collection (implemented in NICs or programmable switches) with flexible and scalable userspace stream processors. A key limitation of dataplane telemetry hardware is that resources (namely memory, which is typically implemented as expensive SRAM in hardware), are very limited and hotly contested, and hardware runtime reconfigurability and runtime programmability are underwhelming at best compared to the possibilities in software. Consequently, running queries in the dataplane at scale and in real time poses a persistent challenge, especially given the dataplane’s preexisting duties as the backbone of the network. Prior efforts on scaling network telemetry queries have focused mainly on attempting to stretch a limited set of dataplane capacity and capabilities as far as possible to answer a set of telemetry queries to a satisfactory precision while ensuring that packets continue to get forwarded at line rate. We propose an alternative approach to scaling dataplane telemetry queries where, rather than trying to fit more queries on to the same hardware, we instead 1 opt to take advantage of pronounced spatial structures present in the network traffic to reduce the workload required to satisfy the queries being submitted. 1.1 Traffic of Interest: Definition and Examples The observation that the traffic required to answer a telemetry query forms a fraction of the overall traffic is mostly intuitive, with telemetry operators often drawing comparisons between telemetry queries and “finding needles in a haystack”. To put concrete numbers to this observation, in the case of one real world intrusion detection system (IDS), 99% of the traffic processed by the IDS was found to be benign, and 1% or less triggered an alert or was flagged by the IDS. For a given query or set of queries, we define the associated “traffic of interest” (ToI) to be a subset of the overall packet traffic that satisfies those queries’ desire for data (i.e. it is the minimum set of packets required for the query to arrive at the same result that it would have given using the full set of packets). In the context of the IDS example, the ToI would be the set of traffic that the IDS identifies as malicious. 1.1.1 Examples of Traffic of Interest. The size and content of the ToI is highly dependent on the nature of the query it satisfies. On one end of the spectrum, consider the ToI for something specialized like a DNS reflection attack detecting query. The query is really only interested in certain DNS packets, so its ToI would be a subset of all DNS packets. The number of DNS packets is pretty minuscule compared to the overall volume of packets, so in this case the ToI is indeed a needle and the haystack of non-DNS packets can be ignored. On the other end of the spectrum, there are a number of very nonspecific queries that would like to view huge portions of the traffic on a network: the simplest one of these would just be the null query, which returns every packet observed at the collection point. 2 More realistic examples include queries examining all outbound or inbound packets, queries examining all TCP traffic and so on. In these instances, the desired packets comprise such a large portion of the overall traffic that the proverbial needle is now effectively the haystack itself. For a final example of ToI, consider a query examining internet background radiation (i.e. packets destined for an inactive address assigned to an active network). As this query only needs to examine data headed towards IPs with no machine behind them, the query could, for instance, ignore any TCP connections that successfully completed the handshake. Detecting this is nontrivial, but it illustrates the flexibility inherent to our definition of ToI. 1.2 Utility of ToIs to Telemetry As most packets encountered by the dataplane are simply irrelevant to the queries they are running, focusing the hardware on the ToI saves precious dataplane resources. For example, the available state space for modern dataplane telemetry systems constrains the volume of flows or connections they can track, but filtering mechanisms taking advantage of multifractal clustering properties can substantially ease this constraint for certain classes of query. Finding the true set of ToI itself can often only be done once the results of a telemetry query are already known, but the notion of ToI as an ideal query input to strive for remains, and there are a variety of heuristics that can be used to carve subsets out of the traffic data that contain ToI, where the goal of these heuristics is to maximize the portion of traffic being fed into a query that is ToI. One such approach for filtering ToI from background traffic is to try and identify addresses that are generating ToI and exclusively examining their traffic. This approach can reduce the volume of traffic that needs to be examined by the dataplane substantially when the number of ToI-generating addresses is a small fraction of the overall traffic (like in the IDS 3 case presented above), and importantly, filtering by address or subnet is a nearly ubiquitous capability among hardware telemetry systems. 1.2.1 Background on Switch Based Telemetry Systems. Many existing network telemetry systems have converged on expression their queries as a sequence of dataflow operators (e.g., map, filter, reduce, distinct) applied over an incoming windowed stream of packets with some fixed window duration. Sonata [10] provides a set of queries that have become a sort of baseline to benchmark dataplane telemetry systems. A fairly typical example of such a query can be seen in Sonata’s implementation of the superspreader query: Hosts, identified by their sourceIP that send traffic to more than Th distinct destinations in a window of size W seconds are flagged as superspreaders. Executing such a query requires making three decisions: (1) which operator to execute during which period in time, (2) which subset of the traffic (identified by some key, such as sourceIP) to process and for which operators, and (3) how much memory to allocate to each operator for the window. In this scenario, imagine a piece of hardware that concurrently runs dataflow operators over sets of packets belonging to different segments of the IP space, and each operator required some amount of memory be allocated to it. The naive approach would be to give the operator running on each segment of the IP space an equal share of memory. This approach works if ToI generating IPs are distributed evenly between every prefix, but on the other hand the system breaks down once any individual prefix outgrows its allocated memory, even if the overall volume of data could easily fit in the memory. A more intelligent system would be aware that the prefixes containing ToI-generating IPs are often not distributed evenly, and would instead have some mechanism to track which prefixes generated ToI for each past query and allocate memory to operators accordingly. 4 1.2.2 The Case for Leveraging Address Distributions for the Dataplane. Preliminary empirical evidence points towards source IPs not being distributed uniformly across the IP space [11][4]. [11] consider traffic consisting of uni-directional traffic extracted from bidirectional traffic seen on different links or at different routers within different networks, and [4] examined traffic consisting of internet background radiation traffic collected from an access link to a university or provider network containing unused addresses within its allocated IPs. The work by Kohler et al.[11] was the first to examine the spatial structure of observed IP addresses in measured internet traffic. Analysis of hours of measured packet traces from different network locations indicated that each trace showed signs of multifractal scaling behavior in the distribution of source IP addresses. Monofractal scaling defines a scale invariant behavior wherein clusters at a given scale level (like /16) get divided into proportionally sized subclusters at finer scales (like /20), irrespective of where the clusters are situated in the IPv4 space. Multifractal scaling is a generalization of this wherein the scaling behavior can vary for different parts of the IPv4 space, and the parameters dictating this “local” scaling behavior are summarized by a function called the multifractal spectrum. Both give rise to a pronounced qualitative “cluster-within-cluster” structure that results in observed IP addresses being distributed very unevenly across the IPv4 space. Subsequently, Barford et al.[4] analyzed multi-day long datasets that consist of source IP addresses in packet traces collected by network telescopes, firewalls and IDSs distributed throughout the internet and reported characteristics of the observed IP address spaces that are consistent with multifractal scaling. The IP space clustering behavior of multifractal traffic may be of substantial benefit to ToI finding heuristics that filter over the IP space. 5 CHAPTER II CONTRIBUTION The studies discussed were first reported two decades ago, and there has been little progress made to establish a rigorous foundational basis for the empirical finding that the structure of observed IP addresses in measured internet traffic demonstrates multifractal scaling. Compared to the commonly-accepted and well-established empirical finding that the temporal structure of measured internet traffic demonstrates self-similar scaling, the reported multifractal findings have received little attention, are neither widely accepted or known, and lack a generally accepted or carefully validated explanation. This issue is summed up by Kohler et al.[11]: “Without a convincing description of how [multifractal] address structure arises, the results of the explorations [in this paper] must be considered preliminary.”. We argue there are additional reasons these early empirical reports have resulted in little to no follow-up work and the suggested multifractal behavior has been largely viewed as a curiosity. First, these findings are somewhat dated, to put it gently. There is a pressing need to use modern traffic and perform a rigorous analysis for a range of current datasets from a range of environments. Second, many existing approaches to multifractal analysis are subject to technical assumptions that are difficult to prove in practice, and coping with this requires exploiting the large numbers of observations that come with typical network measurement by applying statistical techniques to carry out a robust assessment of the macroscopic aspects of multifractal scaling behavior, even if these techniques are inadequate to examine finer grained details. Third, any convincing physical description of how the multifractal structure emerges should be backed by supporting measurements 6 that illustrate the validity of the mechanisms that underly the provided physical explanation. Ideally, such a description should translate directly into parametric generative models that can be used to obtain synthetic traffic that emulates the observed multifractal structure of the IP addresses in the measured traces. Fourth, we are currently lacking concrete use cases that demonstrate whether or not accounting for the observed spatial characteristics of real-world instances of traffic is important for downstream applications, and if so, for what type of such downstream applications. Lastly, in case the multifractal address structure in measured instances of traffic is shown to matter in practice, we need new ways to synthesize instances of traffic that can be shown to exhibit the full-fledged spatial- temporal characteristics that are consistent with real-world traffic and includes both the temporal and spatial aspects of measured internet traffic. We focus on addressing the first two items, present preliminary results for items three and four, and briefly discuss possibilities for approaching item four. In particular we make the following contributions: – We amass a repository of different instances of traffic datasets derived from a range of different measured internet traffic traces. The current instances differ in terms of the location where the traffic was collected, traffic volume, duration, user base, application mix, etc. We are constantly adding new instances of traffic as they become available to us. (§ III) – To infer the presence or absence of multifractal address structure in each of the existing real-world instances of traffic, we apply a statistical method known as “method of moments”. Compared to other, more geometrical methods for inferring multifractal structure, the method of moments is known to be more robust with respect to aspects of the data that may violate certain 7 technical conditions other methods require but are in general difficult if not impossible to verify in practice, computationally more tractable than other methods, and especially well-suited for datasets with large number of observations. We show that each of the traffic samples exhibits a spatial address structure that is consistent with multifractal scaling but the degree of multifractality depends on the type of datasets. (§ IV, § V) – Building on an earlier attempt at providing a physical explanation or description of how multifractal address structure may arise in real-world traffic, we offer some initial empirical evidence in favor of a generative mechanism in the form of an underlying multiplicative conservative cascade process and demonstrate its potential for producing synthetic traffic with a desired multifractal address structure. (§ VI) – Prior studies that report on empirical findings of multifractal address structure in measured internet traffic provide or discuss no concrete use cases that demonstrate the importance of adequately accounting for this newly observed characteristic. We review the few studies in the existing literature that report on such concrete uses and provide evidence that the observed nonuniformity in how observed addresses are distributed across the IPv4 space can be put to good use in practice. In particular, these use cases argue for more aggressive future efforts that are aimed at adequately accounting for the full-fledged spatial-temporal characteristics of real-world traffic in guiding telemetry systems to optimize their resource usage by examining less non-ToI traffic. (§ VII) 8 CHAPTER III DATASETS Previous work exploring the multifractal properties of network traffic is decades old [11][4], and the volume and nature of real internet traffic has undergone tectonic shifts since those experiments were run. Consequently, further examinations of multifractal properties benefit tremendously from samples of the “modern” internet. Furthermore, examining how these properties vary over time, in the vein of [18] is valuable, particularly examining how the behavior evolves over hours, days, months, and years. To this end, we assemble a collection of publicly available datasets (CAIDA[1] and MAWILab[8]), alongside a number of privately collected samples for university campus border switches. 3.1 Collected Traffic Traces A variety of traffic samples were evaluated Table 1, spanning a number of dates, times, and locations. For every dataset, only the first 100,000 observed source IPs are considered, which was found to not substantially impact the results while making head to head comparison of different datasets much easier. In Table 1, CAIDA refers to the CAIDA 2019 anonymized internet trace, which is recorded from an internet backbone [1], MAWI refers to the MAWILab traffic anomaly archive [8], which is recorded at the border between the WIDE project and its parent ISP. UCSB refers to captures of network traffic at the edge of UC Santa Barbara’s campus network. UO refers to captures of network traffic at the edge of the University of Oregon’s campus network. Data from both campuses are anonymized with CryptoPAn [19] prior to analysis. This prefix-preserving anonymization method exactly preserves the structure being analyzed such that running the analysis before and after anonymization yields an identical result. 9 These datasets represent network traffic from campus and network backbone environments, which covers two very common network settings. Data spanning seven years is included, and traces were captured on a variety of times of days, days, and months. This all serves to forward the argument that what we observe is not isolated to a specific time, place, or circumstance, but is rather a fundamental property of organic internet traffic. 10 Name Date Duration Type IP Count Packet Count CAIDA-dir-A 2019-01-17 5:00AM 900s pcap 2,381,306 566M CAIDA-dir-B 2019-01-17 5:00AM 900s pcap 3,863,987 1.23B MAWI-20150202 2015-02-02 2:00PM 900s pcap 5,571,625 99M MAWI-20150710 2015-07-10 2:00PM 900s pcap 4,415,599 191M MAWI-20151002 2015-10-08 2:00PM 900s pcap 4,818,370 135M MAWI-20180316 2018-03-16 2:00PM 900s pcap 4,567,614 69M MAWI-20180807 2018-08-07 2:00PM 900s pcap 4,635,311 73M MAWI-20181107 2018-11-07 2:00PM 900s pcap 4,677,191 80M MAWI-20190901 2019-09-01 2:00PM 900s pcap 4,689,835 87M MAWI-20210110 2021-01-10 2:00PM 900s pcap 265,794 52M MAWI-20210614 2021-06-14 2:00PM 900s pcap 180,532 74M MAWI-20211212 2021-12-12 2:00PM 900s pcap 149,572 55M UCSB-20220428 2022-04-28 12:15 900s pcap 194,498 1.02B UCSB-20220921 2022-09-21 19:25 900s pcap 112,164 1.05B UCSB-20221205 2022-12-05 20:15 900s pcap 186,575 1.1B UO-20181106 2018-11-16 00:00 1 day netflow 1,805,085 9B UO-20190829 2019-08-29 00:00 1 day netflow 1,497,311 12B UO-20200213 2020-02-13 00:00 1 day netflow 786,607 36B UO-20211106 2021-11-06 00:00 1 day netflow 1,288,775 26B UO-20220517 2022-05-17 00:00 1 day netflow 668,817 23B Table 1. List of traffic samples 11 CHAPTER IV METHODOLOGY Let µ denote the measure (defined on the IPv4 space) that counts the number of observed IP addresses in a given dataset. Determining if this measure shows multifractal properties is relatively straightforwards at a high level: compute the fractal dimension at a variety of scaling factors, and see if it changes between scaling factors. If if does change, then µ is multifractal, and if it doesn’t change, it is monofractal; if µ has fractal dimension 1 (same as the underlying IPv4 space), then it is nonfractal. This test is equivalent to computing the multifractal spectra and ensuring that it is not degenerate. 4.1 Technique to Detect Multifractal Properties In order to evaluate the multifractal characteristics of the set of observed IP addresses in a given trace, we use the Method of Moments approach [18] adapted for the discrete IP space, which consists of the following steps: – Compute multiresolution quantities by building a table of how large every subnet is for each prefix length (§ 4.1.1) – Compute the structure functions using the multiresolution quantities table (§ 4.1.2) – Compute the partition function by determining the slope for each of the structure functions(§ 4.1.3) – Nonlinearity of the partition function indicates the presence of a multifractal structure to the data (§ 4.1.4) 4.1.1 Multiresolution Quantities. For 1 ≤ λ ≤ 32, we partition the input IP space into subnets with prefix length λ. We then count the number 12 of observed addresses that fall in each subnet, and indicate this value with the notation µ(i, λ), where i ∈ [0..2λ − 1] indicates the index of the subnet being examined, and λ indicates the prefix length of that subnet. Note that the structure of subnets mirrors the binary tree construction outlined in [15]. ∑ 4.1.2 Structure Function. For a given “moment” q, Z(λ, q) = i µ(i, λ) q defines the structure function across all subnets with prefix length λ.We modify the computation slightly such that if µ(i, λ) = 0, µ(i, λ)q = 0 for any q (even if q ≤ 0, which would normally yield 1 or not be defined). Relaxing the computation in this way is a necessary to facilitate translation from a continuous space to a discrete space. An example structure function graph can be seen in Figure 1. In effect, the structure function Z(λ, q) is an estimate of the q-th moment of µ, and multifractality presumes that these moment estimates will exhibit some scaling behavior for large λ. Further analysis of the behavior of the structure function can be found in § 5.4.1. 4.1.3 Partition Function. For a set of outputs {Z(λ, q)|λ ∈ [s..e]} for a given range of moments q and for a range of prefix lengths s to e in the structure function, let τ(q) represent the slope of the linear regression of log (Z(λ, q)) Figure 1. UO-20220517 structure function 2 graph vs. log2(λ). For our evaluation, we set our prefix range to work between the 8 and 24 prefixes (s = 8, e = 24). We compute τ(q) for q ∈ Z : −4 ≤ q ≤ 4. An example partition function graph can be seen in Figure 2. Formally, if the structure 13 functions serve as estimates of the q-th moments of Z(λ, q), then multifractality presumes that E[µq] ≈ Z(λ, q) ≈ λτ(q) for large values of λ; that is, the partition function attempts to capture the presumed scaling behavior. Further analysis of the behavior of the partition function is provided in § 5.4.2. 4.1.4 Final Analysis. Following the multifractal formalism outlined by Riedi [15], the partition function τ(q) would be input in to a Legendre transform τ(q) = min(αq − f(α)) to derive the multifractal spectrum f(α). Thus, monofractal Figure 2. UO-20220517 partition function data is expected for a linear input as graph it results in a degenerate multifractal spectrum. On the other hand, multifractal data results from a nonlinear input as it produces an output with a nondegenerate (i.e. wide) support that indicates a varying fractal dimension, which is the defining characteristic of multifractal structures [16]. In order to observe this, we plot τ(q) against q and see if there is a substantial nonlinearity. Note that τ(q) is largely noninformative for nonfractal data, except for showing that the fractal dimension is identical to the Euclidean dimension (i.e., the data fill or cover the entire space). 14 CHAPTER V RESULTS In this chapter, we will first apply the method of moments methodology to uniformly random data in order to get a “baseline” for what nonfractal data looks like. We then apply the same methodology to a single dataset to demonstrate what the results look like for multifractal data. With a fresh understanding of how the analysis works per dataset, we examine aggregates of the results across all datasets. Finally some time is devoted to discussing the practical considerations of the method of moments as well as intuitive interpretations of the functions and results. A sample of results covering each data source can be seen in Figure 3. Each row corresponds to a single dataset, and one dataset from each network source was selected. The result of the structure function computation is visible in the left column, and the corresponding partition function computation can be seen in the right column. These are provided here primarily as a visual baseline to show the general behavior common between all the datasets, and to give a rough idea of the particular behavior of each data source. These graphs will be discussed in greater detail later in the document. 5.1 An Example of Nonfractal Data In order to make concrete what was described in § 4.1.4, we will spend a moment examining data that does not posses any multifractal characteristics. The formal definition of nonfractal that we use is that a set in [0, 1] is nonfractal if its fractal dimension is equal to its dimension as a subset of [0, 1]. For the linear IP space, the dimension is 1, so if the computed fractal dimension of a set is 1, then we claim the set is nonfractal. In more intuitive terms, the points of a 15 (a) CAIDA-dir-A (b) MAWI-20190901 (c) UCSB-20220428 (d) UO-20220517 Figure 3. Structure and partition function graphs for selected datasets nonfractal set fully fill or cover the space. Graphs of the structure function and partition function for a set of 100K IP addresses that were generated by casting 32 bit integers drawn from a uniformly distributed random source as IP addresses 16 Figure 4. Graph of the structure and partition function for 100K IP addresses generated from uniformly random integers (i.e. nonmultifractal data). Note that the partition function is computed between /0 and /16 instead of /8 and /24 can be seen in Figure 4. A set of values drawn from a uniform distribution by definition are extremely unlikely to posses any multifractal structure. Though this is not monofractal data, the intent is to provide an example of what data that doesn’t show the properties we are discussing looks like for the purpose of building intuition. In that capacity uniform random data suffices. The structure function graphs start at l = 1, but in the l = 0 case, the y value for each moment would be log (nq2 ) where n is the number of input IPs, which follows from the equation for the structure function, and the fact that we only have one subnet at prefix length 0. This gives a predictable starting point for each moment q in the structure function graph. Additionally, the y value at l = 32 will be log2(n) for every moment, as at l = 32, each subnet either has 0 or 1 IPs, so the computation will simply return the number of IPs. We call the horizontal line at y = log2(n) the convergence line, as the lines for every moment will eventually converge to it. Note that the lines for each moment in this structure function plot converge at roughly the same prefix length. As predicted, the partition function for random data is almost perfectly linear, as the data does not posses a varying 17 fractal dimension beyond what little incidental structure emerged from the random number generator. Figure 4 shows that the input data is uniformly random (i.e. nonfractal), and τ(0) ≈ 1, as was predicted for nonfractal data. The partition function here is computed between /0 and /16 rather than /8 and /24 like every other partition function in this document as there is a very distinct nonlinearity in the structure function around /16, and per § 4.1.3, the analysis is only valid over sections of the structure function where every line is linear. Furthermore, there is no “human interference” at small prefix lengths for the random data, so the motivation for excluding the /0 to /8 range does not hold. Note that shifting the range like this is not required for other traces as they are more or less linear between /8 and /24. 5.2 Multifractal Analysis of a Representative Sample In Figure 5, we see the structure and partition functions for the first 100K IPs of the MAWI 2021-01-10 trace, and the differences from the structure and partition graphs for the random data are stark. Notice that the graph of the structure function has a “knee” between l = 7 and l = 10 where all the lines for negative moments rapidly approach 0, and the lines for the positive moments don’t converge to the expected value of log2(100, 000) ≈ 16.61 until far later, with the Figure 5. Graph of the structure and partition function for the first 100K IPs of the MAWI-20210110 trace 18 line for q = 4 not converging until l = 32. These observations are represented more numerically in the partition function graph, where the negative moments all have slopes near 1 (recall that this only considers l ∈ [8, 24], so the “knee” is not examined), and the positive moments trend towards having more negative slopes. The most critical characteristic is the pronounced nonlinearity between the segment for q ∈ [−4..−1] and q ∈ [0..4], which is strongly indicative of a varying multifractal dimension, and thus multifractal structure. The partition graph being split in this manner is a common theme in all examined data. 5.3 Aggregate Analysis As we are using nonlinearity in the partition function to judge the presence of multifractal structure in the data, the simplest analytic approach is to ask “how bent is it” for every trace we are examining. Prior works answered this question by computing the R-Squared value for the partition function output, but we argue that this approach, while more statistically rigorous, is overly general as the nature of the nonlinearity is identical across all samples. Manual inspection of all the data demonstrated that the bend consistently occurred at or adjacent to q = 0, so comparing the slope of the graph for q ∈ [−4.. − 1] with the slope of the graph for q ∈ [0..4], offers a general approach for evaluating nonlinearity. In order to rigorously compare the pre and post bend slope, a linear regression is computed for each region and plotted. Examining Figure 6, it is clear that the slopes of the “flat part” from q = −4 to q = −1 are all very shallow, with the steepest slope being −0.08, and most of the slopes clustering around −0.03. In contrast, the shallowest slope in the “bent part” Figure 7 is −0.31. Our metric for “how bent is it” then becomes the slope of the bent part divided by the slope of the flat part. Nonmultifractal data should yield approximately 1 for this metric, as the slope for 19 Figure 6. CDF of slopes of partition Figure 7. CDF of slopes of partition function for q ∈ [−4,−1] function for q ∈ [0, 4] Figure 8. CDF of ratio of slope of “bent part” over slope of “flat part” 20 q ∈ [−4.. − 1] and for q ∈ [0..4] are expected to be the same. Upon examining the collected data, Figure 8 shows that none of the samples has a ratio anywhere near 1, rather the smallest ratio is 6, and the ratios seem roughly evenly distributed between 6 and 31, which indicates that all examined samples have a substantial nonlinearity, which indicates strong multifractal scaling. 5.4 Examination of Method of Moments Now that these operations have context in the form of results, we take a moment to go back and build up intuition about why each function behaves as it does. 5.4.1 Structure Function. This is a rough translation of the “box counting” method [14] for determining multifractal spectra in continuous input spaces to a discrete input space. Intuitively, one should think of the multiresolution quantities as being similar to finding the mass distribution of the set of input IPs, and the structure function computes a value that correlates with how clustered the mass distribution of the input is. To illustrate the behavior of Z(λ, q) consider some input set with enough IPs to fill one subnet with prefix length λ. Now consider how different distributions of those addresses throughout the IP space will impact the value of Z(λ, q). If q = 0, the function will return the number of subnets that have one or more IPs in them. If q > 0, the function will assume its maximum value when every IP is in the same single subnet, and its minimum value when the addresses are distributed evenly across all subnets. Conversely, if q < 0, the maximum value occurs when the addresses are spread evenly across all subnets, and the minimum value is occurs when every address is in the same subnet. The value of q changes the size of the subnets that get measured, in a basic sense. 21 5.4.2 Partition Function. Selection of the limited prefix range for the partition function is intended to cope with a critical restriction outlined in [7]: When finding the slope for a moment, the line formed by the plotted points must be straight, otherwise the slope found by least squares has no physical meaning. The moment lines for the /8 to /24 prefix range are consistently sufficiently linear that we are comfortable with accepting the result of the linear regression as matching the real slope of the segment being examined. Furthermore, we justify this choice by making the following claims: First, the distribution of subnets up to /8 (and potentially deeper) is dominated by human influence (namely the IANA), so taking measurements here is more akin to measuring the outcomes of negotiations and deals during the early days of the internet rather than more “organic” allocation patterns. Second, prefix lengths greater than 24 tend to run into finite limit effects, as there is a hard scaling limit at /32, and as we approach that limit the structure being analyzed often breaks down. 5.4.3 Interpretation of Moments. There is a dearth of interpretations of what the moments q actually correspond to in the literature. [16] describes q as a “variable [that] selects different resolutions, with higher values of q selecting a local scaling...of lower order”. While nice and general, it is not obvious how to take this description and interpret the shape of the structure and partition graphs, so we lend the intuition we have developed. Intuitively, it is probably most helpful to think about the structure function without the context of multifractals as being a function capable of being “tuned” for certain types/sizes of clusters of addresses by setting q. The structure function converges to the number of IPs in the input set once the type of cluster it is tuned for has been broken up by the 22 increasing prefix length λ (i.e. the prefix length becomes long enough that most of the IPs appear in different subnets). The q = 0 case is a raw estimate of the box-counting dimension of the set of observed IP addresses as a subset of the IPv4 space. This therefore gives information on to what extent the data (IP addresses) is filling out its physical support (the IP space). In measure-centric language, the set of input IPs can be normalized to some subset of [0, 1] (as was done in [11]), and τ(0) would estimate the fractal dimension of this set. As multifractals tend to not fill out their spaces (versus something like a Hilbert Curve, which does), we would expect τ(0) < 1 for multifractal data. The τ(0) values for each dataset can be seen in Table 2. Negative moments (q < 0) tune the structure function to ignore subnets with many IPs in them (e.g. for q = −2, two subnets each with 50 addresses will contribute eight times the mass as one subnet with 100 addresses), which interestingly makes their presence very apparent since there is a huge gap between the lines for q < 0 and the convergence line. Among the sample data, it was common for traffic to concentrate in a couple of /7 or /8 subnets, which contained a large share of the addresses (Figure 9 and Figure 10. Both are generated using the technique outlined in § 5.4.3.1). Structure functions with negative moments would functionally erase these large subnets from the total until they had been broken up sufficiently (i.e. λ = 10 or λ = 11), at which point the set of now smaller subnets would start contributing meaningfully to the sum. Positive moments (q > 0) conversely tune the structure function to “amplify” the presence of clusters at large prefix lengths by weighting larger subnets much more heavily. For example, for q = 2, a subnet with 100 IPs will have double the mass of two 50 IP subnets. Notice that this ratio is much smaller than the q = −2 case described earlier, which 23 likely contributes to the relatively rapid rate of change for the negative moments compared to the positive ones. Further analyzing the behavior of the positive moments, imagine a uniform random distribution of n IP addresses: the expected number of IPs per subnet with prefix length λ is nλ . If set an arbitrary limit and say the line for a moment2 is “pretty close” to converged once there are 2 IPs per subnet, then we would expect the values for the positive moments to converge when the expected IPs per subnet is 2. For n = 100000, this happens at λ ≈ 16, and if we further restrict our definition of “pretty close” and say the expected value must be 0.5 IPs per subnet, that happens at λ ≈ 18. Checking the structure function graph of random IPs Figure 4 shows that this prediction aligns with the observed outcome. Contraposing this, if the structure function for positive moments has not converged by the time the expected number of IPs per subnet is small (< 1), that means that there are clusters of IPs concentrated in certain subnets at long prefix lengths, and the distribution of IPs is not uniformly random. In this sense, positive moments tune the structure function to be sensitive to the presence of big clusters, and this effect is particularly visible at long prefix lengths. To connect this behavior back to multifractals, if we imagine a multifractal as a sort of stacking of somewhat self-similar mono-fractals with different fractal dimensions, the moment q selects which fractal dimension we want to be sensitive to. Lower values of q look for larger fractal dimension, meaning it is sensitive to structures at very large scales reappearing at very small scales. Conversely, higher values of q look for lower fractal dimensions, meaning it is sensitive to structures at a given scale reappearing at nearby scales. 24 Figure 9. Heatmap showing the portion Figure 10. Heatmap showing the portion of addresses in /8 subnets for the of addresses in /8 subnets for the MAWI- CAIDA-dir-A trace 20210110 trace 5.4.3.1 Heatmaps. The heatmaps seen in Figure 9 and Figure 10 are generated by dividing the input IP space into subnets with a given prefix length, counting the number of IPs in each subnet, and then mapping the subnets to a grid by taking the linear space of IPs and transforming it into a two dimensional Hilbert Curve of order 8. Colors are then assigned via one of two schemes: In the first scheme, subnets are assigned a color whose brightness linearly corresponds to the number of IPs in that subnet. In the second scheme, a subnet is colored black if there are any IPs at all in it, and white otherwise. 5.4.4 Influence of Sparsely Populated Subnets on Negative Moments. Across the structure function graphs for every real world dataset, a “knee” was observed somewhere between the /6 and /10 prefix, where the negative moments abruptly shot up towards 0. This is caused by the properties of Z(λ, q) when q < 0: As soon as one of the subnets being examined only has one IP, Z(λ, q) > 1, which trivially follows the definition of Z. On the structure function log plot, this effect manifests as the line crossing the x-axis. It is well known that the distribution of network traffic by prefix is long tailed, so the distribution of 25 active IPs by prefix likely has the same property, so it is reasonable to expect that at relatively early prefix lengths there will be a subnets with only one IP. Recall that due to the relaxation in the computation described in § 4.1.2, subnets with no IPs are not considered in the computation. Checking Table 2, one can actually match up the prefix length where each dataset has a subnet with one active IP with the prefix length where the knee is in the dataset. Weaknesses in the mapping of the box counting technique from continuous spaces to discrete spaces are likely to blame for this behavior. Some traces, like the CAIDA-dir-A trace show a sort of “stairstepping” in the structure function graph Figure 3 that isn’t a proper knee. Examining the heatmap for CAIDA-dir-A Figure 9 shows how the effect described in § 5.4.3 for negative moments could cause this: when the overwhelming majority of the addresses are concentrated in six /8 subnets, each time an increase in resolution divides that set of addresses it has a profound impact, but occasionally the division caused by a zoom in will have no impact if the IPs happen to coincide in the same subnet. This gives rise to the stair-stepping effect, as some increases in prefix resolution split the IPs, and others don’t. 26 Dataset /6 /7 /8 τ(0) CAIDA-dir-A 0 0 0 0.816 CAIDA-dir-B 5 11 30 0.499 MAWI-20150202 0 1 8 0.577 MAWI-20150710 0 0 7 0.580 MAWI-20151002 0 0 3 0.579 MAWI-20180316 0 2 7 0.583 MAWI-20180807 0 2 4 0.585 MAWI-20181107 0 1 1 0.585 MAWI-20190901 0 1 3 0.583 MAWI-20210110 0 0 2 0.538 MAWI-20210614 0 0 5 0.524 MAWI-20211212 0 0 1 0.514 UCSB-20220428 0 1 4 0.512 UCSB-20220921 0 0 1 0.503 UCSB-20221205 0 0 3 0.508 UO-20181106 0 1 1 0.552 UO-20190829 0 0 0 0.548 UO-20200213 0 0 0 0.544 UO-20211106 0 0 4 0.541 UO-20220517 0 2 11 0.530 Table 2. Number of subnets containing 1 IP address for a set of prefix lengths, underlines indicate the shortest prefix length where a subnet had a single active IP in it. τ(0) is included to illustrate the relationship between scaling factor and IP dispersion and provides an estimate of the fractal dimension of the observed set of IPs. 27 CHAPTER VI PHYSICAL EXPLANATION In this chapter, we dig in to the lower level statistical mechanics that describe how the multifractal structures are generated locally. We then take this analysis and postulate an mechanism by which real-world phenomena could give rise to the observed statistical mechanics. Finally we translate the observed mechanisms into a generative model that creates realistic multifractal IP distributions, and spend some time analyzing its output. 6.1 Preliminary Findings In order to peek at the statistical mechanics at work within the data, we ran each set of input IPs through the level weight procedure outlined in § 6.1.2 for prefix lengths of 8, 12, and 16; and generated a histogram, a selection of which can be found in Figure 11, Figure 13, and Figure 12. This procedure examines every subnet at the given prefix length, finds the probability that an IP in that subnet has a 1 as the next bit after the current prefix, and plots a histogram of those values. Note that these are recorded across the duration of the trace rather than the first 100K IPs, so larger traces like CAIDA will be smoothed out relative to the smaller traces. The /8 prefix is typically noisy due to a small number of possible samples (256), but shows something that very clearly resembles a normal curve or beta function, excluding the peaks at 0 and 1 which we choose to ignore for reasons covered in § 6.2.2. This observation drove the development of our generative cascade model. 6.1.1 Physical Interpretation. Following Kohler et al.[11], we concur that the multifractal structure is likely due to organizations receiving address allocations, partitioning them up, and redistributing the partitions to 28 (a) /8 (b) /12 (c) /16 Figure 11. Level weights for CAIDA-dir-A (a) /8 (b) /12 (c) /16 Figure 12. Level weights for UCSB-20220921 (a) /8 (b) /12 (c) /16 Figure 13. Level weights for MAWI-20210110 29 customers, which aligns rather closely with the cascade model. We further propose that there are “phases” to the allocation behavior as prefix length increases. The first phase from /0 to /8 − /10 is dominated by IANA allocations[9][2] The second phase from /8 to /15 or so is dominated primarily by internet service providers and secondarily by “legacy organization” that got their /8 allocations back before that was a substantial commitment and have been hanging on to them (the DOD being a prime example of this). For ISPs, a quick examination of the address blocks allocated to Comcast and Spectrum indicates that they split their allocations up into large /9s-/15s that they then allocate internally on a geographic basis, often by city. The third phase is the allocation of either single addresses or small blocks to customers, and the final phase is the customer distributing their address(es) among their local machines. Each of these phases has a tendency to create clusters: for the first phase addresses are divided up geographically to different RIRs (though this is pretty messy), which creates natural clusters along geographic borders. In the second phase, service providers do another layer of geographical partitioning, which creates clusters whose size and activity likely correlate to the population of the city they are assigned to. In the third phase, more clusters are created as geographic blocks are chopped up and handed out, and the fourth phase likely happens on such a small scale that it would be challenging to classify anything as a substantial cluster. 6.1.2 Level Weights. We use “level weights” to describe a list of probabilities generated using the following technique (as seen in Figure 11, Figure 13 and Figure 12): – Initialize a list lw with no entries – Divide the set of input IPs in to subnets with prefix length λ 30 – For each subnet, count how many of its member IPs have a 1 for bit λ+ 1 – Divide the above count by the total number of IPs in the set, and append this portion to lw These are graphed as a histogram where each bar corresponds to the portion of a subnets child IPs that have a 1 for the next bit (i.e. the probability that an IP in that subnet has a 1 as the next bit after the subnet prefix), and the height of the bar is the relative occurrence of that probability. 6.2 Generative Cascade Model In order to synthesize lists of IP addresses showing multifractal properties, we use the following procedure: Given a number n indicating how many IP addresses should be generated, a number l indicating the current prefix length of the working pool of addresses, and a prefix p indicating the current IP prefix, and beta distribution parameters a and b: – If p has all 32 bits filled out and n = 1, add it to the list of output addresses and return. If n > 1, indicate that there was a duplicate. – Sample the distribution β(a, b) to get a number x between 0 and 1. – Generate a new prefix p0 by setting bit l of p to 0. – Generate a new prefix p1 by setting bit l of p to 1. – Recursively call this operation with p = p0, n = floor(x ∗ n), l = l + 1 – Recursively call this operation with p = p1, n = n− floor(x ∗ n), l = l + 1 31 In plainer language, this procedure works by recursively partitioning a set of size n in to two parts, where one part has β(a, b) portion of the input set, and the other part has the rest. Members of the first set get 0’s the bit corresponding to the current recursion depth, and the other set gets 1’s. By adjusting the a and b parameters of the Beta Function, it is possible to adjust the distribution of IPs. In the experimental data, a and b tended to be equal, however setting a and b to be smaller tended to create more clustered sets of IPs, while setting them larger created smoother and more evenly distributed sets of IPs. 6.2.1 Cascade Model Results. The level weights computed for the cascade model 14 align quite well with those computed for the samples. The structure functions for the synthetic data Figure 15, Figure 16, Figure 17 have the key features found in the structure functions for the experimental data: the more clustered it is, the later the positive moments converge and the earlier the negative moments converge. There is one key problem: the knee is missing for Figure 15 and Figure 16 (whose a and b parameters are most similar to the observed data). As outlined in § 5.4.4, we believe that at short prefix lengths in real data, there is some not-so-organic clustering as a consequence of how the IP space is allocated, and absent this initial perturbation in the synthetic data, the knee evaporates away on the less clustered data. The partition functions for the synthetic data seen in Figure 18, Figure 19, and Figure 20 also demonstrates the nonlinearity characteristic of multifractal structure in the data, and furthermore the nonlinearity becomes more pronounced as the beta function parameters are reduced. This aligns completely with the expectations laid out above. 32 (a) /8 (b) /12 (c) /16 Figure 14. Level weights for cascade model synthetic data. a = 2, b = 2 Figure 15. Structure Figure 16. Structure Figure 17. Structure function for 100K synthetic function for 100K synthetic function for 100K synthetic IPs (a=2, b=2) IPs (a=3, b=3) IPs (a=0.5, b=0.5) Figure 18. Partition Figure 19. Partition Figure 20. Partition function for 100K synthetic function for 100K synthetic function for 100K synthetic IPs (a=2, b=2) IPs (a=3, b=3) IPs (a=0.5, b=0.5) 33 6.2.2 Cascade Model Limitations. A key deficiency of the model is that the beta function parameters do not vary with respect to bit depth, while in empirical testing seen in Figure 21, Figure 22, and Figure 23, there appears to be a trend towards higher coefficients at deeper bit depths. Implementing this as part of the model would be trivial, however the observed trend is likely heavily influenced by insufficient sample size at long prefix lengths. As the prefix length grows and the number of IPs per subnet shrinks, the samples taken of the beta distribution (as described in § 6.2.3) become quantized. When a subnet has four IPs, for example, getting the portion of IPs with a 1 in the next bit is going to yield 0, 1 , 1 , 3 or 4 2 4 1, regardless of the underlying distribution. This effect becomes particularly pronounced among subnets with one or two addresses, and at longer prefix lengths 0, 1 , and 1 occur with overwhelming frequency. This is likely to lead to increasingly 2 incorrect regression values as prefix length increases, so experimentation with much larger datasets that can fill out subnets with enough IPs at much deeper prefix lengths is likely needed to conclusively determine the beta coefficients at those prefix lengths. Additionally, techniques to dequantize data by injecting noise in to the samples may actually aid the regression system. Figure 21. Regression of Figure 22. Regression of Figure 23. Regression of beta function parameters beta function parameters beta function parameters for MAWI-20210110 at for MAWI-20210110 at for MAWI-20210110 at λ = 8 λ = 16 λ = 24 34 6.2.3 Beta Regression. To generate Figure 21, Figure 22 and Figure 23, we use the process for generating level weights to get a list of probabilities that, in each subnet with prefix length λ, the (λ + 1)th bit of a child IP from that subnet is 1. We treat these probabilities as samples from a beta distribution, and use regression techniques to compute the a and b parameters of the distribution. Due to properties of the level weights outlined in § 6.2.2, the samples have strong probabilities of assuming values of 0 and 1 (and to a lesser extent, simple fractions), but the peaks from these are placed almost optimally to interfere with the beta regression system and they represent heavily distorted samples, so during regression values of 0 and 1 are ignored. 35 CHAPTER VII USE CASES FOR MULTIFRACTAL ADDRESS STRUCTURE In this chapter we will start by looking at a method by which the clustering property of multifractal data can be taken advantage of to optimize resource usage of dataplane telemetry systems via a filtering scheme. Next we will examine some possibilities of using the multifractal spectrum as a “signature” for certain phenomenon or using it as a sort of alarm to indicate something about the nature of the network traffic changed. Finally we will look at the viability of deploying the method of moments technique to a real dataplane telemetry system. 7.1 Clustered Prefix Tracking Current efforts are finding success with using the “clusters-of-clusters” property that comes with multifractal behavior to potentially reduce workloads on IDS and network telemetry systems that depend on knowing things about the source of the packets they examine [3]. Soldo et al. [17] describe a technique for coming up with compact filter rules, though they do not explore how clustering behavior benefits these filtering rules. Chen et al. [5] found that the overwhelming majority of malicious traffic tended to be generated by addresses from a fixed subset of the IP space, and similarly Collins et al. [6] found that malicious hosts tended to cluster in the address space. A number of papers deal with iteratively refining their queries over the IP space [12][13][10][20], which carries an implicit assumption that the features they are refining towards tend to cluster in the IP space. Our work justifies this assumption. With the knowledge that sources generating traffic of interest tend to cluster, it becomes possible to handle those clusters in aggregate rather than individually. 36 Consider some task where a dataplane telemetry system is running a query to detect and monitor some set of addresses that are interesting (i.e. they are generating ToI). The naive approach is to monitor all traffic and test to see if every address is interesting. This is simple to implement and works well if the telemetry system has enough memory to handle the entire space of traffic, but it quickly falls apart if the traffic exceeds the resources of the telemetry system. More sophisticated approaches may do some kind of time domain multiplexing, where they cycle through monitoring subsets of the traffic such that the subset fits on the telemetry hardware. This incurs penalties to accruacy as each address will only be monitored for some portion of the time, and sophisticated statistical devices are likely required on the backend in order to make the partial data usable. An even more sophisticated model may do time domain multiplexing, but also individually track addresses that were found to be interesting, so it doesn’t suffer from the loss of observation time. Unfortunately, this requires a filter per address to capture interesting traffic, and these are typically implemented as extremely expensive TCAM in dataplane hardware, so this approach fails to scale beyond the availability of TCAM rows. Multifractal properties to the rescue! Since we know that ToI generating (i.e. interesting) addresses tend to cluster, what if it was possible to just identify the prefixes with only interesting addresses and track those? Luckily a procedure capable of doing this is quite straightforwards assuming the input has a query that is able to flag addresses as “interesting” or “uninsteresting” at runtime: – Given a subnet with prefix length l, divide the prefix in to 2i smaller subnets with prefix length l + i 37 – Initialize an empty list e that will contain the subnets with only interesting addresses – See if each smaller subnet s contains interesting addresses – If l + i is 32, see if the address in s is interersting, and if it is add it to e and continue to the next smaller subnet – If s consists of only interesting addresses, add s to e – If s consists of a mix of interesting and uninteresting addresses, recursively run these instructions on s, and concatenate each resulting list to e – Otherwise, if s contains only uninteresting addresses ignore it – Once every smaller subnet is processed, return e If i = 1, this will compute the shortest possible list of prefixes that captures all interesting addresses and nothing else. These prefixes would be the optimal set of things to monitor in order to only see attack traffic while using the minimum number of prefix filters. If i > 1, this will compute the shortest possible list of prefixes where prefix lengths are multiples of i that capture all interesting addresses and nothing else. In this sense i works as a knob that trades an increased number of prefix filters for fewer recursive iterations. Additionally, if monitoring some uninteresting traffic is permissible, limits can be placed on the recursion depth which will reduce the maximum number of filters needed. In a real deployment, this implementation would likely take the form of dataplane telemetry hardware that is running two tasks concurrently. The first task is to run the query in question on every prefix that has been flagged as containing only interesting addresses continuously. The second task is to periodically run the recursive process, starting 38 with 0.0.0.0/0 (multiplexing subnets over time if memory is constrained), and adding the resulting list of prefixes to the set of prefixes with only interesting addresses. In the event that a prefix containing only interesting sources suddenly gains an uninteresting source, that prefix can be “evicted”, and have the recursive refinement process run on it in order to reestablish a list of finer grained prefixes that only contain interesting traffic. This approach makes a very firm guarantee: if the system has enough memory to track all the sources generating ToI for a given query, plus some constrained overhead for the recursive scanning process, it will be capable of satisfying the query regardless of how much uninteresting traffic there is as long as it doesn’t run out of room for filters. Additionally, given the clustering properties of multifractal ToI, the number of prefixes that need to be tracked (and thus filters needed) will be minimal compared to the number of addresses being tracked, so for many types of ToI, this approach is completely viable on hardware systems available today, and there is a tunable parameter that can be used to trade off the number of filters needed and the amount of uninteresting traffic that ends up being mistakenly tracked, which makes this approach viable for a wide range of capabilities. 7.2 Spectral Signature Kohler et al.[11] describe measuring the multifractal spectra of a laboratory network before and during a pair of worm attacks in the summer of 2001, and demonstrate that spectra varied significantly both when the worm was introduced, and when a revised version of the worm changed the spreading behavior from targeting random IPs to attempting to spread by stochastically selecting IPs from similar prefixes. This indicates that there may be some value in actually tracking 39 the multifractal spectra of addresses on a network in real time: it provides a decent set of aggregates that appear mostly time-invariant[18] on any scale between seconds and days, but it is sensitive to changes in the structure of the traffic it is built from. A rapid shift in the spectra or deviation from historical norms may actually serve as a decent alarm bell indicating that something fundamental about who is using the network changed, for example maybe there is a DDoS attack or a worm, or maybe a misconfigured router is dumping packets on the network that don’t belong there. It is doubtful that the multifractal spectra is useful in its own right, but it correlates with something that every network operator cares deeply about: who is on the network? Further work in this direction might yield “spectral signatures” of different types of events, or even something as specific as a certain botnet. This idea is executed on by Barlow et al.[4], where they compute multifractal spectra of the internet background radiation, which could be construed as the signature of the background radiation, but much work remains in characterizing signatures and seeing if they are even distinct enough to draw meaning from, and further taxonomizing the signatures for different types of traffic (HTTP, UDP, etc.) to see if there is significance there. While our work does not actually compute the spectra, the partition function can be converted in to a multifractal spectra using the Legendre transform. 7.3 Implementation and Viability of Deployment to Dataplane We implement the method of moments technique as Rust program that computes the multiresolution quantities and structure function which emits the structure function values for a python program to run a linear regression on. The entire computation takes under a second for 10M IPs on a relatively old Intel Xeon E5-2650v4 while using a single thread, and the complexity is linear with respect to 40 the number of input IPs. The implementation can be trivially multithreaded using a thread safe hashmap as the backing structure. IP addresses are extracted from netflow and PCAP data using a pair of custom parsers written in Rust. The dataplane side of the computation just requires collecting a list of all source or destination IP addresses (depending on the analysis), which is one of the simplest possible operations to run on modern dataplanes. Examination of Table 1 shows that even relatively large and long datasets do not accumulate more than 10M IP addresses, and in the common windowed telemetry system, the number of IPs is very manageable. As computation of the structure and partition functions in software is relatively quick, averaging 10ms of computation time on the mentioned CPU for 100K IPs, the door to actually running the structure and partition function as part of a realtime dataplane telemetry system is wide open. Additionally, the multiresolution quantities and structure function steps are extremely amenable to being run on a stream processor as generating the base multiresolution quantity from a list of addresses is a single reduce, and each subsequent multiresolution quantity is a map and reduce on a dataset half as large. Computing Z(λ′, q) is a single map and sum per moment. The linear regression is a simple operation on a fixed set of values numbering less than 1000. We believe that these requirements are extremely modest for modern dataplane telemetry deployments, and the benefits outlined in § 7.1 absolutely justify the processing cost. 41 CHAPTER VIII DISCUSSION AND OUTLOOK In this chapter we will first discuss some of the limitations of the method of moments, particularly as the input dataset grows to be large. Then we will outline an initial approach for generalizing our cascade model to work in both the address space and time dimensions, as well as discuss future work in characterizing the physical phenomenon that creates the observed multifractal structure. 8.1 Limitations As the method of moments operates over a discrete space, there is a hard limit to how small the structures that can be examined are (i.e. the minimum structure size is 1 IP). Consequently, as the number of input IPs grows very large, there is a sort of “saturation”, where the input space fills up with enough addresses that it is no longer possible to tell where one cluster ends and another begins. We control for saturation by only considering the first 100K IPs in each dataset, which was selected because (i) it was a round number, (ii) it was the largest round number every dataset would be able to provide for, and (iii) it was found that the structure and partition graphs at 100K IPs for each dataset were qualitatively very similar to those for every IP in the same dataset. This is a nonissue in continuous domains as it is possible to increase resolution endlessly, but in the discrete space we operate in, the size of the input actually has an impact on the result, even if the underlying structure is identical. We used the cascade model to generate 1M, and 10M, and 42M IPs (selecting 42M as it is ≈ 1% of the IPv4 space), all with beta parameters a = 2 and b = 2. The structure functions can be seen in Figure 24, and the partition functions in Figure 25. The computed structure and partition functions for the 1M sample are similar to the 100K sample seen in Figure 15, but 42 those are very different from the 10M and 42M sample both of which show clear perturbances in their structure function and decay of multifractal properties in the progressively more linear partition function. We postulate that the number of inputs that causes “saturation” is highly dependent on the fractal dimension of the data, but leave exploring these properties as future work. 8.2 Discussion The work laid out here is limited in that we only set out to examine multifractal properties in the spatial dimension, but there is a large existing body of work establishing that internet traffic shows self similarity in the temporal axis as well. Future efforts will likely need to unify these understandings, particularly if they intend to build a full-fledged synthetic dataset generator, rather than something that just generates sets of IP addresses. One simple approach for this could likely consist of generating a set of addresses using our earlier described approach, allocating packet counts to each source in an independent and identically distributed (iid) manner (using some heavy-tailed distribution), and then using some scheme (e.g., on/off behavior) to distribute each addresses’ allocation of packets over time. A more elaborate but also more realistic approach would be to generate a genuine spatial-temporal traffic dynamics by avoiding the above iid assumption and replacing it by a packet allocation process that is derived from (a) 1M IPs (b) 10M IPs (c) 42M IPs Figure 24. Structure functions for cascade model data with a = 2 and b = 2 for different numbers of IPs 43 (a) 1M IPs (b) 10M IPs (c) 42M IPs Figure 25. Partition functions for cascade model data with a = 2 and b = 2 for different numbers of IPs the same kind of multifractal analysis that is described in this thesis but where the measure µ counts the number of encountered packets per observed IP address rather than the number of observed IP addresses. Similar to the first approach, this approach also requires a scheme to distribute each IP’s allocation of packets over time, but the allocated packet counts in this case are now tightly correlated with the multifractal scaling behavior that we would expect to observe for the measure that counts the number of observed packets per observed IP and that could also be synthetically generated by means of an appropriately defined cascade construction. Characterizing the physical explanation for the driving force behind the cluster behavior is an open problem. We pitched one possible interpretation here § 6.1.1, but all the hard labor of digging up statistics on who owns what allocations, where they have been allocated, and how many hosts are using these allocations is left as future work. In general, we believe that the multifractal construction of the internet is caused by features and practices inherent to subnets themselves, as entities are provided some allocation of address space, which they then allocate portions of to sub-entities or clients. This is handled recursively until all addresses are distributed. As all entities that assigned addresses are not equally “active”, this gives rise to a clustering behavior. The geographic interpretation discussed in § 6.1.1 is just one dimension that the clustering can manifest in, but 44 many others are possible and likely (e.g. a cloud provider may have a single subnet assigned to“popular” servers, so these servers would be clustered in this subnet). Furthermore, there is a nuanced discussion to be had about how the multifractal effects are observed in samples of internet traffic, but since there isn’t a big trace of all the traffic on the internet to look at, what we characterized here may not actually be a property of the internet itself but rather a behavior that rises out of how clients of the network environments that we analyzed contact the rest of the world. This opens a dizzying number of questions about how different usage patterns and user bases shape the fractal structure, and we leave answering these as future work. 45 CHAPTER IX CONCLUSION In this thesis we collected a repository of traffic of interest from a range of real-world environments and measured the multifractal properties of each dataset using the method of moments technique. We ran a bulk evaluation that found that every dataset was consistent with exhibiting multifractal scaling behavior that manifests in a pronounced cluster-within-cluster property. We then offered a possible explanation for how this clustering property may emerge, and demonstrated a parametric cascade model IP address generator that was able to generate sets of addresses that matched the multifractal scaling properties of the datasets. Finally we proposed several concrete use cases for multifractal properties and discuss the viability of computing it in a live network setting. 46 REFERENCES CITED [1] The CAIDA UCSD anonymized Internet traces dataset - 2019. https://www.caida.org/data/monitors/passive-equinix-nyc.xml, 2019. [2] IANA IPv4 address space registry, February 2022. [3] Albert Gran Alcoz, Martin Strohmeier, Vincent Lenders, and Laurent Vanbever. Aggregate-based congestion control for pulse-wave ddos defense. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 693–706, 2022. [4] Paul Barford, Rob Nowak, Rebecca Willett, and Vinod Yegneswaran. Toward a model for source address of internet background radiation. In In Proceedings of Passive and Active Measurement Conference (PAM 2006, 2006. [5] Z. Chen, C. Ji, and P. Barford. Spatial-temporal characteristics of internet malicious sources. In IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, pages 2306–2314, 2008. [6] Michael Patrick Collins, Timothy J. Shimeall, Sid Faber, Jeff Janies, Rhiannon Weaver, Markus De Shon, and Joseph (Jay) B. Kadane. Using uncleanliness to predict future botnet addresses. In ACM/SIGCOMM Internet Measurement Conference, 2007. [7] Carl JG Evertsz and Benoit B Mandelbrot. Multifractal measures. Chaos and fractals, 1992:921–953, 1992. [8] Romain Fontugne, Pierre Borgnat, Patrice Abry, and Kensuke Fukuda. MAWILab: Combining Diverse Anomaly Detectors for Automated Anomaly Labeling and Performance Benchmarking. In Proceedings of the ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2010. [9] Elise P. Gerich. Guidelines for Management of IP Address Space. RFC 1466, May 1993. [10] Arpit Gupta, Rob Harrison, Marco Canini, Nick Feamster, Jennifer Rexford, and Walter Willinger. Sonata: Query-driven streaming network telemetry. In Proceedings of the conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pages 357–371, 2018. 47 [11] Eddie Kohler, Jinyang Li, Vern Paxson, and Scott Shenker. Observed structure of addresses in ip traffic. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, IMW ’02, page 253–266, New York, NY, USA, 2002. Association for Computing Machinery. [12] Masoud Moshref, Minlan Yu, Ramesh Govindan, and Amin Vahdat. DREAM: Dynamic resource allocation for software-defined measurement. ACM SIGCOMM Computer Communication Review, 44(4):419–430, 2014. [13] Masoud Moshref, Minlan Yu, Ramesh Govindan, and Amin Vahdat. SCREAM: Sketch resource allocation for software-defined measurement. In Proceedings of the ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT), page 14, 2015. [14] Heinz-Otto Peitgen, Hartmut Jürgens, and Dietmar Saupe. Chaos and fractals - new frontiers of science. 1992. [15] Rudolf Riedi. An introduction to multifractals. 01 1999. [16] Hadrien Salat, Roberto Murcio, and Elsa Arcaute. Multifractal methodology. Physica A: Statistical Mechanics and its Applications, 473:467–487, 2017. [17] Fabio Soldo, Karim Eldefrawy, Athina Markopoulou, Balachander Krishnamurthy, and Jacobus Merwe. Filtering sources of unwanted traffic. pages 199 – 208, 01 2008. [18] Megan Walter. On the multifractal structure of observed internet addresses, 6 2022. Available at https://www.cs.uoregon.edu/Reports/UG-202206-Walter.pdf. [19] Jun Xu, Jinliang Fan, Mostafa Ammar, and Sue B. Moon. On the design and performance of prefix-preserving ip traffic trace anonymization. In Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, IMW ’01, page 263–266, New York, NY, USA, 2001. Association for Computing Machinery. [20] Lihua Yuan, Chen-Nee Chuah, and Prasant Mohapatra. ProgME: towards programmable network measurement. IEEE/ACM Transactions on Networking, 19(1):115–128, 2011. 48