FROM ISOTOPES AND WHOLE ROCK GEOCHEMISTRY TO MACHINE LEARNING: DIVING INTO THE PLUMBING SYSTEM OF LARGE MAFIC ERUPTIONS USING A DIVERSE GEOCHEMICAL TOOLSET TO INVESTIGATE MAGMATIC PROCESSES by RACHEL LYNN HAMPTON A DISSERTATION Presented to the Department of Earth Sciences and the Division of Graduate Studies of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2022 DISSERTATION APPROVAL PAGE Student: Rachel Lynn Hampton Title: From Isotopes and Whole Rock Geochemistry to Machine Learning: Diving into the Plumbing System of Large Mafic Eruptions Using a Diverse Geochemical Toolset to Investigate Magmatic Processes This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Earth Sciences by: Dr. Paul Wallace Chairperson Dr. Leif Karlstrom Advisor Dr. Paul Wallace Core Member Dr. Meredith Townsend Core Member Dr. Amy Lobben Institutional Representative and Krista Chronister Vice Provost for Graduate Studies Original approval signatures are on file with the University of Oregon Division of Graduate Studies. Degree awarded December 2022 ii © 2022 Rachel Lynn Hampton This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs (United States) License. iii DISSERTATION ABSTRACT Rachel Lynn Hampton Doctor of Philosophy Earth Sciences December 2022 Title: From Isotopes and Whole Rock Geochemistry to Machine Learning: Diving into the Plumbing System of Large Mafic Eruptions Using a Diverse Geochemical Toolset to Investigate Magmatic Processes This dissertation brings together a variety of tools to investigate the processes that occur within the plumbing of mafic volcanic systems. In Chapter II we use a combined isotope, trace element, and thermal modeling approach to investigate the production of rhyolitic magmas at the active Krafla Volcano in Iceland which lies directly on the Mid- Atlantic Ridge. There we found evidence for differentiation of basalts to rhyolites through a combined partial melting of the hydrated basaltic crust followed by subsequent late-stage fractional crystallization to produce highly evolved rhyolitic magmas. In Chapter III we turn our attention to a larger extinct mafic system, the Columbia River Flood Basalts. In this chapter we compile a database of whole-rock geochemical data sampled from the CRB and use both supervised and unsupervised machine learning to quantify and interpret groupings and variation in the dataset. We evaluate the relationships between the known stratigraphic groups and then build a classification model that quantitatively recognizes the chemical variation that defines the existing stratigraphic groups. We find that the geochemical variation and relationships within the stratigraphy are indicative of common processes of recharge, assimilation and fractional crystallization. In Chapter IV we apply this stratigraphic model to sort unknown samples iv of intrusive dike whole rock geochemistry into the CRB stratigraphy. These samples from the Wallowa Mtns and specifically from the Maxwell Lake area, provide further insight into the plumbing system of the CRB using a combination of field methods, machine learning, and comparison to other studies to investigate variation along strike within a dike complex. In these three chapters we both find new evidence for processes occurring in these mafic systems and show the efficacy of these machine learning techniques when applied to whole rock geochemical data from volcanic systems. This dissertation includes previously published coauthored material. v ACKNOWLEDGMENTS My sincere gratitude to all those who helped me to complete this dissertation and the research within. Thank you, first, to my adviser, Leif Karlstrom, who allowed me to follow my curiosity and explore geochemistry in a new way. Thinking beyond the normal bounds of our scientific fields is not always easy, so I thank Dr. Karlstrom for the support, marathon brainstorming sessions, and the encouragement to pursue my own scientific ideas. Thank you also to those that helped to create the datasets in this work, your time and expertise was invaluable, Dr. Scott Hughes, Dr. John Wolff, Dr. Stephen Reidel and Dr. Ray Wells. I also want to express my sincere gratitude to all my field assistants, who helped me carry very heavy rocks over ludicrously steep terrain. Every blister you suffered to collect these samples was valued and contributed to the scientific results of this work. I especially want to thank Felicia Cummings, Edward Vinis, and Alicia Bixby for following me into the geologic unknown with infinite positivity. I also thank my committee and my mentors for encouraging me to make it to the finish line. The investigations within were financially supported by several grants: we thank the NSF, NSF 1822977 grant for support and RNF grant 19-17-00241 for zircon extraction and other support, University of Oregon and National Geographic society for travel support, Stanford-USGS facility for U-Th analyses, GSA Lipman fund, and GSA Student Research Grants for student analytical support and Earth Science Department for their financial support. And finally, to my friends, my fiancé, and my family, I couldn’t have made it without you, thank you for everything you have done for me. vi For my parents Alicia and Keith. Thank you for teaching me to never stop being curious about the wonders of this world and for unwittingly molding me into a geologist. vii TABLE OF CONTENTS Chapter Page I. INTRODUCTION .......................................................................................................1 II. A MICROANALYTICAL OXYGEN ISOTOPIC AND U-TH GEOCHRONOLOGIC INVESTIGATION OF RHYOLITE PETROGENESIS AT THE KRAFLA CENTRAL VOCLANO, ICELAND ..................................................................3 1. Introduction ...............................................................................................................3 1.1 Geology of Krafla ..............................................................................................5 1.2 Previous Models for Rhyolite Petrogenesis at Krafla .......................................8 2. Methods...................................................................................................................10 2.1 Analytical Methods..........................................................................................10 2.2 Thermal and Chemical Modeling ....................................................................12 2.3 Supplemental Methods Material ......................................................................14 2.3.1 Major Phase and Groundmass Oxygen Isotope Analysis ........................14 2.3.2 Crystal Size Distributions.........................................................................15 2.3.3 Thermal Modeling ....................................................................................15 2.3.4 Compositional Modeling ..........................................................................18 2.3.5 Sample Location Information...................................................................19 2.3.6 Zircon Data ...............................................................................................21 3. Results .....................................................................................................................27 3.1 Zircon U-Th and δ18O Analyses ......................................................................27 3.2 Trace Elements in Zircons ...............................................................................29 3.3 Zircon Crystal Size Distributions (CSD) .........................................................31 3.4 Oxygen Isotope Measurements of Pyroxene, Plagioclase, and Groundmass ..32 3.5 Thermal Modeling Results ..............................................................................35 3.6 Magma Chamber Simulator Results ................................................................37 4. Discussion ...............................................................................................................40 viii 4.2. Modeling Perspectives on Petrogenesis of the Rhyolite Domes ........................42 4.3. Generation of Other Low δ18O Magmas at Krafla ..............................................47 5. Conclusions .............................................................................................................48 6. Bridge to Chapter 2 .................................................................................................49 7. References Cited .....................................................................................................50 III. CRYPTIC WHOLE-ROCK GEOCHEMICAL PATTERNS IN THE STRATIGRAPHY OF THE COLUMBIA RIVER FLOOD BASALTS REVEALED BY MACHINE LEARNING ...................................................................................................57 1 Intro .........................................................................................................................57 1.1 The Columbia River Basalts ............................................................................61 2 Methods...................................................................................................................64 2.1 Columbia River Flood Basalts (CRB) Dataset ................................................65 2.2 Geochemical Data Analysis.............................................................................70 2.3 Workflow .........................................................................................................73 2.3.1 Synthetic Data for Model Testing ............................................................73 2.3.2 Structural characteristics ..........................................................................74 2.3.2.1 Principal Component Analysis (PCA) ................................................78 2.3.3 Preprocessing ...........................................................................................79 2.3.3.1 Removing Outliers ..............................................................................79 2.3.3.2 Box-Cox Power Transform .................................................................80 2.3.3.3 Ratio Combinatorics ............................................................................80 2.3.3.4 Balanced Sample Classes ....................................................................81 2.3.4 Supervised Classification via Multinomial Logistic Regression .............82 2.3.5 Unsupervised Clustering using a Gaussian Mixture Model .....................88 2.3.5.1 Measuring success ...............................................................................91 3 Results .....................................................................................................................92 3.1 Statistical Structure of CRB Chemistry ...........................................................92 ix 3.2 The Grande Ronde Formation .........................................................................98 3.3 Robustness of CRB Stratigraphic Labels using Supervised Classification ...101 3.4 Unsupervised Clustering to Elucidate Common Petrologic Pathways ..........106 3.4.1 Gaussian Mixture Model (GMM) Cluster Number Analysis.................107 3.4.2 Grande Ronde Petrologic Affinities .......................................................111 4 Discussion .............................................................................................................113 4.1 Implications of supervised classification and unsupervised clustering for the Stratigraphy of the CRB lavas .................................................................................114 4.1.1 Petrogenesis of the Grande Ronde Basalts .............................................118 4.2 Implications for time-evolving crustal storage zones ....................................124 5 Conclusion ............................................................................................................127 6 Bridge to Ch. 3 ......................................................................................................128 7 References Cited ...................................................................................................129 IV. CHEMICAL AND STRUCTURAL VARIABILITY OF COLUMBIA RIVER FLOOD BASALT DIKES, WITH A FOCUS ON THE MAXWELL LAKE DIKE COMPLEX IN THE WALLOWA MOUNTAINS OF OREGON..................................139 1 Intro .......................................................................................................................139 1.1 Geochemistry of the Chief Joseph Dike Swarm ............................................143 1.2 The Maxwell Lake Dike Complex ................................................................146 2 Methods.................................................................................................................148 2.1 Geochemical sampling and analysis ..............................................................148 2.2 Oxygen Isotope Analyses ..............................................................................150 2.3 Field Observations and Structural Analysis ..................................................151 2.4 Summary of Statistical Methods and Classification ......................................151 3 Results ...................................................................................................................153 3.1 CJDS Supervised Classification ....................................................................154 x 3.2 Classifying the Chemistry of the Lakes Basin Study Area ...........................158 3.3 Classifying the Chemistry of the MLDC .......................................................162 3.4 Comparison Between Intrusive and Extrusive CRB Data .............................169 3.5 Oxygen Isotope Analyses from the MLDC and Lakes Basin Study Areas ...171 3.6 Observations of Assimilation and Excavation...............................................177 4 Discussion .............................................................................................................179 4.1 Hydrothermal Alteration in the CRB .............................................................180 4.2 Geochemical Variation Observed in the CJDS .............................................182 4.3 Dike Segment Variation Along Strike in the MLDC ....................................184 4.4 Evidence for Assimilation in the Dikes and Implications for the Petrogenesis of the Main Phase CRB............................................................................................191 5 Conclusion ............................................................................................................194 6 References Cited ...................................................................................................195 xi LIST OF FIGURES Figure Page Figure 1 Map of Krafla. Simplified map of the Krafla central volcano and caldera, highlighting the units sampled in this study (modified from Jónasson, 1994). The outset map shows the location of the Krafla central volcano (red star) in the northeast of Iceland along the Northern Volcanic Zone. ......................................................................................5 Figure 2 The Phase 2 Rhyolite Domes. A) Jörundur, located SE of the caldera margin (see Fig.1 for dome locations), with basaltic lava flows from the fissure system in the foreground. Outset shows columnar jointing and blocky fracturing patterns formed during subglacial emplacement. Some localities along the base of the dome show glassy textures. B) Hlíðarfjall, located SW of the caldera margin. C) Gæsafjallarani with outsets showing sample location of KRF-21 and blocky columnar jointing. All samples from these domes yielded zircons.................................................................................................6 Figure 3 Selected Units Sampled. A) Obsidian ridge (Hrafntinnuhryggur). Two central textures are present across the ridge, B) a perlitic texture as well as C) a glassy texture. D) Ridge composed of the Halarauður ignimbrite, erupted during a major caldera collapse event. At both sample locations the tuff has a high volume of lithic fragments and sparse crystals. E) Vití crater, the site of Krafla’s youngest rhyolitic eruption. ...........7 Figure 4 Previous Models for Rhyolite Petrogenesis at Krafla. Schematic representations of previous hypotheses for rhyolite production at Krafla. A) Model dominated by segregation of partial melt (Jónasson, 1994). Rhyolitic, low δ18O partial melt forms around the margins of large basaltic magma bodies. This partial melt segregates through cracks and melt channels to form rhyolitic magma bodies. B) Model dominated by fractional crystallization and simultaneous assimilation (AFC) (Nicholson et al., 1991; Charreteur et al., 2013). Partial melting of the low δ18O hydrothermally altered wallrock is induced by the intrusion and crystallization of basaltic magma and assimilated into the fractionating and magma chamber; alternatively, bulk crust may be assimilated by stoping. .........................................................................................................9 Figure 5 Zircon Cathodoluminescence Images Zircons from the post-caldera domes imaged by cathodoluminescence (CL) show two morphologies: 1) an equant sector zoned morphology and 2) an elongated, splintered morphology. An example of each type is shown here for each dome. ................................................................................................11 Figure 6 δ18O vs. time for the zircon grains extracted from the three rhyolite domes. Error bars are 2σ. The data show no pattern in δ18O over time. .......................................14 Figure 7 Zircon populations selected for δ18O analysis at the University of Alberta. ......15 Figure 8 Zircon Isochron. Zircon isochron diagram showing the U-Th model ages of the zircon populations in the three domes. Error ellipses show 2σ error. Best fit lines show the estimated isochron model age for each dome. Jörundur 238U/232Th=0.3068 ± 0.01, Hlíðarfjall 238U/232Th=0.3059 ± 0.01, Gæsafjallarani 238U/232Th=0.3051 ± 0.01. n = xii Number of zircons analyzed. MSWD = Mean square weighted deviation. Red isochron labels are reported in thousands of years (ka). ...................................................................27 Figure 9 δ18O of Dome Zircons. δ18O (‰) presented for the three rhyolite domes zircon populations (for each dome, each vertical array consists of multiple spot analyses from an individual grain). The expected equilibrium zircon δ18O based on a host melt value of +3.5‰ and 18O(melt-zircon)= 2‰, is shown (assuming a magmatic temperature of 900°C; e.g. Bindeman, 2008), as well as average values for the respective zircon populations. ........................................................................................................................28 Figure 10 Zircon Trace Elements. A) Hf/Yb versus Th/U trace element ratios measured by SHRIMP-RG on individual zircon grains. Shapes represent the dome the zircon was extracted from (see legend at bottom of figure). Colors indicate the measured δ18O (Red = +1.5 to +2‰, Blue = +1 to +1.5‰, and Green = +0.5 to +1‰). B) Zircon Th vs. U concentrations (no oxygen isotope information shown on B and C). C) Zircon Yb vs. Hf concentrations. B) and C) are accompanied by a fractional crystallization (FC) model using the equation Cl/Co=(1-F)(D-1), where F is the fraction of material crystallized and D is the bulk partition coefficient for each element. Partition coefficients for FC are from Melnik and Bindeman (2018), assuming temperatures of ~ 850-900°C : KdHf = 1200, KdYb = 11, KdU = 46, KdTh =14. Percentages indicate the amount of crystallization that has occurred at each composition. .....................................................................................29 Figure 11 Zircon Crystal Size Distribution Natural logarithm of the population density for zircons of different lengths (longest axis, in μm) from each dome. There is a linear trend between elongation and size (not shown here), implying that there is no bias towards sampling large zircons. The gradients (population density/µm) of the downward- sloping parts of the curves are used to estimate residence times. ......................................31 Figure 12 δ18O of All Units and All Phases δ18O of groundmass, plagioclase, pyroxene (measured by laser fluorination) in all analyzed units at Krafla. Average MORB basalt and mantle zircon values are from Valley (1994). .............................................................34 Figure 13 Thermal Model Time-Temperature Paths and Partial Melt Volumes. A) Temperature profile of the crust at time t=0,after 500 kyr of thermal priming has created an elevated geotherm. B) The final temperature profile after 50 kyr of intrusions. C) Time-temperature paths at different depths within the thermal model run at the radius just outside of the intrusion (x=0.5 km). The zircon saturation temperature and the wet/dry basaltic solidus are also shown. D) Maximum melt fraction reached in the domain of the crust modeled and the volume of partial melt created in the crust calculated using cylindrical coordinates and assuming radial symmetry of the 2D space. Location of partial melt generated in the system can be evaluated by melt contours that store the volume of melt for each cell and each time step in the model. Near the magma chamber, melt fraction will increase while further away from the chamber melt fraction will decrease. Zircon saturation temperatures are calculated based on the whole rock compositions of the rhyolite domes according to the model of Boehnke et al. (2013). ....36 Figure 14 Magma Chamber Simulator (MCS) Results. Results of assimilation and fractional crystallization (AFC) runs on the magma chamber simulator, showing SiO2 content of the final magma, the magma volume remaining after AFC (% of original liquid volume), and the amount of partial melt added to the magma (% of total mass). For each xiii run, basaltic magma (Mývatn Fires composition of Thorarinsson, 1979) with T=1200°C is intruded into wallrock of specified composition (detailed on x-axis, with model pressures in bars given in parentheses). Calculations for dry (0.1 wt% H2O) basaltic magma were run at and 3000 bars using wallrock of identical composition. The dry andesite (0.1 wt.% H2O) wallrock composition is equivalent to the erupted andesite unit that we sampled inside the southern caldera margin. The basaltic wallrock composition with 1 wt.% H2O was taken from Spulber and Rutherford’s (1983) experiments on partial melting of Icelandic crust. The amphibolite composition is from Wolf and Wylie (1995) and has 5 wt.% H2O. ..........................................................................................................38 Figure 15 Magma Chamber Simulator Analysis. A) Wallrock partial melt compositions produced in selected MCS simulations for different wallrock compositions and pressures. Symbols indicate 5°C temperature cooling increments and the partial melt compositions produced by melting of the protolith indicated in the legend. Initial partial melts have the most evolved compositions, moving towards less evolved compositions at higher degrees of melting. None of the simulated partial melt compositions overlap with the observed compositions of the rhyolites, and most of the partial melt is dacitic rather than rhyolitic. B) Final magmatic δ18O values after AFC, calculated from MCS results with an isotopic mixing model assuming different average δ18O values for the crustal assimilant (-10‰, -5‰ and 0‰). Pressures of MCS simulations in bars are given in parentheses. C) The compositions of the partial melt created during MCS simulations in each scenario. None of the partial melt compositions overlap with observed compositions of the Krafla rhyolites (red square). ...................................................................................39 Figure 16 New Model for Petrogenesis of the Rhyolite Domes. Schematic representation for the multi-stage petrogenetic model proposed for the three rhyolite domes. Low-δ18O partial melt from a compositionally diverse and isotopically depleted crust is assimilated after intrusion of mafic magma heats the surrounding crust. This imparts the low δ18O signature and creates an intermediate to dacitic magma. Further differentiation to rhyolite occurs by fractional crystallization after removal of the magma to cooler crust where it cools below the zircon saturation temperature. ............................46 Figure 17 δ18O vs. SiO2 δ18O of glass or groundmass (this study) or bulk rock (Nicholson et al., 1991; Cooper et al., 2016) vs. SiO2 for Krafla products. ......................47 Figure 1 The extent of Columbia River Basalt (CRB) lava flow coverage on the surface (dashed red line) and documented exposures of dike segments exposed throughout the high lava plains (area of interest in Chapter 3). .................................................................60 Figure 2 From left to right, CRB stratigraphic Formation (left column) younging upward, magnetic polarity, Member classification, estimated volume of each member, the number of samples in the database, and the number of sources that data is drawn from. ..............61 Figure 3 Combined dip test versus coefficient of variation for 10,000 synthetic analyses. Three synthetic datasets are shown to demonstrate the structure of the data in each colored regime. Colors refer to HCV scores of synthetic data. .........................................77 Figure 4 Workflow for supervised machine learning analysis in this work, describing on the process involved in multinomial logistic regression from preprocessing through training, classification, and visualization. ..........................................................................84 xiv Figure 5 A) Example supervised classification for an individual sample from the CRB lava test dataset. The red bar represents the maximum probability, showing 97% confidence that the sample is related to the Wapshilla Ridge member, with negligible secondary probability (blue bar pointing to Grouse Creek). B) Histogram of samples categorized in the Grande Ronde Formation (left) and associated maximum probability of those categorizations (histogram on the right). Dashed black line is the mean of maximum secondary probability per category. ..................................................................86 Figure 6 Workflow for unsupervised machine learning analysis in this work, illustrating preprocessing, Gaussian Mixture Modeling, model assessment, and interpretation based on compositions and petrologic modeling. ........................................................................90 Figure 7 A) Element distributions in the full CRB database plotting on the regime diagram of Figure 3. B) The same analysis but with the interquartile range instead of the Coefficient of Variation (outliers have less of an effect) C and D) The same analysis as A and B but restricted to data from the Grande Ronde Formation. .......................................94 Figure 8 Principal Component Analysis of full CRB lava dataset. A) Explained variance per principal component dimension. B, C, and D) The first 6 principal components (87% of the variation). On the 3rd, 4th ,5th, 6th principal components only the element labels are shown for simplicity. The 1st and 2nd principal components include ratios. ................96 Figure 9 Statistics for two element distributions are shown for MgO and Ba. A and B) Count in each member versus the Kurtosis with the color indicating the difference from the expected Gaussian line in column 1, and the Gaussian limits of -3 to 3 shown by the dashed line. C and D) Same analysis but with the Skew instead and the bounds between - 1 and 1, which indicate near Gaussian distributions. E and F) IQR vs 2* Standard Deviation with the expected Gaussian line shown in grey and the difference from that expected as the colors. G and H) The same analysis as E and F, however outliers are removed from the data. ......................................................................................................97 Figure 10 PCA on the Grande Ronde Formation subset with outliers removed. A) The explained variation per feature dimension added. B and C) The eigenvectors for the first principal component in this analysis, in biplot form and just eigenvectors showing a contrast in elements that are incompatible (Na2O, K2O, P2O5, Ba, Rb, Sr, Zr, Y, Nb, Ga) and compatible (MgO, CaO, Al2O3, Ni, Cr, Sc, and Cu) in basalts. D) The biplot for the 3rd and 4th principal components. .....................................................................................101 Figure 11 Formation level MLR supervised learning on full CRB dataset. These normalized confusion matrices are read via rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. A perfect output would only have values on the diagonal. Right) Formation level test on full CRB dataset without outliers. ..................................103 Figure 12 Member level MLR supervised classification on full CRB dataset, preprocessed by ratio combinatorics and a Box-Cox power transform. Normalized confusion matrices are read in rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. A perfect output would only have values on the diagonal. The xv formations are highlighted with x and y axis labels (Member names) following Table 1. ..........................................................................................................................................105 Figure 13 MLR supervised classification of Grande Ronde data. Left) the member level test on the Grande Ronde data subset with outliers included. Right) the member level test on just Grande Ronde data with the outliers removed. Member names follow Table 1..106 Figure 14 Member level MLR supervised learning on full CRB dataset preprocessed with magnetic polarity included as a feature. Normalized confusion matrices are read via rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. X and y axis labels follow Table 1. .................................................................................................................107 Figure 15 A) Assessment of GMM clustering success via AIC and BIC scores for the entire CRB dataset and its 47 members. This plot elbows at 3 clusters and again at 11 clusters. B) AIC and BIC scores for the entire CRB dataset with outliers removed. The plot elbows most substantially at 5 clusters. ....................................................................109 Figure 16 A) Assessment of GMM clustering success via AIC and BIC scores for Grande Ronde data and its 24 associated members. The plot elbows at 5 clusters. B) AIC and BIC scores for the subset of just Grande Ronde data with the outliers removed from the group. The plot elbows at 3 and slightly again at 5 clusters. .....................................110 Figure 17 GMM output sorted by Formation of original samples, as a function of user- specified number of clusters. The dashed lines show the actual number of members within each labeled formation. The far right is 47 clusters, the number of Member labels in the CRB database. ........................................................................................................111 Figure 18 GMM cluster composition, assuming 3 clusters. A) The average composition of each cluster normalized by the average composition of the Imnaha formation. B) Bivariate plot of normalized Ba vs Rb compositional data, colored by cluster association. ..........................................................................................................................................113 Figure 19 Three cluster breakdown for the Grande Ronde of each sample colored by the stratigraphic member label on the y-axis vs the GMM cluster label on the x-axis. This is compared to the volume of the different Grande Ronde members on the right (Reidel et al., 2013), with comparisons of cluster 2 in red and comparisons of cluster 0 in blue. ...114 Figure 20 Unsupervised cluster output for the CRB database against the actual formation ID of each sample. Sample numbers in each category are plotted in the histograms. Formation ID labels on y axis follow Table 1. The cluster label on the x axis are meaningless. .....................................................................................................................116 Figure 21 Unsupervised labels vs the member ID in the original dataset. Sample numbers in each category plotted in the histograms. Actual member identification labels on y axis are according to Table 1, cluster labels on x axis are meaningless. Colors on y axis are formations. .......................................................................................................................117 Figure 22 Compositions of partial melt created during various AFC experiments of the geochemical Magma Chamber Simulator (Bohrson et al., 2014). The Idaho Batholith is run under two scenarios, one including recharge and one without recharge (Gaschnig et al., 2011). .........................................................................................................................120 xvi Figure 23 A) The calculated percent change in each element during the time steps in the model run that included each process respectively, for the scenarios involving intrusion of the Imnaha parent into the Idaho batholith wallrock, including just AFC and FC processes. B) percent change including instances of recharge by an Imnaha composition assumed parent magma. ...................................................................................................121 Figure 24 Summary of the percent change signals for each process (FC, AFC, and RFC) in Fig. 23. Arrows show the elements most affected by each process in the MCS simulations from Fig. 23 and therefore act as a guide for which elements would be most likely to record these processes in the melt chemistry. ....................................................122 Figure 1 Map of dike exposures (yellow) in the Wallowa Mountains and surrounding areas as well as geochemical sample localities (red diamonds). .....................................141 Figure 2 Map of geochemical sample locations in the Wallowa Mtn. colored by researcher who collected the sample (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Cahoon et al., 2020; Morriss et al., 2020; Petcovic & Grunder, 2003). ................145 Figure 3 Close up view of the Lostine River Valley, Lakes Basin, Hurricane Ck. and Wallowa Lake areas with geochemical sample locations colored by collector. Samples collected by Heather Petcovic, Ilya Bindeman and Leif Kalstrom were taken from the along strike sampling at the Maxwell Lake Dike Complex. ............................................146 Figure 4 A) Dike segment exposed along the road to the top of Big Lookout Mountain. Sample site for CRBD1907 is shown. B) Dike from the Lakes Basin area with a large zone of apparent excavation along the margin. The corner was filed with partial melt. C) The furthest north exposure in the MLDC presented in this study at the locality of 21- MaxR-09. The 2-3 m wide partial melt zone is shown in the welded margin of the dike. ..........................................................................................................................................150 Figure 5 Dike geochemical samples classified by Formation using supervised machine learning Multinomial Logistic Regression.......................................................................155 Figure 6 Probability histogram plots have several components: the left hand side (blue) shows number of samples per the member category on the y-axis, the right hand side shows the average maximum probability for the samples classified into each group (salmon bars) and the black dashed line that accompanies it shows the mean probability of other possible category assignments (the higher the probability of this line, the more uncertainty). A) Formation wise classification outcome over the entire CRB stratigraphy for the dike segment geochemical samples. The Majority of the dikes belong to the Grande Ronde Formation. B) Member-wise classification model over the entire CRB stratigraphy and 197 dike samples. The majority of dikes classify as Sentinel Bluffs, Wapshilla, and Teepee Butte. ..........................................................................................157 Figure 7 Test data mode applied to just the Grande Ronde samples and a model that just classifies over the Grande Ronde stratigraphy. A) Supervised classification model for samples classified in the Grande Ronde Formation to divide them into members. The model is panel A includes outliers. B) Supervised classification model over the Grande Ronde members with outliers removed from the extrusive dataset. ................................158 Figure 8 Geochemical sample locations colored by Formation in the Lakes Basin and Maxwell Lake Study areas. ..............................................................................................160 xvii Figure 9 Lakes Basin geochemical samples colored by Formation. ...............................161 Figure 10 A) Confusion matrix for model to classify MLDC geochemical samples that only compared the samples to known reversed polarity Grande Ronde extrusive lava fields (Member ID Labels for Reversed Grande Ronde Members. 6: Buckhorn Springs; 7: Hunter and Birch Creek; 8: Teepee Butte; 9: Rogersburg; 10: Skeleton Creek; 11: Center Creek, 12: Kendrik Grade; 19: Mount Horrible; 20: Wapshilla Ridge; 21: Grouse Creek; 22: Meyer Ridge) B) Outcome of that classification model on the samples of the MLDC showing the number in each category on the left and the average maximum probabilities for those classifications on the right. ...............................................................................166 Figure 11 Circular barplot example of a classification with more uncertainty for sample CRB-58, collected by Dr. Ilya Bindeman (I. N. Bindeman et al., 2020) in the MLDC. .166 Figure 12 Circular barplots of probabilities from the Supervised Classification for the Maxwell A (MLR-01-72 collected by Heather Petcovic) and Jackson A (20-MaxR-15h) dike segments in the MLDC. ...........................................................................................167 Figure 13 Bivariate distribution of Maxwell Samples against closest extrusive affinity ..........................................................................................................................................167 Figure 14 MLDC map of geochemical samples colored by Grande Ronde Member. ....169 Figure 15 Bivariate comparison between intrusive dike segment geochemical data and extrusive lava geochemical data. Lavas are colored by Formation ID (0: Picture Gorge; 1: Steens; 2: Imnaha; 3: Grande Ronde; 4: Wanapum; 5: Saddle Mountains) ....................170 Figure 16 A) Dike δ18O results vs Sc/Zr geochemistry for dike samples with both chemical analyses and oxygen isotope. B) Summary of oxygen isotope data from the Maxwell and Lakes Basin areas collected during this study organized by latitude. ........175 Figure 17 Map of oxygen isotope analyses for the MLDC. Diamonds indicate localities where oxygen isotope information was sampled, including the transect from Bindeman et al., 2020, with colored diamonds indicating new sample localities. The traces of the dike margins are shown in solid black lines along with the geochemical sample sites shown in the background in green. ..................................................................................................176 Figure 18 Oxygen Isotope analyses for the Lakes Basin. Geochemical samples are shown by the circles underneath the isotopic measurements. .....................................................177 Figure 19 Map of partial melt zone variability along the margin of the Jackson A dike segment. ...........................................................................................................................178 Figure 20 Composition of Jackson A dike segment (samples 20-MaxR-15g, h, j, n collected during this study and KD_2 and KD2 analyzed in previous studies (J. Biasi & Karlstrom, 2021; Karlstrom et al., 2019)) accompanied by a field photo showing the Jackson A segment (photo oriented looking South) and the location of the transect and oxygen isotope analyses by Bindeman et. al., (2020). The partial melt fins on the margins of the dike are visible. ......................................................................................................179 Figure 21 A) Margin of a dike segment from the Wallowa Mountains displaying an irregular contact with the host rock indicative of partial melting. B) Irregular contact with zone of partial melting from the MLDC taken on the ridge above the dike segment xviii originally sampled by Petcovic and Grunder (2003). C) Interaction between the tip of a dike segment in the Wallowa Mountains and the surrounding wallrock as smaller tendrils of the dike continue to intrude into the wallrock beyond the main outcrop of the segment. D) Quenched dike margin in the marble facies of the metasedimentary wallrock in Hurricane Ck.. Less partial melt occurs in this area. .......................................................180 Figure 22 Summary of along strike variation in all data at the MLDC. This also shows a comparison to other data types including paleomagnetic reset distance and thermochronology reset distance (J. Biasi & Karlstrom, 2021; Goughnour et al., 2021). Oxygen isotope, geochemical, and structural data was all collected during the work of this study. The bottom panel separately displays the elevation profile and the dike segment length as a function of along strike distance. The variation display considerable segmentation with each dike segment taking on individual characteristics. ...................187 Figure 23 Several examples of assimilated blocks, partially melting within the dikes. A) MLDC dike segment with assimilated tonalite block surrounded by partial melt halo. B) Partial melting and mechanical erosion of blocks into the dike. C) Example of assimilated blocks of marble in the Lakes Basin. ...............................................................................193 xix LIST OF TABLES Table Page CHAPTER II 1. Table A. Input parameters for Heat 2D first step priming run............................... 15 2. Table B. Input parameters for Heat 2D thermal model ......................................... 16 3. Table C. Sample Location Information ................................................................. 18 4. Table C-2. Zircon Data .......................................................................................... 20 5. Table 1. Results of laser fluorination δ18O analyses .............................................. 31 CHAPTER III 6. Table 1. Member and Formation ID Key ............................................................... 85 CHAPTER IV 7. Table 1. Probability outcomes for samples from MLDC ...................................... 164 8. Table 2. Oxygen Isotope Analyses ........................................................................ 174 9. Table 3. Along Strike Data Variation .................................................................... 186 xx I. INTRODUCTION This dissertation brings together a variety of geochemical analysis and interpretation tools, to investigate the processes that occur within the plumbing of mafic volcanic systems, and in particular flood basalt provinces. In Chapter II we use a combined isotope, trace element, and thermal modeling approach to investigate the production of rhyolitic magmas at the active Krafla Volcano in Iceland which lies directly on the Mid-Atlantic Ridge. We present laser fluorination oxygen isotope analyses of plagioclase, pyroxene, and groundmass from eight rhyolites and six selected basalts, as well as in situ oxygen isotope analyses and U-Th geochronology of zircons from three rhyolitic domes erupted around the caldera margins. Zircon U-Th geochronology for the rhyolite domes yields ages of 88.7 ± 9.9 ka for Jörundur, 83.3 ± 9.2 ka for Hlíðarfjall, and 85.5± 9.4 ka for Gæsafjallarani, some 20-30 ka after the eruption of the zoned rhyolite to basalt Halarauður ignimbrite during a major collapse of the Krafla caldera. We suggest that the domes represent a renewed episode of silicic magma production in the pre-heated crust. Oxygen isotope analyses of single and bulk plagioclase and pyroxene identify some instances of isotopic disequilibrium with groundmass (~3.5‰) reflecting assimilation of diverse low-δ18O crustal material. However, zircon is largely in equilibrium with groundmass analyses, suggesting it crystallized directly from low-δ18O magma. Zircon trace elements (Hf, Yb, Th, U) for all three domes show trends indicative of fractional crystallization. We suggest that petrogenesis of rhyolitic magma at Krafla requires at least two-steps: the δ18O of basaltic parental magmas are first lowered through assimilation of hydrothermally altered material (generated in the high temperature region in the crust surrounding the magma chamber) to produce low-δ18O mafic to intermediate magmas, which then ascend from magma generation zones into colder crust where they undergo further fractional crystallization at shallower depths. Our models suggest that prior hydrothermal alteration of the mafic crust greatly increases the volume of partial melt that can be produced and assimilated, and we thus suggest that long-lived hydrothermal systems may play an important role in further encouraging the production of larger volumes of rhyolitic magmas in basalt-dominated environments. Chapter II is co-authored with Ilya N. Bindeman, Richard A. Stern, Matthew A. Coble, and Shane M. Rooyakkers and was published in the Journal of Volcanology and Geothermal Research in 2021. In Chapter III we turn our attention to a larger extinct mafic system, the Columbia River Flood Basalts to try to unravel the magmatic processes occurring there without the presence of 1 the extensive crystalline cargo present in the Icelandic system. In this chapter we compile a database of whole-rock geochemical data sampled from the CRB and use both supervised and unsupervised machine learning to quantify and interpret groupings and variation in the dataset. We evaluate the relationships between the known stratigraphic groups using both descriptive statistical analysis as well as unsupervised machine learning. During this work we find that the geochemical variation and relationships within the stratigraphy are indicative of common processes of recharge, assimilation and fractional crystallization. We then build on this assessment of the data relationships and groupings to build a classification model to quantitatively recognize the chemical variation that defines the existing stratigraphic groups. This tool is not only effective for further evaluating the stratigraphy of the Columbia River Flood Basalts, but also proves as an important case study for validating the application of these methods to geochemical data. The effective classification tool provided evidence that these automated methods have a promising future in the field of geochemistry and petrology. Chapter III is being crafted into a manuscript intended to be submitted to the Geological Society of America Bulletin and co-authored with Leif Karlstrom. Chapter IV takes advantage of the unique exposures of the shallow crustal plumbing system in the Wallowa Mountains of northeastern Oregon to gain direct insight into the plumbing system for the Columbia River Flood Basalts studied in Chapter III. In this work we use both previously collected data on the exposures of dikes from the Wallowa Mountains, as well as new detailed sampling from dikes in and around the Wallowa Mountains. To sort this data points into the stratigraphy and understand their larger context, we apply the stratigraphic classification model from Chapter III to sort unknown samples of intrusive dike whole rock geochemistry into the CRB stratigraphy. These samples from the Wallowa Mountains and specifically from the Maxwell Lake area, provide further insight into the plumbing system of the CRB using a combination of field methods, machine learning, and comparison to other studies to investigate variation along strike within a dike complex. Together, the chapters of this dissertation represent a thorough investigation of the magmatic processes that drive mafic flood basalt eruptions using a wide range of geochemical tools from isotope geochemistry and field work to geochemical data and machine learning. 2 II. A MICROANALYTICAL OXYGEN ISOTOPIC AND U-TH GEOCHRONOLOGIC INVESTIGATION OF RHYOLITE PETROGENESIS AT THE KRAFLA CENTRAL VOCLANO, ICELAND From Hampton R.H., Bindeman I.N., Stern R.A., Coble M.A., Rooyakkers S.M. (2021) A microanalytical oxygen isotopic and U-Th geochronologic investigation and modeling of rhyolite petrogenesis at the Krafla Central Volcano, Iceland. Journal of Volcanology and Geothermal Research, 414. 1. Introduction Seated at the intersection of a subaerial mid-ocean ridge and upwelling mantle plume, Iceland is a natural laboratory for studying silicic magma generation in basalt-dominated extensional environments (Bindeman et al., 2012; Carley et al., 2020, 2014, Jónasson, 2007, 1994; Martin and Sigmarsson, 2010; Pope et al., 2013; Schattel et al., 2014). The Icelandic crust is dominantly basaltic, but also hosts up to 15% silicic intrusive and extrusive rocks (Maas et al., 1992; Marsh et al., 1991; Martin et al., 2008). This greater occurrence of silicic magmatism and volcanism in a basalt-dominated setting has important implications; Iceland’s silicic magmas present eruption hazards that can affect both Iceland and mainland Europe, but also provide a long-lived heat source for hydrothermal power, and may provide a modern analog for the generation of continental crust in early Earth-like environments (Bindeman et al., 2012; Maas et al., 1992; Marsh et al., 1991; Martin et al., 2008; Reimink et al., 2014). Despite these implications and decades of scientific interest, the petrogenesis of silicic magmas in Iceland and other predominantly basaltic settings remains highly debated. Many rocks in Iceland, including many of the erupted rhyolites and much of the crust that has been explored via drilling, are characterized by low-δ18O isotopic compositions. Crustal δ18O values are on average ~1-3‰ lower than normal mantle values in the Icelandic rift zones (Gautason and Muehlenbachs, 1998), and reach much lower values in areas of extensive hydrothermal alteration; altered rocks from Krafla boreholes have bulk δ18O as low as -12‰ (average upper crustal value = -7.7 ± 2.4‰; Hattori and Muehlenbachs, 1982). This strong isotopic contrast between the Icelandic crust and mantle-derived magmas provides a useful tool to constrain the role of magmatic-crustal interactions in Iceland, and is particularly useful in 3 probing the origins of the silicic magmas, which mainly occur around the central volcanoes where large hydrothermal systems are located (Hattori and Muehlenbachs, 1982; Nicholson et al., 1991; Pope et al., 2013). While it has been hypothesized that moderately low δ18O magmas originate from an anomalously low-δ18O component of the Icelandic mantle or plume (Maclennan et al., 2002; Winpenny and Maclennan, 2014), the values observed in volcanic rocks at Krafla are too low and variable to reflect such an anomaly alone, and must reflect interaction between crustal rocks and meteoric water in some fashion (Bindeman et al., 2012, 2008; Eichelberger, 2020; Eichelberger et al., 2020). Zircons, though not always present in Icelandic rhyolites, provide an additional tool for understanding silicic magma petrogenesis and, in particular, the origins of the low δ18O magmas in Iceland (Banik et al., 2018; Bindeman et al., 2012; Bindeman and Melnik, 2016; Carley et al., 2020, 2014, 2011; Gurenko et al., 2015; Reimink et al., 2014). Because of the high closure temperature, zircon grains hold a robust record of the δ18O signature imprinted on them during crystallization (Carley et al., 2020; Gurenko et al., 2015; Valley et al., 1994; Watson and Harrison, 1983). Crystal size distributions, and trace element analyses further increase the utility of detailed zircon investigation. In this study, we utilize detailed zircon analyses, including oxygen isotope measurements, U- Th dating, trace element, and crystal size distribution analysis, to gain insights on the petrogenesis of rhyolite at Krafla volcano in the Northern Volcanic Zone of Iceland. We combine this zircon work with laser-fluorination δ18O analysis of plagioclase, clinopyroxene and host groundmass from Krafla volcanic products ranging in composition from basalt to rhyolite. These data are then paired with thermal and chemical modeling to explore the relative importance of fractional crystallization versus partial melting and assimilation processes in the generation of Krafla rhyolites. We conclude by proposing a new conceptual model for the petrogenesis of the rhyolites at Krafla, consistent with thermal, chemical, and isotopic constraints, that reconciles the previous controversy surrounding their origin and may also be applicable in other Icelandic systems. 4 1.1 Geology of Krafla Figure 1 Map of Krafla. Simplified map of the Krafla central volcano and caldera, highlighting the units sampled in this study (modified from Jónasson, 1994). The outset map shows the location of the Krafla central volcano (red star) in the northeast of Iceland along the Northern Volcanic Zone. The Krafla volcanic system consists of a central volcano, hosting a ~8 x 10 km ellipsoidal caldera, which is bisected by a 100 km-long fissure system (Sæmundsson, 1991; Jónasson, 1994). This system is ideal for investigating the production of rhyolite in a basaltic environment for several reasons. First, Krafla has a ca. 240 kyr record of concurrent silicic and mafic volcanism (based on 40Ar/39Ar dating of a dacite recognized as its oldest silicic product; Sæmundsson and Pringle, 2000), providing an opportunity to study a variety of rhyolites and their relationship with temporally- and spatially-related basaltic magmas (Jónasson, 1994; Nicholson et al., 1991, Sæmundsson, 1991). Second, Krafla hosts a large hydrothermal system where meteoric water, isotopically depleted to values between -10 and -13‰ due to the high 5 latitudes and cold climate of Iceland, has infiltrated the heated crust (Gautason and Muehlenbachs, 1998; Hattori and Muehlenbachs, 1982; Pope et al., 2014, 2013; Zakharov et al., 2019). This has resulted in extensive hydrothermal alteration of the crust to isotopically depleted (low δ18O) values (-3 to -12‰) (Eichelberger et al., 2020; Hattori and Muehlenbachs, 1982; Pope et al., 2013) that are in significant contrast to the mantle-derived basalts (+5.5‰) that feed the magmatic system. This contrast provides excellent leverage to study the interactions between the intruding magmas and surrounding mafic crust, which are otherwise compositionally similar (Bindeman, 2008; Hattori and Muehlenbachs, 1982; Pope et al., 2014; Sveinbjornsdóttir et al., 2015). Figure 2 The Phase 2 Rhyolite Domes. A) Jörundur, located SE of the caldera margin (see Fig.1 for dome locations), with basaltic lava flows from the fissure system in the foreground. Outset shows columnar jointing and blocky fracturing patterns formed during subglacial emplacement. Some localities along the base of the dome show glassy textures. B) Hlíðarfjall, located SW of the caldera margin. C) Gæsafjallarani with outsets showing sample location of KRF-21 and blocky columnar jointing. All samples from these domes yielded zircons. Rhyolitic activity at Krafla was divided into three phases by Jónasson (1994) (Fig. 1, 2, and 3). The first, phase 1, involved emplacement of a small, poorly exposed rhyolite dome near 6 the southern margin of the Hágöng plateau at 190 ka, and subsequent eruption of the mixed basalt-rhyolite Halarauður ignimbrite at ca. 110-115 ka (40Ar/39Ar dates from Sæmundsson and Pringle, 2000; Fig. 1 and 3D). The Halarauður eruption is the largest known eruption of Krafla (total 7 ± 6 km3 of magma) and was linked with major caldera collapse (Calderone et al., 1990; Rooyakkers et al., 2020). Phase 2 occurred during the last glacial period and involved the emplacement of three subglacial rhyolite domes around the caldera margin. These domes, with a cumulative volume of ~0.7 km3, are believed to have exploited caldera ring fractures. From the southeast clockwise to the northwest they are Jörundur, Hlíðarfjall, and Gæsafjallarani (Figs. 1 and 2). Previous 40Ar/39Ar geochronology provides age constraints for Jörundur and Gæsafjallarani of 85-90 ka and 83-85 ka, respectively (Sæmundsson and Pringle, 2000). Phase 3 began at ca. 24 ka with the subglacial rhyolite fissure eruption (volume <0.05 km3) that formed the obsidian ridge, Hrafntinnuhryggur, in the southeast section of the caldera (Jónasson, 1994; Figure 3 Selected Units Sampled. A) Obsidian ridge (Hrafntinnuhryggur). Two central textures are present across the ridge, B) a perlitic texture as well as C) a glassy texture. D) Ridge composed of the Halarauður ignimbrite, erupted during a major caldera collapse event. At both sample locations the tuff has a high volume of lithic fragments and sparse crystals. E) Vití crater, the site of Krafla’s youngest rhyolitic eruption. Tuffen and Castro, 2009) (Figs. 1 and 3A, B, C). This was followed by eruption of the 9 ka Hveragil Tephra in the central region of the caldera (not part of this study), and a small 7 phreatomagmatic eruption in 1724 AD at the onset of the Mývatn Fires rifting episode, which formed Víti crater (Fig. 3E). The Víti eruption ejected scarce juvenile rhyolitic pumice and basaltic scoria, and xenolith blocks of granophyre and felsite (Grönvold, 1984; Jónasson, 1994; Thorarinsson, 1979) (Fig. 3E). Episodic outpouring of large basaltic lava flows continued from fissures to the west until 1729 (Grönvold, 1984). The most recent eruptions at Krafla occurred during the 1975-1984 Krafla Fires rifting episode and were exclusively basaltic. Numerous other effusive basaltic eruptions at Krafla occurred throughout the Holocene (Sæmundsson, 1991; Thorarinsson, 1979). In 2009, IDDP-1, the first well of the International Deep Drilling Project, drilled into the Krafla hydrothermal system and unintentionally intersected near liquidus (~930°C) rhyolite magma at 2.1 km depth (Elders et al., 2014, 2011; Zierenberg et al., 2013) (Fig. 1). The transition during drilling from the ~300°C hydrothermal regime to ~930°C occurred over ~30 m (Eichelberger, 2020). Triple oxygen isotope data suggest that this low-δ18O (+3.1‰) rhyolite was formed predominantly by assimilation of partial melts from hydrothermally altered and isotopically depleted basaltic crust into intruding basaltic magma (Zakharov et al., 2019), with altering waters tracing back to the last glacial (Pope et al. 2014; Zakharov et al. 2019). This magma provides a rare glimpse into the active upper crustal petrogenetic factory (Eichelberger et al., 2020). 1.2 Previous Models for Rhyolite Petrogenesis at Krafla Previous petrologic and geochemical studies at Krafla have resulted in two endmember models to explain the petrogenesis of its high-silica, low-δ18O rhyolites (Fig. 4): 1) A classic assimilation and fractional crystallization (AFC) model where the latent heat released during fractional crystallization (FC) of a basaltic magma chamber induces assimilation of partially melted (or bulk) basaltic crust surrounding the chamber (Fig. 4B) (Charreteur et al., 2013; Nicholson et al., 1991). 8 2) A model involving predominantly crustal melting, where partial melt is formed in the heated zone around an intruded basaltic magma chamber and segregates to form its own magma body without interacting with (i.e., mixing or assimilating into) the basaltic magma (Fig. 4A) (Jónasson, 1994; Zierenberg et al., 2013). Figure 4 Previous Models for Rhyolite Petrogenesis at Krafla. Schematic representations of previous hypotheses for rhyolite production at Krafla. A) Model dominated by segregation of partial melt (Jónasson, 1994). Rhyolitic, low δ18O partial melt forms around the margins of large basaltic magma bodies. This partial melt segregates through cracks and melt channels to form rhyolitic magma bodies. B) Model dominated by fractional crystallization and simultaneous assimilation (AFC) (Nicholson et al., 1991; Charreteur et al., 2013). Partial melting of the low δ18O hydrothermally altered wallrock is induced by the intrusion and crystallization of basaltic magma and assimilated into the fractionating and magma chamber; alternatively, bulk crust may be assimilated by stoping. Whole rock chemical trends that follow fractional crystallization patterns, supported by mass balance calculations, have been used as the central evidence for identifying FC processes at Krafla and other silicic systems in Iceland (Carmichael, 1964; Furman et al., 1992; Hards et al., 2000; Kokfelt et al., 2009; Macdonald et al., 1990; Nicholson et al., 1991). However, Nicholson et al. (1991) noted that isotopic compositions (especially O and U- series isotopes) of Krafla’s silicic rocks required some contribution from low- δ18O and older (at Th-U equiline) crust. They thus devised an AFC model for the generation of Krafla rhyolites; fractional crystallization was believed to be primarily responsible for differentiating the magmas, while the latent heat of 9 crystallization drove the assimilation of partial melt from hydrothermally altered wallrock to produce their low δ18O isotopic signature. The second model for differentiation at Krafla suggests that low-degree (5-15%), high-silica partial melt is formed in the heated wallrock zone around upper-crustal basaltic magma chambers and segregates away from the melting zone through cracks and melt channels to form rhyolitic magma bodies (Jónasson, 1994; Fig. 4A). This model was based primarily on similarities between the major element chemistry of the rhyolites and experimental melts of altered basaltic rocks at low pressure and low PH2O, as well as their low δ 18O compositions and the scarcity of intermediate magmas erupted at Krafla (Jónasson, 2007, 1994). This paper contributes new data and aims to resolve these conflicting models for the generation of rhyolites at Krafla and other basalt-dominated settings. The objective of this study is to combine the existing petrologic knowledge for Krafla with detailed isotopic analysis of zircon crystal populations with thermochemical modeling, and to quantify via heat, mass, and isotope balances the relative importance of FC, assimilation and partial melting of altered crust in producing Krafla rhyolites. 2. Methods 2.1 Analytical Methods Samples were collected from the following units (Figs. 1- 3): 1) the ~190 ka pre-caldera rhyolite dome at the southern edge of Hágöng, 2) the ~110-115 ka Halarauður ignimbrite (units H2 and H3 of Rooyakkers et al., 2020), 3) the three post-caldera rhyolite domes, Jörundur, Gæsafjallarani and Hlíðarfjall, 4) the ~24 ka obsidian ridge Hraftntinnuhryggur, and 5) the 1724 AD Víti crater pumice and felsite xenoliths . Samples of quenched glass from the rhyolitic magma encountered in the IDDP-1 well in 2009 were also acquired. Several basaltic lavas from recent fissure eruptions were also sampled around the caldera (Fig. 1). 10 Zircon separation by standard and HF-extraction techniques were performed on all units sampled, but yielded zircons only in the three rhyolitic ring fracture domes. Zircons extracted from these samples were mounted in epoxy resin, polished, and imaged by cathodoluminescence (CL) (Fig. 5, Supplement Fig.A). The mounted zircons were then analyzed for oxygen isotopes by the SIMS at the University of Alberta using a 15 μm-diameter Cs beam (Fig. 5). Following O isotope analyses, analytical pits (depths of ~1µm) were polished away, and the same crystals spots were then targeted by O + 238 230 2322 beam for U- Th- Th isotopes and select trace element concentrations (U, Th, Hf, Yb) by using the SHRIMP-RG SIMS at Stanford University, following methods similar to Coble et al. (2018) and Mucek et al. (2017). This approach enabled direct comparison of individual δ18O values, ages and elemental ratios for the same analytical spots. Isochron ages were calculated using whole rock U/Th ratios measured by solution ICP-MS (Rooyakkers et al., in review; Rooyakkers, 2020) to estimate the host rock U-Th activity ratio: Figure 5 Zircon Cathodoluminescence Images Zircons from the post-caldera domes imaged by cathodoluminescence (CL) show two morphologies: 1) an equant sector zoned morphology and 2) an elongated, splintered morphology. An example of each type is shown here for each dome. Jörundur 238U/232Th =0.3068 ± 0.01, Hlíðarfjall 238U/232Th = 0.3059 ± 0.01, and Gæsafjallarani 238U/232Th=0.3051 ± 0.01. The model assumes secular equilibrium (i.e., the 238U/232Th activity 11 ratio is equal to the 230Th/232Th activity ratio). In the absence of isotope dilution U and Th ratios of glass, these ICP-MS measurements are inferred to be the most accurate proxies to use in the age model. Zircon crystal size distribution analysis (CSD) was performed following Simakin and Bindeman (2008 and references therein), applying theoretical considerations derived by Marsh (2007, 1988). As shown by Simakin and Bindeman (2008), the zircon extraction process preserves the smallest size fraction (>10-20 um) and does not introduce a size bias into the size distribution. Phenocrysts (if present) and groundmass δ18O analyses were conducted by CO2 laser fluorination at the University of Oregon on a MAT 253 isotope ratio mass spectrometer (Bindeman, 2008). Select units were also analyzed for whole rock major and trace element concentration by X-ray fluorescence (XRF) at Pomona University. 2.2 Thermal and Chemical Modeling Chemical and thermal modeling was applied to constrain the likely thermal structure of the crust and the corresponding processes involved in producing low δ18O Krafla rhyolites. We first used the Heat 2D thermal model (Annen, 2009; Annen and Sparks, 2002), to establish the thermal conditions expected in the upper crust at Krafla and to constrain where partial melting could occur. The resulting thermal constraints were then combined with the chemical and mass balance model of the Magma Chamber Simulator (MCS) to understand how the oxygen isotope and whole rock chemistry might vary during the partial melting and assimilation of different crustal compositions (Bohrson et al., 2014; Bohrson and Spera, 2001; Ghiorso and Sack, 1995). Initial and boundary conditions for these models were derived from a variety of data sources gathered at Krafla (Supplement Table 1). Initial temperatures, whole rock chemistries, and isotopic compositions of intruding basalts were assumed from published chemical and isotopic data from Krafla, which suggest whole-rock δ18O values of ~+5.0‰ (only slightly lower than normal mantle values) and temperatures between 1000-1200°C for the basaltic magmas feeding into the main fissure system (Cooper et al., 2016; Nicholson, 1990; Nicholson and Latin, 1992; Sigmarsson and Steinthórsson, 2007; Thorarinsson, 1979). Modeled wallrock compositions and δ18O values were derived from drillhole data collected from Krafla geothermal wells (Elders et al., 2014; Hattori and Muehlenbachs, 1982; Pope et al., 2014) as well as other published compositions (Kuo, 2017; Spulber and Rutherford, 1982; Wolf and Wyllie, 1995). Our 12 new zircon geochronological data provides further constraint on the timescales involved in generating the dome magmas. These input parameters are summarized in Table A of the supplementary material. Next, the initial conditions were modified from a simple geothermal gradient of 30°C/km (representative of crust outside the active rift zones of Iceland; (Martin and Sigmarsson, 2007)) by a 500 kyr period of crustal “priming”, in which basaltic intrusions of 50 m thickness were injected every 5-10 thousand years. The assumed 500 kyr duration of priming is broadly consistent with the lifespan of the Krafla system (at least 300 ka; Sæmundsson, 1991) and accounts for the establishment and thermal maturation of the incipient magmatic system. The priming results in a steep shallow-crustal geotherm (Fig.11a), which was then used as the input temperature conditions for subsequent runs to model melt production. The geothermal gradient produced by this priming run is consistent with well data from Krafla (Axelsson et al., 2014). Following initial thermal priming, sills were emplaced every 500 years during the subsequent 50 kyr at random depths between 3.5 and 5 km. The 500 year interval between intrusions is consistent with the frequency of major rifting events on the Krafla fissure swarm (one event every 300-1000 years; (Hjartardóttir et al., 2016), while the assumed depths derive from seismic evidence for a basaltic magma chamber or a system of interconnected dikes and sills between 3-7 km depth (Einarsson, 1978; Einarsson and Brandsdóttir, 1980; Kennedy et al., 2018). The sills and crust in this simulation were both composed of dry (0.1 wt.% H2O) basalt. The effects of hydrating the crust are explored by the Magma Chamber Simulator (MCS). Once the capacity and location of partial melt production was constrained by the thermal model, MCS models were used to investigate the consequences of the resulting thermal conditions on the compositions of wallrock partial melts produced in the vicinity of basaltic intrusions. The MCS uses algorithms constrained by experimental thermodynamic datasets to predict magma chamber chemical evolution during fractional crystallization plus crustal assimilation and/or magmatic recharge (Bohrson et al., 2014; Bohrson and Spera, 2001). Mývatn Fires basalt (δ18O = ~+5.0‰, this work) was used as the starting magma composition. Models were run for simultaneous fractional crystallization and assimilation of partial melt using a range of wallrock compositions. In cases where rhyolitic magmas were produced, the resulting compositions were used in a simple isotopic mixing model to estimate the resulting δ18O of the final magmas given different average δ18O values of the assimilated crust (0‰, -5‰, and -10‰). 13 The goal of this combined modeling technique was to find the parameters and conditions that allow for production of magmas that most closely match the Krafla rhyolites, and to quantify the roles of fractional crystallization, partial melt production and assimilation in the system. 2.3 Supplemental Methods Material 2.3.1 Major Phase and Groundmass Oxygen Isotope Analysis In situ zircon SIMS analysis was paired with laser fluorination oxygen isotope analyses of groundmass and major mineral phases (pyroxene and plagioclase) from the same samples (KRF10 from Jörundur, KRF20 from Hlíðarfjall, KRF21 from Gæsafjallarani see Figure 1). Similar laser fluorination analyses were conducted for an array of other samples from units lacking zircon. Samples for laser fluorination were prepared in the Isotope Lab at the University of Oregon, where small fragments of glass or crystals less than three mg were loaded into an airlock chamber and individually lasered to liberate the oxygen, and the δ18O was then measured with a mass spectrometer. Laser fluorination analysis used a garnet standard. Samples were reacted with a BrF5 prior to analysis in the MAT 253 isotope ratio mass spectrometer. Standards were within 1 standard deviation of the accepted δ18O value and thus within the reproduceable limit. For analysis of individual zircon grains, KIM5 and TEMORA zircons were uses as standards to calibrate for insignificant (<1‰) instrumental mass fractionations; the relative reproducibility of δ18O on standards were ±0.18‰ (1 st.dev. on standards). Zircons were analyzed on a Cameca IMS1280 ion microprobe machine at the University of Alberta. δ18O vs. Age 2.5 2 1.5 1 0.5 0 0 50 100 150 Age (ka) Figure 6 δ18O vs. time for the zircon grains extracted from the three rhyolite domes. Error bars are 2σ. The data show no pattern in δ18O over time. 14 δ18O (‰) 2.3.2 Crystal Size Distributions Images taken under the microscope were converted to black and white and processed in the ImageJ/Fiji software. This software calculated the length, Feret diameter, and area of each zircon crystal. To create a normalized representation of the size distribution, the probability density was calculated for each dome (Fig.9); this probability density takes into account the number of zircons in each size bin, the total zircons counted for the study, and the volume the crystals were extracted from. The natural logarithm of population density versus the bins creates a crystal size distribution as seen in Fig.9. Figure 7 Zircon populations selected for δ18O analysis at the University of Alberta. 2.3.3 Thermal Modeling Table A Input Parameters for Heat 2D first step priming run. The output thermal environment from this run is used as the initial temperature conditions for the second step which models the development of the rhyolite magmas at Krafla. Thermal Priming Run (no partial melt created) Magma Flux (km3/year) 0.00062 Duration of Simulation (Myr) 0.5 Background Geotherm (°C/km) 30 15 Table A, continued Thermal Priming Run (no partial melt created) Magma Temp (°C) 1350 Magma Input Composition Basalt (dry) Country Rock Composition Basalt (dry) Injection Depth (km) Randomly emplaced between 2-10 km Table B Input parameters used in Heat 2D to model the development of rhyolites in the Krafla system. Krafla System Model Run Parameters Horizontal Extent (km) 5 Horizontal Length of Sills (m) 1000 Sill Thickness (m) 100 Magma Input Volume (km3) 0.31 Injection Timescales (yrs.) Every 500 years Magma Flux (km3/year) 0.00124 Time of Simulation (yrs.) 50000 Background Geotherm (°C/km) 30 Temperature of Intruding Magma (°C) 1350 16 Table B, continued Krafla System Model Run Parameters Magma Density (g/cm3) 2700 g/cm3 Country Rock Density (g/cm3) 2650 Country Rock Composition Basalt (dry) Magma Composition Basalt (dry) Randomly emplaced between 4-5 Injection Depth (km) km Initial Temp Conditions Output from thermal priming run The Heat 2D numerical model and associated code was used to model the intrusion of magmatic sills into mafic country rock. This simple model describes the evolution of temperature due to the intrusion of sills and any melting that the temperature change induces in the surrounding country rock. It does this using two main governing equations: 1) the heat balance or heat conservation equation and 2) a temperature-melt fraction relationship (Annen, 2009). The model discretizes the space into a finite difference grid and solves these equations in cylindrical coordinates for temperature at each of the nodes (Annen, 2009). This allows the temperature to be calculated as time progresses and as intrusions change the spatial distribution of heat. The simplest form of the heat equation, which relates changes in temperature to changes in time and space, is modified to include a term the latent heat of crystallization, i.e. the heat given off by the magma as it cools and forms crystal bonds that release energy (Annen, 2009). The equation for heat conservation used here assumes that conductivity k of the magma and the crust is constant (1.5 W/mK), the latent heat term is a constant (3.5 J/kg), the heat capacity is constant (1200 J/kg K), and the density is constant. The initial and boundary conditions used are outlined in Supplementary Table A. 17 Calculation of partial melt volumes takes place within each discretized cell of the crust within the model. As heat diffuses through the model, information about the temperature of each cell is stored at each time step. Using the relationship between temperature and melt fraction based on the solidus and liquidus temperatures of the given composition, a melt fraction is calculated for each cell. Using this melt fraction, a total volume of partial melt is calculated for the whole model, and information about the melt fraction in each cell is stored. The melt fraction reported in Fig.11 shows the maximum melt fraction reached in these cells but does not describe the overall melt fraction of the crustal volume. In addition to tracking the diffusion of heat in each cell, the information about partial melt is also stored in the model so that the location of partial melt generation can be evaluated within the system. This allows for the calculation of partial melt volumes in the 2D axisymmetric model. Because it has a higher solidus than hydrated basalt, we used dry basalt as the wallrock composition in these thermal models in order to determine the minimum volume of partial melt (and hence the most evolved bulk partial melt composition) that could be produced during crustal melting. By increasing the melt fraction (and hence total volume) of partial melt produced, the partial melt composition will trend towards the composition of the protolith. The compositional trajectory therefore takes the partial melt composition further away from the observed compositions of the Krafla rhyolites as melting progresses. 2.3.4 Compositional Modeling Experiments focused on pure AFC processes. Intruding basaltic magma is modeled using the composition of the isotopically primitive Mývatn Fires basalt which is also close to normal in δ18O (+5‰). Wallrock composition was then changed for each experimental run (Fig. 13). AFC simulations come to an end once the wallrock temperature has equilibrated with the magmatic temperature; subsequent differentiation can occur beyond this point, but exclusively through cooling and fractional crystallization (Bohrson et al., 2014). Partial melt is assimilated only once the melt fraction reaches a threshold of 0.1. 2.3.5 Sample Location Information 18 Table C. Sample Location Information Sample Number Latitude Longitude Description Halarauður ignimbrite -Tuff with lithic KRF1 fragments KRF2 65.73872 16.68864 Halarauður ignimbrite at top of plateau Pillowy/lithic fragments and quasi pearlitic KRF3 65.73936 16.69499 texture in vertically oriented structure, classified as a silicic hyaloclastite. Rhyolitic hyaloclastite near Halarauður KRF4 ignimbrite - landslide deposit Andesite with small <5% plagioclase on KRF5 65.73652 16.71398 hyaloclastite ridge Vesiculated basalt, moss covered, yellowish KRF6 65.73726 16.73757 alteration Subglacial hyaloclastite, from ridge near KRF7 65.70197 16.70171 rhyolite (Jörundur) Hyaloclastite ridge with clasts of Mývatn KRF8 65.69853 16.68199 Fires basalt Same unit as KRF 10 collected in the saddle KRF9 below- on the dome see slightly variation in glassiness but all appears to be the same unit Rhyolite Dome 1 (Jörundur) collected on KRF10 65.68808 16.65125 main dome dark/medium gray/ 20% feldspars Mývatn Fires basalt, very rich in plagioclase KRF11 feldspars (sample location very close to KRF11) Ultra- KRF12 65.688442 16.69248 thin ropey basalt flow and ash- mapped the same as KRF11 but has no plagioclase Unnamed Andesite- hyaloclastite on slope but KRF13 65.68867 16.70559 tuff is dark grey/purple with 10% plagioclase Obsidian Ridge (Hrafntinnuhryggur) with KRF14 65.69638 16.71885 hydrated perlitic texture, shows extreme banding 19 Table C. Continued Sample Number Latitude Longitude Description Hand sample from obsidian ridge KRF15 (Hrafntinnuhryggur) of transition from hyaloclastite to quenched obsidian Krafla hyaloclastite on backside of ridge near KRF16 65.70501 16.73195 crater- looks more silicic with clasts of obsidian and yellow weathering Basaltic flow from shore of Mývatn lake, KRF17 65.56188 16.94479 olivine crystals KRF18 Hverfjall scoria and tephra SW rhyolite dome in Krafla (Hlíðarfjall), KRF19 65.67614 16.8669 western slope, very weathered/altered, platy fracture, plagioclase feldspars Rhyolite- same unit as KRF19 but better for KRF20 zircon extraction (Hlíðarfjall) (Gæsafjallarani) Dome of rhyolite- crumbly, KRF21 65.74017 16.94957 shiny, blocky KRF22 65.74014 16.95091 Basalt next to Gæsafjallarani rhyolite dome Basalt next to Gæsafjallarani from edge of KRF23 65.75401 17.01851 flow near contact with glacial sediment deposit 2.3.6 Zircon Data Table C Ziron Data Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) Jörundur 0.2 KRF10-1.1 1.16 4 0.1 13.9 12.3 KRF10-12.1 1.14 1475 8580 394 482 0.82 5.82 2.99 4.35 2.03 3.05 82.41 13.18 7 7 9 20 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 0.2 KRF10-12.2 1.12 6 0.3 12.5 27.4 21.9 KRF10-14.1 1.65 422 7654 19 43 0.44 18.14 5.82 0.64 3.7 90.69 24.68 1 9 4 1 0.2 12.4 KRF10-19.1 1.07 1102 7929 102 180 0.57 7.2 4.48 4.38 2.57 6.76 66.88 11.84 11.2 3 8 KRF10-19.2 1.14 0.2 0.2 KRF10-21.1 1.21 4 0.1 KRF10-21.2 1.15 9 0.2 15.9 19.9 16.8 KRF10-23.1 0.69 454 8536 21 54 0.39 18.8 6.44 5.76 3.22 58.14 18.43 3 1 8 9 0.2 12.4 11.1 KRF10-28.1 1.01 665 8697 55 112 0.5 13.07 5.06 2.55 2.86 7.62 68.33 11.78 3 2 5 0.1 KRF10-28.2 1.23 7 15.3 13.4 KRF10-29.1 1.17 0.2 890 8025 73 126 0.58 9.02 4.39 1.44 2.86 6.48 88.25 14.39 3 4 0.2 KRF10-29.2 1.01 5 0.2 KRF10-3.1 1.07 1 0.2 KRF10-3.2 1 996 9218 113 211 0.54 9.26 4.76 1.81 2.57 5.93 60.46 8.5 8.83 8.17 2 0.2 KRF10-33.1 1.21 2 KRF10-36.1 1.42 0.2 0.1 10.4 KRF10-38.1 1.71 697 7820 48 97 0.49 11.22 5.22 1.39 2.67 8.26 56.24 9.96 9.51 7 1 0.2 KRF10-38.2 1.81 1 0.2 23.4 26.7 21.4 KRF10-4.1 1.35 255 8296 16 42 0.37 32.58 6.85 1.57 3.29 54.98 24.09 1 2 3 6 0.1 13.4 16.4 14.3 KRF10-4.2 1.3 277 8532 16 47 0.34 30.75 7.47 0.74 3.79 62.47 15.39 8 4 7 1 0.2 13.8 15.7 13.7 KRF10-40.1 1.36 464 8634 26 60 0.44 18.61 5.85 1.66 2.85 53.56 14.74 2 6 4 5 0.1 KRF10-40.2 1.33 9 0.2 12.1 14.3 12.7 KRF10-43.1 1.19 372 8765 25 61 0.41 23.56 6.31 0.78 3.16 57.86 13.55 2 3 8 1 0.2 KRF10-43.2 1.31 1 0.1 18.2 13.5 12.0 KRF10-47.2 1.34 281 8421 17 44 0.38 29.92 6.81 1.28 2.63 36.9 12.77 6 8 1 2 0.2 KRF10-48.1 1.23 3 0.1 KRF10-48.2 1.12 6 0.1 KRF10-5.1 1.06 7 0.2 KRF10-5.2 1.17 3 21 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 0.2 KRF10-51.1 1.39 1 0.1 KRF10-53.1 1.21 7 0.1 KRF10-53.2 1.09 7 0.2 KRF10-54.1 1.36 5 0.2 KRF10-54.2 0.98 2 0.1 KRF10-55.1 1.25 8 0.1 KRF10-55.2 1.19 5 15.6 KRF10-57.1 597 8371 99 148 0.66 14.01 3.86 1.32 2.56 6.09 88.21 14.68 13.7 7 0.1 KRF10-57.2 1.21 8 KRF10-60.1 1.3 0.2 0.2 14.9 13.1 KRF10-62.1 0.95 1933 8172 321 453 0.71 4.23 3.63 0.78 2.51 5.01 94.61 14.07 4 7 7 0.2 KRF10-67.1 1.54 2 0.2 KRF10-70.1 1.26 2 0.2 12.8 11.4 KRF10-76.1 1.67 1163 7943 89 149 0.59 6.83 4.31 0.94 2.65 6.37 77.3 12.17 2 4 9 0.2 KRF10-76.2 1.63 4 0.1 KRF10-8.1 0.74 9 KRF10-8.2 1.15 0.2 0.2 12.9 11.5 KRF10-82.1 1.27 489 8267 54 92 0.58 16.89 4.41 0.57 2.38 9.05 58.43 12.25 1 3 6 0.2 KRF10-82.2 1.26 4 0.2 KRF10-85.1 1.28 4 0.1 KRF10-85.2 1.3 9 0.1 19.8 32.0 24.7 KRF10-87.1 1.16 393 7683 37 76 0.48 19.53 5.33 0.32 2.99 68.14 28.36 8 2 1 2 0.1 KRF10-87.2 1.22 5 0.2 15.7 16.8 14.5 KRF-92.1 1.42 451 8614 20 45 0.46 19.09 5.63 0.89 2.65 49.34 15.69 3 9 1 6 Hlíðarfjall 0.2 KRF20-103.1 1.84 4 0.1 KRF20-103.2 1.58 9 22 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 11.5 10.4 KRF20-106.1 1.54 0.2 794 7777 53 111 0.48 9.79 5.34 1.45 2.93 8 65.38 10.96 1 1 0.2 KRF20-106.2 1.4 1 0.1 KRF20-110.1 1.61 9 18.4 15.8 KRF20-110.2 1.6 0.3 1226 8269 120 212 0.56 6.74 4.53 2.17 2.75 9.74 76.45 17.15 9 1 1049 16.9 14.4 12.7 KRF20-114.1 1.18 0.2 275 15 49 0.32 38.14 8.15 3.63 3.41 45.6 13.59 2 1 4 5 0.2 11.7 11.4 10.3 KRF20-114.2 1.25 294 9366 23 68 0.34 31.9 7.68 1.1 3.5 52.06 10.9 4 5 4 6 0.2 11.2 15.9 13.8 KRF20-117.1 1.46 475 8098 25 60 0.42 17.04 6.13 2.32 3.31 66.52 14.9 1 6 1 9 0.2 KRF20-117.2 1.46 5 0.2 12.2 KRF20-119.1 1.08 200 5331 20 46 0.42 26.69 6.08 2.13 3.43 8.93 71.8 13.03 13.8 1 5 0.2 KRF20-18.1 1.84 740 7510 31 67 0.46 10.15 2 0.1 KRF20-2.1 1.2 9 0.2 15.8 13.8 KRF20-2.2 1.23 341 9244 50 102 0.5 27.12 4.85 7.24 2.63 9.77 61.2 14.86 1 6 5 0.1 KRF20-24.1 7 0.2 KRF20-24.2 2 0.1 KRF20-29.1 5 12.2 11.0 KRF20-33.1 1.5 0.2 432 9281 37 89 0.42 21.46 6.1 1.42 3.35 8.49 68.65 11.64 6 2 0.1 KRF20-33.2 1.4 8 0.2 KRF20-34.1 1.91 1 0.2 KRF20-35.1 1.25 1 0.2 KRF20-35.2 1.21 2 KRF20-37.1 0.2 0.1 KRF20-37.2 6 0.1 12.7 15.0 13.2 KRF20-4.1 1.27 545 6681 33 65 0.51 12.27 5.04 4.57 2.48 50.98 14.16 9 5 7 4 0.1 KRF20-46.1 1.78 1418 7539 127 192 0.66 5.32 7 0.1 KRF20-49.1 1.47 9 0.2 10.2 13.8 12.2 KRF20-49.2 1.66 279 7577 31 69 0.45 27.2 5.68 2.15 3.02 62.82 13.06 4 2 4 8 0.1 KRF20-5.1 1.28 8 23 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 0.2 KRF20-53.1 1.31 558 7532 74 119 0.62 13.5 3 0.1 KRF20-53.2 1.49 7 0.1 KRF20-55.1 1.26 1153 8765 137 246 0.56 7.6 4.64 2.02 2.83 4.75 77.54 9.03 9.4 8.66 5 0.2 KRF20-55.2 1.35 1 0.2 KRF20-56.1 1.41 2 0.1 KRF20-56.2 1.48 7 0.1 17.7 KRF20-73.1 1.07 1425 8490 163 233 0.7 5.96 3.66 1 2.43 7.31 86.22 16.55 15.3 8 9 0.2 KRF20-73.2 1.38 3 0.2 KRF20-78.1 8 KRF20-8.1 1.32 0.2 0.1 KRF20-82.1 1.18 9 0.1 11.3 18.3 15.6 KRF20-82.2 1.27 236 9054 14 48 0.3 38.41 8.71 0.56 5.01 80.83 17 5 5 1 8 0.1 17.6 12.1 10.9 KRF20-87.1 1.65 271 7753 19 51 0.38 28.6 6.82 1.31 2.54 34.3 11.52 9 6 3 2 0.2 KRF20-87.2 1.36 3 0.2 30.3 57.8 37.6 KRF20-95.1 1.94 149 7684 5 18 0.27 51.74 8.4 8.82 4.76 78.24 47.73 4 6 5 2 0.2 KRF20-95.2 1.79 1 0.1 KRF20-96.1 1.7 7 0.1 10.9 KRF20-96.2 1.79 1623 7628 150 202 0.74 4.7 3.44 1.68 2.15 5.2 71.47 10.46 9.96 5 6 0.1 KRF20-97.1 1.27 8 0.1 KRF20-97.2 1.14 9 0.1 KRF20-99.1 1.68 7 0.2 KRF20-99.2 1.56 4 Gæsafjallaran i 0.2 KRF21-102.1 1.44 1 0.1 KRF21-104.1 1.5 5 0.1 16.8 20.0 16.9 KRF21-106.1 0.83 239 8876 11 35 0.3 37.12 8.42 4.63 4.14 60.61 18.51 6 7 7 5 0.2 10.1 11.7 10.5 KRF21-106.2 0.59 647 8503 35 75 0.47 13.14 5.41 1.6 2.67 52.98 11.16 3 6 3 9 24 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 0.1 KRF21-107.1 0.86 5 0.1 KRF21-107.2 0.86 7 0.1 KRF21-108.1 1.43 9 0.2 KRF21-108.2 1.33 5 0.2 KRF21-111.1 1.98 673 8251 35 78 0.45 12.26 2 0.1 KRF21-111.2 1.77 6 KRF21-12.1 1.35 0.2 0.2 KRF21-12.2 1.45 6 0.1 KRF21-13.1 1.16 9 KRF21-16.1 0.87 0.2 0.2 KRF21-16.2 0.79 3 0.2 15.2 13.3 KRF21-23.1 1.41 657 7739 38 80 0.48 11.78 5.47 1.2 3.24 8.73 76.89 14.32 4 5 8 0.1 KRF21-23.2 1.74 8 0.1 14.2 12.6 KRF21-26.1 1.3 561 8568 45 89 0.5 15.26 5.18 0.21 3 8.68 72.16 13.45 9 7 2 0.1 KRF21-26.2 1.08 9 0.2 KRF21-36.1 0.92 1 0.2 15.4 13.5 KRF21-39.1 0.83 896 8654 89 133 0.67 9.66 3.82 1.31 2.57 5.84 90.55 14.5 4 6 4 0.1 KRF21-39.2 1.22 7 0.2 12.2 27.6 KRF21-42.1 1.39 559 7707 85 131 0.65 13.78 3.88 2.22 2.7 99.03 32.45 37.2 4 6 9 KRF21-42.2 1.48 0.2 0.2 KRF21-43.1 1.24 4 0.1 KRF21-43.2 1.51 8 0.2 KRF21-49.1 1.18 1 0.2 KRF21-49.2 1.07 7 0.2 16.2 13.7 12.1 KRF21-5.1 1.59 273 7444 14 52 0.27 27.28 9.3 1.26 3.9 47.44 12.95 5 7 2 8 0.2 KRF21-51.1 0.96 2 0.2 17.1 13.7 KRF21-51.2 1.09 220 8614 12 41 0.31 39.21 8.48 2.69 3.44 43.66 12.97 12.2 4 5 4 25 Table C., Continued Yb Hf Th U % Mode Averag Error Error Grain and δ18 2σ Th/ Hf/Y (238U)/(232Th (230Th)/(232Th % (ppm (ppm (ppm (ppm erro l Age e Error + - Spot O (+/-) U b ) ) error ) ) ) ) r (ka) (ka) (kyr) (kyr) 0.2 13.1 18.6 15.9 KRF21-52.1 1.1 501 7338 21 44 0.48 14.66 5.33 0.28 2.88 63.47 17.28 9 5 4 2 0.1 KRF21-52.2 1.22 9 0.1 26.7 23.1 KRF21-60.1 1.13 172 7364 8 23 0.35 42.7 6.97 5.38 3.24 52.12 26.27 29.4 7 8 4 0.1 KRF21-60.2 1.29 6 0.2 KRF21-67.1 1.29 6 0.1 KRF21-67.2 1.17 8 0.2 KRF21-69.1 1.12 430 8171 48 60 0.8 19.01 3 14.1 28.1 22.3 KRF21-69.2 1.31 0.2 333 8819 18 44 0.4 26.48 6.23 4.26 3.78 83.6 25.25 8 5 6 0.2 10.1 22.1 18.4 KRF21-7.1 0.97 797 8652 78 139 0.56 10.85 4.5 1.59 2.86 84.49 20.3 1 7 8 3 0.1 KRF21-7.2 1.42 9 0.1 KRF21-79.1 1.47 9 0.1 16.5 14.3 KRF21-79.2 1.32 589 7854 34 70 0.48 13.34 5.3 0.74 3.13 9.5 75.82 15.42 8 1 4 0.2 KRF21-79.3 1.25 2 13.6 12.1 KRF21-81.1 1.54 0.2 497 9289 35 76 0.46 18.67 5.54 0.95 3 9.75 64.36 12.9 6 4 0.2 KRF21-81.2 1.6 1 0.2 KRF21-91.1 1.13 1 0.2 KRF21-91.2 1.16 1 0.2 KRF21-94.1 1.72 4 0.1 11.0 KRF21-95.1 1.61 1034 7229 69 120 0.57 6.99 4.51 2.68 2.62 6.95 69.33 11.67 12.3 9 5 0.2 KRF21-98.1 1.26 5 0.1 KRF21-98.2 1.41 9 0.1 11.9 10.7 KRF21-99.1 1.46 812 7579 47 89 0.53 9.34 4.89 1.12 2.59 8.83 58.73 11.33 9 2 4 0.2 KRF21-99.2 1.61 4 26 3. Results 3.1 Zircon U-Th and δ18O Analyses Figure 8 Zircon Isochron. Zircon isochron diagram showing the U-Th model ages of the zircon populations in the three domes. Error ellipses show 2σ error. Best fit lines show the estimated isochron model age for each dome. Jörundur 238U/232Th=0.3068 ± 0.01, Hlíðarfjall 238U/232Th=0.3059 ± 0.01, Gæsafjallarani 238U/232Th=0.3051 ± 0.01. n = Number of zircons analyzed. MSWD = Mean square weighted deviation. Red isochron labels are reported in thousands of years (ka). SHRIMP-RG 238U-230Th-232Th dating of individual zircon grains in three post-caldera rhyolite domes yielded crystallization ages of 88.7 ± 9.9 ka for Jörundur, 83.3 ± 9.2 ka for Hlíðarfjall, and 85.5 ± 9.4 ka for Gæsafjallarani (Fig. 8). Each dome has a unimodal age distribution. The ages of the domes overlap within error. Our ages overlap with the 40Ar/39Ar eruption ages between 83 to 90 ka from Sæmundsson and Pringle (2000), but are younger than the U-Th dating from Gæsafjallarani which yielded an age estimate of ~120 ka. (Carley et al., 2020). Individual zircon grains yielded overlapping and low δ18O values between +0.5 to +2.0‰ (Fig. 9) for all three domes. For all zircons analyzed, intragrain variability is less than the analytical uncertainty (± 0.25‰, 2σ), but population heterogeneity (1.8‰) overall exceeds the Figure 9 δ18O of Dome Zircons. δ18O (‰) presented for the three rhyolite domes zircon populations (for each dome, each vertical array consists of multiple spot analyses from an individual grain). The expected equilibrium zircon δ18O based on a host melt value of +3.5‰ and 18O(melt-zircon)= 2‰, is shown (assuming a magmatic temperature of 900°C; e.g. Bindeman, 2008), as well as average values for the respective zircon populations. 27 expected analytical uncertainty (± 0.5‰, 2σ) (Fig. 6). Zircons extracted from Jörundur range from +0.69 to +1.71‰ with an average of +1.24‰, Hlíðarfjall zircons range from +1.07 to +1.79‰ with an average value of +1.45‰, and Gæsafjallarani zircons exhibit the largest range from +0.59 to +1.98‰ with an average of +1.27‰ (Fig. 9). Under cathodoluminescence imaging (CL; Fig. 5, Supplement Fig.7), two distinct morphological types are identified in the zircon populations. Each dome has a population of equant grains and a second population of elongated grains. A subpopulation of equant grains exhibit sector zoning (Fig. 5). These morphologies do not vary systematically in δ18O. Overall, we observe no trend in δ18O vs U-Th age within any of the zircon populations (Supplement, Fig. 7), other than consistently low δ18O values across the crystallization event (Fig. 9). We observe no systematic variations between inner parts of zircon grains and outer portions of the same grain. 28 3.2 Trace Elements in Zircons Figure 10 Zircon Trace Elements. A) Hf/Yb versus Th/U trace element ratios measured by SHRIMP-RG on individual zircon grains. Shapes represent the dome the zircon was extracted from (see legend at bottom of figure). Colors indicate the measured δ18O (Red = +1.5 to +2‰, Blue = +1 to +1.5‰, and Green = +0.5 to +1‰). B) Zircon Th vs. U concentrations (no oxygen isotope information shown on B and C). C) Zircon Yb vs. Hf concentrations. B) and C) are accompanied by a fractional crystallization (FC) model using the equation Cl/Co=(1-F)(D-1), where F is the fraction of material crystallized and D is the bulk partition coefficient for each element. Partition coefficients for FC are from Melnik and Bindeman (2018), assuming temperatures of ~ 850-900°C : KdHf = 1200, KdYb = 11, KdU = 46, KdTh =14. Percentages indicate the amount of crystallization that has occurred at each composition. Zircon U, Th, Hf, and Yb concentrations exhibit strong correlation to the behavior predicted during fractional crystallization given known partition coefficients for zircon; Hf/Yb shows a sharp initial decrease and overall downward trend as Th/U increases (Fig. 10A), consistent with the extreme compatibility of Hf and greater compatibility of U relative to Th in zircon. To assess whether fractional crystallization is indeed responsible for this trend, we undertake trace element modeling of magma differentiation as recorded by zircon, using the 29 following partition coefficients: KdHf = 1200, KdYb = 11, KdU = 46, KdTh =14. These values were estimated from experimental data (~850-900°C; Melnik and Bindeman, 2018) and from values reported in the GERM partition coefficient database for zircons in high silica rhyolites (GERM, 2015). The Yb partition coefficient value also considers the partitioning of Yb into other phases, such as apatite (an accessory phase in all Krafla rhyolites; Rooyakkers, 2020). We assume that the other elements considered here predominantly partition into zircon (Clayborne et al. 2018; Melnik and Bindeman, 2018). These calculations specifically model how much zircon crystallization is needed to form the observed trace element trend in zircon grains; this is therefore a minimum estimate for the total amount of fractional crystallization that these rhyolites may have undergone since zircon saturation. Most of our data are consistent with ~1-2% zircon fractionation. Some grains have compositions consistent with up to 6% fractional crystallization, and rare grains with 10-20% crystallization (Figs. 10B and C). However, changes in the partition coefficient values can significantly affect the estimated amount of crystallization predicted by the model; for example, lowering the partition coefficients for Yb and Th by an order of magnitude results in an estimate of around 10-15% of fractional crystallization. Regardless of the Kd values selected, the overall trend predicted by the FC model provides an excellent fit to the observed trace element trends, supporting a central role for FC in magmatic differentiation after zircon saturation. 30 3.3 Zircon Crystal Size Distributions (CSD) Figure 11 Zircon Crystal Size Distribution Natural logarithm of the population density for zircons of different lengths (longest axis, in μm) from each dome. There is a linear trend between elongation and size (not shown here), implying that there is no bias towards sampling large zircons. The gradients (population density/µm) of the downward-sloping parts of the curves are used to estimate residence times. Zircon CSDs are concave-down (Fig. 11), with maxima around 10-200 µm. We analyze the right-hand sides of the curves (the portions with negative slopes), as they reflect the conditions of zircon crystallization, while the left-hand sides (positive slopes) show trends previously attributed to crystal coarsening via temperature cycling (Simakin and Bindeman, 2008). CSDs for the three dome samples are comparable but not overlapping (Fig. 11). The sample size for KRF10 (Jörundur) is too small (n=42) for statistical significance of the largest size bins but shows a steep slope and small mode relative to the distributions from the other domes (Fig. 11). The populations imaged for Hlíðarfjall and Gæsafjallarani have n large enough to be statistically significant. The distribution for Hlíðarfjall is skewed further to larger values than either of the other domes and shows a shallower slope, indicative of a longer crystallization time, assuming similar growth rates for each dome (Fig. 11). Gæsafjallarani has a distribution that trends towards smaller values and a steeper slope (Fig. 11). 31 To attain the most accurate estimates for residence times from our CSD data, zircon growth was modeled using the code of Bindeman and Melnik (2016), which parameterizes zircon growth as a function of cooling rate for intrusions of various sizes. Based on input temperatures and the observed zircon sizes, the model suggests that zircons mostly grew at a rate of G = 10-14 cm/s, given the size of the inferred cooling magma bodies (Bindeman and Melnik, 2016; Melnik and Bindeman, 2018). Using this growth rate, we estimate the residence times of these zircons in their host melt using the relationship: 1 CSD Slope = − 𝑐𝑚 (Cashman and Marsh, 1988; Marsh, 𝐺𝑟𝑜𝑤𝑡ℎ 𝑅𝑎𝑡𝑒( )∗𝑅𝑒𝑠𝑖𝑑𝑒𝑛𝑐𝑒 𝑇𝑖𝑚𝑒 (𝑠) 𝑠 1988). Measured slopes on the right-hand side of the CSDs are between -125 and -143 µm-1 (Fig. 11), yielding crystallization time estimates of 2,500 yrs. and 2,200 yrs. for Hlíðarfjall and Gæsafjallarani, respectively. If growth rates are increased an order of magnitude to 10-13 cm/s, (rapid growth possibly suggested by the presence of elongated crystals), corresponding with exceptionally fast cooling in thin magma sheets, these times are reduced to ~250 and ~200 yrs. 3.4 Oxygen Isotope Measurements of Pyroxene, Plagioclase, and Groundmass Table 4 Results of Laser Fluorination δ18O Analyses *indicates a single crystal analysis, all other analyses are multiple crystals. Predicted equilibrium δ18O at 900°C is shown for a host glass value of 3.5‰ (Bindeman, 2008) 32 33 Laser fluorination δ18O analyses for Jörundur yielded a values +3.55‰ for groundmass, +2.67‰ and +1.99‰ for plagioclase, and +1.12‰, +1.59‰, and +2.22‰ for pyroxene (Fig. 12; Table 1). Plagioclase that is in 18O (melt-Plag) isotopic equilibrium would be expected to have a δ18O of +2.5‰ (Bindeman, 2008) (Fig. 9) within error of one +2.67‰ crystal value, while another crystal of plagioclase with δ18O = +1.99‰ is below that expected at equilibrium (Fig. 12; Table 1). By contrast, only one bulk analysis of pyroxenes has a δ18O value around the predicted equilibrium value of +1.5‰ (Fig. 12; Table 1). Figure 12 δ18O of All Units and All Phases δ18O of groundmass, plagioclase, pyroxene (measured by laser fluorination) in all analyzed units at Krafla. Average MORB basalt and mantle zircon values are from Valley (1994). Hlíðarfjall has a groundmass δ18O of +3.44‰ (Fig. 12; Table 1), within error of the groundmass value for Jörundur, and thus equilibrium δ18OPlag and δ 18OPx are expected to be +2.5- 2.7‰ and +1.5-1.7‰ respectively at ~900°C. However, a single bulk pyroxene analysis had a δ18O value of +4.75‰, significantly above the equilibrium. The sample of Gæsafjallarani groundmass exhibits notably more perlitic and hydrated textures than the samples from the other two domes. The resulting δ18O is lower than the other two domes at +2.21‰, but we infer that this difference most likely reflects syn- or post- emplacement alteration processes rather than a true magmatic difference in isotopic composition (Fig. 12). Three measurements of single plagioclase crystals gave slightly diverse δ18O values of 34 +2.49‰, +1.92‰ and +1.91‰, and a bulk analysis of several pyroxene grains returned a value of +1.48‰ (Fig. 12; Table 1). The pre-ignimbrite dome (sample KRF-1.2) gave a groundmass δ18O value of +2.68‰, but also exhibits hydrated perlitic textures indicative of syn or post emplacement processes that may have altered the original magmatic value to slightly lower δ18O. Plagioclase extracted from this sample returned a δ18O of +2.70‰, and two pyroxene analyses gave values of +1.56‰, and +1.66‰ (Fig. 12; Table 1). Groundmass δ18O values for the Halarauður ignimbrite are +2.38‰ in the intermediate unit H2 of Rooyakkers et al. (2020), and +1.96‰ in the basaltic zone (unit H3). In H2, pyroxenes clustered in a population at +3.31‰, +3.52‰, and +3.78‰ (Fig. 12), requiring a +4.5 to +5‰ equilibrium melt value, thus out of equilibrium with H2 melt. In H3, 5 individual plagioclase measurements returned a wide range of values from -5.26‰ to +2.13‰ (equilibrium ~+1‰) (Fig. 12). The phase 3 Hrafntinnuhryggur and IDDP-1 rhyolites both gave host glass δ18O values of +3.10‰. The granophyre from the IDDP-1 well had a bulk value of +2.80‰ (Fig. 12). Plagioclase and pyroxene in pumice from Víti crater gave δ18O values of +2.72‰ and +1.62‰, respectively (Fig. 12). Glass δ18O values in felsite xenoliths also erupted from Víti were low, at - 1.84‰ and -1.08‰ (Fig. 12). Other phenocrysts from the felsite spanned a large range, from an extremely low -5.35‰ value to near mantle values of +4.75‰ (Fig. 12). 3.5 Thermal Modeling Results Figure 11 shows the time-temperature paths for different depths in the Heat2D thermal model. Near the sill intrusions between 4-6 km depth, temperatures are elevated above the dry basaltic solidus (~1100°C), allowing for small degrees of partial melting (Fig. 13C). In these zones along the boundary of the magma chamber, the thermal model predicts melt fractions between 0.2-0.4 (Fig. 13D). The δ18O of this partial melt would reflect the crust that it was partially melted from and would therefore have low δ18O values between 0 and -12‰. 35 Away from the main area of heat and sill intrusions, temperatures decrease but the overall geothermal gradient remains elevated. Crust 1000 m above the intrusion (3 km depth) is heated to a maximum temperature of 700°C, while crust further away (1-3 km above the intrusion) is barely impacted by the heat of the intrusion (Fig. 13). Figure 13 Thermal Model Time-Temperature Paths and Partial Melt Volumes. A) Temperature profile of the crust at time t=0,after 500 kyr of thermal priming has created an elevated geotherm. B) The final temperature profile after 50 kyr of intrusions. C) Time- temperature paths at different depths within the thermal model run at the radius just outside of the intrusion (x=0.5 km). The zircon saturation temperature and the wet/dry basaltic solidus are also shown. D) Maximum melt fraction reached in the domain of the crust modeled and the volume of partial melt created in the crust calculated using cylindrical coordinates and assuming radial symmetry of the 2D space. Location of partial melt generated in the system can be evaluated by melt contours that store the volume of melt for each cell and each time step in the model. Near the magma chamber, melt fraction will increase while further away from the chamber melt fraction will decrease. Zircon saturation temperatures are calculated based on the w hole rock compositions of the rhyolite domes according to the model of Boehnke et al. (2013). 36 3.6 Magma Chamber Simulator Results The first experiment tested by the MCS is the simple case of basaltic magma at T=1200°C intruding into dry basaltic crust at ~10 km depth (3000 bars) (Fig. 14). In this case, minimal partial melting occurs, with the model estimating less than 1% (by total mass of the magma chamber) assimilation of partial melt (Fig. 14). Without generation and assimilation of any high silica partial melt and remaining at a depth where the magma is too hot to crystallize substantially, the magma undergoes minimal differentiation or change in oxygen isotope composition. The result is a near normal-δ18O basaltic magma (Fig. 14). The next scenario is crust with 1 wt.% H2O (composition from Spulber and Rutherford, 1982). In simulations using this composition at both 2000 and 3000 bars, around 10% (by mass of the total magma chamber) partial melt is produced and assimilated (Fig. 14). The partial melt composition predominantly dacitic (Fig. 15). Coupled with subsequent FC, the intruding basaltic magma undergoing assimilation can differentiate to a small volume (20% of original magma chamber mass remains as melt) of dacite (Fig. 15). Depending on the average δ18O value of the crust that is partially melted, this resulting dacite can have a δ18O as low as +3.7‰ (assimilant of -10‰) to as high as +5.0‰ (assimilant of 0‰) (Fig. 15B). Although the crust at Krafla is predominantly basaltic, drilling has revealed occasional intermediate to silicic intrusive and extrusive rocks at depth (Hattori and Muehlenbachs, 1982; Pope et al., 2013; Weisenberger et al., 2015). Hence, to test the effect of a more evolved assimilant, we modelled assimilation of a dry andesitic wallrock composition at 3000 bars and compared it to the dry basalt case (Fig. 14). The more evolved composition results in more partial melting (up to 17% by mass of the magma chamber). The final magma composition after 37 the completed AFC process is a basaltic andesite (54% SiO2) with a δ 18O of +2.8‰ to +4.5‰, depending on the chosen value for the assimilant (Figs. 14 and 15). Figure 14 Magma Chamber Simulator (MCS) Results. Results of assimilation and fractional crystallization (AFC) runs on the magma chamber simulator, showing SiO2 content of the final magma, the magma volume remaining after AFC (% of original liquid volume), and the amount of partial melt added to the magma (% of total mass). For each run, basaltic magma (Mývatn Fires composition of Thorarinsson, 1979) with T=1200°C is intruded into wallrock of specified composition (detailed on x-axis, with model pressures in bars given in parentheses). Calculations for dry (0.1 wt% H2O) basaltic magma were run at and 3000 bars using wallrock of identical composition. The dry andesite (0.1 wt.% H2O) wallrock composition is equivalent to the erupted andesite unit that we sampled inside the southern caldera margin. The basaltic wallrock composition with 1 wt.% H2O was taken from Spulber and Rutherford’s (1983) experiments on partial melting of Icelandic crust. The amphibolite composition is from Wolf and Wylie (1995) and has 5 wt.% H2O. 38 Figure 15 Magma Chamber Simulator Analysis. A) Wallrock partial melt compositions produced in selected MCS simulations for different wallrock compositions and pressures. Symbols indicate 5°C temperature cooling increments and the partial melt compositions produced by melting of the protolith indicated in the legend. Initial partial melts have the most evolved compositions, moving towards less evolved compositions at higher degrees of melting. None of the simulated partial melt compositions overlap with the observed compositions of the rhyolites, and most of the partial melt is dacitic rather than rhyolitic. B) Final magmatic δ18O values after AFC, calculated from MCS results with an isotopic mixing model assuming different average δ18O values for the crustal assimilant (-10‰, -5‰ and 0‰). Pressures of MCS simulations in bars are given in parentheses. C) The compositions of the partial melt created during MCS simulations in each scenario. None of the partial melt compositions overlap with observed compositions of the Krafla rhyolites (red square). The final scenario modeled was intrusion of basaltic magma into basaltic crust that has experienced extensive hydrothermal alteration. This scenario was modeled at several pressures, starting with a shallow simulation at 1000 bars where the average value of altered oceanic crust (newly formed MORB that has interacted with seawater) was used as the wallrock composition 39 (Kuo, 2017). At higher pressures, between 2000-4000 bars, we used an amphibolite wallrock composition with 5 wt.% H2O (composition from Wolf and Wyllie, 1995). Between 15% and 43% (by mass) partial melt is created and assimilated in these scenarios. With additional fractional crystallization, this scenario produces a final magma of dacitic composition (Fig. 14). An assimilant with δ18O of -10‰ produces final magmatic δ18O values that are lower than those observed for the Krafla rhyolite domes when partial melting is increased (Fig. 15B). In the case with the largest amounts of partial melt formation (amphibolite wallrock at 4000 bars, 45% by mass partial melting and assimilation; Fig. 14), the final δ18O of the resulting magma is lower than values observed for the domes, regardless of assumed average crustal δ18O (Fig. 15). When a wallrock δ18O of -5‰ is assumed, the final magmatic δ18O produced by the AFC process at 1000 to 2000 bars (hydrated basalt, AOC, or amphibolite wallrock compositions) provides a good fit for the measured δ18O value of the domes (Fig. 15B). Wallrock with a δ18O of 0‰ also provides a good fit in the 3000 bars amphibolite wallrock model (Fig. 15B). In all models, the initial partial melts are dacitic to rhyolitic, but further partial melting at higher temperature produces less evolved dacitic to intermediate compositions (Fig. 14A). Importantly, none of the modeled partial melt compositions overlap with the observed chemistry of the Krafla rhyolites (Figs. 14A and C). Furthermore, none of our AFC simulations produced a high-silica, low-δ18O rhyolite as the final product during isobaric differentiation (Fig. 14). Instead, the final magmas after AFC has allowed the wallrock to thermally equilibrate with the magma are mainly low δ18O andesites-dacites. 4. Discussion 4.1. Magmatic Evolution of the Rhyolite Domes The ca. 80 ka rhyolite domes studied here were erupted ca. 20-30 ka after the Halarauður caldera collapse event. Whole-rock compositions of Halarauður products transition up-section from mildly basalt-contaminated rhyolite to hybrid intermediate compositions and finally to homogeneous basalt, implying that the rhyolite tapped in this event was completely evacuated (Rooyakkers et al., 2020). We thus infer that the domes represent a renewed episode of rhyolite production after caldera collapse rather than residual rhyolitic magma left over after the Halarauður eruption. 40 Our zircon analyses reveal close similarities among the three rhyolite domes. Trace element compositions for zircons from all three domes overlap and show similar trends that are consistent with modeling of zircon fractional crystallization (Fig. 10), while zircon δ18O values have similar averages and ranges across the three domes (Fig. 9); values measured for Hlíðarfjall (average δ18O= +1.45‰, range= 1.06-1.94‰) are well within 2σ error (± 0.25‰) of those measured in Jörundur and Gæsafjallarani (Fig. 9), although the higher average may suggest a slight difference from the other domes (Fig. 9). Similarly, overlapping zircon U-Th zircon crystallization ages for all three domes imply a close link between them. Based on these chemical and temporal overlaps, we suggest that the zircon populations from all three domes likely crystallized during the same cooling event. This scenario is consistent with other similarities in their whole-rock and mineral compositions, which imply that the three domes ultimately derive from a common magma body as defined by compositional trends (Rooyakkers, 2020). A slight difference in average zircon age between Hlíðarfjall and the other domes may indicate slightly longer crustal residence for Hlíðarfjall, consistent with our CSD data and its slightly more evolved whole-rock and phenocryst compositions (Rooyakkers, 2020). Overall, we conclude that the three domes are likely related to the same original low-δ18O magma batch, but the late-stage evolution of the Hlíðarfjall magma appears to have taken a slightly different differentiation path to the other two domes. The average δ18O values of the zircon populations from the domes suggest crystallization from a host melt δ18O value of ~+3.0±0.5‰, assuming isotopic equilibrium; the measured values of the glass in the three domes is consistent with this equilibrium value. These +0.5-1.5‰ zircon δ18O values are very low by global standards, 4-4.5‰ lighter than normal mantle values of +5.3 ± 0.3‰ (Valley et al., 1994), and require either crystallization from low δ18O host melts, or assimilation directly from hydrothermally altered wallrock. Although each zircon population shows minor variability in δ18O, greater than the analytical uncertainty (±0.2‰. 1 st. dev.), the δ18O values do not vary systematically along the inferred fractional crystallization trend (Fig. 10). This suggests that zircons grew from an already low-δ18O magma that was not systematically changing in δ18O during crystallization, and therefore not incorporating new low- δ18O material, at the time of zircon growth. We thus infer that zircon growth occurred during the later stages of rhyolite differentiation, after significant assimilation of altered low-δ18O crust had ceased. We attribute the slight heterogeneity in zircon δ18O to minor isotopic heterogeneity 41 within a poorly mixed magma body; ~1-1.5‰ variation in melt δ18O is required to explain the zircon variability. Overall, the zircon populations of the Krafla rhyolite domes are more isotopically homogeneous than populations analyzed from other Icelandic rhyolites (e.g. Askja, Torfajökull, Hekla; Bindeman et al., 2012), but show comparably low δ18O values (Carley et al., 2020, 2014; Reimink et al., 2014). Based on predicted equilibrium values (Fig. 12), it appears that some pyroxene and plagioclase grains in the dome rhyolites, like the zircons, crystallized in equilibrium with their host melts. Occasional instances of marked disequilibrium, such as a bulk analysis of pyroxenes from Hlíðarfjall that gave a close to mantle-like δ18O value, or plagioclase with strongly negative δ18O values, provide evidence for assimilation of normal-δ18O cargo and hydrothermally altered rocks with low δ18O cargo (Fig. 12; Table 1). Together, these observations suggest that more than one process is acting to differentiate these magmas. 4.2. Modeling Perspectives on Petrogenesis of the Rhyolite Domes Our combination of thermal and chemical modelling suggests that segregation of partial melt from around the margins of basaltic magma bodies (sensu Jónasson, 1994; Fig. 4a) is not a viable mechanism to generate the rhyolitic dome magmas. We recognize two challenges to this process. Because most of the volume of partial melt in the thermal model was created within only 1-10 meters of the boundary of the basaltic intrusion (Fig. 13), the first difficulty is in producing sufficient volumes of high silica partial melt partial melt. While our MCS models show that hydrothermal alteration can encourage more partial melting in the crust, the estimates calculated from the thermal model provide a minimum estimate for the volume of partial melt created in the crust surrounding a magma body and therefore represent the most evolved composition of partial melt that can be generated. Greater degrees of partial melting in altered or hydrated crust results in less-evolved partial melt compositions, which is thus even less likely to produce the observed composition of the Krafla rhyolites. Therefore, to assess whether sufficient volumes of high silica partial melt with a composition similar to that of the Krafla rhyolites can be produced purely by partial melting, we evaluate the partial melt volumes produced in the dry crust of the thermal model. The collective erupted volume of the three domes is ~0.7 km3 (Jónasson, 1994). Assuming that this erupted volume represents only a fraction of the total magma body volume, for example around ~1/3 of the magma produced below the surface, we suggest that a total 42 volume of >2 km3 of rhyolite would be required to feed eruption of the domes. Over the full 50 kyr thermal model run, ~1.3 km3 of high silica partial melt is produced; over the 20-30 kyr between the Halarauður and dome eruptions this reduces to 0.8 km3 (Fig. 13). Unless almost the entire volume of partial melt was segregated over this 20-30 kyr period and completely drained on eruption, our models suggest insufficient partial melting occurs to produce the dome eruptions. The second challenge is in producing the compositions of the dome magmas purely by partial melting. The MCS predicts that the partial melt itself, while low in δ18O, is mostly dacitic in composition and does not overlap with the rhyolitic composition of surface lavas (Fig. 15A). The closest match to the measured whole rock compositions comes from the initial (very low melt fraction and small volume) partial melting of the average altered oceanic crust composition or the andesitic wallrock composition, but even these partial melt compositions do not overlap with the rhyolite compositions of the domes or other Krafla rhyolites in any major element dimension (Figs. 15A and C). If silicic partial melt were segregated from these zones, further differentiation would then be required to reach the observed rhyolitic compositions. Furthermore, at high temperatures, the δ18O of the protolith will be roughly equal to the δ18O of any partial melt produced (Hoefs, 2015), so partial melting of the Krafla crust will produce a melt that takes on the low values of the crustal material, between 0 and -12‰ for altered rocks at Krafla (Hattori and Muehlenbachs, 1982). These values are far lower than the measured δ18O of any of the domes or any other Krafla rhyolites, which have groundmass values between +3 to +3.5‰ (Table 1). The isotopic compositions of the domes are more consistent with an AFC scenario. For example, producing the observed rhyolitic groundmass δ18O value of ~+3 to +3.5‰ by assimilating partial melt with a low δ18O of -5.0‰ into a normal-δ18O magma of +5.0‰ requires 40% assimilation by total mass of the magma chamber. If the magma chamber, therefore, has a total volume of at least 2 km3, then the necessary 40% by mass assimilation corresponds with a volume of >0.8 km3 of partial melt. This volume of partial melt is well within the amount predicted by the thermal model in the time constrained by the geochronology (Jónasson, 1994; Zierenberg et al., 2013) (Fig. 13B). If more assimilation occurs, the resulting δ18O value may be too low, approaching the whole-rock values of -5 to -11‰ measured in altered basalts from Krafla drillhole samples (Hattori and Muehlenbachs, 1982). The models that best match the 43 observed δ18O values for the domes are: andesitic crust at 3000 bars (δ18O = -5‰), hydrated basaltic crust at 2000 bars (δ18O = -10‰), amphibolite crust at 2000 or 3000 bars (δ18O = -5‰ and 0‰, respectively) and altered oceanic crust at 1000 bars (δ18O = -5‰) (Fig. 13B). Given the suggested average δ18O value of -7‰ for the altered crust at Krafla (Hattori and Muehlenbachs, 1982), we suggest that models involving wallrock δ18O of -5‰ (hydrated basaltic crust at 2000 bars, or altered oceanic crust at 1000 bars) represent the most realistic petrogenetic scenarios. Although shallow (1000-2000 bars) AFC models involving wallrock δ18O of -5‰ provide a good match to the isotopic compositions and are consistent with thermal modelling constraints, the bulk compositions of the final silicic magma produced do not reach the evolved rhyolitic compositions observed, and therefore require further differentiation outside of the thermal regime where assimilation of partial melt occurs (Figs. 14 and 15). The presence of zircons in the dome rhyolites provides additional thermal constraint; zircon crystallization occurs only when the temperature of the magma drops below the zircon saturation temperature (Boehnke et al., 2013; Loewen and Bindeman, 2015; Watson and Harrison, 1983), calculated at ~870-890°C for the dome rhyolites (Boehnke et al., 2013; Watson and Harrison, 1983). In the simulated thermal system, this temperature threshold occurs >500 m from the main intrusion (Fig. 13C), suggesting that the magma must move out of this elevated temperature regime to sufficiently cool below zircon saturation (Figs. 13B and C). We thus suggest that more than one thermal environment is required to produce the rhyolitic dome magmas. The simple or complete lack of zircon zoning and lack of textural evidence for zircon resorption suggests that the magma was not reheated above the zircon saturation temperature after the onset of zircon grown, hence we infer that the dome zircons record the last stages of magmatic differentiation. One possible mechanism for achieving this substantial cooling is to remove the zircon undersaturated magma from the main area of heat along the fissure system to colder (and possibly shallower) areas of the crust where it can cool below zircon saturation. Such a scenario is consistent with the lack of systematic shifts in zircon δ18O along the trace element fractional crystallization trends, which implies a lack of crustal assimilation during zircon fractionation and may reflect a cooler thermal environment. In summary, our modelling results suggest that a two-step process is required to produce the Krafla rhyolite dome magmas (Fig. 16). First, partial melting of altered low-δ18O crust occurs in a relatively narrow zone around the margins of basaltic intrusions and is assimilated into the 44 fractionating basaltic magma body (AFC). Second, the resulting low-δ18O dacitic melt is extracted and emplaced in cooler (and likely shallower) crust, where it undergoes additional fractional crystallization and cools below the zircon saturation temperature (Fig. 13). Similar two-step models have been suggested at other systems in Iceland (Martin and Sigmarsson, 2007) and at Yellowstone, where the low-δ18O signature of some rhyolites is imparted by incorporation of hydrothermally altered material (Troch et al., 2019, 2018) but zircon crystallization occurs after removal from the zone where assimilation occurs (Loewen and Bindeman, 2015). Our modeling results highlight the importance of hydrothermally altered crust in the differentiation of Krafla rhyolites. In the scenarios with dry crust, minimal partial melt is created, which means that differentiation must be driven by fractional crystallization (Fig. 14). This does not result in low-δ18O compositions. However, sufficient partial melt volumes can be generated to produce the observed isotopic characteristics (Fig. 15). The results of these numerical models are consistent with experimental data, which shows that the composition of the partial melt produced from altered mafic crust is predominantly dacitic, and that hydrothermally altered crust facilitates considerably more partial melt production at a given temperature (eg. Koepke et al., 2007, 2005, 2004; Spulber and Rutherford, 1982; Wolf and Wyllie, 1995). Although our MCS models assume assimilation of partial melt rather than bulk crust, they cannot distinguish between bulk crustal assimilation versus assimilation of partial melt. Partial melting does not substantially alter the δ18O signature of a material, hence both assimilation scenarios would lower the δ18O of the magma in the same way. However, because of their high silica composition, assimilation of low-degree partial melts can assist in differentiating to higher silica compositions while simultaneously lowering the δ18O. In contrast, the magma composition can evolve substantially by bulk assimilation only if unerupted high silica material is assimilated. If the assimilant is basaltic, bulk assimilation is more (or entirely) reliant on fractional crystallization for differentiation. Such a scenario, while not specifically investigated in our models, provides a possible mechanism to produce low-δ18O basalts that have incorporated hydrothermally altered material but not evolved beyond their mafic compositions. Such processes may explain the occurrence of low-δ18O quartz-tholeiitic basalts with isotopically diverse crystal cargoes (Bindeman et al. 2008). To test this hypothesis, a detailed investigation of the trace element and whole rock chemistries in low-δ18O basalts in Iceland could be carried out to help reveal input from a more felsic partial melt or bulk crustal material. 45 Figure 16 New Model for Petrogenesis of the Rhyolite Domes. Schematic representation for the multi-stage petrogenetic model proposed for the three rhyolite domes. Low-δ18O partial melt from a compositionally diverse and isotopically depleted crust is assimilated after intrusion of mafic magma heats the surrounding crust. This imparts the low δ18O signature and creates an intermediate to dacitic magma. Further differentiation to rhyolite occurs by fractional crystallization after removal of the magma to cooler crust where it cools below the zircon saturation temperature. 46 4.3. Generation of Other Low δ18O Magmas at Krafla Figure 17 δ18O vs. SiO2 δ18O of glass or groundmass (this study) or bulk rock (Nicholson et al., 1991; Cooper et al., 2016) vs. SiO2 for Krafla products. Figure 17 displays groundmass oxygen isotope values versus SiO2 content for selected Krafla products spanning in age from pre-100 ka caldra to post caldera (Cooper et al., 2016; Nicholson et al., 1991). The basaltic end of this compositional spectrum has the greatest range in δ18O values (~3‰ range, from +2.0 to +5.1‰) (Fig. 17), whereas higher silica compositions appear to cluster within a smaller range, overlapping with the lowest δ18O values exhibited by the basaltic products (~2.5‰ range, from +1.0 to +3.5‰) (Fig. 17). Critically, all erupted lavas with SiO2 greater than 53 wt.% have groundmass δ 18O below +3.55‰ (Fig. 17). This suggests that lowering of magmatic δ18O from mantle values of ~+5‰ occurs when the bulk magma composition is basaltic (<53 wt.% SiO2). Some significant δ 18O heterogeneity does appear in magmas with SiO2 >53 wt%, reflected by crystals found in the rhyolites (e.g., occasional high- δ18O pyroxenes). This may indicate at least some simultaneous assimilation and fractional crystallization through the intermediate stage of magma evolution, but the majority of differentiation to rhyolitic compositions appears to happen after the oxygen isotope signature has 47 been imparted (Fig. 17). In this model, both normal and low-δ18O basalts (+2.0 to +5.0‰) can be erupted from the same system, dependent on the degree of crustal assimilation that they experience. This scenario is consistent with our two-step model for the petrogenesis of the dome rhyolites, which involves an early stage of differentiation involving assimilation of low-δ18O material followed by further FC without assimilation, and suggests that similar multi-step process may be common to many or most of the evolved magmas at Krafla. The near normal mantle values (+5 to +5.5‰) shown by some Krafla basalts (our analyses; Cooper et al., 2016; Nicholson et al., 1991) (Figs. 12 and 17), and the normal mantle-like olivine found in almost every low-δ18O basaltic lava in Iceland (Bindeman et al., 2008), imply that the low oxygen signature of many Icelandic magmas is more likely imparted primarily by incorporation of hydrothermally altered crust into relatively normal-δ18O magmas, rather than inherited from an anonymously low-δ18O plume signature, though there may certainly be some isotopic variability in the plume (e.g. Winpenny and Maclennan, 2014). The other rhyolites at Krafla, including Hrafntinnuhryggur and the melt intercepted in the IDDP-1 well, have groundmass δ18O values that are comparable to those measured in the domes. As shown by our modeling results, like the domes, these low-δ18O rhyolites cannot form by segregation of pure partial melts alone; either assimilation and/or further FC are necessary to produce these compositions. The Víti and Halarauður eruptive products also have groundmass values consistent with the domes, but host crystal cargoes with diverse δ18O. For example, some crystals in the Víti felsite and pumice appear almost mantle-like (e.g. pyroxene IC-82; +4.75‰), while others have low values that appear to reflect assimilation of hydrothermally altered material (e.g. altered plagioclase IC-83; -5.35‰) (Fig. 12). Halarauður crystals show a comparable range, from extremely low values (plagioclase as low as -5.26‰). to values equivalent to or higher than the domes (Fig. 12). While some of this cargo may reflect incorporation of lithic material (abundant in both H2 and H3) rather than primary magmatic variation, the data provide clear evidence for isotopic heterogeneity in the pre-caldera crust and/or magmatic system, suggesting a large range not only in composition of the wallrock material but also in the degree of hydrothermal alteration. Our data from these two eruptions thus provide direct evidence for assimilation of a isotopically depleted and compositionally diverse crust at Krafla. 48 5. Conclusions Our combination of detailed isotopic analyses and thermal and chemical modeling provide important new insights into the generation of low-δ18O rhyolitic magmas at Krafla volcano, and represents a useful approach for exploring the petrogenesis of other enigmatic magmas. We show that the petrogenesis of Krafla rhyolites involves at least two-steps: 1) assimilation of hydrothermally altered material during fractional crystallization (AFC) to produce a low-δ18O intermediate to dacitic magma, followed by 2) further fractional crystallization with little to no concurrent assimilation to reach rhyolitic compositions. Our models highlight the importance of hydrothermal alteration in the production of rhyolites at Krafla, and possibly in other areas of predominantly mafic crust. At Krafla, sustained basaltic intrusions accompanied by periodic glaciations during the early- to mid-Pleistocene thermally primed and hydrated the crust, producing a vigorous hydrothermal system (Gautason and Muehlenbachs, 1998; Hattori and Muehlenbachs, 1982; Jónasson, 1994; Pope et al., 2013; Sæmundsson, 1991; Zakharov et al., 2019). This provided a fertile environment for rhyolite petrogenesis, allowing for more extensive crustal melting and assimilation to drive magmatic differentiation. To further understand this phenomenon, other systems with similar characteristics to Krafla can be tested using detailed oxygen isotope analyses and detailed modeling. These tools offer a method of investigating the interaction between intruding magmas and the crust in enigmatic systems and has a critical role to play in broadly interrogating volcanic plumbing systems. 6. Bridge to Chapter 2 The methods presented in the first chapter of this dissertation allowed us to dive into the plumbing system at Krafla Iceland and better understand the processes that were forming both basaltic and rhyolitic magmas. In large part we were able to do this because of the wide diversity of magmas in the system, the abundant crystal cargo and sampling access to the deeper system through the hydrothermal research that has been done there. By then adding in modeling we were able to use an extensive toolbox to constrain the processes of this system. However, more often it is the case that these geologic factors such as crystal content, aren’t available or accessible. In these systems we must rely on whole rock geochemistry as the dominant evidence to inform hypotheses about processes in the plumbing system. But this data 49 can be difficult to parse through, both because it is multidimensional and therefore complex, and also because the datasets become much larger than the one constrained here. In the following chapter, we explore the use of machine learning methods to explore whole rock geochemical data in a mafic system with minimal magmatic variation compared to the system in Iceland. In particular we use data from another flood basalt province, the Columbia River Flood Basalts, and use these machine learning methods to dive into a larger plumbing system than the one explored in this first chapter. 7. References Cited Annen, C., 2009. From plutons to magma chambers : Thermal constraints on the accumulation of eruptible silicic magma in the upper crust. Earth Planet. Sci. Lett. 284, 409–416. https://doi.org/10.1016/j.epsl.2009.05.006 Annen, C., Sparks, R.S.J., 2002. Effects of repetitive emplacement of basaltic intrusions on thermal evolution and melt generation in the crust. Earth Planet. Sci. Lett. 203, 937–955. Axelsson, G., Egilson, T., Gylfadóttir, S.S., 2014. Modelling of temperature conditions near the bottom of well IDDP-1 in Krafla, Northeast Iceland. Geothermics 49. https://doi.org/10.1016/j.geothermics.2013.05.003 Banik, T.J., Miller, C.F., Fisher, C.M., Coble, M.A., Vervoort, J.D., 2018. Magmatic-tectonic control on the generation of silicic magmas in Iceland: Constraints from Hafnarfjall- Skarðsheiði volcano. Lithos 318–319, 326–339. https://doi.org/10.1016/j.lithos.2018.08.022 Bindeman, I., 2008. Oxygen isotopes in mantle and crustal magmas as revealed by single crystal analysis. Rev. Mineral. Geochemistry 69, 1–34. https://doi.org/10.2138/rmg.2008.69.11 Bindeman, I., Gurenko, A., Carley, T., Miller, C., Martin, E., Sigmarsson, O., 2012. Silicic magma petrogenesis in Iceland by remelting of hydrothermally altered crust based on oxygen isotope diversity and disequilibria between zircon and magma with implications for MORB. Terra Nov. 24, 227–232. https://doi.org/10.1111/j.1365-3121.2012.01058.x Bindeman, I., Gurenko, A., Sigmarsson, O., Chaussidon, M., 2008. Oxygen isotope heterogeneity and disequilibria of olivine crystals in large volume Holocene basalts from Iceland: Evidence for magmatic digestion and erosion of Pleistocene hyaloclastites. Geochim. Cosmochim. Acta 72, 4397–4420. https://doi.org/10.1016/j.gca.2008.06.010 Bindeman, I.N., Melnik, O.E., 2016. Zircon survival, rebirth and recycling during crustal melting, magma crystallization, and mixing based on numerical modelling. J. Petrol. 57, 437–460. https://doi.org/10.1093/petrology/egw013 Boehnke, P., Watson, E.B., Trail, D., Harrison, T.M., Schmitt, A.K., 2013. Zircon saturation re- 50 revisited. Chem. Geol. 351, 324–334. https://doi.org/10.1016/j.chemgeo.2013.05.028 Bohrson, W.A., Spera, F.J., 2001. Energy-constrained open-system magmatic processes II: Application of energy-constrained assimilation fractional crystallization (EC-AFC) model to magmatic systems. J. Petrol. 42, 1019–1041. Bohrson, W.A., Spera, F.J., Ghiorso, M.S., Brown, G.A., Creamer, J.B., Mayfield, A., 2014. Thermodynamic model for energy-constrained open-system evolution of crustal magma bodies undergoing simultaneous recharge, assimilation and crystallization: The magma chamber simulator. J. Petrol. https://doi.org/10.1093/petrology/egu036 Carley, T.L., Miller, C.F., Fisher, C.M., Hanchar, J.M., Vervoort, J.D., Schmitt, A.K., Economos, R.C., Jordan, B.T., Padilla, A.J., Banik, T.J., 2020. Petrogenesis of silicic magmas in Iceland through space and time: The isotopic record preserved in zircon and whole rocks. J. Geol. 128, 1–28. https://doi.org/10.1086/706261 Carley, T.L., Miller, C.F., Wooden, J.L., Bindeman, I.N., Barth, A.P., 2011. Zircon from historic eruptions in Iceland: Reconstructing storage and evolution of silicic magmas. Mineral. Petrol. 102, 135–161. https://doi.org/10.1007/s00710-011-0169-3 Carley, T.L., Miller, C.F., Wooden, J.L., Padilla, A.J., Schmitt, A.K., Economos, R.C., Bindeman, I.N., Jordan, B.T., 2014. Iceland is not a magmatic analog for the Hadean: Evidence from the zircon record. Earth Planet. Sci. Lett. 405, 85–97. https://doi.org/10.1016/j.epsl.2014.08.015 Carmichael, I., 1964. The petrology of Thingmuli, a Tertiary volcano in eastern Iceland. J. Petrol. 5, 435–460. Cashman, K. V., Marsh, B.D., 1988. Crystal size distribution (CSD) in rocks and the kinetics and dynamics of crystallization - II. Makaopuhi lava lake. Contrib. to Mineral. Petrol. 99, 292– 305. https://doi.org/10.1007/BF00371933 Charreteur, G., Tegner, C., Haase, K., 2013. Multiple ways of producing intermediate and silicic rocks within Thingmúli and other Icelandic volcanoes. Contrib. to Mineral. Petrol. 166, 471–490. https://doi.org/10.1007/s00410-013-0886-1 Coble, M.A., Vazquez, J.A., Barth, A.P., Wooden, J., Burns, D., Kylander-clark, A., Jackson, S., Vennari, C.E., 2018. Trace element characterisation of MAD-559 zircon reference material for ion microprobe analysis. Geostand. Geoanalytical Res. https://doi.org/10.1111/ggr.12238 Cooper, K.M., Sims, K.W.W., Eiler, J.M., Banerjee, N., 2016. Timescales of storage and recycling of crystal mush at Krafla Volcano, Iceland. Contrib. to Mineral. Petrol. 171, 1–19. https://doi.org/10.1007/s00410-016-1267-3 Eichelberger, J., 2020. Distribution and transport of thermal energy within magma–hydrothermal 51 systems. Geosci. 10, 1–26. https://doi.org/10.3390/geosciences10060212 Eichelberger, J., Kiryukhin, A., Mollo, S., Tsuchiya, N., Villeneuve, M., 2020. Exploring and modeling the magma–hydrothermal regime. Geosci. 10, 1–6. https://doi.org/10.3390/geosciences10060234 Einarsson, P., 1978. S-wave shadows in the Krafla Caldera in NE-Iceland, evidence for a magma chamber in the crust. Bull. Volcanol. 41, 187–195. https://doi.org/10.1007/BF02597222 Einarsson, P., Brandsdóttir, B., 1980. Seismological evidence for lateral magma intrusion during the July 1978 deflation of the Krafla volcano in NE- Iceland. J. Geophys. 47, 160–165. https://doi.org/10.2172/890964 Elders, W.A., Fridleifsson, G.Ó., Zierenberg, R.A., Pope, E.C., Mortensen, A.K., Gudmundsson, Á., Lowenstern, J.B., Marks, N.E., Owens, L., Bird, D.K., Reed, M., Olsen, N.J., Schiffman, P., 2011. Origin of a rhyolite that intruded a geothermal well while drilling at the Krafla volcano, Iceland. Geology 39, 231–234. https://doi.org/10.1130/G31393.1 Elders, W.A., Frioleifsson, G.O., Albertsson, A., 2014. Drilling into magma and the implications of the Iceland Deep Drilling Project (IDDP) for high-temperature geothermal systems worldwide. Geothermics 49, 111–118. https://doi.org/10.1016/j.geothermics.2013.05.001 Furman, T., Frey, F.A., Meyer, P.S., 1992. Petrogenesis of evolved basalts and rhyolites at austurhorn , southeastern Iceland: the role of fractional crystallization. J. Petrol. 33, 1405– 1445. https://doi.org/10.1093/petrology/33.6.1405 Gautason, B., Muehlenbachs, K., 1998. Oxygen isotopic fluxes associated with high-temperature processes in the rift zones of Iceland. Chem. Geol. 145, 275–286. GERM, 2015. Geochemical earth reference model partition coefficient (Kd) database [WWW Document]. Earth Ref. URL http://earthref.org/KDD/ Ghiorso, M.S., Sack, R., 1995. Chemical mass transfer in magmatic processes IV. Contrib. to Mineral. Petrol. 119, 197–212. https://doi.org/10.1007/BF00307281 Grönvold, K., 1984. Myvatn Fires 1724-1729; Chemical composition of the lava. Nord. Volcanol. Institute, Reyjavik 1, 1–31. Gurenko, A.A., Bindeman, I.N., Sigurdsson, I.A., 2015. To the origin of Icelandic rhyolites: insights from partially melted leucocratic xenoliths. Contrib. to Mineral. Petrol. 169, 1–21. https://doi.org/10.1007/s00410-015-1145-4 Hards, V.., Kempton, P.., Thompson, R.., Greenwood, P.., 2000. The magmatic evolution of the Snæfell volcanic centre; an example of volcanism during incipient rifting in Iceland. J. Volcanol. Geotherm. Res. 99, 97–121. 52 Hattori, K., Muehlenbachs, K., 1982. Oxygen isotope ratios of the Icelandic crust. J. Geophys. Res. 87, 6559–6565. Hjartardóttir, A.R., Einarsson, P., Magnúsdóttir, S., Björnsdóttir, T., Brandsdóttir, B., 2016. Fracture systems of the Northern Volcanic Rift Zone, Iceland: An onshore part of the Mid- Atlantic plate boundary. Geol. Soc. Spec. Publ. 420, 297–314. https://doi.org/10.1144/SP420.1 Hoefs, J., 2015. Stable Isotope Geochemistry, 6th ed. Springer International Publishing, Switzerland. Jónasson, K., 2007. Silicic volcanism in Iceland : Composition and distribution within the active volcanic zones. J. Geodyn. 43, 101–117. https://doi.org/10.1016/j.jog.2006.09.004 Jónasson, K., 1994. Rhyolite volcanism in the Krafla central volcano, north-east Iceland. Bull. Volcanol. 56, 516–528. https://doi.org/10.1007/BF00302832 Kennedy, B.M., Holohan, E.P., Stix, J., Gravley, D.M., Davidson, J.R.J., Cole, J.W., 2018. Magma plumbing beneath collapse caldera volcanic systems. Earth-Science Rev. 177, 404– 424. https://doi.org/10.1016/j.earscirev.2017.12.002 Koepke, J., Berndt, J., Feig, S., Holtz, F., 2007. The formation of SiO 2 -rich melts within the deep oceanic crust by hydrous partial melting of gabbros. Contrib. to Mineral. Petrol. 67– 84. https://doi.org/10.1007/s00410-006-0135-y Koepke, J., Feig, S.T., Snow, J., 2005. Hydrous partial melting within the lower oceanic crust. Terra Nov. 17, 286–291. https://doi.org/10.1111/j.1365-3121.2005.00613.x Koepke, J., Feig, S.T., Snow, J., Freise, M., 2004. Petrogenesis of oceanic plagiogranites by partial melting of gabbros: An experimental study. Contrib. to Mineral. Petrol. 146, 414– 432. https://doi.org/10.1007/s00410-003-0511-9 Kokfelt, T., Hoernle, K., Lundstrom, C., Hauff, F., Bogaard, C., 2009. Time-scales for magmatic differentiation at the Snaefellsjo central volcano , western Iceland : Constraints from U – Th – Pa – Ra disequilibria in post-glacial lavas. Geochim. Cosmochim. Acta 73, 1120–1144. https://doi.org/10.1016/j.gca.2008.11.021 Kuo, T., 2017. The composition of altered oceanic crust: Implications for mantle evolution (Dissertation). University of Melbourne. Loewen, M.W., Bindeman, I.N., 2015. Oxygen isotope and trace element evidence for three- stage petrogenesis of the youngest episode (260–79 ka) of Yellowstone rhyolitic volcanism. Contrib. to Mineral. Petrol. 170, 1–25. https://doi.org/10.1007/s00410-015-1189-5 Maas, R., Kinny, P.D., Williams, I.S., Froude, D.O., Compston, W., 1992. The Earth’s oldest known crust: A geochronological and geochemical study of 3900-4200 Ma old detrital 53 zircons from Mt. Narryer and Jack Hills, Western Australia. Geochim. Cosmochim. Acta 56, 1281–1300. https://doi.org/10.1016/0016-7037(92)90062-N Macdonald, R., McGarvie, D., Pinkerton, H., Smith, R., Palacz, A., 1990. Petrogenetic evolution of the Torfajökull volcanic complex , Iceland I. Relationship between the magma types. J. Petrol. 31, 429–459. Maclennan, J., Jull, M., McKenzie, D., Slater, L., Grönvold, K., 2002. The link between volcanism and deglaciation in iceland. Geochemistry, Geophys. Geosystems 3, 1–25. https://doi.org/10.1029/2001GC000282 Marsh, B.D., 1988. Crystal size distribution (CSD) in rocks and the kinetics and dynamics of crystallization - I. Theory. Contrib. to Mineral. Petrol. 99, 277–291. https://doi.org/10.1007/BF00371933 Marsh, B.D., Gunnarsson, B., Congdon, R., Carmody, R., 1991. Hawaiian basalt and Icelandic rhyolite: Indicators of differentiation and partial melting. Geol. Rundschau 80, 481–510. https://doi.org/10.1007/BF01829378 Martin, E., Martin, H., Sigmarsson, O., 2008. Could Iceland be a modern analogue for the Earth’s early continental crust? Terra Nov. 20, 463–468. https://doi.org/10.1111/j.1365- 3121.2008.00839.x Martin, E., Sigmarsson, O., 2010. Thirteen million years of silicic magma production in Iceland: Links between petrogenesis and tectonic settings. Lithos 116, 129–144. https://doi.org/10.1016/j.lithos.2010.01.005 Martin, E., Sigmarsson, O., 2007. Crustal thermal state and origin of silicic magma in Iceland: The case of Torfajökull, Ljósufjöll and Snæfellsjökull volcanoes. Contrib. to Mineral. Petrol. 153, 593–605. https://doi.org/10.1007/s00410-006-0165-5 Melnik, O., Bindeman, I., 2018. Modeling of trace elemental zoning patterns in accessory minerals with emphasis on the origin of micrometer-scale oscillatory zoning in zircon. Am. Minerol. https://doi.org/10.2138/am-2018-6182 Mucek, A.E., Danis, M., Silva, S.L. De, Schmitt, A.K., Pratomo, I., Coble, M.A., 2017. Post- supereruption recovery at Toba Caldera. Nat. Commun. 1–9. https://doi.org/10.1038/ncomms15248 Nicholson, H., 1990. The magmatic evolution of Krafla, NE Iceland (Dissertation). University of Edinburgh. Nicholson, H., Condomines, M., Fitton, J.G., Fallick, A., Gronvold, K., Rogers, G., 1991. Geochemical and isotopic evidence for crustal assimilation beneath Krafla, Iceland. J. Petrol. 32, 1005–1020. 54 Nicholson, H., Latin, D., 1992. Olivine tholeiites from krafla, Iceland: Evidence for variations in melt fraction within a plume. J. Petrol. 33, 1105–1124. https://doi.org/10.1093/petrology/33.5.1105 Pope, E.C., Bird, D.K., Arnórsson, S., 2014. Stable isotopes of hydrothermal minerals as tracers for geothermal fluids in Iceland. Geothermics 49, 99–110. https://doi.org/10.1016/j.geothermics.2013.05.005 Pope, E.C., Bird, D.K., Arnórsson, S., 2013. Evolution of low-18O Icelandic crust. Earth Planet. Sci. Lett. 374, 47–59. https://doi.org/10.1016/j.epsl.2013.04.043 Reimink, J.R., Chacko, T., Stern, R.A., Heaman, L.M., 2014. Earth’s earliest evolved crust generated in an Iceland-like setting. Nat. Geosci. 7, 529–533. https://doi.org/10.1038/ngeo2170 Rooyakkers, S.M., 2020. New insights on rhyolitic and mixed rhyolitic-basaltic magmatism and volcanism at Krafla Central Volcano, Iceland (Dissertation). McGill University. Rooyakkers, S.M., Stix, J., Berlo, K., Barker, S.J., 2020. Emplacement of unusual rhyolitic to basaltic ignimbrites during collapse of a basalt-dominated caldera: The Halarauður eruption, Krafla (Iceland). GSA Bull. 132, 1881–1902. https://doi.org/10.1130/b35450.1 Rooyakkers, S.M., Stix, J., Berlo, K., Petrelli, M., Hampton, R., In Review, The origin of rhyolitic magmatism at Krafla Central Volcano (Iceland). J. Petrol. Sæmundsson, K., 1991. Jarðfræði Kröflukerfisins (The geology of the Krafla volcanic system, in Icelandic). Náttúra Mývatns 3–60. Sæmundsson, K., Pringle, M.S., 2000. Um aldur berglaga Í Kröflukerfinu (On the age of rock strata in the Krafla system). Geosci. Soc. Icel. Spring Meet. Abstr. Reykjavík, Iceland, Geosci. Soc. Icel. 26–27. Schattel, N., Portnyagin, M., Golowin, R., Hoernle, K., Bindeman, I., 2014. Contrasting conditions of rift and off-rift silicic magma origin on Iceland. Geophys. Res. Lett. 41, 5813– 5820. https://doi.org/10.1002/2014GL060780 Sigmarsson, O., Steinthórsson, S., 2007. Origin of Icelandic basalts : A review of their petrology and geochemistry. J. Geodyn. 43, 87–100. https://doi.org/10.1016/j.jog.2006.09.016 Simakin, A.G., Bindeman, I.N., 2008. Evolution of crystal sizes in the series of dissolution and precipitation events in open magma systems. J. Volcanol. Geotherm. Res. 177, 997–1010. https://doi.org/10.1016/j.jvolgeores.2008.07.012 Spulber, D., Rutherford, M.J., 1982. The origin of rhyolite and plagiogranite in oceanic crust: An experimental study. J. Petrol. 24, 1–25. 55 Sveinbjornsdóttir, Á.E., Ármannsson, H., Óskarsson, F., Ólafsson, M., 2015. A conceptual hydrological model of the thermal areas within the northern neovolcanic zone , Iceland using stable water isotopes. Proc. World Geotherm. Congr. 19–25. Thorarinsson, S., 1979. The postglacial history of the Mývatn area. Nord. Soc. Oikos 32, 16–28. Troch, J., Ellis, B.S., Harris, C., Ulmer, P., Bouvier, A.-S., Bachmann, O., 2019. Experimental melting of hydrothermally altered rocks: constraints for the generation of low-δ18O rhyolites in the central Snake River Plain. J. Petrol. 60, 1881–1902. https://doi.org/10.1093/petrology/egz056 Troch, J., Ellis, B.S., Schmitt, A.K., Bouvier, A.S., Bachmann, O., 2018. The dark side of zircon: textural, age, oxygen isotopic and trace element evidence of fluid saturation in the subvolcanic reservoir of the Island Park-Mount Jackson Rhyolite, Yellowstone (USA). Contrib. to Mineral. Petrol. 173, 1–17. https://doi.org/10.1007/s00410-018-1481-2 Tuffen, H., Castro, J.M., 2009. The emplacement of an obsidian dyke through thin ice: Hrafntinnuhryggur, Krafla Iceland. J. Volcanol. Geotherm. Res. 185, 352–366. https://doi.org/10.1016/j.jvolgeores.2008.10.021 Valley, J.W., Chiarenzelli, J.R., McLelland, J.M., 1994. Oxygen isotope geochemistry of zircon. Earth Planet. Sci. Lett. 126, 187–206. Watson, E.B., Harrison, T.M., 1983. Zircon saturation revisited: temperature and composition effects in a variety of crustal magma types. Earth Planet. Sci. Lett. 64, 295–304. Weisenberger, T.B., Axelsson, G., Arnaldsson, A., Blischke, A., Óskarsson, F., Ármannsson, H., Blanck, H., Helgadóttir, H.M., Berthet, J.C., Árnason, K., Ágústsson, K., Gylfadóttir, S.S., Guðmundsdóttir, V., 2015. Revision of the Conceptual Model of the Krafla Geothermal System. Landsvirkjun. Winpenny, B., Maclennan, A.J., 2014. Short length scale oxygen isotope heterogeneity in the icelandic mantle: Evidence from plagioclase compositional zones. J. Petrol. 55, 2537–2566. https://doi.org/10.1093/petrology/egu066 Wolf, B., Wyllie, J., 1995. Liquid segregation parameters from amphibolite dehydration melting experiments. J. Geophys. Res. 100, 15611–15621. Zakharov, D.O., Bindeman, I.., Tanaka, R., Friðleifsson, G., Reed, M., Hampton, R., 2019. Triple oxygen isotope systematics as a tracer of fluids in the crust: A study from modern geothermal systems of Iceland. Chem. Geol. 530, 1–13. https://doi.org/10.1016/j.jenvman.2020.110644 Zierenberg, R.A., Schiffman, P., Barfod, G.H., Lesher, C.E., Marks, N.E., Lowenstern, J.B., Mortensen, A.K., Pope, E.C., Bird, D.K., Reed, M.H., Fridleifsson, G.Ó., Elders, W.A., 2013. Composition and origin of rhyolite melt intersected by drilling in the Krafla 56 geothermal field, Iceland. Contrib. to Mineral. Petrol. 165. https://doi.org/10.1007/s00410- 012-0811-z 57 III. CRYPTIC WHOLE-ROCK GEOCHEMICAL PATTERNS IN THE STRATIGRAPHY OF THE COLUMBIA RIVER FLOOD BASALTS REVEALED BY MACHINE LEARNING 1 Intro Flood basalts are an exceptional endmember of mafic volcanism in both their volume and duration (Black et al., 2021; Bond & Wignall, 2014; Bryan & Ferrari, 2013; Ernst, 2014; Wignall, 2001). Without direct observations of these flood basalt eruptions in action however, the detailed progression and mechanics of flood basalt volcanism remains poorly constrained. Along with physical mapping and paleomagnetic characterization, geochemical data have long been a basis for understanding the stratigraphy and interpreting the magmatic processes driving these Large Igneous Provinces (LIPs) and their associated flood basalt eruptions (Ernst, 2014). Such whole rock geochemistry, isotopic data and mineralogic data provide insights into the structure of crustal magma transport and chemical evolution of magmas from mantle melting to surface eruption, if their petrologic signatures can be established (Bryan & Ferrari, 2013; Ernst, 2014). However, geochemical variations in LIPs, particularly with respect to crustal magma transport processes, are notoriously challenging to identify, because flood basalts are generally chemically homogenous and often crystal poor (Marsh, 1987). This presents a significant challenge to interpreting the variation and implied petrologic evolution of flood basalt eruptions through time. The small variations that can be observed have been widely leveraged to establish eruptive stratigraphy for LIPs globally (Pearce et al., 2021). Large geochemical datasets collected to understand LIPs (Ernst, 2014), along with datasets available for basaltic systems globally (ex. Ueki et al., 2018), represent an opportunity to test modern tools for pattern recognition and classification of high-dimensional data in a petrologic setting. In this work we build a machine learning (ML) framework that utilizes both supervised and unsupervised techniques to integrate and classify extrusive lava flow data from the Columbia River Flood Basalts (CRB) with the ultimate goal of building a model that can (1) assess the accuracy of established stratigraphy and classify unknown samples in this stratigraphy, and (2) identify geochemical patterns independent of eruptive order that fingerprint the underlying petrologic processes associated with magma ascent. To accomplish this, we develop a workflow for manipulating the feature space associated with major and trace element 58 compositions to identify characteristic patterns of variation and covariation between elements that lead to robust classification (in the case of supervised learning) or dimensionality reduction and clustering (in the case of unsupervised learning). Although the goals of our machine learning exercise are quite general, we focus on the CRB as the youngest, best-preserved, and most comprehensively characterized flood basalt province globally to demonstrate their utility (Camp, 2013; Camp, Ross, et al., 2017; Darold & Humphreys, 2013; Hooper, 2000; Kasbohm & Schoene, 2018; Moore et al., 2018, 2020; Reidel et al., 2013; Reidel & Tolan, 1989; R. Wells et al., 2009; Wolff et al., 2008; Wolff & Ramos, 2013) (Fig.1). Erupted lavas have been classified into 6 formations and 47 members, on the basis of geochemistry, as well as mapped location and magnetic polarity (Barry et al., 2013; Cahoon et al., 2020; Camp et al., 2013; Conrey et al., 2013; Hooper, 2000; Kasbohm & Schoene, 2018; Mcdougall, 1976; Moore et al., 2018; Reidel et al., 2013; Reidel, 1982, 2015; Wright et al., 1973) (Fig.2). The progression of eruptive volumes and tempo defines a rapid waxing, a voluminous “main phase” defined by massive groups of flows (up to 40,000 km3), and prolonged waning characteristic of the LIP life cycle globally (Black et al., 2021; Hooper, 2000; Reidel et al., 2013) (Fig.2). We have assembled a large database of published and unpublished CRB geochemistry amounting to 9,446 samples from CRB lavas from 25 sources to train our model. For decades researchers have collected and characterized samples from CRB lavas across the pacific northwest, amassing a published and unpublished collection of thousands of samples. However, this incredible amount of work had yet to be compiled into a single dataset with samples from across the stratigraphy. Compiling this data therefore became a central part of the research in this study. Through text digitization, personal communication, and published samples, we created a database from over 25 different sources that has whole rock major and trace element chemical information from thousands of samples collected across the CRB stratigraphy. Additional information about stratigraphic characterization and polarity have also been added to this database for further sample characterization where available. After applying the machine learning analysis to this compiled database, we demonstrate 99% effectiveness for supervised classification of unknown major and trace element analyses on a formation level and ~90% effectiveness on a member level, with quantified uncertainty of classification. Supervised learning approaches thus align in the identification of geochemical 59 variations between formations and members and generally confirm the robustness and utility of member-level stratigraphy in the CRB (Fig.2), although we also identify and quantify variable stratigraphic ambiguity. Figure 1 The extent of Columbia River Basalt (CRB) lava flow coverage on the surface (dashed red line) and documented exposures of dike segments exposed throughout the high lava plains (area of interest in Chapter 3). Categorizing unknown data using supervised classification techniques, assumes a known CRB stratigraphy. However, as modern eruptions demonstrate (Lerner et al., 2021; Neal et al., 2019) the petrologic evolution from mantle melting through crustal magma transport and eruption needs not conform to eruptive sequence. Unsupervised learning methods on CRB lavas 60 demonstrate that, while formation-level differences are readily distinguishable, members are blurred when stratigraphic position is not assumed. In particular, unsupervised clustering indicates a far smaller number of categories (perhaps as few as 3 clusters) within the main phase Grande Ronde Basalts Formation rather than the 24 groupings identified in the current eruptive member stratigraphy (Fig.2). Figure 2 From left to right, CRB stratigraphic Formation (left column) younging upward, magnetic polarity, Member classification, estimated volume of each member, the number of samples in the database, and the number of sources that data is drawn from. These clusters likely point to variable Recharge-Assimilation-Fractional Crystallization (RAFC) processes affecting the data variation, and we conclude our study by seeking to interpret unsupervised clusters found according to common petrologic pathways. We focus primarily on the Grande Ronde Formation and suggest that the data is well characterized by three primary signals: the formation of clinopyroxene crystals via fractional crystallization, the assimilation of 61 partially melted crust most likely derived from partial melting of the Idaho Batholith, and recharge of primitive Imnaha-like compositions. Petrologically, these signals provide insight into the processes that give rise to distinct stratigraphic variation in the Grande Ronde, the relative timing of these processes, and the relationship to erupted volumes through time (Fig.2). Unsupervised learning suggests common processes underlying temporally separated members of the Grande Ronde Basalts, which suggest time-varying assimilation versus recharge or crystallization that are reflected in major and trace elemental variation. These inferred phases appear to reflect episodes of large volume flux erupted followed by episodes of recharge of more primitive magma. Episodes of assimilation and recharge, which could reflect periods of enhanced or expanded crustal storage versus vertical transport, imply an interconnected system that evolved significantly through the main phase of the CRB. Synthesis of large, multidimensional petrologic data with machine learning allows us to better understand signatures of geochemical source and path effects in CRB magmas, and thus help establish a foundation for future physics-based models of the CRB eruption sequence. 1.1 The Columbia River Basalts Though the CRB is the smallest of the large igneous provinces known on Earth, it is still estimated to have erupted around 210,000 km3 of basalt onto eastern Oregon, Washington, and parts of Nevada and Idaho, and reached all the way from the inland Columbia Basin to the ocean, hundreds of kilometers away (Reidel et al., 2013; R. Wells et al., 2009). Though previously believed to have erupted over the course of several million years, new estimates from ID-TIMS (Isotope Diluted Thermal Ionization Mass Spectrometry) dating of interbedded silicic material now shows that 95% of the volume of material (known as the “main phase”) was erupted between 16.7 and 15.9 Ma (Kasbohm & Schoene, 2018). This rapid large-volume eruption of basalt has been linked to an episode of climatic forcing that affected the entire globe (Kasbohm & Schoene, 2018). Both the recent timing and location of these basalts has allowed for excellent exposure of the stratigraphy across the Pacific Northwest, making this the most well studied and sampled LIP in the world (Reidel, 2015; Reidel et al., 2013). This wealth of data in many forms makes the CRFB an excellent target for machine learning data analysis and classification. Research to classify this existing stratigraphy of the CRB flows began in the early 1960s (Gibson, 1969; Snavely, 1962; Trimble, 1963; Waters, 1961). During this decade, flows were classified based on field and petrographic evidence. This entailed constraining the extent of 62 flows and matching units based on similar petrographic characteristics (i.e. the presence or absence of olivine and plagioclase) (Waters, 1961). Using these methods, the stratigraphy was classified into two large groupings and further subdivided into the Yakima Basalt group but that was the extent of stratigraphic correlation (Mcdougall, 1976; Reidel & Tolan, 1989; Swanson et al., 1989; Wright et al., 1973). Based on just field observations it was difficult to correlate flows over the vast distances that they cover and also difficult to distinguish units with very similar petrographic characteristics (i.e. aphyric units). The advent of XRF (X-Ray Fluorescence) technology changed that. This technology allowed samples to be analyzed for their chemical composition with relative ease and low cost (e.g. Hooper, 2000). Once CRB flows began to be analyzed by this method, patterns of similarity began to be recognized among groupings of units, and suddenly flows could be recognized and correlated over vast spatial distances (Hooper, 2000). This geochemical analysis was the basis for further subdividing the stratigraphy (Hooper, 2000). However, every flow could not be distinguished readily just with geochemistry, and so researchers began around the same time as the rise of XRF to collect paleomagnetic data (Audunsson & Levi, 1997; Camp et al., 2013; Dominguez & Van der Voo, 2014; Jarboe et al., 2010; Wilson & Watkins, 1967). This data gave a binary (paleomagnetic direction is either Normal or Reversed) feature class to the data that was related to emplacement time and therefore allowed the geochemical analyses to be compiled into a time frame (Dominguez & Van der Voo, 2014). Though the paleomagnetic data didn’t match with Ar/Ar ages (Barry et al., 2013) it did allow for the sorting of geochemical data into a stratigraphy that included a time progression, especially once paired with updated geochronology data (Kasbohm and Schoene, 2018; Reidel et al., 2013). This work to refine the stratigraphic boundaries continues today with researchers using this paleomagnetic, geochemical and field evidence to distinguish the various groups of lavas in the stratigraphy (ex. Davis et al., 2017; Webb et al., 2019; Cahoon, 2020; Cahoon et al., 2020; Wells et al., 2020). Through this stratigraphic classification work, the stratigraphy has been broken down into seven main formations, which have been classified together based on geochemical affinity, paleomagnetic data and radiometric dating and field observations. These formations from, oldest to youngest, are the Steens Basalt, Imnaha Basalt, Grande Ronde Basalt, Prineville Basalts, the Wanapum Basalts and the Saddle Mountain Basalts (Kasbohm & Schoene, 2018; Moore et al., 2018; Reidel et al., 2013). The sixth formation, the Picture Gorge basalt, is known to have 63 erupted around the same time as the Grande Ronde, though new geochronology suggests a complicated emplacement history (Cahoon et al., 2020). The most distinguishing features that separate the units are 1) the presence or absence of plagioclase feldspar, 2) major element and trace element discrimination (Reidel et al., 2002) 3) paleomagnetic polarity (Audunsson & Levi, 1997; Dominguez & Van der Voo, 2014; Wilson & Watkins, 1967), 4) and in some cases isotopic signature can be a helpful designation (i.e. between Steens and Imnaha) and has also been used to inform hypotheses about source and assimilation (Camp et al., 2013; Mcdougall, 1976; Moore et al., 2018; Takahahshi et al., 1998; Wolff et al., 2008). The major and trace element discrimination has been the primary way lavas have been distinguished. The different formations represent both different chemistry but also timing. Each formation has been subsequently broken down into members and individual flows (Hooper, 2000; Reidel et al., 2013; Reidel & Tolan, 1989), although many of these are not formal subdivisions. This stratigraphic sequence helps to define chemical evolution and eruption rates based on volume as calculated from areal extent and thickness estimates (Reidel et al., 2013). It also provides a basis for understanding magma transport processes between mantle source and surface through time. A primary tool for understanding petrologic processes is visual inspection of geochemical biplots, in which pairs of elements or ratios of elements are plotted and assessed in terms of end member models (White, 2013). But in the absence of phenocryst populations and with bulk rock compositions that exhibit overall homogeneity, constraining flow stratigraphy can be difficult based only on binary classification of geochemical data. Discriminant analysis on the Grande Ronde member level chemistry, on a much smaller data subset, has shown in the past that clustering analysis only leads to a 66% reproducibility, when using scores based on simple combinations of linear predictor variables in the geochemical data (Reidel, 1982). Reidel (2002) identified four elements most useful for bivariate classification including TiO2, MgO, P2O5, and Zr, and suggested that trace elements, particularly Ba and Cr, are also useful. This work to understand the geochemical distinctions in the bivariate space represents a foundation for utilizing machine learning on the CRB. There is a large existing database of samples, with clear questions about the robustness of the stratigraphy, and a clear need for a classifier tool for future sample collecting (ex. Steiner & Wolff, 2020). There is a need for an objective framework with 64 which to pose process-oriented questions regarding eruptive tempo of the CRFB and its relation to crustal magma transport, which is imprinted on erupted compositions (Fig.2). The debate between different magmatic processes occurring within the CRB stratigraphy has led to a plethora of research, with no formation garnering more interest and debate than the massive and highly evolved Grande Ronde Basalt Formation and the large (40,000 km3) Wapshilla Ridge Member that erupted at its peak. This formation erupted around 16.5 Ma (Kasbohm and Schoene) and erupted over the course of ~400-500 kyr (Davis et al., 2017). This formation accounts for ~72% (150,000 km3) of the erupted volume of the entire CRB (Reidel et al., 2013). Its chemical and textural homogeneity make it challenging to characterize and assign a clear petrologic origin (Hales et al., 2005; Reidel, 1982; Reidel & Tolan, 1989; Takahahshi et al., 1998; Wolff et al., 2008; Wolff & Ramos, 2013). Unlike main phase eruptions from other large igneous provinces, the Grand Ronde Formation is not a primitive basalt, it is a more evolved basaltic andesite (Hooper, 2000; Reidel, 1982; Reidel & Tolan, 1989). It also lacks significant crystal cargo in most flows and shows remarkable chemical homogeneity over a massive volume of lava (Hooper, 2000; Reidel, 1982; Reidel & Tolan, 1989). We will focus on detailed analysis of the Grande Ronde formation in the context of the broader CRB here, utilizing the machine learning toolbox to evaluate crustal magmatic processes, their relation to stratigraphic position and to erupted volume. 2 Methods To characterize the geochemical data that has come from the CRB system, we apply a variety of different techniques that all contribute to a full quantitative assessment of the variation and groupings in the data, and an interpretation of the processes that may have caused them. This begins with a description of the dataset that we compiled (Section 3.1). This impressive number of samples came from a variety of sources that are detailed in Section 3.1. Before we begin work and interpretation of that data however, we begin with analysis of a synthetic data set (Section 3.3.1) and creation of a regime diagram (Fig.3) to compare the structure of different datasets and compare different features within individual datasets. In section 3.3.2 and section 3.3.3 we then discuss the preparation of our data from characterizing the structure of the distributions that define the dataset (3.3.2) to preprocessing and transforming our dataset in preparation for classification and clustering (3.3.3). In the following sections (3.3.4 and 3.3.5) we then describe 65 our procedure for supervised (Multinomial Logistic Regression) and unsupervised (Gaussian Mixture Model) machine learning on the data. Both processes are fully described by the workflows in Figure 4 and Figure 6 and further detailed in the text. These seemingly disparate methods are seldom used together, but here we combine them to help us interpret many of the attributes that define the multidimensional space of this newly compiled dataset. 2.1 Columbia River Flood Basalts (CRB) Dataset In this section we describe the full process for compiling a database of CRB lavas. The database of CRB whole rock compositions compiled for this study came from the work of more than 25 different research sources spanning 30 years (Fig.2). The samples in this database have been characterized into a stratigraphy based primarily on whole rock chemistry with additional location and polarity information helping to define large scale (formation level) and smaller scale (member level) classes (Hooper, 2000; Reidel et al., 2013). This series of stratigraphic categories implies an eruptive sequence and helps constrain timing of eruption, when combined with geochronology (Kasbohm & Schoene, 2018). We seek to create a model that recognizes the chemistries of each of these distinct formation and member level categories, such that we could classify unknown samples based on chemistry and polarity. In the database we include samples with at least eight trace elements and the full suite of 10 major elements. We only include data run after 1985 to minimize the bias introduced between data produced by older equipment. Data without member level labels is still included and just given a formation ID. Multi-Member Datasets Several existing published datasets are the starting point for any CRB geochemical investigation. Peter Hooper published a seminal dataset that established the full formation and member level identification of most of the CRB lavas (Hooper, 2000). Hooper used the distributions of each element within the collected samples to visually distinguish groupings in bivariate space (Hooper, 2000). This classification was able to separate out many of the members within each formation. The stratigraphy was then further updated by Hooper’s students coworkers who separated out different members and even flows within members (although some of these are still debated in some cases) and linked them to polarity (eg. Wright et al., 1973; Reidel, 1982, 2015; Camp et al., 2013; Conrey et al., 2013; Reidel et al., 2013). 66 In addition to these published datasets, we have incorporated unpublished data from Dr. Steve Reidel and the Washington State Geoanalytical Lab, led by John Wolff, where thousands of CRB samples have been run over decades. The analytical lab provided us with 3,600 samples that each had full major and trace element, polarity, and stratigraphic information, while personal communication with Dr. Reidel ensured that we had full stratigraphic coverage of lava samples. We additionally added data from the USGS and OWRD (Oregon Well Resource Department) groundwater well sites in the region available respectively at the USGS website for the Columbia River Basalt Stratigraphy in the Pacific Northwest: https://or.water.usgs.gov/projs_dir/crbg/index.html and at the Oregon Water Resources Department's GRID Website: http://apps2.wrd.state.or.us/apps/gw/well_log/Default.aspx. The USGS lead a project to drill a series of ground water well sites in the pacific northwest. As the holes were drilled, cuttings and core from the holes allowed researchers to collect geochemical information from the lavas that made up the well stratigraphies (see https://or.water.usgs.gov/projs_dir/crbg/sources.html for more detail on the data sourcing for this groundwater study). Much of this data was interpreted and classified by Dr. Marvin Beeson and Dr. Terry Tolan, who used geochemical indicators to sort the lava samples from the wells into the stratigraphy. This was one of the pioneering efforts to use geochemical data to sort the samples into the stratigraphic nomenclature. This database of hundreds of wells from across the pacific northwest has whole rock major and select trace element (with stratigraphic placement) information for over 500 samples of CRB lavas. The data from each well was extracted from the USGS groundwater site database and compiled into a single dataset and added into the overall CRB lava dataset. Additionally, the USGS used this data in collaboration with others to put forward an official stratigraphic nomenclature for the CRB system (available at: https://or.water.usgs.gov/projs_dir/crbg/nomenclature.png). Combined, these multi-member datasets provided us with a foundation to train a model on. However, some members have been studied in greater detail due to exposures or community interest and thus have more data associated with them that needs to be added into the database. We detail these individual formation or member level datasets next. Early CRB - Steens Basalt To capture the range in composition of the Steens Basalt we pulled data from several sources. The larger datasets that we used to set up the stratigraphy had some small amounts of 67 undifferentiated Steens data (Hooper, 2000; Reidel et al., 2013; Wolff et al., 2008) but data from the Steens Formation was minimal. Most data for the Steens basalt comes from the work of Dr. Nikki Moore who characterized the chemical composition of the exposed lava flows in detail (Moore et al., 2018, 2020). In her work she defined stratigraphic boundaries within the Steens Formation, differentiating between upper and lower Steens with distinct chemical compositions (Moore et al., 2018, 2020). We used these chemical identifiers to break the Steen’s data up into these two upper and lower members. Data was added to the Steens compilation from other sources that also looked at the spectacular exposures in detail. Camp et al. (2013) measured and sampled from several stratigraphic sections around Steens Mtn. to characterize the flows and surrounding geology. Some of these sections included samples of CRB lava data with defined stratigraphic boundaries from the geochemical distinctions (Camp et al., 2013). Basaltic data from this work has been digitized from the published work and added into the larger dataset (Camp et al., 2013). Picture Gorge The Picture Gorge formation has long been an enigmatic part of the CRB system. Despite its location away from the other CRB flows and characteristics such as the presence of olivine, it remains part of the established stratigraphy for the CRB and thus is an important part of our training model. Some picture gorge information was available from Dr. John Wolff and the Washington State Geoanalytical Lab, but most of the data we have from this formation came from the work of Dr. Emily Cahoon (Cahoon, 2020; Cahoon et al., 2020). Dr. Cahoon completed her PhD work on characterizing geochemical information from this formation and followed up with a publication on the same topic (Cahoon, 2020; Cahoon et al., 2020). The data collected from the Picture Gorge formation during these studies has been added to the dataset. Main Phase – Imnaha, Grande Ronde, and Wanapum Basalts The impressive volume, short duration (~750 kyr), and exposure of the main phase lavas of the CRB, which includes the Imnaha, Grande Ronde, and Wanapum Fms., have made them a target for study on the individual member or flow level (Black et al., 2021; Davis et al., 2017; Kasbohm & Schoene, 2018; Reidel et al., 2013). The multi-member datasets provide most of the information for the members within the main phase of the CRB that we possess. However, several small-scale datasets supplemented our geochemical picture of the main phase stratigraphy. 68 For example, the work of Conrey et al. (2013) detailed the geochemistry of three large volume flows of the main phase CRB (Conrey et al., 2013). This work included characterization of the Meyer Ridge member and the Sentinel Bluffs Member both from the latter half of the Grande Ronde Fm stratigraphy (Conrey et al., 2013). It also included sampling and analyses from the Priest Rapids member of the Wanapum Fm. (Conrey et al., 2013). The analyses from this study are included in the compiled database. A small sample batch was also added from the work of Drs Cruz and Streck who sampled several samples from the Birch Creek and Hunter Creek members of the CRB (Cruz & Streck, 2022; Webb et al., 2019). This data has been included in the compilation and was collected from the Littlefield area (Cruz & Streck, 2022). These basalts were collected in conjunction with study of the Dinner Creek Tuff which fell within the lava stratigraphy in the area (Cruz & Streck, 2022; Webb et al., 2019). One of the most controversial members in the CRB in terms of geochemical variation interpretation and in terms of stratigraphic assignment, is the Sentinel Bluffs member of the CRB. With more than 10,000 km3 of lava assigned to this member, it is the third largest member of the Grande Ronde Fm. (Reidel, 2005; Reidel et al., 2013; Reidel & Tolan, 1989). But the variation within the 5 or 6 different flows that have been identified within the member have caused considerable debate within the community (Reidel, 1982, 2005; Reidel and Tolan, 1989; Sawlan, 2017, 2019; Baker et al., 2019). Sawlan (2017, 2019) used geochemical variation in Mg and other elements to identify a signature of alteration within this flow (Sawlan, 2017, 2019). This suggests that the lavas were not preserving information about their source and magmatic evolutionary processes, but rather the post depositional emplacement processes that can affect geochemistry. This is a controversial claim given the suggested lack of evidence in previous thin section and petrologic studies (Baker et al., 2019). While this debate has not been settled, the intrigue of this debate has increased the sampling on this member such that we have been able accumulate 766 samples of this member alone, giving us ample coverage of the flows and stratigraphy of this unit. Combining these individual studies with the large multi-member datasets has given us the necessary geochemical coverage across the main phase stratigraphy in the locations east of the Cascades. However, several of these main phase flows were not localized to the plains, but flowed all the way to the ocean, several hundred kilometers away (Reidel et al., 2013; R. Wells 69 et al., 2009). Considerable work in the area west of the Cascade mountains through the Columbia River Gorge also increased the amount of data coverage for the main phase members (R. Wells et al., 2009). Dr. Ray Wells of the USGS has been an invaluable source of information for the geochemical sampling done in the area. We used data from the Wells et al., 2009 Field Trip Guide which compiled data on the CRB group as they traveled from the Cascades Arc to the Pacific Ocean (R. Wells et al., 2009). Additional data was added from mapping collections. 308 samples of main phase CRB was collected during mapping of the Greater Portland Metropolitan Area and Surrounding Region by Wells and colleagues and published (R. E. Wells et al., 2020). 60 more samples were added from three Map Quads compiled by Russ Evarts primarily: the Geologic Map of the Washougal Quadrangle, Clark County, Washington, and Multnomah County, Oregon (R. C. Evarts et al., 2013), the Geologic Map of the Woodland Quadrangle, Clark and Cowlitz Counties, Washington (R. C. Evarts, 2004) and the Geologic Map of the Saint Helens Quadrangle, Columbia County, Oregon, and Clark and Cowlitz Counties, Washington (B. R. C. Evarts, 2004). The samples collected from the Gorge area provide an important constraint for the maximum geochemical variation possible in these far-reaching flows. In addition to these published samples, we also received a dataset of samples from the Gorge that have yet to be published but help us to constrain the full range of variation in the flows of the Columbia River Gorge. These samples specifically gathered from the Gorge are in addition to data gathered from the USGS and OWRD databases which have hundreds of well sites in the Portland and Willamette Basins. Waning Phase – Saddle Mountains The Saddle Mountains Formation represents the final, waning phase of the CRB eruption (Kasbohm & Schoene, 2018; Reidel et al., 2013)It spans the longest timeframe (more than 11 million years), is the smallest in volume (~2,400 km3 total), is the most well exposed at the top of the stratigraphy and is the most isotopically distinct and variable geochemically of all of the CRB members (Barry et al., 2013; Kasbohm & Schoene, 2018; Reidel et al., 2013). With these features, the Saddle Mountains and its individual members have been well-studied and characterized over the history of CRB study. The compiled dataset contains 1,450 Saddle Mountains geochemical samples. These come primarily from the work of John Wolff and the Washington State Geoanalytical Lab, and 70 the work of Peter Hooper (2000) and Steve Reidel (2013). Additional data from the OWRD and USGS well databases supplemented these large composite datasets. Based on the labels assigned to these samples from these experts who studied and mapped them, there are 11 members represented. Together these data sources allowed us to get excellent data coverage of the stratigraphy within the Saddle Mountains Fm. 2.2 Geochemical Data Analysis Geochemical data analysis in general aims to: 1. Understand the variation in geochemical data and relate that variation to physical process 2. Find groupings or patterns amongst samples in a geological context Here we apply these aims to the CRB using tools of machine learning. In this section we describe the process of quantitative assessment that we carried out to characterize this dataset from preprocessing to clustering and classification. The full workflow for both the unsupervised and supervised analysis is presented in Figure 4 and Figure 6. We have divided these methods based on their utility in analyzing variation through structural geochemical data analysis, finding groupings, and categorization. To address the two goals outlined in Section 1, we develop a machine learning workflow that can assimilate a variety of data types that have been utilized by previous workers to understand and classify variation within the CRB. We largely focus on geochemistry in this work, augmented by paleomagnetic and volcanologic data and petrologic modeling. The workflow detailed in the following sections focuses on a combination of descriptive statistical analysis, training and testing a supervised Multinomial Logistic Regression model, and applying unsupervised Gaussian Mixture Model analysis to the CRB geochemical database. Machine learning is the science of data-driven prediction and classification through computation (Geron, 2017). Machine learning approaches are increasingly applied to fields varying from healthcare, to advertising, to science, with a famous example being image recognition where an algorithm can be taught to recognize patterns within images (Fradkov, 2020; Lee & Seung, 2000). Applications of ML to large datasets generally fall into two categories: supervised and unsupervised learning (Geron, 2017). Here we focus entirely on ML for classification using both supervised and unsupervised methods. 71 One of the central challenges in applying machine learning methods to geochemistry is data density (Petrelli & Perugini, 2016). Geochemical data is labor intensive to collect and is often analyzed in small batches where machine learning or “big data” methods are not necessary. But when enough data is available, as in the case of the CRB, we can apply these techniques to more efficiently identify patterns that may be linked to process, and form categories for classification. Though this is far below the data size of what is normally considered big data, with millions of instances, the CRB dataset is a large geochemical dataset. The dataset has almost 10,000 samples, considerably larger than most geochemical datasets which generally have < 100 samples. With this size of dataset, the machine learning methods proposed here increase the efficiency, efficacy, and quantitative assessment of this complex compilation of CRB lava data. We focus on geochemical data, which generally includes whole rock major and trace element data, isotopic analysis, and mineral phase assemblages and often have 20-30 or more dimensions (each dimension represents a geochemical feature such as element, isotope, etc.). While the analysis of high dimensional compositional data has been a problem in petrology for almost 100 years (ex. Clarke, 1920), research on the use of automated techniques for this kind of analysis has only gained popularity within the last two decades. Other fields that use geochemical data (i.e. hydrology, sedimentology etc. (Chen et al., 2015; Meng et al., 2011; Perol et al., 2018; Shrestha et al., 2008)) widely use these tools on their large datasets, however widespread use amongst petrologists has remained relatively limited (Petrelli & Perugini, 2016). In part, this is due to the small sample size of many petrologic datasets. But as datasets and databases swell with information, the need for automated methods of analyses capable of handling larger more complex datasets grows also. Often in igneous petrology and geochemistry, sample collecting is done with spatial reference information and other descriptive characteristics in mind. Data that already has labels based on this geologic knowledge (ex. Stratigraphic labels, geologic unit labels, age distinctions etc.) are suitable for supervised machine learning. In this type of analysis, a model is trained to recognize that specific data signals are related to specific categories (Geron, 2017). In the case presented here, we train on whole rock major and trace element geochemical information that defines each stratigraphic group within the CRB at the formation and member level. We first randomly split the dataset within each stratigraphic category into a training and a test subset. The 72 training data samples are fed into the algorithm with their known labels to teach the model to recognize the chemistry of each category of interest. The model is then applied to predict the classifications for the remaining test dataset; these predicted classifications can then be compared to the actual known labels associated with each test sample to assess how well the model performs on unknown data. After repeated random splitting (< 50 times), each of which undergoes ~1200 iterations to find the best fit, the highest accuracy model is then used to predict labels for samples without associated category identifications. While this geochemical information is foundational, it is also limited to providing information of chemical similarities. However, in the case of the CRB, other geologic information exists to help discriminate between categories. In addition to using geochemical data in this study, we also provide preliminary analysis using other forms of geochemical data such as paleomagnetic data as a feature for supervised learning. This can improve the model when enough paleomagnetic or other geologic data is present. We also want to understand the petrologic pathways reflected in observed lava compositions. This is a problem where we can leverage the strengths of unsupervised learning, which is defined by a set of algorithms that seek to separate a dataset into categories. Unsupervised learning is often associated with dimensionality reduction or blind source separation (Carniel & Guzman, 2012; Geron, 2017; Petrelli & Perugini, 2016). By using unsupervised clustering on the data, we offer no prior information on categories and use the differences between each sample to create groupings. These groupings may cross stratigraphic boundaries. A challenge with unsupervised learning generally is the interpretation and robustness of categories (Geron, 2017). We use standard approaches for defining clusters in a reproducible way but leverage the petrologic framing of our problem by augmenting the ML with modeling of RAFC processes in major and trace element concentrations to identify probable drivers for the variation observed in the dataset. We can compare both outcomes to auxiliary information about the plumbing system to directly connect variation to process within the plumbing system. In the following section we detail the combined workflow of automated statistical analysis, supervised classification and unsupervised clustering on geochemical data. 73 2.3 Workflow The following section details our workflow, implemented in the free and open-source Python programming environment, that includes a generic analysis of the high-dimensional geochemical dataset structure, followed by supervised and unsupervised algorithms for classification of CRB lavas. 2.3.1 Synthetic Data for Model Testing The ‘black-box’ of machine learning can introduce layers of uncertainty that rapidly diverge from simple interpretation. As such we focus first on simple ML tools, and rigorously test these methods using synthetic datasets that mimic the natural datasets of interest (here, CRB major and trace element geochemistry). To create synthetic data, we use a random sample generator from the Python library SciKitLearn (Pedregosa et al., 2011) that produces high- dimensional synthetic point datasets with user-specific characteristics (such as moments of the underlying distribution). We mimic geochemical data by creating isotropic (“blob-like”) and anisotropic (trend-like) distributions of synthetic samples. Multiple dimensions (>10) were used to simulate the major and trace element compositional data. Geochemical datasets have grown substantially but are not ‘big datasets’ in the modern sense. At a technical level, many “big” datasets such as user databases for social networks like Twitter or advertising databases with customer information such as Amazon which can comprise millions or billions of instances (Norinder & Norinder, 2022). Earth Science also has many of these big datasets, including datasets in seismology, planetary geology, or remote sensing data which can similarly contain millions or billions of sample instances and their variables to be evaluated (Arrowsmith et al., 2022; Azari et al., 2021; Madhukar, 2019). The dataset of CRB lavas has almost 10,000 samples, multiple levels of stratigraphic categorization at the formation and member levels, and 20 > dimensions of elemental concentration and polarity which we increase to ~200 dimensions through combinatorics, all of which makes the analysis and categorization of samples by individual manual bivariate analysis, not feasible. This factor means that these methods for big data analysis, can be effective in helping to quantitatively assess this geochemical data. 74 Most geochemical datasets have tens to hundreds of data points, each of which requires exceptional effort to collect and analyze. A few data repositories of geochemical data have grown to include sample sets that range into the tens of thousands of samples (ex. GeoRoc or PetDB), but these databases require significant data filtering prior to use in a research context. We therefore created synthetic data with 1000 samples and 3 - 5 cluster categories to test the methods used herein on a dataset of similar size to the CRB lava dataset. While this is not considered a “big” dataset in the sense that the above examples have millions of samples, this many geochemical samples for detailed analysis becomes a problem that can be made considerably more efficient and effective by using these objective methods. With a synthetic dataset, we can both ground truth the ML methods, and quantitatively test how the changing structure of geochemical data impacts the effectiveness of these methods. Consequently, we can gain insight into how we might be able to preprocess and manipulate the data most effectively for each ML algorithm. For this step we take advantage of foundational petrologic concepts such as ratio combinations of elements to create datasets where the relevant information can be extracted by the simplest methods. We detail the results of these various preprocessing steps in the following section. 2.3.2 Structural characteristics Finding patterns in multidimensional datasets begins with understanding the underlying statistical structure of the data. This informs which algorithms will be most effective and what we can expect from the outcomes. Characterization of geochemical data can be broken up into element-wise or category-wise statistical analysis. Element-wise (or feature-wise, for a general multidimensional dataset) analysis is effective at giving insight into the behavior of each feature element within the dataset during clustering. If f(x) is the probability distribution function of an element in a dataset with x the concentration, we will use the moments of f(x) around a concentration value c defined formally as ∞ 𝜇 𝑚𝑚 = ∫ (𝑥 − 𝑐) 𝑓(𝑥)𝑑𝑥, (1) −∞ with 𝜇𝑚the mth moment, to characterize the structure of f (Peck et al., 2005). Often the first few moments of a distribution are sufficient to distinguish its characteristics. The first moment 𝜇1 is 75 the mean value of the element (Peck et al., 2005). We describe the second moment of the elemental distribution 𝜇2 (the variance or dispersion) using the coefficient of variation (Ospina & Marmolejo-Ramos, 2019). the 3rd moment 𝜇3 (the skewness) to assess the symmetry of the distribution and the 4th moment 𝜇4 (the kurtosis) to assess the ‘peakiness’ of the distribution (Peck et al., 2005). In addition to statistical moments and descriptive statistics, we can use the dip test and the correlation coefficient to characterize the data structure (Fig. 3). The dip test is a structural characterization that we can carry out to quantify the unimodality of the data (Hartigan & Hartigan, 1985). This test assigns a probability to each feature that it can reject a null hypothesis that the distribution is uniform to assess the modality of the dataset relative to a uniform distribution (Hartigan & Hartigan, 1985). We also utilize the correlation coefficient to describe whether the distribution tends towards anisotropy or isotropy (Peck et al., 2005). The overall shape of the data (isotropic or anisotropic) is critical for picking an unsupervised algorithm to use. For example, the commonly used “k-means'' algorithm is only effective for data that is mostly isotropic in its categories and overall structure (Pedregosa et al., 2011). Figure 3 shows a generalized regime diagram for these three statistics and their impact on clusterability. To create this generalized regime diagram we created 10,000 different synthetic datasets using the sklearn package MakeClassification (Pedregosa et al., 2011) with various data structures. Each dataset was created with 20 dimensions of scatter data similar to the shape and structure of geochemical data. A Gaussian Mixture Model (GMM) was then run on each dataset with the known number of three clusters, and a score (called the HCV score) to assess how well each point fit with its cluster neighbors and original neighbors, was calculated (Pedregosa et al., 2011; Rosenberg & Hirschberg, 2007). In plotting the average structural statistics of the dataset colored by the HCV, the output after 10,000 different dataset analyses reveals a pattern that strongly relates that structure to the outcome of clustering. We can therefore assess an estimate of clusterability of the data based on the structure which can help us to predict how well we can or cannot cluster the data. 76 Figure 3 Combined dip test versus coefficient of variation for 10,000 synthetic analyses. Three synthetic datasets are shown to demonstrate the structure of the data in each colored regime. Colors refer to HCV scores of synthetic data. One of the benefits of first utilizing synthetic data to test the machine learning methods is that we can run thousands of analyses while changing the data’s structure. This synthetic analysis reveals that datasets with minimal variability (i.e. overlap between samples and between groupings) are more difficult to cluster than datasets with an increase in variability and in the separation between clusters (Fig. 3). However, too much variability results in an over abundance of noise, more overlap, and a decrease in the ability for the algorithm to recognize clusters (Fig. 77 3). But with enough variation allowing for separation of samples without introduction of too much noise, clusterability can increase and groupings can be more easily recognized by the algorithm (Fig. 3). Not only does this hypothesized regime diagram allow for assessment of dataset clusterability, but it also allows us to test the effect of different dataset preprocessing transformation steps. The most effective preprocessing steps increase variation in low variation data and constrain variation in datasets with high noise (Fig. 3). Preprocessing steps that decrease the dip test by separating groupings prior to clustering are also helpful to increase the chances of a geologically relevant and successful clustering outcome (Fig. 3). In addition to element-wise analysis of the entire dataset, analysis of each category’s (in the case of the CRB, each Formation or Member) distribution can be an effective way to evaluate the applicability of different ML algorithms. In this work we will use Gaussian Mixture Modeling as a template for unsupervised learning (Geron, 2017). Therefore, we need to assess on a category level whether the classes themselves have Gaussian distributions. This is also critical for supervised machine learning as many supervised techniques assume Gaussian distributions prior to analysis (Pedregosa et al., 2011). To do this we again leverage moment statistics (Peck et al., 2005). In addition to quantifying the shape of the category wise data, we can also use these measures to assess whether our distributions are strongly influenced by outliers. Outliers, defined as data that lies more than three standard deviations away from the mean, can make clustering or classifying unknown samples into that category difficult, and also may have petrologic implications if the analyses are reproducible (Peck et al., 2005). Some geochemical data is ill suited to geochemical classification or clustering. Data that is strongly influenced by outliers or that is strongly overlapping amongst all features is poorly suited to be clustered. The coefficient of variation can help to assess whether our dataset falls into either of these categories by assessing the variation (x-axis of Fig. 3). Data with significant overlap between all samples will plot closer to 0 while data with large amounts of variation that could be an indicator of noisy data will plot at over above 100 (Fig. 3). Strongly unimodal data with overlapping categories and minimal separation between any data categories can be tested by the dip test probability which describes how likely it is that the distribution is unimodal (Hartigan & Hartigan, 1985). In Figure 3, this composes the y-axis, where elements that plot near 1 are considered to be mostly unimodal and elements that plot closer to zero are considered more 78 likely to be multimodal. The correlation coefficient (Colorbar Fig. 3), as well as the descriptive category-wise statistics (mean, standard deviation, interquartile range, kurtosis, and skew) can help us interpret the shape of each category and whether certain methods are appropriate for the dataset being analyzed (Peck et al., 2005). 2.3.2.1 Principal Component Analysis (PCA) Element-wise analysis is useful, but geochemical analyses produce suites of elements and we further expand these using ratios to generate high dimensional datasets, so we therefore compare this to another method for quantifying the relative variability between elements. Principal Component Analysis (PCA) is a matrix decomposition technique that facilitates dimensionality reduction to approximate the original data by rotating the coordinate axes to minimize variability (Aitchison, 1983; Aitchison et al., 1993; Filzmoser et al., 2009; Praus, 2005). PCA decomposition allows us to recognize the features that are responsible for that variation as well as providing tools to describe that variation (de Caritat & Grunsky, 2013; Iwamori et al., 2017; Praus, 2005; Ueki & Iwamori, 2017). From a linear algebra perspective this decomposition and transformation allows us to ask whether there is another basis (set of coordinate axes) that is an orthogonal linear transformation of the original basis and is ordered such that the greatest variance of the data lies on the first of the new basis vectors (called the first ‘principal component’), the second greatest variance on the second, and so on (Aitchison, 1983). The new set of basis vectors completely describe the data, but often only a subset of principal components is necessary to approximate the dataset. In this way PCA functions as a simple dimensionality reduction technique (Aitchison, 1983; Aitchison et al., 1993). The linear algebra behind it is also relatively transparent, but one is not guaranteed that the linear combinations of the original data coordinates within principal components have any physical significance (Lee & Seung, 2000; Vesselinov et al., 2018). This is a significant drawback to the use of PCA in practice and is one reason why we do not rely on this method for full interpretation of process-based clusters. We implement PCA by first defining a covariance matrix for the dataset, then computing the eigenvalues and corresponding eigenvectors and eigenvalues of this matrix, and then sorting the eigenvectors by decreasing eigenvalues (Iwamori et al., 2017; Ueki & Iwamori, 2017). One can assess the percentage of variance explained by these principal components; we will show here that it is sufficient for the CRB to examine only 79 the first five principal components to identify signatures of petrologic processes that drive variation in the geochemical data. 2.3.3 Preprocessing Before we can implement clustering and classification, preprocessing of the data is necessary to remove the influence of different units (i.e. wt.% vs ppm) and of the varying magnitudes of variation within the features. Data can be transformed or preprocessed in a variety of ways. By transforming the data, we seek to enhance variation agnostic of the original units. These preprocessing steps range from simple normalization to complicated non-linear transformations (Dangeti, 2017; Geron, 2017). It can also include steps such as PCA to establish the most useful element features for describing the variation (Praus, 2005). Preprocessing plays a major role in implementation of ML algorithms in most complex datasets. An optimist might consider such preprocessing as a rigorous means of elucidating structure in the data. A pessimist might consider preprocessing a type of tinkering. Regardless, there are many flavors of data transforms that are commonly used for geochemical data (ex. Aitchison, 1983). We tested over 50 different combinations of preprocessing steps, with the goal of finding the best way to get rid of outliers, balance data categories, remove the effect of different units, exaggerate useful variation, and combine features into ratios for dataset enlargement, while getting the highest accuracy out the supervised classification model and the highest silhouette score from the unsupervised algorithm. 2.3.3.1 Removing Outliers Outliers can have a dramatic effect on machine learning algorithms (Geron, 2017; Peck et al., 2005). Some algorithms are designed to take outliers into account, but most assume that outliers have been removed from the dataset (Geron, 2017). Statistically, we therefore need to remove outliers from the data to best parameterize the actual variation present in each of the stratigraphic classes. While geochemical outliers can have important information about the variation within the group, they primarily show places where the lava has evolved or been altered far beyond what we would expect from a source-to-cooling variation in the chemistry, for example alteration (Sawlan, 2017, 2019) or syn-eruptive differentiation (Kumar, 2014). During both supervised and unsupervised learning, we remove outliers on a category-wise basis. The 80 mean (𝜇1) and standard deviation (√𝜇2 ) for each category (or stratigraphic class) is calculated and then used to assign each sample a z-score (Pedregosa et al., 2011). Samples with z-scores that indicate more than three standard deviations away from the mean are removed from the dataset. 2.3.3.2 Box-Cox Power Transform We use the Box-Cox Power Transform as an effective and simple transformation in our workflow (Box & Cox, 1964; Howarth & Earle, 1979; Ueki et al., 2018). 𝑦𝑥−1 𝑖𝑓 𝑥 ≠ 0 𝑦(𝑥) = { 𝑥 (2) 𝑙𝑜𝑔𝑦 𝑖𝑓 𝑥 = 0 In 1964 Box and Cox established a family of functions for transforming the data and creating roughly normal distributions out of skewed multidimensional data (Box & Cox, 1964). They established a way to locate the best fit exponent for transforming the data into a more normal shape (Box & Cox, 1964). With outliers removed from the data, this power transform has the effect of normalizing the raw geochemical data, making it comparable, and enhancing the effects of features that vary significantly relative to the other elements. 2.3.3.3 Ratio Combinatorics Geochemists regularly create ratios during bivariate analysis of geochemical data to make patterns and groupings in the data more obvious. By creating ratios, relative variation between different elements (especially between compatible and incompatible elements) is exaggerated which helps to make variation identifiable. Statistically, we can create ratio combination features that exaggerate the variation in the dataset, which helps us to train an accurate classification model. This gives us additional information features that may be more useful for finding groups. The number of ratio combinations is determined by simple combinatorics. # 𝑜𝑓 𝑟𝑎𝑡𝑖𝑜 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑠 = 𝑛! (𝑛 − 𝑟)! (3) where n is the number of features (elements) in the original dataset and r is the number of features in each combination (two in this case since we are creating pairs) (Peck et al., 2005). Of course, geochemical data often has more complicated ratio sums and combinations as well, so 81 this r value can be expanded depending on the problem of interest and the ratios being created. But in this case, the features reflect all possible monomial ratios of each element without repeating combinations. In this way we also expand our dataset, from a simple 10 major and 5-10 trace element dataset to a feature set of 200 or more features that encompass all possible ratio combinations. This transformation makes the workflow more suitable to the methods that work with large data problems and connects with standard tools in analysis of bivariate diagrams in geochemistry. 2.3.3.4 Balanced Sample Classes One of the primary challenges identified when clustering geochemical data is the lack of even sampling amongst different categories. This is likely due to constraints on sampling in the field, exposure biases, and differential interest in some geologic units over others (Petrelli & Perugini, 2016). There are many ways of dealing with category imbalance during machine learning analyses. The most straightforward method is just to gather more data, however this can be costly and inefficient or even impossible in some geologic scenarios where outcrops are no longer exposed. Random oversampling or random under sampling, whereby a dataset is populated with random samples taken from each category is one of the simplest statistical methods of balancing datasets. In oversampling the number of samples in each class matches the largest sample count in any category, while in under sampling the number of samples is selected based on the smallest group count. However, in a category with few samples, oversampling may have the effect of eliminating variation that is an important part of the geochemistry of geologic units and analyses (i.e. only one sample is chosen over and over again, making the standard deviation zero). In under-sampling, categories are defined based on just a few samples which may oversimplify the variation in the dataset and once again remove important information from the dataset. So, while an important tool, random sampling will always remove important information about variation within the categories and is thus not always advisable. We fit supervised classification models to a variety of balanced datasets (ex. over, under sampled, gaussian up sampling) and then use outside test data with known labels to test the models. While the overall accuracy on the testing data split from the training data goes up, the overall accuracy on unknown samples decreases dramatically (< 50%). We find that balancing 82 categories in the CRB dataset causes over-fitting within the supervised and unsupervised models. As is the case with many geochemical settings, sampling bias in the CRB is a result of exposure and sampling. As Figure 2 shows, more studies have focused on the upper parts of the stratigraphy, but the most samples have been gathered from the highest volume (and highest surface area (Reidel et al., 2013)) units. This therefore reflects a higher probability that any given sample is from the larger volume members. The small bias introduced by this into our dataset is an asset to our probabilistic categorization model which relies on known information from geochemical sampling. We therefore do not balance categories in this workflow, but do ensure that we have a minimum number of samples in each of the 46 members to reasonably represent their variability: the smallest sample number per member in our dataset is 10. 2.3.4 Supervised Classification via Multinomial Logistic Regression In general, the workflow for a supervised machine learning approach follows these steps: 1) quality control of the data and preprocessing (ex. remove outliers), 2) select a model, 3) train a model, and 4) check the model’s performance (Fig.4). Because the method is supervised, outputs will be based on the inputs and the specifications that determine what data the model is trained on. Success in supervised learning methods thus depend entirely on the quality of the training data. The aim of supervised learning for the CRB is to train a model that recognizes the chemistries that define each stratigraphic category in the existing stratigraphy. With this established stratigraphy as our “known” or “actual” ground truth label, a supervised model recognizing the chemical patterns that differentiate the members or formations can be trained and tested using the database of lava chemistry (Steps 2-5, Fig.4). We then compare the supervised categories and unsupervised cluster labels for each sample (Step 6, Fig.4). To accomplish the supervised task of training and testing a model we follow Ueki and Iwamori (2018) and use a relatively simple approach for training and classification: the multinomial logistic regression (MLR). Many complex algorithms of training and testing a classification model on the data exist; we choose this algorithm specifically for its simplicity and transparency of evaluation. Given the relatively small size of geochemical datasets, this method is also extremely efficient at analyzing the data. 83 Figure 4 Workflow for supervised machine learning analysis in this work, describing on the process involved in multinomial logistic regression from preprocessing through training, classification, and visualization. Logistic regression studies the relationship and association between a categorical dependent variable and a set of explanatory or independent variables. The end result of an MLR classification is the probability of a categorical target variable belonging to a certain class (Carniel & Guzman, 2012; Itano et al., 2020; Pedregosa et al., 2011; Ueki et al., 2018) (Step 7, Fig.4). The dependent or target variables in the case of geochemical data and our CRB application are lithostratigraphic correlation, the features of the dataset are the elemental concentrations and other observations, and the instances are individual samples. The MLR algorithm that makes up Steps 2-5 (Fig.4) is as follows (Geron, 2017; Itano et al., 2020; Peck et al., 2005; Ueki et al., 2018): 1) A linear model for the distribution of target classes is defined. In the logit formula (the quantile formula associated with the standard logistic distribution) the regression equation is expressed as a linear model Z of features X and corresponding weights W. 84 𝑍 = 𝑊0 + 𝑊1 × 𝑋1 + 𝑊2 × 𝑋2 + … 𝑊𝑔𝑝 × 𝑋𝑝 (4) 2) Choose values for the weights (W0, W1, … Wgp). In practice this begins with an educated guess to find a set of weights to test first. The set of weights applied to the features gives Z. 3) Z is then used as input into a sigmoid function to assign a probability between 0 and 1 that Z evaluated at X is part of a particular class. 4) We cannot trust any arbitrary weights in Z so we define a cost function. The cost function via a gradient descent algorithm finds the weights that minimize cost while maximizing Z (the probability). There are many kinds of cost functions (Pedregosa et al., 2011). We use the cross- entropy cost function, a maximum likelihood estimator of weights that maximizes probability. The maximum entropy cost function is defined as (Pedregosa et al., 2011), where h(x) is the outcome probability, and y is the cost: 𝑐𝑜𝑠(ℎ(𝑥), 𝑦) = −𝑙𝑜𝑔(ℎ(𝑥)) 𝑖𝑓 𝑦 = 1 𝑐𝑜𝑠(ℎ(𝑥), 𝑦) = −𝑙𝑜𝑔(1 − ℎ(𝑥)) 𝑖𝑓 𝑦 = 0 (5) 5) The model parameters are constrained once a set of weights has been found that best predicts the classes through a gradient descent process through the cost function and beginning with a random set of weights. Best predicting the classes is defined as minimizing the cost function. To accomplish this in the case of a multinomial regression we use 50 iterations of random starting weights and gradient descent calculations (1200 iterations each) to find a model that best represents the known data categories. To train the model, we use a random train-test-split that randomly sets aside 30% of the data to be used as test data. On our dataset of full CRB samples this resulted in ~700-800 test samples, depending on the elements used. 6) For each iteration of the training model described above, test data are classified based on the model, then assessed in terms of accuracy, precision and recall (Pedregosa et al., 2011). Accuracy quantifies how well the model predicted classes match with the real classes for the test data. A high accuracy score indicates that the predicted labels must match the real labels exactly. Precision is defined as the ratio between “true positives” and the sum of “true positives” and “false positives”, indicating how well correct predictions were made (Geron, 2017). Recall is defined as the ratio between “true positives” and the sum of “true positives” and “false negatives”, indicating how well the model identified incorrect predictions (Geron, 2017). Together these three measures quantify the success of our classification model as classification 85 parameters and preprocessing steps are varied. Once the highest accuracy model is chosen from the 50 iterations of starting weights, the model is considered trained. We follow these steps to predict classifications for unknown samples and datasets. For each population that is analyzed, each sample is given a suite of probabilities that provide a quantitative measure for how likely it is that a given sample fits into each class. We then visualize this raw data in three principal ways (Step 7, Fig.4). First with a pseudo confusion matrix that is normalized such that it reads as proportions of each category on the y-axis classifying in each category on the x-axis. Second is with histograms that describe the population output and the average maximum probability for the samples in each class. Third with a circular barplot for each sample that describes the full suite of probabilities for each class, plotting probabilities >10%. An example from classification of test lava data from the data is shown in Figure 5 with an individual circular bar plot showing an affinity for the Wapshilla Ridge member with a 97% probability and a histogram showing the outcomes over the whole population of Grande Ronde samples analyzed in the dataset. Figure 5 A) Example supervised classification for an individual sample from the CRB lava test dataset. The red bar represents the maximum probability, showing 97% confidence that the sample is related to the Wapshilla Ridge member, with negligible secondary probability (blue bar pointing to Grouse Creek). B) Histogram of samples categorized in the Grande Ronde Formation (left) and associated maximum probability of those categorizations (histogram on the right). Dashed black line is the mean of maximum secondary probability per category. 86 87 Table 1. Member and Formation ID Key Formation Formation ID Member Member ID Steens 1 Lower Steens 0 Upper Steens 1 Imnaha 2 Log Creek 2 Fall Creek 3 American Bar 4 Rock Creek 5 Grande Ronde 3 Buckhorn Springs 6 Birch and Hunter Ck 7 Teepee Butte 8 Rogersburg 9 Skeleton Creek 10 Center Creek 11 Kendrik Grade 12 Brady Gulch 13 Downey Gulch 14 Frye Point 15 China Creek 16 Hoskin Gulch 17 Cold Springs Ridge 18 Mount Horrible 19 Wapshilla Ridge 20 Grouse Creek 21 Meyer Ridge 22 Slack Canyon 23 Ortley 24 Armstrong Canyon 25 Indian Ridge 26 Umtanum 27 88 Table 1. Member and Formation ID Key Cont. Formation Formation ID Member Member ID Field Springs 28 Winter Water 29 Sentinel Bluffs 30 Picture Gorge 0 Picture Gorge 31 Wanapum 1 Eckler Mountain 32 Frenchman Springs 33 Shumaker Creek 34 Roza 35 Priest Rapids 36 Saddle Mountains 5 Umatilla 37 Wilber Creek 38 Asotin 39 Weissenfels Ridge 40 Esquatzel 41 Pomona 42 Weippe 43 Elephant Mountain 44 Buford 45 Ice Harbor 46 Lower Monumental 47 2.3.5 Unsupervised Clustering using a Gaussian Mixture Model 89 Figure 6 Workflow for unsupervised machine learning analysis in this work, illustrating preprocessing, Gaussian Mixture Modeling, model assessment, and interpretation based on compositions and petrologic modeling. A generic workflow for unsupervised machine learning involves preprocessing the data, assessing the correct number of components or clusters, assessing the outcome and repeating until the best fit model parameters can be found, and finally interpreting the chosen clustering output. There are more than 100 known clustering algorithms, although not all perform well on geochemical data (Iwamori et al., 2017; Lacassie et al., 2006; Pedregosa et al., 2011; Templ et al., 2008; Ueki & Iwamori, 2017). The statistical analysis described in the Methods section guides the appropriate choice of algorithm. For example, the well-known k-means algorithm is excellent at choosing spherical or isotropic clusters, but when the data shape deviates towards anisotropic or with ellipsoidal clusters then the algorithm fails to predict cluster boundaries (Ellefsen et al., 2014; Geron, 2017; Pedregosa et al., 2011). Other methods, such as DBSCAN method, are excellent for datasets that have a lot of noise but are very dependent on user input and parameter manipulation and can thus be non-unique (Geron, 2017; Pedregosa et al., 2011). Other algorithms can’t handle clusters that have different class sizes or cluster sizes. Choosing the appropriate algorithm is therefore a key ingredient in the success of unsupervised machine learning applications. 90 We choose a clustering algorithm that assumes the data is a collection of Gaussian distributions and doesn’t require the clusters to be a specific shape. Gaussian Mixture Models (GMM) (Ellefsen et al., 2014; Pedregosa et al., 2011). This algorithm generalized k-means to include covariance structure along each dimension of the data. GMMs take a probabilistic approach to drawing high-dimensional Gaussian-shaped cluster boundaries that make up Steps 2- 6 (Fig.6) (Ellefsen et al., 2014; Geron, 2017). Depending on where a point falls within a distribution it is assigned a probability for each category. To assign category boundaries along each dimension, the algorithm maximizes the parameters to describe the cluster boundaries using the whole dataset (Ellefsen et al., 2014). This allows the clustering algorithm not just to assign rigid boundaries but to give each sample a probability that it belongs to a specific class. To accomplish the maximization step the model begins by randomly seeding the components and calculating the probability that each sample was generated by each of those components (Geron, 2017; Pedregosa et al., 2011). This process is repeated until the highest probability can be found across all the samples; this iterative process is called expectation maximization (Dempster et al., 1976). By using these well-established statistical criteria, we can find a “best-fit” even without providing the model with ground truth labels to compare to. In this way we remove information about the stratigraphic boundaries from the model and simply work to find existing patterns and similarities between samples. To run the GMM algorithm, the number of desired clusters (or “components”) must be specified for the dataset (Step 3, Fig.6). In the case of the CRB, painstaking effort has been applied to map and categorize the CRB lavas into a Formation and Member level stratigraphy (Cahoon et al., 2020; Camp et al., 2013; Camp, Reidel, et al., 2017; Hooper, 2000; Moore et al., 2018; Reidel, 1982; Reidel et al., 2013; Reidel & Tolan, 1989). This stratigraphy is primarily defined by the whole rock chemistry, polarity, presence or absence of certain phases like feldspar and other geologic indicators like field mapping and textures (Reidel et al., 2002). However, in the unsupervised analysis of the CRB lavas that we undertake we focus entirely on the whole rock chemistry. This allows us to test relationships between the stratigraphic members and focus just on identifying processes that affected the chemistry and may have repeated over the course of the stratigraphy. To objectively describe the “best” number of clusters we use two metrics to assess how well a given number of clusters fit the actual breaks in the data (Step 3, Fig.6). The BIC (Bayesian Information Criterion) and AIC (Akaike Information Criterion) metrics both 91 favor the most efficient model that maximizes the probability that each sample belongs to a given class (i.e. The lowest AIC and BIC scores) while also providing us with the least complicated components (Hastie et al., 2001). Together they allow us to assess which number of clusters provides us with an appropriate model to describe the similarities in the data without overfitting (Hastie et al., 2001). 2.3.5.1 Measuring success Measuring the success of unsupervised learning can be very difficult when there are no absolute ground truth labels for the categories in the data. We use AIC and BIC to assess the success of various clustering numbers in the GMM algorithm (Hastie et al., 2001) (Step 3, Fig.6). These should both be minimized to ensure that the simplest model with the best outcome is implemented; finding the lowest point with less change to the following configuration than the previous, constitutes the minimum and the basis for the elbow plot method. Once the number of components has been chosen, we assess the clustering success using two different metrics: the V score and the Silhouette Score (Geron, 2017; Hastie et al., 2001; Pedregosa et al., 2011; Rosenberg & Hirschberg, 2007) (Step 5, Fig.6). The Silhouette Score is a classic metric for unsupervised learning because it does not require comparison to a ground truth label. “The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample.” (Pedregosa et al., 2011). The Silhouette Coefficient for a sample is (b - a) / max(a, b) where b is the distance between a sample and the nearest cluster that the sample is not a part of (Pedregosa et al., 2011). We use this to define an objective measure of how overlapping the clusters are. Silhouette scores near 1 indicate that the samples are clustered near neighbors that they are close to and that are all far away from the other clusters, while values around 0 indicate strong overlap between clusters and negative values up to -1 suggest that samples have been mis-clustered. V-measure scores (or HCV scores) on the other hand do require a ground truth label. The v-score combines measures of homogeneity (each cluster only has samples from a single class) and completeness (all samples that have the same label are clustered together in a category) by taking the harmonic mean of the two measures (Rosenberg & Hirschberg, 2007). In order to get a perfect clustering configuration (1) both the homogeneity and completeness must be maximized suggesting that all samples are categorized where we expect them to be (Geron, 2017; Rosenberg 92 & Hirschberg, 2007). We utilize this both in the creation of the synthetic statistical regime diagram as well as for assessment of our supervised algorithm (Step 5, Fig.6). 3 Results 3.1 Statistical Structure of CRB Chemistry To quantify the structure of the CRB dataset in each feature dimension and within each category (in this case stratigraphic labels), we identify elements with strong differences between the groupings versus elements that are similar amongst most of the samples. This is critical to better understanding the ultimate outcome of the machine learning analysis, by assessing which data dimensions (elements or ratios) will be the most helpful for the clustering algorithm and which ones may be the best for illustrating differences or variation in the data. In the CRB dataset with all formations and members of the stratigraphy included, there is a clear separation between elements that appear similar for most samples versus those that have strong differences within the samples. P2O5, TiO2, K2O, Cr, Ba, Rb, and Ga appear to be ideally placed in terms of the coefficient of variation within the dimension as well as the very low dip tests which reject the null hypothesis that these are unimodal distributions (Fig. 7A). Ni and Cu show very high coefficients of variation and Ni has an additionally low dip test score (Fig. 7A). SiO2, Al2O3, CaO, MnO, Y, and Sc all have lower coefficients of variation indicative of overlap between the data samples, but these dimensions also appear to have low dip test values indicative of clear separations between groups of samples within those elements (Fig. 7A). Within the Sr, Nb, Zr, V, MgO, and Na2O dimensions we cannot reject the hypothesis that these are unimodal distributions based on the dip test score (Fig. 7A). In these same dimensions, the coefficient of variation is also low, indicating overlap between many of the samples in that dimension (Fig. 7A). 93 Figure 7 A) Element distributions in the full CRB database plotting on the regime diagram of Figure 3. B) The same analysis but with the interquartile range instead of the Coefficient of Variation (outliers have less of an effect) C and D) The same analysis as A and B but restricted to data from the Grande Ronde Formation. Position on the regime diagram provides important clues as to the underlying petrologic processes. Our first notable statistical result is the relatively strong influence of Ni, Cr and V (Fig. 7A). These elements are highly compatible within olivine in the case of Ni, or 94 clinopyroxene (White, 2013). Crystallization of clinopyroxene particularly affects concentrations of Ni, Cr and V together as a group (White, 2013). The other group of elements that appears to be responsible for most of the variation based on the statistical output are more incompatible elements (White 2007). K2O, Rb, Ba, and TiO2 are all low dip tests and high variation, suggesting we could identify different groupings with these elements and that they can help describe variation seen between members (Fig. 7A, B). This has been done previously using many of these highlighted elements to distinguish between members or even between drill core samples (ex. Hooper and Hawkesworth, 1993; Reidel et al., 2002; Tolan et al., 2009). This is immediately suggestive of an interaction with a more incompatible rich source, beyond the evolution usually observed in flood basalt provinces (Reidel & Tolan, 1989; Wolff & Ramos, 2013). When we put these feature dimensions into ratio form, we have the effect of exaggerating this variation, which is why ratios are useful in a petrologic context for investigating the variation. The most incompatible and most compatible elements appear to make the ratios that describe the greatest variation. Defining most of the variation in the data are three ratios: Ba/Ni, Ba/Cr, and Ba/MnO (Fig.8A, B). In addition to these 3 high variance ratios, V/MnO and Sr/MnO also describe a large amount of the variation in the data (Fig.8B). Together these ratios appear to describe nearly 80% of the data (Fig.8A). These ratios indicate a hierarchy between these high elements with Ba carrying a significant portion of the variation accompanied by Cr and then V and Sr (Fig.8B). MnO appears to be acting nearly as a trace element in this data with high variation that helps to describe these ratios (Fig.8B). While PCA components do not necessarily have to be geologically relevant due to the orthogonality constraint in the calculation of the eigenvectors, this type of PCA analysis can greatly inform our understanding of the high variation elements that describe the data variation. The results of the ratio analysis on the entire CRB are shown in the PCA biplot in Figure 8. 95 Figure 8 Principal Component Analysis of full CRB lava dataset. A) Explained variance per principal component dimension. B, C, and D) The first 6 principal components (87% of the variation). On the 3rd, 4th ,5th, 6th principal components only the element labels are shown for simplicity. The 1st and 2nd principal components include ratios. In addition to analyzing each element dimension of the full sample set, we can also assess the distribution within member groups in the stratigraphy. To assess the structure of the distribution within each member and the importance of outliers, we primarily want to know to what degree each dataset feature can be treated as a Gaussian. We also compare this to a hypothesized gaussian structure to compare the standard deviation and inter quartile range of a Gaussian (Peck et al., 2005). We use these combined metrics to assess the shape and structure of the data. 96 𝐼𝑄𝑅 = 1.4826 × √𝜇2 (6) Most members in the CRB dataset have roughly symmetric, gaussian distributions of samples (Fig.9A, B, C, D, E, F, G, H). However, there is a group of ~10 members that have excess positive kurtosis values that indicate heavily tailed distributions (Fig. 9A, B, C, D). While this is a significant number of members, most of the skew and kurtosis come from small groups of samples with chemistry outside the general distribution. For example, the largest kurtosis is found in the Sentinel Bluffs Member of the CRB which has a distinct outlier group within its chemistry (Fig. 9A, B, C, D). This is indicative of outliers within our dataset, which could either be a signal of a group of lavas with an unusual and specific process associated with its petrogenesis or a signal of alteration (Fig. 9A, B). Certainly, in members such as the Sentinel Bluffs, alteration has been suggested as an important source of variability within the population (Sawlan, 2017, 2019), but other researchers favor the position that the datasets of lava geochemistry reflect mostly non-alteration related variation (Baker et al., 2019). When carrying out machine learning analysis of this dataset we therefore perform each analysis twice, with and without outliers, to ensure that we understand the full effect of this small group of outliers (<0.5% of the data) influencing the kurtosis, as a first pass at understanding the overall variation in the dataset (Fig.9A, B). Kurtosis and skew are each plotted against the number of samples within each grouping and no systematic pattern is observed (Fig.9A, B, C, D). This indicates that there is no bias introduced by the imbalanced sample classes in our data (Fig.9A, B, C, D). Figure 9 Statistics for two element distributions are shown for MgO and Ba. A and B) Count in each member versus the Kurtosis with the color indicating the difference from the expected Gaussian line in column 1, and the Gaussian limits of -3 to 3 shown by the dashed line. C and D) Same analysis but with the Skew instead and the bounds between -1 and 1, which indicate near Gaussian distributions. E and F) IQR vs 2* Standard Deviation with the expected Gaussian line shown in grey and the difference from that expected as the colors. G and H) The same analysis as E and F, however outliers are removed from the data. 97 Figure 9 98 Figure 9 Cont. 3.2 The Grande Ronde Formation In what follows we will focus some ML efforts exclusively on the Grande Ronde due to its cryptic homogeneity and importance to the CRB life cycle as a whole (Fig 7C, D). Though this formation accounts for most of the volume of the full CRB, the elements that capture 99 variation within this formation are slightly different than that of the full CRB dataset (Fig. 7C, D). Variation in the data appears to be primarily dominated by the behavior of Ba, Ni, Cr, and Cu which all have very high variation (Fig. 7C, D). The dip test for Ni, Cr, and Cu is mid-range indicating that some separation in the data may be visible, but it isn’t possible to reject the null hypothesis that this is a unimodal distribution (Fig. 7C, D). This could indicate a more trend-like behavior of the data in those dimensions. The lowest dip test scores come from MgO, CaO, K2O, Ba, Rb, Ga, while TiO2, Sc, and Y also appear to play a role in separating data groups (Fig. 7C, D). Overall, the variation in the Grande Ronde Fm. is much smaller (CV scores) than the full CRB (Fig. 7), which provides us with further motivation to employ sensitive machine learning techniques to the analysis of this subset of the dataset. In the various distributions of member geochemical data (Fig. 9), we see that several of the members with the greatest kurtosis and non-gaussian skew in the CRB are in the Grande Ronde Fm. This includes the Wapshilla Ridge (member #20), the Grouse Creek (member #21), and the Sentinel Bluffs (member #30), which are some of the largest volume eruptions in the CRB (Fig.9, Table 1). This may indicate that such large volumes, that covered extensive surface areas, may have experienced more variable alteration than other smaller volume members and therefore resulted in larger outlier groups (Fig. 9). This is potentially consistent with hypotheses indicated by other geochemical indicators (Sawlan, 2017, 2019), but the small number of samples within this group from the overall collection of data samples shows that it is not a pervasive trend within the stratigraphy. 100 Figure 10 PCA on the Grande Ronde Formation subset with outliers removed. A) The explained variation per feature dimension added. B and C) The eigenvectors for the first principal component in this analysis, in biplot form and just eigenvectors showing a contrast in elements that are incompatible (Na2O, K2O, P2O5, Ba, Rb, Sr, Zr, Y, Nb, Ga) and compatible (MgO, CaO, Al2O3, Ni, Cr, Sc, and Cu) in basalts. D) The biplot for the 3 rd and 4th principal components. If we carry out PCAs, we see that the same elements of variation are identified within the Grande Ronde basalts as when we analyze the descriptive statistics and regime diagram (Figure 10). In both the case with outliers without, there are two main groups of elements that account for most of the variation amongst this data (Fig.10A, B). The first group is defined by variation in Cr, Ni, MgO, CaO, Sc, and Cu (Fig.10B, C). These are all considered compatible in 101 clinopyroxene (White, 2013). The other group accounts for very similar magnitudes of variation in the data but in direct opposition to the first group. This second group is defined mainly by SiO2, K2O, Na2O, Rb, and Ba, all considered highly incompatible elements in basaltic melts (Fig.10B). These two groups of elements appear to split the data into two rough clusters and define most of the variation therein, providing a hint for supervised and unsupervised machine learning that we should be able to find groupings within the Grande Ronde and use high variation elements to help us understand the processes that incurred this pattern of variation. 3.3 Robustness of CRB Stratigraphic Labels using Supervised Classification Our goal is to train a multinomial logistic regression supervised classification model to test published stratigraphic units at both the formation and member levels on the basis of geochemistry (or geochemistry + magnetic polarity) (Fig. 2). Once trained, this model will be useful for classifying unknown samples; here we simply carry out the process of training and testing the model on data with known stratigraphic labels. This allows us to first test the robustness of the known stratigraphic labels in the context of MLR regression. We test the stratigraphy on the basis of chemistry alone as well as geochemistry + polarity as another geologic constraint to test the stratigraphy. Additionally, the effect of outliers in the classification is tested through different applications of preprocessing techniques. To visualize this, we plot each sample based on the model predicted stratigraphic category vs the original labels given to the sample in the dataset, defined by (Hooper, 2000; Reidel et al., 2013; Tolan et al., 2009). This ‘confusion matrix’ also represents the relative proportion of each known label classified into a predicted group (Fig. 11). This is a normalized confusion matrix to show the proportion of each sample class (y-axis), that was categorized into each predicted stratigraphic label (x-axis) (Fig. 11). 102 Figure 11 Formation level MLR supervised learning on full CRB dataset. These normalized confusion matrices are read via rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. A perfect output would only have values on the diagonal. Right) Formation level test on full CRB dataset without outliers. At the formation level, the best training score on the preprocessed features resulted in an overall accuracy of 99% with Picture Gorge at the lowest accuracy group (82%) and Grande Ronde, Wanapum and Saddle Mtns. all having 100% accuracy on the test data (Fig. 11). Recall and precision were also both .99. When outliers were removed, overall accuracy remained 99% however, individual class accuracies varied slightly (Fig. 11). Notably there appears to be a group of Steens samples that look very similar to the Grande Ronde, resulting in a drop in accuracy in the Steens class (Fig. 11). Bootstrapping (random oversampling) of the data results in overall higher accuracy but appears to over-fit the data and emphasize the smaller formations in a disproportionate and misleading way. This extremely high accuracy is suggestive of a strong stratigraphic framework overall, with a few instances within the stratigraphy where members appear more closely related within the formations than is suggested by hard stratigraphic boundaries. This is particularly true in the Grande Ronde Formation, which is motivating for our detailed analysis of that formation. 103 With the full CRB dataset, we also carry out the training and testing procedure on the member level with the same threshold of 85% accuracy for use on unknown samples (Fig. 12). The highest overall accuracy achieved at the CRB member level is 90%, which suggests that the stratigraphy can be well recognized by the model at both the member and formation level (Fig.12). The most easily recognizable formation (i.e. the formation with the highest accuracy and highest acreage accuracy per group), is the Saddle Mountains Fm. (Fig.12). This is also the formation with the largest variation, based on the observation that many of the members in the Saddle Mtns. have large interquartile ranges, kurtosis that is indicative of heavily tailed distributions but skew that is indicative of reasonably symmetric distributions (i.e. non outlier groups) (Fig. 9). The Wanapum Formation is comparably good to the Saddle Mtns. classifications based on the accuracy of each class (Fig. 12). The boundary between the Upper and Lower Steens Formation (Jarboe et al., 2010; Moore et al., 2018) appears to be a challenging boundary for the algorithm to recognize and indicates a more gradual transition in chemistry (Fig. 12). The most disagreement and lowest accuracy for individual groups comes from the Grande Ronde Fm. (Fig. 12). Within this formation, some samples tend to group with their neighbors and show some overlap between the known stratigraphic categories (Fig. 12). For example, the three members prior to the Wapshilla Ridge member (Mount Horrible, Cold Springs, and Hoskin Gulch) all have samples that tend to have maximum probabilities that place the samples into the Wapshilla Ridge category rather than any of the respective other categories, which signals overlap in the multidimensional chemistry of these different samples originally put into different stratigraphic classes (Fig. 12). This is not a unique occurrence within the Grande Ronde; when individually modeled, the highest accuracy achieved for the Grande Ronde Fm. member level classification is 87% (Fig. 13). This lower overall accuracy than other formations in the CRB, suggests that there are commonalities between the chemistries of several Grande Ronde members, and warrants further investigation through unsupervised clustering to understand the relationship between the classes (Fig.13). 104 Figure 12 Member level MLR supervised classification on full CRB dataset, preprocessed by ratio combinatorics and a Box-Cox power transform. Normalized confusion matrices are read in rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. A perfect output would only have values on the diagonal. The formations are highlighted with x and y axis labels (Member names) following Table 1. 105 Figure 13 MLR supervised classification of Grande Ronde data. Left) the member level test on the Grande Ronde data subset with outliers included. Right) the member level test on just Grande Ronde data with the outliers removed. Member names follow Table 1. To improve the identification of each stratigraphic label in the supervised training model, we can add additional geologic information, such as polarity, observational data, or isotopic information (Fig.14). While many unknown samples do not have polarity measurements associated with them, polarity can be one of the best ways to improve the model training. We add polarity as a binary 0 or 1 with an additional 2 to indicate transitional polarity. When applied to the supervised learning model, it can improve the Grande Ronde sub formation level classification by ~5% (Fig. 14). This allows us to better test unknown samples, but only if they have associated polarity data. 106 Figure 14 Member level MLR supervised learning on full CRB dataset preprocessed with magnetic polarity included as a feature. Normalized confusion matrices are read via rows across. Each row is the actual label and values in the boxes show the percentage of each category that is classified in that predicted stratigraphic label. X and y axis labels follow Table 1. 3.4 Unsupervised Clustering to Elucidate Common Petrologic Pathways The results of the supervised classification invite further investigation and exploration of the groupings within the main phase of the CRB. Our first question is: if we applied an unsupervised classification of the CRB at Formation or Member level, do we recover the known stratigraphy? We will find that we do not; subsequently, we focus on the Grande Ronde (GR) Formation as the source of the most disagreement with the known data classes. We undertake an 107 unsupervised clustering workflow, with a specific focus on the Grande Ronde data to understand the commonalities and groupings that may exist within the known stratigraphy and whether these groupings may lead to insight about petrologic processes occurring during Grande Ronde emplacement. 3.4.1 Gaussian Mixture Model (GMM) Cluster Number Analysis The first step to any clustering analysis is assessing the correct number of clusters or groupings that define the dataset. Here we use the simple method of judging the number of clusters based on the smallest number of clusters that relates to the lowest objective AIC and BIC scores in which adding another cluster would reduce the scores less than it would increase them by using fewer clusters. This is informally referred to as the elbow plot methods of picking a cluster number from an AIC and BIC score analysis, as the correct number of clusters is presumed to sit at the “elbow” of the plot with the largest change in slope (Dangeti, 2017; Hastie et al., 2001). When this analysis is carried out at the formation level for the CRB to assess the number of clusters that best describes the lowest score using the least number of clusters, it appears that 6-9 clusters is an appropriate number of clusters to use (Fig. 15A, B). This is consistent with the actual number of formations in the dataset (6) and provides an important indicator that formation level chemistry is robust in both supervised and unsupervised machine learning. Beyond the formation level however, we can also assess how the dataset breaks down when forced to break the formations up into smaller categories with an increased number of clusters. Primarily, once we increase the number of clusters beyond the number of formations, the Saddle Mountains Formation is decomposed into several groups while the main phase, which overlaps significantly, is labeled as a signal group until the Grande Ronde is broken down into several groups (Fig. 16A). This is consistent with previous interpretations and observations that the majority of the variation within the entire CRB is captured by the Saddle Mountains Formation and not by the main phase (Camp, Reidel, et al., 2017; Conrey et al., 2013; Hooper, 2000; Reidel et al., 2013; Wolff & Ramos, 2013). We can see this breakdown by formation illustrated in Figure 17. At first with a growing number of clusters, only the Saddle Mountains Fm is broken into different clusters, while the main phase remains grouped together (Fig. 17). While this is an important pattern for understanding the variation, we ultimately want to know 108 what happens to the cluster configuration as we approach the number of categories that exist in the known data labels (47). Figure 15 A) Assessment of GMM clustering success via AIC and BIC scores for the entire CRB dataset and its 47 members. This plot elbows at 3 clusters and again at 11 clusters. B) AIC and BIC scores for the entire CRB dataset with outliers removed. The plot elbows most substantially at 5 clusters. 109 Figure 16 A) Assessment of GMM clustering success via AIC and BIC scores for Grande Ronde data and its 24 associated members. The plot elbows at 5 clusters. B) AIC and BIC scores for the subset of just Grande Ronde data with the outliers removed from the group. The plot elbows at 3 and slightly again at 5 clusters. As the model approaches n = 47 clusters, several formations are broken down into a number of smaller groupings that are consistent with the number of known member groupings in each formation. This is an unexpected result and shows that in the Imnaha, Steens, Wanapum and Saddle Mountains, the number of smaller sub formation groupings in the unsupervised outcome is similar or the same as the number of member groupings in the stratigraphy (Fig. 16A, B). However, the number of eventual clusters within the Grande Ronde is only 11 as compared 110 to the 24 members in the defined stratigraphy. Figure 17 shows the known number of members for each formation (dashed lines) and the clustering breakdown of each formation as the overall CRB is divided further and further. As we approach 47 total clusters, each formation is broken down into the correct number of clusters (Fig. 17). The only formation for which this is not the case, is the Grande Ronde (Fig. 17). The Grande Ronde is known to have 24 different member classes, however with 47 total clusters only 11 groupings are broken down within the Grande Ronde (Fig. 17). Similarly, to the results of the supervised training, we see that the Grande Ronde has a tendency to group together more than the stratigraphic boundaries might indicate (Figs. 16A, B, 17). Because the Grande Ronde appears to exhibit the most non-uniqueness at a member level within the CRB, we focus our efforts towards understanding variation within the Grande Ronde specifically. There are 24 members of the Grande Ronde Formation defined in the stratigraphy (Fig. 1). However, AIC and BIC analysis of the GMM results indicates that three clusters are appropriate for breaking up the data (Fig. 15). Further breakdown into more clusters does not offer much improvement in either score and appears to just break up one group into smaller groups while leaving the rest of the stratigraphy alone (Fig. 16). We can assess the compositions of a three cluster configuration in the Grande Ronde and its petrogenetic implications. Figure 17 GMM output sorted by Formation of original samples, as a function of user-specified number of clusters. The dashed lines show the actual number of members within each labeled formation. The far right is 47 clusters, the number of Member labels in the CRB database. 111 3.4.2 Grande Ronde Petrologic Affinities Each of the best Grande Ronde clusters with n=3 component clusters for the GMM, has an average composition that is plotted in Figure 18. We compare this composition through normalization to average Imnaha (Fig. 18) and average MORB composition. When outliers are removed from the Grande Ronde dataset, three clusters are identified as the most efficient and descriptive clustering configuration (Fig. 16A). Most of the data in the main phase Grande Ronde sits in one cluster, as evident in the PCA biplot of the Grande Ronde data that includes the clusters (Fig. 10B, D). In this plot we see a group of outliers described by principal component 2 (PC2) but the rest of the Grande Ronde data is squarely defined by principal component 1 (PC1) in a single trend through the origin. However, though there is one main cluster, when we highlight that variation (Fig.10B) we can see that the main cluster can be divided into 2-3 groupings. This is consistent with the unsupervised analysis that suggested 3 components appropriately described the data. Compared to the average Imnaha composition, which has been hypothesized as a parent basalt composition for the Grande Ronde (Wolff & Ramos, 2013), we see a few striking departures in the cluster compositions. Immediately we note that these patterns are hosted in the same elements that the statistical regime diagram highlighted as critical to understanding the variation and discerning groupings (Fig. 7A) and that are highlighted by a PCA biplot (Fig.8, 10B). This is significant because it shows agreement in the assessment of the data between statistical, unsupervised, and supervised machine learning tactics, despite the distinct inputs and algorithms used by each method. All clusters in the Grande Ronde are significantly enriched in K2O, Ba, and Rb relative to the Imnaha and strongly depleted in Ni, Cr, Nb, and Cu (Fig. 18A). Two of the clusters are also slightly depleted in MgO (Fig. 18A). Relative to the composition of an average MORB basalt, the Grande Ronde is strongly enriched in K2O, Ba, Rb and slightly enriched in Sr and Nb (Fig. 18A). Cluster 0 has the closest composition to the Imnaha basalt, while clusters 1 and 2 deviate in their trace element compositions by orders of magnitude from the Imnaha basalt composition (Fig. 18A). Such deviations hint at two petrologic processes. Depletions in Cr and Ni are often caused by fractional crystallization of mafic phases such as clinopyroxene and olivine, while enrichments in mafic magmas in incompatible elements are often indicative of interaction between the basalt and another highly evolved melt (White, 2013). 112 Figure 18 GMM cluster composition, assuming 3 clusters. A) The average composition of each cluster normalized by the average composition of the Imnaha formation. B) Bivariate plot of normalized Ba vs Rb compositional data, colored by cluster association. Further detailed inspection of the cluster compositions also reveals important patterns in the cluster compositions relative to one and other, not just relative to the Imnaha basalt. Cluster 0 is defined by relative enrichments in MgO, CaO, Ni, Cr, and Cu and depletions in more incompatible elements (Fig. 18A, B). Cluster 1, the most populated group, is most centrally defined from the other clusters by the largest depletion in Cr and the largest enrichments in Ba, Rb, and Zr (Fig. 18A, B). However, cluster 2, very similar in composition to cluster 1, experiences less of the depletion in Cr while experiencing all of the enrichment in the trace element and more incompatible element compositions (i.e. K2O) (Fig. 18A). We can then place these clusters back into their stratigraphic context (Fig. 19). This allows us to relate these variations in geochemistry to the timing, volume and general dynamics of this massive eruption. The following section details the preliminary interpretation of these three clusters and the overall results of the machine learning analysis. 113 Figure 19 Three cluster breakdown for the Grande Ronde of each sample colored by the stratigraphic member label on the y-axis vs the GMM cluster label on the x-axis. This is compared to the volume of the different Grande Ronde members on the right (Reidel et al., 2013), with comparisons of cluster 2 in red and comparisons of cluster 0 in blue. 4 Discussion The primary goals of this study are to apply a machine learning workflow to the Columbia River Basalt lava geochemistry, building models to (1) objectively assess established stratigraphy and classify unknown samples, and (2) fingerprint the underlying petrologic processes associated with magma ascent by investigating geochemical patterns independent of eruptive order. To accomplish this, we built both a Multinomial Logistic Regression model (supervised ML) and a Gaussian Mixture Model (unsupervised ML), validated with synthetic data that mimics the structure of a newly assembled CRB dataset spanning major and trace element analyses. The following discussion section has two main components. First, we discuss the implications for the robustness of the CRB stratigraphy based on the supervised and unsupervised output, and the prospects for classifying future samples, including intrusive rocks from widely exposed dikes. Second, we discuss the implications of our unsupervised results for petrologic evolution and magma transport during the CRB, particularly the Grande Ronde Formation. To interpret clusters in terms of petrologic processes, we bring in an additional tool, the equilibrium thermodynamics Magma Chamber Simulator (Bohrson et al., 2014; Bohrson & Spera, 2001), to study how RAFC processes under idealized scenarios may affect magma chemistry. We finally compare the interpreted clusters according to their stratigraphic position and discuss implications for time-evolving magma transport within the main phase of the CRB. 114 4.1 Implications of supervised classification and unsupervised clustering for the Stratigraphy of the CRB lavas Within the supervised confusion matrices there is >95% accuracy in category identification in the Wanapum and Saddle Mountains as well as >90% accuracy in identification of Steens and Imnaha Formations (Fig. 11). However, within the Grande Ronde a larger pattern of disagreement occurs with an accuracy around 87% (Fig. 12, 13, 14). Several of the members are misclassified with neighboring members, while others categorize across the stratigraphy (Fig. 12, 13, 14). The first pattern of disagreement potentially indicates chemical similarities between stratigraphic neighbors, such as Mt. Horrible and the Wapshilla Ridge member (Fig. 12). These classify with more than 30% disagreement from the original stratigraphic (Fig. 12). A plausible implication is that these members are genetically related. For calculating volumes associated with different members, this could increase the volume estimates where members tend to experience overlapping chemistry. For example, if Mt. Horrible and Wapshilla Ridge members were categorized as a single member, the volume would increase around 20% from 40,250 km3 to an immense volume >48,000 km3 (Reidel, 2015). The relative failings of our MLR model on the Grande Ronde (keeping in mind an overall accuracy of 90% at the member level throughout the CRB), and groupings that instead arose within this part of the stratigraphy, suggest more variation in erupted lavas than is simply described by the existing stratigraphic framework. That labels cross stratigraphy to group members together, suggesting common processes occurring at multiple times within the stratigraphy (Fig.12). Where this occurs, we can investigate which processes might be inducing the signal based on elemental variation and comparison to known variations. The supervised classification tool also offers a probabilistic framework for quantifying matches. As shown by Figure 5, probabilities of classification that are large (but not the largest) indicate other members with similar major and trace element compositions. This provides a starting point for future critical evaluation of the stratigraphy. Here we use it simply as a guide for confidence in classification, and we suggest that this has great utility for the assignment of future samples into the CRB stratigraphy. We additionally provide a tool for analyzing the probability assignments within the entire population (histogram plots) and for each individual sample (circular barplots) (Fig.5). With these assignments we can then compare the unsupervised outcomes to the original labels and supervised outcomes (Fig. 20, 21). 115 Figure 20 Unsupervised cluster output for the CRB database against the actual formation ID of each sample. Sample numbers in each category are plotted in the histograms. Formation ID labels on y axis follow Table 1. The cluster label on the x axis are meaningless. During the AIC and BIC assessments of the Gaussian Mixture Model unsupervised clustering algorithm, we found that many of the sample’s cluster with their correct stratigraphic grouping, especially at the formation level (Fig. 20). However, the overall number of geochemical clusters (with more than 10 samples) within the Grande Ronde appears to be less than the actual number of stratigraphic labels (Figs. 15, 16, 17, 20). This is due to geochemical commonalities between members in the Grande Ronde Formation (Fig. 10, 18, 20), principally, our statistical results would show, due to the variation imparted in specific elements such as Cr, MgO, Rb, and Ba (Fig.10B). The only samples that do not follow the overall trend in these elements and plot more than three standard deviations away from the average basalt composition, is a group of outlier compositions with highly enriched trace elements. This small number of samples (<0.5% of the dataset) deviates considerably from the rest of the data. We therefore 116 consider the possibility that the variation in these samples may have been imparted during post eruption cooling or alteration processes that the rest of the erupted magma did not experience, a hypothesis strengthened by the concentrated variation in those samples in the fluid mobile and incompatible elements (ex. very high Rb concentrations). Figure 21 Unsupervised labels vs the member ID in the original dataset. Sample numbers in each category plotted in the histograms. Actual member identification labels on y axis are according to Table 1, cluster labels on x axis are meaningless. Colors on y axis are formations. Within the Grande Ronde many of the stratigraphic members share common geochemical elements and variation and are therefore difficult to separate into groups based 117 solely on geochemistry (Fig. 12, 21). Stratigraphic boundaries are based on geochemistry as well as mapping, polarity measurements, and lithologic observations. Geochemical grouping revealed by our GMM analysis does not necessarily suggest that the Grande Ronde stratigraphic boundaries are incorrect, but it does likely imply that common petrogenetic processes were acting on multiple stratigraphic members likely separated in time (Fig. 21). The 24 members of the Grande Ronde were grouped into 11 clusters in an unsupervised analysis of the entire CRB at a member level (Fig. 20, 21), while other formations retain roughly the correct number of clusters, provides strong initial evidence. Combining the MLR classification with unsupervised GMM clustering we can make stronger statements still: 1) The stratigraphy can be robustly identified at the formation level and the member level except for in the Grande Ronde and 2) within the Grande Ronde, there appear to be a small number of clusters (~3) that describe similarities and differences between the stratigraphic members (Fig. 18A, 19). Inspection of the individual cluster compositions revealed strong differences in the behavior of high variation elements between the different clusters and a departure from the composition of the formation stratigraphically beneath the Grande Ronde (the Imnaha). Comparison between the cluster compositions is even more illuminating on process though than the general comparison of all Grande Ronde lava clusters to the Imnaha Formation (Fig.18). The first cluster, most strongly identified in the Meyer Ridge and Sentinel Bluffs members, is the closest in chemistry to the Imnaha basalt and is considerably more enriched in compatible elements such as Cr and MgO than the other two clusters (Fig.18, 19). The other two clusters appear closely related, with emphasis on the same suite of incompatible element enrichments within those samples (Fig.18). The central difference between the most populated cluster and the chemistry of its relatives however is that the majority of the data accompany that enrichment in incompatible elements by a depletion in compatible elements (Fig.18). The less populated cluster with more emphasis on the enrichment in incompatible elements appears to become the dominant cluster at several points in the CRB stratigraphy, all but one of which are correspondent to local maxima in the volume of the members (Fig.18, 19). We attempt to explain this variation between clusters in the following section. If the variation within the data was entirely the result of distinct storage zones or mantle melt source variability, we would likely expect this analysis to reveal clusters close to the actual 118 number defined in the stratigraphy and to have geochemical variations that were identifiable in both the unsupervised and supervised algorithms. At the formation level, this appears to be the case. Each formation is identifiable in the supervised and unsupervised algorithms and there is agreement with the known labels - at a formation level, variations are most likely related to variations in mantle melt supply and source (Black et al., 2021). However, for the Grande Ronde the aggregate ML analysis is suggestive of variations within the crustal transport network, under a regime of high and prolonged mantle influx. To interpret the petrologic implications of these relative enrichments and depletions in the GR from the perspective of variable crustal differentiation, we use petrologic modeling to explain qualitative differences in the data. 4.1.1 Petrogenesis of the Grande Ronde Basalts We now discuss the question of why the number of unsupervised clusters may be different from the number of established stratigraphic groupings in the Grande Ronde Formation. Our unsupervised clustering is built solely on geochemical data, where each cluster therefore represents different geochemical groupings in which samples show similarities. This can help us to find geochemical relationships between samples and stratigraphic classes. To understand what processes may be separating the clusters we first identify which features (elements or ratios or elements) are responsible for cluster distinctions, and then use geochemical modeling to understand which processes might impact the specific elements responsible for the difference in the clusters. We use a petrologic modeling tool, the Magma Chamber Simulator, to understand the variation that occurs in idealized differentiation scenarios within a magma body (Bohrson et al., 2014; Bohrson & Spera, 2001) and compare this to the observed variation to the data. It is first important to establish that there are several existing hypotheses to explain Grande Ronde compositions. One hypothesis posits that assimilation and fractional crystallization (AFC) processes are minor, less than 10% (Camp, Reidel, et al., 2017). This requires that the source composition is closer to the composition of the magmas we observe at the surface. To accomplish this, the mantle source must be a pyroxenite and not peridotite (Takahahshi et al., 1998; Wright et al., 1973). This clinopyroxene rich source is suggested to have come from plume triggered delamination, where the mafic lower crust was detached and entirely melted due to its relatively low melting temperature. In this scenario the delamination triggers a large amount of melt to be created quickly and transported to the surface without time to fractionally crystallize (Camp & Hanan, 2008). This is consistent with the aphyric characteristic of the Grande Ronde 119 Basalt, whereas the AFC heavy models would require substantial storage times of huge volumes and a mechanism with which to quickly remove any crystals that form during this time if it was occurring in a central magma chamber. Some have also suggested that assimilation of even 10% granite would cause the magma to cool rapidly, increase the viscosity substantially, and result in a much higher crystal content (Camp, Reidel, et al., 2017). Figure 22 Compositions of partial melt created during various AFC experiments of the geochemical Magma Chamber Simulator (Bohrson et al., 2014). The Idaho Batholith is run under two scenarios, one including recharge and one without recharge (Gaschnig et al., 2011). A more recent hypothesis instead posits that the Grande Ronde requires a significant amount of assimilation. Wolff et al. (2008) suggest that the assimilant in question is the Idaho batholith based on isotopic evidence, and that incorporating 10-60% crustal partial melt material through efficient partial melting of crust and mixing with Imnaha basalt can produce magmas with Grande Ronde compositions. They also suggest this gives a framework for transport of magma through the dike system. Wolff and Ramos (2014) conducted petrologic modeling to support this hypothesis, starting with a parent magma that had a composition similar to the average Imnaha basalt, showing that basaltic andesite arises using AFC processes at different depths. We follow this approach, experimenting with different wallrock types on each element in our dataset, varying the relative amounts of RAFC processes. This allows us to qualitatively compare the variation at an element-level and compare to the variation in the clusters that we see. 120 Figure 23 A) The calculated percent change in each element during the time steps in the model run that included each process respectively, for the scenarios involving intrusion of the Imnaha parent into the Idaho batholith wallrock, including just AFC and FC processes. B) percent change including instances of recharge by an Imnaha composition assumed parent magma. 121 Figure 24 Summary of the percent change signals for each process (FC, AFC, and RFC) in Fig. 23. Arrows show the elements most affected by each process in the MCS simulations from Fig. 23 and therefore act as a guide for which elements would be most likely to record these processes in the melt chemistry. The Magma Chamber Simulator (MCS) is a geochemical calculator that helps predict the outcome of different processes on the composition of the melt in a magmatic system (Bohrson et al., 2014). The MCS uses a foundation based on the MELTs thermodynamic calculator which in turn uses a large database of thermodynamic experiments to predict the fractionally crystallizing phases in a magma at different pressures and temperatures (Bohrson et al., 2014; Ghiorso & Sack, 1995). This code then builds off the MELTs model by using the Gibbs free energy minimization calculation in conjunction with an enthalpy balance to understand the effect of 122 assimilation and recharge on the magma composition (Bohrson et al., 2014; Bohrson & Spera, 2001). These energy constrained phase equilibria calculations allow us to quantify the effect different processes have on the geochemical character of a system to produce the lava compositions we observe at the surface. MCS simulations are conducted as follows: we model intrusion of a parental basalt composition, close to that of the average Imnaha basalt, into batholithic crust (Fig. 22, 23, 24). This magma is intruded at a variety of pressures during the different runs from 2000 bars to 10,000 bars (model runs in Fig. 23 were run at a pressure of 5000 bars) into a defined country rock composition with a mass ratio of 2:1 wallrock to basalt. In each scenario the magma is instantaneously emplaced into the crust and begins a temperature descent in increments of 20°C starting at 1400°C. Immediately the heat starts to be transferred to the wallrock, which begins to partially melt once it reaches the solidus temperature. In each simulation the partial melt is assimilated (homogenized) into the remaining melt once it reaches a melt fraction of 0.05. The simulation ends when the magma and wallrock reach the same temperature or once the magma reaches 850°C, whichever comes first. During model runs that incorporate recharge, as the temperature of the melt drops to a certain temperature (in this case 1100°C), magma with composition equivalent to the Imnaha parent was added to the melt in a proportion of 50% by mass. While assimilation and fractional crystallization occur gradually over many time steps, recharge is a short-lived process that imparts an immediate but short-lived signal (unlike a real system where recharge may be more sustained). Once the recharging magma is added to the chamber it is immediately considered well mixed and begins to undergo AFC processes once again, from a more primitive composition than the previous steps. We test four wallrock compositions at various depths in the lower to shallow crust and report the results for the mid to shallow crust: An average mid continental crust (granodiorite) (Rudnick & Gao, 2003), an average upper continental crust (granite) (Rudnick & Gao, 2003), the unmelted composition of tonalite from the Wallowa Mtns. (Petcovic & Dufek, 2005), and the average composition of the Atlantic pluton within the Idaho Batholith (Gaschnig et al., 2011) (Fig. 22). Primarily these experiments test AFC processes but we also study the potential influence from recharging mafic magma similar in composition to the Imnaha like parent (Bohrson et al., 2014; Bohrson & Spera, 2001). 123 We report the average composition of the partial melt created during assimilation simulations for each wallrock at 5000 bars (~17 km depth) (Fig. 23) as this run had the most input from all three signals including recharge and assimilation. Some runs had minimal assimilation or were dominated by the signal of Fractional Crystallization, but the run at 5000 bars shows all three processes occurring (Fig. 23). We calculate the percent change for each element in the magma melt during the various processes of Fractional Crystallization (FC), Assimilation and FC (AFC), and Recharge and AFC (RAFC) (Bohrson et al., 2014; Bohrson & Spera, 2001). The outcome of the percent change calculations is shown in Figure 23. This allows us to further constrain the possible processes occurring within the magmatic system. The effect of the different wallrock assimilant compositions is particularly evident when we plot the composition created of partial melt from each wallrock scenario (Fig. 22). Each scenario is characterized by very similar elemental behaviors with changes in the relative proportions of the trace element input. The partial melt consists in all cases of enrichments in incompatible elements (Fig. 22). However, several distinctions separate the different experiments. The Idaho Batholith partial melt composition is depleted in Cr, V and Ni versus the other partial melts (Fig. 22, 23). It also has orders of magnitude higher Ba and Rb than the Wallowa’s partial melt or partial melt generated from generalized crustal compositions (Fig. 22). The partial melting of the Wallowa tonalite composition results in a marked enrichment in Zr concentration, but experiences overall less partial melting and less enrichment in other incompatible elements (Fig. 22). In every simulation fractional crystallization causes a strong depletion in Cr and relatively less strong depletions in MgO and CaO when the percent change within each process was calculated (Fig. 23, 24). At the same time, it also causes increases in many of the other more incompatible trace elements, though not to the same degree as the addition of assimilant (Fig. 23, 24). This is consistent particularly with crystallization of clinopyroxene as the primary phase by mass and with secondary crystallization of plagioclase. Secondary crystallization of olivine and orthopyroxene also occurs during these runs. Assimilation of each type of wallrock increases the trace element composition in Rb, Ba, Zr, Y, Nb, La, Ce, Nd, and Lu (Fig. 23, 24). However, the strongest signal of trace element increase comes from assimilation of partial melt derived from the Idaho Batholith composition (Fig. 23). In this wallrock scenario Rb and Ba particularly experience a strong increase in the 124 magma melt (Fig. 23). Notably this appears to mirror the signal we see in the actual Grande Ronde data and in the cluster compositions (Fig. 18, 19). The Wallowa batholith also increases the trace element compositions; however, it most significantly affects Zr and has the lowest effect on Ba and Rb concentration, which we know to be the most important sources of variation within our dataset (Fig. 23, 24). This may suggest that the likely assimilant in this scenario would be compositionally closer to the known composition of the Idaho Batholith versus the Wallowa batholith. The recharge process has a very distinct effect on the magma melt composition in comparison to FC or AFC and thus creates a different signal within the data (Fig. 23, 24). Many of the incompatible trace element concentrations are decreased while more compatible elements such as Cr and MgO increase in the magma (Fig. 23, 24). However, this is RFC and not just recharge alone, so the Cr signal, for example, is a combined decrease from fractional crystallization and an increase from addition of the recharging magma (Fig. 23, 24). If this RFC was the primary process affecting our melt, we would no longer expect Cr to dominate the variation. Rather we would expect to see strong trends toward a more primitive magma, for example a decrease in incompatible trace element concentration and trends in MgO. Based on the above comparisons between Grande Ronde major and trace elements and MCS simulations, our analysis suggests several important characteristics of Grande Ronde petrogenesis. Firstly, we identify three clusters of compositional groups in the Grande Ronde. One represents assimilation and fractional crystallization, one primarily represents assimilation of partial melt, and one dominantly represents a primitive recharge signal. Based on the enrichments in the trace elements in the assimilation dominant cluster and the compositions of the partial melts created in the MCS, the primary assimilation interaction in the Grande Ronde Formation appears to have been with the Idaho batholith (Fig.23, 24). 4.2 Implications for time-evolving crustal storage zones With the compositions of the different clusters identified, we can relate the cluster appearances to the relative timing of these RAFC processes in the CRB as a whole and Grande Ronde as they are represented in the stratigraphy. Our inferences will be qualitative but set the stage for future work that seeks to integrate petrologic constraints with thermomechanical considerations of crustal magma transport. 125 Considering the entire CRB, we first examine the relationship between formation and member labels of each chemical analysis in the database to the GMM label assigned, with number of clusters equal to the number of members (Fig. 21). The rationale for this choice is to assess explicitly how the compositions of clusters compare to the compositions of stratigraphic categories. At a formation level (Fig 20), we see that some clusters associated with Steens formation are identified in all later formations but are less clearly present in the Saddle Mountains. This suggests that compositional signatures of this initial mantle melt are found throughout the stratigraphy but may decrease in waning phases of the LIP. The Grande Ronde formation appears to be a bridge, carrying some affinity to the more primitive compositions of Imnaha and Steens basalt and some relationship to the later evolved members of the Saddle Mountain. The greatest expansion in cluster number occurs at the onset of the Grande Ronde, which suggests that this chemical affinity in linked to a sudden increase in storage zones (assimilant compositions) or an increase in the diversity of mantle source compositions. This may further suggest that the main phase including Picture Gorge, Steens, Imnaha, Grande Ronde and Wanapum all share a primitive parental source while the heterogeneity observed in the Saddle Mountains is a result of residual fluxing of magma through these increased storage zones left from the Grande Ronde (Fig. 20, 21). At a member level (Fig 21), this interpretation remains essentially unchanged: 9 different cluster labels have affinity with Steens, with samples in these clusters appearing throughout the Grande Ronde, Wanapum, and Picture Gorge members. Several members are split between many clusters, including Birch Creek (7), Wapshilla ridge (11), Grouse Creek (5), Sentinel Bluffs (9). This suggests contributions to these members from distinct storage zones, that perhaps were mixed syn-eruptively (Mittal et al., 2021; Mittal & Richards, 2021). Picture Gorge is split between 13 different clusters, which attests to the unclear affinity of this formation (whose timing is still not well established (Cahoon, 2020; Cahoon et al., 2020)). Saddle Mountains members are more uniquely split between cluster labels, as expected based on the known variability amongst these later basalts (Hooper, 2000; Reidel et al., 2002, 2013). Several members have constituent samples that reside entirely within one GMM cluster: Buford, Weippe, Esquatzel, Asotin, Roza, Field Springs, Indian Ridge. This suggests that these members might be confidently linked to a single storage origin or petrologic pathway. Conversely, no GMM cluster is uniquely associated with a single member, except for clusters that have a small number of 126 samples. We consider any cluster with <10 samples in it as not robust, an indication of the general trend towards merging within the Grande Ronde. Overall, our interpretation both at formation and member level is of sudden expansion of crustal differentiation pathways at the onset of the Grande Ronde, which likely indicates a fundamental transition in the mode of crustal transport associated with increased flux and thermal/rheologic priming of the crust (Black et al., 2021; Karlstrom et al., 2017) We then examine GMM solely on the Grande Ronde formation, in which Figures 18 and 19 suggest that a very small number of clusters (3) sufficiently characterizes most of the variation in the geochemistry of the formation. The cluster with the largest population of samples (section 4.4.2 results, Fig. 19) appears to be characterized by consistent AFC processes. This signal is present in most of the members of the Grande Ronde. Conversely, the other two clusters appear at more distinct times across the stratigraphy (Fig. 19). Within this pattern, individual members are generally grouped together with the notable exceptions of Wapshilla and Sentinel Bluffs which both have strong multi cluster identities (Fig. 19). The component with the most primitive composition that we interpreted as a recharge signal, appears to correlate with periods in the stratigraphy that follow large volume eruptions (Fig. 19). Two main episodes of this primitive composition appears to occur in the stratigraphy: first at the time of the Meyer Ridge (after the outpouring of the Wapshilla Ridge and stratigraphic neighbors) and again around the time of the Sentinel Bluffs member (Fig. 19). While recharge must be a constant factor to maintain liquidus temperatures, at these times the recharge signal appears to dominate the geochemical variation. Influence by a more primitive source during the waning Grande Ronde Sentinel Bluffs Member inferred here, is consistent with the results of other geochemical studies of CRB variation. For example, in the work conducted by Yu et al., (2015) researchers observe that the end of the Grande Ronde Formation and the beginning of the Wanapum Formation are characterized by increased MgO, Ni, and Nb/La, and simultaneously decreased K2O (Yu et al., 2015). They infer that this signal is consistent with an increase in “primitiveness” over time and thus infer that there may be pulses where mafic recharge interacts less significantly with crustal rocks (e.g., less storage) (Yu et al., 2015). The third cluster appears to emphasize assimilation, having enrichment in incompatible elements and less depletion in Cr and other compatible phases than the others. This cluster appears at several points in the Grande Ronde stratigraphy, all of which except for one, 127 correspond to local maxima in the volume of the members (Fig.18, 19). This correlation is consistent with an increase in the surface area between magma and wallrock that increases the efficiency of assimilation, or of thermal priming (e.g., Karlstrom et al., 2010) as suggested to be a component of generating bulk Grande Ronde compositions by Wolff and Ramos (2013). In contrast to the recharge signal from cluster 0, cluster 2 suggests large but time-variable storage zones in the crust during distinct periods of high-volume Grande Ronde eruptions. The cluster analysis in conjunction with the supervised labeling and petrologic modeling, revealed the compositional similarities between different groupings in the data, while the comparison to the original labels shows a time progression of chemical similarity through the stratigraphy of the Grande Ronde. By relating this to volume and the physical characteristics of the system, we connected our machine learning analysis to geologic process and physical dynamics. 5 Conclusion The rigorous testing and application of machine learning methods to whole rock geochemistry of the CRB provided us with several insights. The first is that these unsupervised and supervised machine learning methods, meant for larger datasets than geochemistry usually explores, were effective at detecting important and subtle patterns in the dataset. This provides a general foundation for future work with these methods and application to new and unknown data samples, particularly in the CRB system. This includes future publication of the classification code onto an open-source code repository such as GitHub such that other researchers can use this tool on their own CRB samples, or even to modify the code slightly for a new system. We also used these machine learning methods to hypothesize about the processes that may have affected the whole rock geochemical variation in this system. Through the clustering analysis and MCS interpretation we linked this variation to relative timing within the CRB stratigraphy to parse out the signals of different processes. We found that both assimilation and recharge likely played an important role in evolving these magmas, especially in the Grande Ronde Formation. While we did not fully explore the hypothesis that this variation was imparted by changing mantle sources, our comparison between the original labels and the clustering analysis of geochemical similarity, provides a preliminary interpretation in the debate between source and path effects. Our comparison between the data labels and unsupervised outcomes showed a trend within the data that suggests these data all share common elements with their 128 stratigraphic Formation neighbor, and that the main phase was all related to a common primitive element. The connection between the chemistry of all main phase flows, and a connection in common geochemistry between the Grande Ronde Formation and the more evolved and heterogenous Saddle Mountains Member, demonstrates that these magmas were part of a system, rather than distinct and separate batches of melt. This may add more weight to the hypothesis that the variation in geochemistry is dominated by the effects of the transport and storage system in the crust rather than by dynamically changing mantle sources. The signals that we see in the descriptive statistics, clustering analysis and supervised classification analysis all point to the same groups of elements varying in the data, which robustly suggests that these machine learning methods have quantified the important signals of variation in the data. Our modeling and further interpretation of these statistical results allowed us to investigate the variation and interpret it in the context of magmatic system processes. We hope that the rigorous testing and successful application of these methods in this study provides another building block for widespread use of these methods in the petrology and geochemistry communities. 6 Bridge to Ch. 3 As geochemical datasets grow, use of effective multi-dimensional automated techniques will become more useful. The results of this study show the excellent performance of both supervised and unsupervised machine learning workflows on a high quality large whole rock major and trace element database. The methods clearly identified groupings and samples with stratigraphic similarities. They allowed for investigation of variation not obvious through visual inspection of bivariate plots. And they separated out the signals of process within the data and their relationship to the known labels put together by researchers over the last several decades. However, to better understand the CRB beyond just the whole rock lava data, we now intend to use these methods to directly investigate intrusive exposures of the plumbing system that fed the CRB main-phase eruptions. The classification model will be applied to sort unknown data samples into the nearest stratigraphic label. This will allow for comparison between the intrusive and extrusive systems and for a more direct investigation of processes that may have been occurring within the crust to vary magma composition. In the following chapter, we apply the methods demonstrated in this research to exposures of the Chief Joseph Dike Swarm in and 129 around the Wallowa Mountains (Morriss et al., 2020). In detail we investigate the geochemical variation of the Maxwell Lake Dike Complex, a feeder complex for the Wapshilla Ridge Member. This allows us to gain an even more detailed picture of the processes that may have affected these magmas as they traveled from the mantle to the surface. In particular we explore the patterns detected in the dataset of extrusive lava chemistries from the CRB, that suggest recharge and assimilation played an important role in the main phase eruptions. 7 References Cited Aitchison, J. (1983). Principal Component Analysis of Compositional Data. Biometrika, 70(1), 57–65. Aitchison, J., Barcelo-Vidal, C., & Pawlowsky-Glahn, V. (1993). Some Comments on Compositional Data Analysis Archaeometry, In Particular the Fallacies in Tangri and Wright’s Dismissal of Logratio Analysis. 1–20. Arrowsmith, S. J., Trugman, D. T., MacCarthy, J., Bergen, K. J., Lumley, D., & Magnani, M. B. (2022). Big Data Seismology. Reviews of Geophysics, 60(2), 1–55. https://doi.org/10.1029/2021RG000769 Audunsson, H., & Levi, S. (1997). Geomagnetic fluctuations during a polarity transition. Journal of Geophysical Research, 102(96). Azari, A., Biersteker, J. B., Dewey, R. M., Doran, G., Forsberg, E. J., Harris, C. D. K., Kerner, H. R., Skinner, K. A., Smith, A. W., Amini, R., Cambioni, S., Poian, V. Da, Garton, T. M., Himes, M. D., Millholland, S., & Ruhunusiri, S. (2021). Integrating Machine Learning for Planetary Science: Perspectives for the Next Decade. Bulletin of the AAS, 53(4). https://doi.org/10.3847/25c2cfeb.aa328727 Baker, L. L., Camp, V. E., Reidel, S. P., Martin, B. S., Ross, M. E., & Tolan, T. L. (2019). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province: COMMENT. Geosphere, 15(4), 1436–1447. https://doi.org/10.1130/GES02047.1 Barry, T. L., Kelley, S. P., Camp, V. E., Self, S., Jarboe, N. A., & Duncan, R. A. (2013). Eruption chronology of the Columbia River Basalt Group. Geological Society of America Special Papers, 2497(02), 45–66. https://doi.org/10.1130/2013.2497(02). Black, B. A., Karlstrom, L., & Mather, T. A. (2021). The life cycle of large igneous provinces. Nature Reviews Earth and Environment, 2(12), 840–857. https://doi.org/10.1038/s43017- 130 021-00221-4 Bohrson, W. A., & Spera, F. J. (2001). Energy-constrained open-system magmatic processes II: Application of energy-constrained assimilation fractional crystallization (EC-AFC) model to magmatic systems. Journal of Petrology, 42(5), 1019–1041. Bohrson, W. A., Spera, F. J., Ghiorso, M. S., Brown, G. A., Creamer, J. B., & Mayfield, A. (2014). Thermodynamic model for energy-constrained open-system evolution of crustal magma bodies undergoing simultaneous recharge, assimilation and crystallization: The magma chamber simulator. Journal of Petrology. https://doi.org/10.1093/petrology/egu036 Bond, D. P. G., & Wignall, P. B. (2014). Large igneous provinces and mass extinctions: An update. Special Paper of the Geological Society of America, 505(02), 29–55. https://doi.org/10.1130/2014.2505(02) Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–252. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x Bryan, S. E., & Ferrari, L. (2013). Large igneous provinces and silicic large igneous provinces: Progress in our understanding over the last 25 years. Bulletin of the Geological Society of America, 125(7–8), 1053–1078. https://doi.org/10.1130/B30820.1 Cahoon, E. B. (2020). Distribution, Geochronology, and Petrogenesis of the Picture Gorge Basalt with Special Focus on Petrological Relationships to the Main Columbia River Basalt Group. Cahoon, E. B., Streck, M. J., Koppers, A. A. P., & Miggins, D. P. (2020). Reshuffling the Columbia river basalt chronology-picture gorge basalt, the earliest-and longest-erupting formation. Geology, 48(4), 348–352. https://doi.org/10.1130/G47122.1 Camp, V. E. (2013). Origin of Columbia River Basalt: Passive rise of shallow mantle, or active upwelling of a deep-mantle plume? Geological Society of America Special Paper, 497(07), 181–199. https://doi.org/10.1130/2013.2497(07). Camp, V. E., & Hanan, B. B. (2008). A plume-triggered delamination origin for the Columbia River Basalt Group. Geosphere, 4(3), 480. https://doi.org/10.1130/GES00175.1 Camp, V. E., Reidel, S. P., Ross, M. E., Brown, R. J., & Self, S. (2017). Field-Trip Guide to the Vents , Dikes , Stratigraphy , and Structure of the Columbia River Basalt Group , Eastern Oregon and Southeastern Washington. Scientific Investigations Report 2017-5022-N, October, 88 p. https://doi.org/10.1002/hbm.20105 Camp, V. E., Ross, M. E., Duncan, R. A., Jarboe, N. A., Coe, R. S., Hanan, B. B., & Johnson, K. (2013). The Steens basalt: Earliest lavas of the Columbia River basalt group. The Columbia River Flood Basalt Province: Geological Society of America Special Paper 497, 2497(04), 131 87–116. https://doi.org/10.1130/2013.2497(04). Camp, V. E., Ross, M. E., Duncan, R. A., & Kimbrough, D. L. (2017). Uplift, rupture, and rollback of the Farallon slab reflected in volcanic perturbations along the Yellowstone adakite hot spot track. Journal of Geophysical Research: Solid Earth, 1–20. https://doi.org/10.1002/2016JB013849 Carniel, R., & Guzman, S. R. (2012). Machine Learning in Volcanology: A Review. IntechOpen, Updates in, 1–27. http://dx.doi.org/10.1039/C7RA00172J%0Ahttps://www.intechopen.com/books/advanced- biometric-technologies/liveness-detection-in- biometrics%0Ahttp://dx.doi.org/10.1016/j.colsurfa.2011.12.014 Chen, S., Grunsky, E. C., Hattori, K., & Liu, Y. (2015). Principal Component Analysis of Geochemical Data from the REE-rich Maw Zone, Athabasca Basin, Canada. Geological Survey of Canada, Open File, 1–21. https://doi.org/10.4095/295615 Clarke, F. W. (1920). The Data of Geochemistry. U.S. Geological Survey. Conrey, R., Beard, C., & Wolff, J. (2013). Columbia River Basalt flow stratigraphy in the palouse Basin Department of Ecology test wells. Cruz, M., & Streck, M. J. (2022). The Castle Rock and Ironside Mountain calderas, eastern Oregon, USA: Adjacent venting sites of two Dinner Creek Tuff units—the most widespread tuffs associated with Columbia River flood basalt volcanism. GSA Bulletin, February, 1–21. https://doi.org/10.1130/b36070.1 Dangeti, P. (2017). Statistics for Machine Learning: Techniques for exploring supervised, unsupervised, and reinforcement learning models with Python and R. Darold, A., & Humphreys, E. (2013). Upper mantle seismic structure beneath the Pacific Northwest: A plume-triggered delamination origin for the Columbia River flood basalt eruptions. Earth and Planetary Science Letters, 365, 232–242. https://doi.org/10.1016/j.epsl.2013.01.024 Davis, K. N., Wolff, J. A., Rowe, M. C., & Neill, O. K. (2017). Sulfur release from main-phase Columbia River Basalt eruptions. 45(11), 1043–1046. https://doi.org/10.1130/G39371.1 de Caritat, P., & Grunsky, E. C. (2013). Defining element associations and inferring geological processes from total element concentrations in Australian catchment outlet sediments: Multivariate analysis of continental-scale geochemical data. Applied Geochemistry. https://doi.org/10.1016/j.apgeochem.2013.02.005 Dempster, A. P., Laird, N. M., & Rubin, D. B. (1976). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 46(1), 139–144. https://doi.org/10.1115/1.3424485 132 Dominguez, A. R., & Van der Voo, R. (2014). Secular variation of the middle and late Miocene geomagnetic field recorded by the Columbia River Basalt Group in Oregon, Idaho and Washington, USA. Geophysical Journal International, 197(3), 1299–1320. https://doi.org/10.1093/gji/ggt487 Ellefsen, K. J., Smith, D. B., & Horton, J. D. (2014). A modified procedure for mixture-model clustering of regional geochemical data. Applied Geochemistry, 51, 315–326. https://doi.org/10.1016/j.apgeochem.2014.10.011 Ernst, R. E. (2014). Large Igneous Provinces. Cambridge University Press. https://doi.org/10.1017/CBO9781139025300 Evarts, B. R. C. (2004). Geologic Map of the Saint Helens Quadrangle, Columbia County, Oregon, and Clark and Cowlitz Counties, Washington. USGS Scientific Investigations, Map 2834, 1–24. Evarts, R. C. (2004). Geologic Map of the Woodland Quadrangle, Clark and Cowlitz Counties, Washington. Scientific Investigations Map, 2827. http://pubs.er.usgs.gov/publication/sim2827 Evarts, R. C., O’Connor, J. E., & Tolan, T. L. (2013). Geologic Map of the Washougal Quadrangle, Clark County, Washington, and Multnomah County, Oregon. U.S. Geological Survey Scientific Investigations Map 3017, 3257, 32. Filzmoser, P., Hron, K., & Reimann, C. (2009). Principal component analysis for compositional data with outliers. February, 621–632. https://doi.org/10.1002/env Fradkov, A. L. (2020). Early history of machine learning. IFAC-PapersOnLine, 53(2), 1385– 1390. https://doi.org/10.1016/j.ifacol.2020.12.1888 Gaschnig, R. M., Vervoort, J. D., Lewis, R. S., & Tikoff, B. (2011). Isotopic evolution of the idaho batholith and Challis intrusive province, Northern US Cordillera. Journal of Petrology, 52(12), 2397–2429. https://doi.org/10.1093/petrology/egr050 Geron, A. (2017). Hands-On Machine Learning With Scikit-Learn And TensorFlow: Concepts, Tools, And Techniques To Build Intelligent Systems. Ghiorso, M. S., & Sack, R. (1995). Chemical mass transfer in magmatic processes IV. Contributions to Mineralogy and Petrology, 119, 197–212. https://doi.org/10.1007/BF00307281 Gibson, I. . (1969). A comparative account of the fl ood basalt volcanism of the Columbia Plateau and eastern Iceland. Bulletin of Volcanology, 33, 420–437. Hales, T. C., Abt, D. L., Humphreys, E. D., & Roering, J. J. (2005). A lithospheric instability 133 origin for Columbia River flood basalts and Wallowa Mountains uplift in northeast Oregon. Nature, 438(7069), 842–845. https://doi.org/10.1038/nature04313 Hartigan, J. A., & Hartigan, P. M. (1985). The Dip Test of Unimodality. The Annals of Statistics, 13(1), 70–84. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. In Springer (Vol. 26, Issue 4). Hooper, P. R. (2000). Chemical discrimination of Columbia River Basalt Flows. Geochemistry, Geophysics, Geosystems, 1. https://doi.org/10.1029/2000GC000040 Hooper, P. R., & Hawkesworth, C. J. (1993). Isotopic and Geochemical Constraints on the Origin and Evolution of the Columbia River Basalt. Journal of Petrology, 34(6), 1203– 1246. Howarth, R. J., & Earle, S. A. M. (1979). Application of a generalized power transformation to geochemical data. Journal of the International Association for Mathematical Geology, 11(1), 45–62. https://doi.org/10.1007/BF01043245 Itano, K., Ueki, K., Iizuka, T., & Kuwatani, T. (2020). Geochemical discrimination of monazite source rock based on machine learning techniques and multinomial logistic regression analysis. Geosciences (Switzerland), 10(2). https://doi.org/10.3390/geosciences10020063 Iwamori, H., Yoshida, K., Nakamura, H., Kuwatani, T., Harnada, M., Haraguchi, S., & Ueki, K. (2017). Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses. Geochemistry, Geophysics, Geosystems, 18, 994–1012. https://doi.org/10.1002/2016GC006663.Received Jarboe, N. A., Coe, R. S., Renne, P. R., & Glen, J. M. G. (2010). The age of the Steens reversal and the Columbia River Basalt Group. Chemical Geology, 274(3–4), 158–168. https://doi.org/10.1016/j.chemgeo.2010.04.001 Karlstrom, L., Dufek, J., & Manga, M. (2010). Magma chamber stability in arc and continental crust. Journal of Volcanology and Geothermal Research, 190(3–4), 249–270. https://doi.org/10.1016/j.jvolgeores.2009.10.003 Karlstrom, L., Paterson, S. R., & Jellinek, A. M. (2017). A reverse energy cascade for crustal magma transport. Nature Geoscience, 10(July). https://doi.org/10.1038/NGEO2982 Kasbohm, J., & Schoene, B. (2018). Rapid eruption of the Columbia River flood basalt and correlation with the mid-Miocene climate optimum. 1–8. Kumar, S. (2014). Modelling of Magmatic and Allied Processes. June 2014, 0–22. https://doi.org/10.1007/978-3-319-06471-0 134 Lacassie, J. P., Solar, J. R., Roser, B., & Hervé, F. (2006). Visualization of volcanic rock geochemical data and classification with artificial neural networks. Mathematical Geology, 38(6), 697–710. https://doi.org/10.1007/s11004-006-9042-z Lee, D. D., & Seung, H. S. (2000). Learning the parts of objects by non-negative matrix factorization. 401(October 1999), 788–791. Lerner, A. H., Wallace, P. J., Shea, T., Mourey, A. J., Kelly, P. J., Nadeau, P. A., Elias, T., Kern, C., Clor, L. E., Gansecki, C., Lee, R. L., Moore, L. R., & Werner, C. A. (2021). The petrologic and degassing behavior of sulfur and other magmatic volatiles from the 2018 eruption of Kīlauea, Hawaiʻi: melt concentrations, magma storage depths, and magma recycling. In Bulletin of Volcanology (Vol. 83, Issue 6). Springer Berlin Heidelberg. https://doi.org/10.1007/s00445-021-01459-y Madhukar, M. (2019). Big Data for Remote Sensing: Visualization, Analysis and Interpretation: Digital Earth and Smart Earth. In Springer. https://doi.org/10.1007/978-3-319-89923-7_5 Marsh, J. S. (1987). Basalt geochemistry and tectonic discrimination within continental flood basalt provinces. Journal of Volcanology and Geothermal Research, 32(1–3), 35–49. https://doi.org/10.1016/0377-0273(87)90035-7 Mcdougall, I. (1976). Geochemistry and origin of basalt of the Columbia River Group, Oregon and Washington. Bulletin of the Geological Society of America, 87(5), 777–792. https://doi.org/10.1130/0016-7606(1976)87<777:GAOOBO>2.0.CO;2 Meng, H. D., Song, Y. C., Song, F. Y., & Shen, H. T. (2011). Research and application of cluster and association analysis in geochemical data processing. Computational Geosciences, 15(1), 87–98. https://doi.org/10.1007/s10596-010-9199-x Mittal, T., & Richards, M. A. (2021). The Magmatic Architecture of Continental Flood Basalts: 2. A New Conceptual Model. Journal of Geophysical Research: Solid Earth, 126(12). https://doi.org/10.1029/2021JB021807 Mittal, T., Richards, M. A., & Fendley, I. M. (2021). The Magmatic Architecture of Continental Flood Basalts I: Observations From the Deccan Traps. Journal of Geophysical Research: Solid Earth, 126(12), 1–54. https://doi.org/10.1029/2021JB021808 Moore, N. E., Grunder, A. L., & Bohrson, W. A. (2018). The three-stage petrochemical evolution of the Steens Basalt ( southeast Oregon , USA ) compared to large igneous provinces and layered mafic intrusions. 14(6), 1–28. https://doi.org/10.1130/GES01665.1/4346436/ges01665.pdf Moore, N. E., Grunder, A. L., Bohrson, W. A., Carlson, R. W., & Bindeman, I. N. (2020). Changing Mantle Sources and the Effects of Crustal Passage on the Steens Basalt, SE Oregon: Chemical and Isotopic Constraints. Geochemistry, Geophysics, Geosystems, 21(8), 135 1–33. https://doi.org/10.1029/2020GC008910 Morriss, M. C., Karlstrom, L., Nasholds, M., & Wolff, J. (2020). The Chief Joseph Dike Swarm of the Columbia River Flood Basalts, and the Legacy Dataset of William H. Taubeneck. Geosphere, 16(4), 1082–1106. Neal, C. A., Brantley, S. R., Antolik, L., Babb, J. L., & Etc. (2019). The 2018 rift eruption and summit collapse of Kīlauea Volcano. Science, 363(January), 367–374. Norinder, U., & Norinder, P. (2022). Predicting Amazon customer reviews with deep confidence using deep learning and conformal prediction. Journal of Management Analytics, 9(1), 1– 16. https://doi.org/10.1080/23270012.2022.2031324 Ospina, R., & Marmolejo-Ramos, F. (2019). Performance of Some Estimators of Relative Variability. Frontiers in Applied Mathematics and Statistics, 5(August), 1–20. https://doi.org/10.3389/fams.2019.00043 Pearce, J. A., Ernst, R. E., Peate, D. W., & Rogers, C. (2021). LIP printing: Use of immobile element proxies to characterize Large Igneous Provinces in the geologic record. Lithos, 392–393, 106068. https://doi.org/10.1016/j.lithos.2021.106068 Peck, R., Olsen, C., & Devore, J. (2005). Introduction to Statistics and Data Analsis. In Thomson. Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. JMLR, 12, 2825–2830. Perol, T., Gharbi, M., & Denolle, M. (2018). Convolutional neural network for earthquake detection and location. 2016(March 2015), 2–9. Petcovic, H. L., & Dufek, J. D. (2005). Modeling magma flow and cooling in dikes: Implications for emplacement of Columbia River flood basalts. Journal of Geophysical Research: Solid Earth, 110(10), 1–15. https://doi.org/10.1029/2004JB003432 Petrelli, M., & Perugini, D. (2016). Solving petrological problems through machine learning : the study case of tectonic discrimination using geochemical and isotopic data. Contributions to Mineralogy and Petrology, 171(10), 1–15. https://doi.org/10.1007/s00410-016-1292-2 Praus, P. (2005). SVD-based principal component analysis of geochemical data. Central European Journal of Chemistry, 3(4), 731–741. https://doi.org/10.2478/BF02475200 Reidel, S. P. (1982). Stratigraphy of the Grande Ronde Basalt , Columbia River Basalt Group , From the Lower Salmon River and Northern Hells Canyon Area, Idaho, Oregon, and Washington. Idaho Bureau of Mines and Geology Bulletin, 26, 77–101. Reidel, S. P. (2005). A lava flow without a source: The Cohassett flow and its compositional components, Sentinel Bluffs Member, Columbia River Basalt Group. Journal of Geology, 136 113(1), 1–21. https://doi.org/10.1086/425966 Reidel, S. P. (2015). The Columbia River Basalt Group: A Flood Basalt Province in the Pacific Northwest, USA. Geosciences Canada, 42, 151–168. Reidel, S. P., Camp, V. E., Tolan, T. L., & Martin, B. S. (2013). The Columbia River flood basalt province : Stratigraphy , areal extent , volume , and physical volcanology. Special Paper of the Geological Society of America, 497, 1–43. https://doi.org/10.1130/2013.2497(01). Reidel, S. P., Johnson, V. G., & Spane, F. A. (2002). Natural Gas Storage in Basalt Aquifers of the Columbia Basin , Pacific Northwest USA: A Guide to Site Characterization. Pacific Northwest National Laboratory, August, 277. Reidel, S. P., & Tolan, T. L. (1989). The Grande Ronde Basalt, Columbia River Basalt Group - Stratigraphic Descriptions and Correlations in Washington , Oregon, and ldaho. Special Paper of the Geological Society of America, 239, 21–53. https://doi.org/10.1130/SPE239- p21 Rosenberg, A., & Hirschberg, J. (2007). V-Measure: A conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June, 410–420. Rudnick, R. L., & Gao, S. (2003). Composition of the Continental Crust. Treatise on Geochemistry, 3, 1–64. Sawlan, M. G. (2017). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province. Geosphere, 14(1), 286–303. https://doi.org/10.1130/GES01188.1 Sawlan, M. G. (2019). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province: REPLY. Geosphere, 15(4), 1448– 1458. https://doi.org/10.1130/GES02047.1 Shrestha, S., Kazama, F., & Nakamura, T. (2008). Use of principal component analysis , factor analysis and discriminant analysis to evaluate spatial and temporal variations in water quality of the Mekong River Sangam Shrestha , Futaba Kazama and Takashi Nakamura. https://doi.org/10.2166/hydro.2008.008 Snavely, P. D. (1962). Tertiary Geologic History of Western Oregon and Washington. AAPG Bulletin, 46. https://doi.org/https://doi.org/10.1306/BC7437FF-16BE-11D7- 8645000102C1865D Steiner, A., & Wolff, J. (2020). Supervised Machine Learning for Columbia River Basalt Group Classification. College of Arts and Sciences, Pullman, Abstract. 137 Swanson, D. A., Vance, J. A., Clayton, G., & Evarts, R. C. (1989). Cenozoic volcanism in the Cascade Range and Columbia Plateau, southern Washington and northernmost Oregon : Seattle, Washington to Portland, Oregon, July 3-8, 1989. American Geophysical Union, Field Trip, 1–60. Takahahshi, E., Nakajima, K., & Wright, T. L. (1998). Origin of the Columbia River basalts: Melting model of a heterogeneous plume head. Earth and Planetary Science Letters, 162(1– 4), 63–80. https://doi.org/10.1016/S0012-821X(98)00157-5 Templ, M., Filzmoser, P., & Reimann, C. (2008). Cluster analysis applied to regional geochemical data: Problems and possibilities. Applied Geochemistry. https://doi.org/10.1016/j.apgeochem.2008.03.004 Tolan, T. L., Martin, B. S., Reidel, S. P., Anderson, J. L., Lindsey, K. A., & Burt, W. (2009). An Introduction to the stratigraphy, structural geology, and hydrogeology of the Columbia River Flood Basalt Province: A primer for the GSA Columbia River Basalt Group field trips. The Geological Society of America, Field Guide. Trimble, D. E. (1963). Geology of Portland, Oregon and Adjacent Areas. Geological Society of America Bulletin, 1119, 247. Ueki, K., Hino, H., & Kuwatani, T. (2018). Geochemical discrimination and characteristics of magmatic tectonic settings: A machine-learning-based approach. Geochemistry, Geophysics, Geosystems, 19(4), 1327–1347. https://doi.org/10.1029/2017GC007401 Ueki, K., & Iwamori, H. (2017). Geochemical differentiation processes for arc magma of the Sengan volcanic cluster, Northeastern Japan, constrained from principal component analysis. Lithos, 290–291, 60–75. https://doi.org/10.1016/j.lithos.2017.08.001 Vesselinov, V. V, Alexandrov, B. S., & Malley, D. O. (2018). Contaminant source identi fi cation using semi-supervised machine learning. Journal of Contaminant Hydrology, 212(November 2017), 134–142. https://doi.org/10.1016/j.jconhyd.2017.11.002 Waters, A. C. (1961). Stratigraphic and lithologic variations in the Columbia River Basalt. American Journal of Science, 259(8), 583–611. Webb, B. M., Streck, M. J., McIntosh, W. C., & Ferns, M. L. (2019). The Littlefield Rhyolite and associated mafic lavas: Bimodal volcanism of the Columbia River magmatic province, with constraints on age and storage sites of Grande Ronde Basalt magmas. Geosphere, 15(1), 60–84. https://doi.org/10.1130/GES01695.1 Wells, R. E., Haugerud, R. A., Niem, A., Niem, W. A., Evarts, R. C., O’Connor, J. E., Ma, L., Madin, I. P., Sherrod, D. R., Beeson, M. H., Wheeler, K. L., Hanson, W. B., & Sawlan, M. G. (2020). Geologic Map of the Greater Portland Metropolitan Area and Surrounding Regions, Oregon and Washington. USGS Scientific Investigations, Map 3443, 3–59. 138 https://doi.org/10.1130/abs/2019cd-329168 Wells, R., Niem, A., Evarts, R., & Hagstrum, J. (2009). The Columbia River Basalt Group-From the gorge to the sea. The Geological Society of America, 15, 737–774. https://doi.org/10.1130/2009.fl White, W. . (2013). Trace Elements in Igneous Processes. In Geochemistry (pp. 259–313). Wignall, P. B. (2001). Large igneous provinces and mass extinctions. Wilson, R. L., & Watkins, N. D. (1967). Correlation of Petrology and Natural Magnetic Polarity in Columbia Plateau Basalts. Geophysical Journal of the Royal Astronomical Society, 12(4), 405–424. https://doi.org/10.1111/j.1365-246X.1967.tb03150.x Wolff, J. A., & Ramos, F. C. (2013). Source materials for the main phase of the Columbia River Basalt Group: Geochemical evidence and implications for magma storage and transport. Special Paper of the Geological Society of America, 497, 273–291. https://doi.org/10.1130/2013.2497(11) Wolff, J. A., Ramos, F. C., Hart, G. L., Patterson, J. D., & Brandon, A. D. (2008). Columbia River flood basalts from a centralized crustal magmatic system. 704, 177–180. https://doi.org/10.1038/ngeo124 Wright, T. L., Grolier, M. J., & Swanson, D. A. (1973). Chemical variation related to the stratigraphy of the Columbia river basalt. Bulletin of the Geological Society of America, 84(2), 371–386. https://doi.org/10.1130/0016-7606(1973)84<371:CVRTTS>2.0.CO;2 Yu, X., Lee, C., Chen, L., & Zeng, G. (2015). Magmatic recharge in continental flood basalts:Insights from the Chifeng igneous province in Inner Mongolia. Geochemistry, Geophysics, Geosystems, 2082–2096. https://doi.org/10.1002/2015GC005805.Received 139 IV. CHEMICAL AND STRUCTURAL VARIABILITY OF COLUMBIA RIVER FLOOD BASALT DIKES, WITH A FOCUS ON THE MAXWELL LAKE DIKE COMPLEX IN THE WALLOWA MOUNTAINS OF OREGON 1 Intro Opportunities to investigate volcanic plumbing systems, especially those that feed the largest eruptions, are rare (Airoldi et al., 2016; Ernst & Buchan, 1997; Hastie et al., 2014; Mittal et al., 2021; Mittal & Richards, 2021; Muirhead et al., 2014). The Columbia River Flood Basalts (CRB) are the youngest large flood basalt province in the world and are exceptionally exposed in the Pacific Northwest United States (Cahoon et al., 2020; Camp et al., 2017; Hooper, 2000; Moore et al., 2018; Reidel, 1982, 2015; Reidel et al., 2002, 2013; Tolan et al., 2009; Wells et al., 2009; Wolff et al., 2008; Wolff & Ramos, 2013). The stratigraphy of the lava flows that make up the CRB has been broken down into six formations: the Steens, Imnaha, Grande Ronde, Wanapum, Saddle Mountains, and Picture Gorge, which are then further broken down into 47 members on the basis of geochemical discrimination, magnetic polarity, and mapping (Cahoon et al., 2020; Camp et al., 2013; Conrey et al., 2013; Hooper, 2000; Kimmel, 1982; Moore et al., 2018; Reidel et al., 2002, 2013; Swanson et al., 1989; Tolan et al., 2009; Wilson & Watkins, 1967; Wright et al., 1973). Constraints on timing, volume, and locations of eruptive centers, as well as hypotheses regarding genesis of the system that produced these voluminous magmas, have evolved over more than 60 years of data collection and analysis primarily of the extrusive lavas (Camp et al., 2017; Gibson, 1969; Reidel et al., 2013; Snavely, 1962; Waters, 1961; Wilson & Watkins, 1967; Wolff & Ramos, 2013; Wright et al., 1973). In this study we add to the body of knowledge about the CRB by investigating the geochemical variation in the exposed shallow (1-2 km) plumbing system, manifested as a swarm of volcanic dikes (Fig.1). Post-emplacement uplift in northeastern Oregon and southeastern Washington of 2-3 km (Hales et al., 2005; Petcovic & Grunder, 2003) has exposed a large swath of the shallow crustal plumbing system. The swarm exposed in this area have been collectively named the Chief Joseph Dike Swarm (CDJS; Morriss et al., 2020; Taubeneck, 1964), and constitute the largest of several groups of dike exposures in the CRB (e.g. Steens dikes) (Fig.1). This shallow crustal dike swarm in the Wallowa Mountains of northeastern Oregon and surrounding areas into southeastern Washington, and western Idaho, offers a nearly unrivaled opportunity to directly 140 observe a LIP plumbing system and investigate the mechanics of LIP dikes (Fig.1) (Ernst, 2014; Morriss et al., 2020; Taubeneck, 1964). Outcrops of dike segments in and around the Wallowa Mountains represent a cross-section through the upper 1-2 km of crust underlying CRB lavas adding as much as an additional 1 km of rock near the locus of many main phase eruptive vents (Fig. 1) (Morriss et al., 2020). Although nearly unique on Earth in terms of their extent and preservation, these dikes have seen relatively little study (Karlstrom et al., 2019; Petcovic & Dufek, 2005; Petcovic & Grunder, 2003) compared to their corresponding extrusive counterparts. Previous research largely resides in the unpublished literature, abstracts, and theses, although there is currently a resurgence of interest and new data generation. Figure 1 Map of dike exposures (yellow) in the Wallowa Mountains and surrounding areas as well as geochemical sample localities (red diamonds). 141 We take a two-pronged approach to investigating the intrusive rocks that constitute the exposed plumbing system. Building on the study of Morriss et al., (2020), the first goal of this study is to quantitatively assess the variation in chemistry in 198 chemical analyses from exposed dikes spanning more than 20,000 km2 of the CJDS in Oregon and Idaho (Fig.1). This will allow us to build a classification model to place these dikes into a stratigraphic framework. The dataset compiled with previously collected data from Dr. William Taubeneck and Dr. Scott Hughes along with new samples and analyses collected as a part of this study, established a basis for connecting to eruptive units at the member and formation level and establishing intrusive- extrusive chemical connections across the CRB (Fig.1). The second goal of this study is then to analyze a feeder dike complex in detail. We examine the Maxwell Lake Dike Complex (MLDC) exposed in the Lostine River Valley of the Wallowa Mountains, using field observations and isotope and geochemical data to characterize the magma transport dynamics for a major CRB feeder dike complex. First, we investigate the geochemistry (whole rock major and trace elements) from 198 dike segments exposed in and around the Wallowa Mountains, 108 of which also had associated location data (Fig.1). Unfortunately, the geochemical data collected by Dr. Taubeneck could not be correlated to their location (Morriss et al., 2020). The full database of dike samples (most of which do not include full geochemical sets to be classified) includes 660 localities of dike segments visited. On the database of dikes that do have full geochemical sets, we employ supervised machine learning classification to connect the dike compositions to their extrusive counterparts, using a model trained on a large dataset of CRB lavas (see Chapter 2). We also analyze the variation over the entire dataset of intrusive samples from the CJDS to interpret any signals of variation in the geochemistry that may have been imparted by recharge, assimilation, or fractional crystallization (RAFC) magma system processes (see previous chapter). We find a loose spatial pattern with more dikes of Imnaha composition south of the Wallowa Mountains and a larger density of Grande Ronde, Wanapum and Saddle Mtns type dikes further North. When we compare intrusive compositions from the whole dataset of CJDS segments that have been categorized and clustered into their respective extrusive formations and members, we see that the geochemistry of the extrusive samples overlaps in multidimensional geochemical space with the intrusive samples taken from the dike segments. Apart from a small group of samples, the variation in the extrusive and intrusive data appears to represent variation imparted by 142 processes occurring within the magmatic plumbing system as opposed to differences in mantle source or post-eruption differentiation and alteration. We interpret these processes based on machine learning analyses of geochemical data collected from the lavas. We follow analysis of the full dike swarm geochemistry with analysis that characterizes a single dike complex from within the swarm. The MLDC is a well exposed series of en echelon dike segments, some of which suggest compositional affinity with the massive Wapshilla Ridge Member of the Grande Ronde, CRB (hypothesized to be 40,000 km3 total) (Davis et al., 2017; Karlstrom et al., 2019; Petcovic & Grunder, 2003; Reidel & Tolan, 1989). The Wapshilla Ridge Member is a series of voluminous basalt flows that erupted at the peak of the main-phase CRB eruption (Black et al., 2021; Davis et al., 2017; Reidel, 1982; Reidel & Tolan, 1989). Feeder dike complexes are typical for mafic eruptions. While small basaltic fissure eruptions have been widely studied (Keating et al., 2008), feeder complexes for ancient flood basalts are rarely exposed and rarely studied. There are smaller volume modern analogs that may provide insight into dike mechanics, however it is unclear if different dynamics are necessary to emplace larger volumes of magma into the crust (Chapter 1, (Ágústsdóttir et al., 2019; I. Bindeman et al., 2008; Keating et al., 2008; Sigmundsson et al., 2020). This research at the MLDC adds to a building picture of transport dynamics that has involved several different approaches to studying the dike dynamics, from geochronology (Karlstrom et al., 2019) to paleomagnetic observations (J. A. Biasi, 2021), to detailed petrology and isotopic work (Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Petcovic & Dufek, 2005; Petcovic & Grunder, 2003). We build on this work by analyzing the geochemistry of the segments and mapping the structure of several of the dike exposures to better understand how these large volumes of magma were emplaced and how they interacted with the shallow crust. We utilize field and structural observations, mapping, and geochemical analysis to characterize the MLDC. These two goals leverage both large and small spatial scales to understand the processes of dike emplacement, magma transport, and evolution of the intrusive CRB system. In this work we address several outstanding observational gaps that will serve to characterize the MLDC in the context of feeder dike systems generally and intrusions of the CRB. These include 1) the variation in the geochemistry of the dikes over the scale of the Wallowa Mountains and larger CJDS, 2) structural and chemical variation in between related dike segments of the MLDC and relating that to paleomagnetic and thermochronologic studies 143 (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Goughnour et al., 2021; Karlstrom et al., 2019), and 3) petrologic evidence of differentiation processes (such as Assimilation, Fractional Crystallization, or Recharge) (DePaolo, 1981; Ghiorso & Sack, 1995; Pearce et al., 2021) that may have been occurring in the plumbing system. The role of assimilation in these dike transport structures is of particular interest in the context of CRB petrogenesis, to contribute to the ongoing debate about the geometry of the plumbing system that fed the main phase CRB. 1.1 Geochemistry of the Chief Joseph Dike Swarm Dr. William Taubeneck spent a research career at Oregon State University mapping CRB dike swarms with his collaborators (e.g., Taubeneck, 1964, 1970, 1990, 1997) (Fig.1). Scott Hughes, a professor from the University of Idaho accompanied Dr. Taubeneck during several field seasons and collected many samples from the dikes in and around the Wallowa Mountains for geochemical analysis (Fig.1, 2). Their campaigns to rugged and remote areas included sampling dikes deep in the Wallowa Mountains and in the southern plutons of Pedro Mountain, Big and Little Lookout Mountain, and further east into Idaho (Fig.2) (Morriss et al., 2020; Taubeneck, 1970). The campaign to supplement this data undertaken by the study presented here included a horse packing trip to the Lakes Basin (Fig. 3) with the highest dike density in the swarm (Morriss et al., 2020), several excursions to the Maxwell Lake area (subject of several other studies (J. Biasi & Karlstrom, 2021; Karlstrom et al., 2019; Petcovic & Grunder, 2003), and other sampling excursions to more accessible areas of the Wallowa Mountains (Fig.3). This preliminary dataset of dike geochemistry was further augmented by the work of Emily Cahoon and colleagues, who also worked in dikes related to the Picture Gorge Basalt Fm. outside of the Wallowa Mountains (Cahoon, 2020; Cahoon et al., 2020) (Fig.2). Their work was not restricted to extrusive samples, like much of the prior CRB literature and we include 38 of their dikes in our geochemical analysis (Cahoon et al., 2020). Together the intrusive and extrusive data of the CRB give us a nearly unparalleled amount of data from a flood basalt to explore variation, categorize unknown samples into the known stratigraphy, and identify the processes that occurred within this system through multi-dimensional machine learning analysis. We display this data colored by the collector in Figures 2 and 3. 144 Figure 2 Map of geochemical sample locations in the Wallowa Mtn. colored by researcher who collected the sample (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Cahoon et al., 2020; Morriss et al., 2020; Petcovic & Grunder, 2003). The Chief Joseph Dike Swarm is characterized by 4,279 segments of dikes with an average width of ~8 meters and an average segment length of ~100-1000 meters. It crops out over a total area of 35,000 km2 that was largely mapped by the work of Dr. William Taubeneck and compiled by the work of Morriss et al (2020) (Figs.1, 2). The dominant strike pattern of the segments is N/NW though there are a variety of orientations that cross cut this dominant orientation, and individual segments can vary substantially in their strike (Morriss et al., 2020). They are exposed to a paleodepth as deep as ~2 km, but most of the exposures in the Wallowa Mountains are exposed at a paleodepth ~1 km (Hales et al., 2005; Morriss et al., 2020; Schoettle- 145 Greene et al., 2022). In the dataset recorded by Dr. William H. Taubeneck, only ~3% of the dike segments had significant and visible partial melt along the margins, suggesting that the majority of dikes did not actually feed surface flows and were not long lived (Morriss et al., 2020). We use these foundational structural constraints to better understand the variation in the geochemistry of the CJDS and then use a specific dike complex to understand the variation in between the individual segments. Figure 3 Close up view of the Lostine River Valley, Lakes Basin, Hurricane Ck. and Wallowa Lake areas with geochemical sample locations colored by collector. Samples collected by Heather Petcovic, Ilya Bindeman and Leif Kalstrom were taken from the along strike sampling at the Maxwell Lake Dike Complex. 146 Here we compare intrusive compositions from dike exposures in the Chief Joseph Dike Swarm, to the extrusive lava compositions, volumes, and timing of a comprehensive CRB lava flow dataset compiled during this dissertation (Chapter 2) (Fig.2). We want to understand whether the extrusive compositions can tell us about the plumbing system and whether there are signals of process in the dikes that also match (or do not match) the petrologic signals in the extrusive deposits. This can only be done once the dike chemistries have been sorted into the stratigraphic framework. Once each dike sample has a stratigraphic label, we can then compare the element-wise statistics of the intrusive vs. extrusive groups to better understand the relationship between the two and geologically interpret the results. 1.2 The Maxwell Lake Dike Complex We present preliminary mapping and geochemical analysis of the Maxwell Lake Dike Complex (MLDC) in the Lostine River Valley. The MLDC is a well exposed series of dike segments that have been interpreted to represent the feeder system for the CRB (J. Biasi & Karlstrom, 2021; Karlstrom et al., 2019; Petcovic & Grunder, 2003). In this study, we undertake detailed characterization of this complex to assess coupled magma flow and hydrothermal processes that may have occurred during emplacement. We use a variety of observational techniques including oxygen isotopes, whole rock geochemical data, mapping, and field observations to study the dike complex over several years and field campaigns. This is a large and ongoing collaborative project that aims to address questions about dike emplacement dynamics (Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Goughnour et al., 2021; Karlstrom et al., 2019). This area was first mapped by Dr. William Taubeneck and was later revisited by Heather Petcovic and colleagues who conducted detailed petrographic and modeling studies on one segment in particular, named the “Maxwell Lake Dike” by Petcovic and Grunder (2003), but later referred to as “Maxwell A” (Goughnour et al., 2021). We use the updated naming convention that is based on the discovery of the possibly related dike segments that make up the complex and we call this the Maxwell A dike segment (Fig.3). On the basis of bivariate whole rock geochemical analysis this dike segment was associated with the Wapshilla Ridge Flow (Petcovic & Dufek, 2005; Petcovic & Grunder, 2003). At 40,000 km3 this massive flow represents the upper volume end member of magmatic fluxes through the crust, and therefore is of particular interest (Davis et al., 2017; Reidel & Tolan, 1989). Previous studies of the Maxwell 147 A segment revealed substantial interaction with the surrounding crust through partial melting of the Wallowa tonalite (Petcovic & Grunder, 2003). The resulting stages of partial melt frozen in time on the halos of the dike segments offered a mechanism by which to estimate the active flow duration and wallrock heating history of the dike (Petcovic & Grunder, 2003). By combining these measurements with thermal models, Petcovic and Dufek (2005) constrained the active lifespan of the dike to ~3-4 years for the Maxwell A dike segment. However, it is unclear how pervasive these partial melt haloes are in the dike swarm or if similar timescales could be identified along strike (Morriss et al., 2020). More recent studies leveraged thermochronology, oxygen isotopes and paleomagnetic resetting, all in the context of thermal modeling to estimate the lifespan of other dike segments in the complex (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Karlstrom et al., 2019)(Fig.4). Several of these studies focused on what is known as the “Jackson A” dike segment (Goughnour et al., 2021) across the basin but along strike with the Maxwell A dike segment of Petcovic and Grunder (2003) (Fig.3). This very thick and well-preserved exposure provided a location to test the agreement between the different thermochronology estimates. Paleomagnetic studies of three dike segments along this complex revealed an active lifetime of 1-4 years for both the Jackson A and Maxwell A dike segments (J. A. Biasi, 2021; J. Biasi & Karlstrom, 2021). This is slightly longer than the lifespan for a dike segment just north of Maxwell A, estimated to have had a lifespan of months to as much as a year (J. A. Biasi, 2021; J. Biasi & Karlstrom, 2021). Zircon U-Th thermochronology revealed similar ages to these paleomagnetic estimates. Karlstrom et al., (2019) used inverse thermal modeling of the U-Th age resetting pattern sampled away from the dike to estimate the Jackson A segment to have an active lifespan between 1-10 years. The recent masters thesis work of Goughnour (2021) further constrained the active lifespan of this dike segment with new thermochronometers to be between 2-8 years. The heating from this dike segment created a larger zone of resetting in the wallrock than observed in other dike segments (Goughnour et al., 2021). We compare this variation to geochemical indicators to attempt to further understand the differences between dike segments. Relatively good agreement between the different lifespan models provides a robust estimate for the active lifetime of ~2-4 years for the Jackson A segment, and the preliminary variation in the data suggests a complex thermal (and hence magma flow) history for segments of the MLDC broadly. 148 While a much clearer picture of the lifespan of the different dike segments has formed through these studies, other aspects of the variation in the dike complex remain poorly constrained. The explored length of the MLDC is at least 2.5 km, but along this stretch there are ~12 dike segments that we studied that exhibit significant diversity in inferred wall rock interactions and longevity. While the dike segments are all roughly along strike with one another, they are not continuous and form what appear to be large en echelon gaps between segments (Fig.3). Notably the general strike of these segments is NNE, which is off of the NNW regional trend of the dike swarm as a whole (Morriss et al., 2020). We detail the variation between these segments, particularly in the geochemical data which has not yet been documented prior to this study at multiple segments in this area. In Section 2 we detail our methods for field mapping, sample collection, chemical analyses, and machine learning classification and analysis to characterize the MLDC feeder system. 2 Methods 2.1 Geochemical sampling and analysis Geochemical sampling and analysis of Wallowa area dikes, and the Maxwell A and Jackson A dike segments, form the backbone of this study. Geochemical samples for this study of the dikes were taken over the course of four summer field campaigns, including the Lakes Basin and Maxwell Lake Study areas, and the surrounding plutonic exposures to the north, east, and south of the Wallowa Mountains (Fig.1, 2, 3). All samples collected from this study were taken from within the central 1 meter of exposure at the center of the dike segments being investigated (Fig.4). The freshest samples were targeted, and all weathered surfaces were removed in the field to ensure that only the least altered basaltic material. Figure 4 shows three examples of dike segment exposures that were sampled in this study. Textures such as filled vesicles or vugs, visible alteration mineralization (e.g. chlorite), or post-emplacement calcite or quartz filled cracks were observed in some segments but were avoided during geochemical sampling efforts. Sample locations are shown in Figures 2 and 3. 149 Figure 4 A) Dike segment exposed along the road to the top of Big Lookout Mountain. Sample site for CRBD1907 is shown. B) Dike from the Lakes Basin area with a large zone of apparent excavation along the margin. The corner was filed with partial melt. C) The furthest north exposure in the MLDC presented in this study at the locality of 21-MaxR-09. The 2-3 m wide partial melt zone is shown in the welded margin of the dike. Phenocryst sizes within these intrusive dike segments ranged 1-30 mm and adequate masses (up to 2 kg) were collected to ensure samples were homogenized for geochemical analysis. Splits of the samples were retained for oxygen isotope analyses and the remainder was sent to the Washington State GeoAnalytical Laboratory in Pullman WA. Samples were powdered to less than 50 microns, fused in an oven at 1000°C and run through the ThermoARL 150 X-ray Fluorescence Spectrometer (XRF) to analyze both 10 major and 19 trace elements (Johnson et al., 1999). For full details of the method used in this laboratory see (Johnson et al., 1999). In addition to the samples collected and analyzed from this study, the intrusive dataset includes samples from work done by other researchers. The intrusive dataset built by Dr. William Taubeneck and presented in the work of Morriss et al., (2020) included several geochemical samples of dike material; however, the location data for those samples is not available. We classify some of the samples from that study despite the lack of precise location information to better understand the geochemical trends exposed in the larger dike swarm. The dataset provided by Dr. Scott Hughes, provides geochemical sample data from dikes in and around the Wallowa Mountains (Fig. 2, 3). This dike geochemical dataset includes dike samples with geochemical information and associated location data that could be classified using the supervised machine methods discussed herein (Fig. 2, 3). He has graciously provided us with this data to better understand the dike swarm geochemistry as a whole. Additional intrusive data was added by published dike geochemical data from the study of the Picture Gorge basalts undertaken by Emily Cahoon (Fig.3) (Cahoon, 2020; Cahoon et al., 2020). This included 38 segments of dikes from the area to the west of the Wallowa Mtn. in central OR where the Picture Gorge Formation basalts are primarily located. Also included in the database is geochemical information from 12 dike segments collected by Joe Biasi (J. A. Biasi, 2021; J. Biasi & Karlstrom, 2021)(Fig. 2, 3). These are also an important test for the dataset classification generally as he completed a least squares regression to classify the dikes into the stratigraphic members with the closest chemistry (using a less complete CRB database than ours) (J. Biasi & Karlstrom, 2021). 2.2 Oxygen Isotope Analyses Dike samples kept for oxygen isotope analysis in this study of the CRB plumbing system, were lightly crushed to liberate any glass, feldspars and pyroxenes. Separation of the glass from the microcrystalline feldspars was difficult but possible through careful picking. The freshest possible samples of “bulk” material with mostly glass were then loaded into the laser fluorination line at the University of Oregon Stable Isotope Laboratory. Within the high vacuum system, a laser ablates the sample with a small amount of reagent (BrF5) present to liberate the oxygen (I. 151 Bindeman, 2008). The O2 is then captured through conversion to CO2 and finally measured in the mass spectrometer which gives us the ratio of 18O to 16O (I. Bindeman, 2008). Samples of wallrock were taken in addition to the samples of the dike segments themselves. In several cases the wallrock to dike contacts are well exposed enough that we could replicate the process of Bindeman et. al, (2020) to produce transects of oxygen isotope values with select minerals from the tonalitic wallrock. Small samples < .5 kg were taken at selected intervals away from the dike and recorded. These samples were crushed to liberate the minerals quartz, feldspar and biotite. The freshest and most intact minerals (under the microscope) were picked for analysis in the laser fluorination line (I. Bindeman, 2008; I. N. Bindeman et al., 2020). 2.3 Field Observations and Structural Analysis A major goal of this work was to build upon previous work on the MLDC through thorough mapping, sampling, and characterization of the entire dike system. Prior to this study and the work of our collaborators, only two segments in the Maxwell Lake area had been studied in detail and most of the other segments in the area had not yet been identified (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Karlstrom et al., 2019; Petcovic & Grunder, 2003) (Fig.3). For each segment, we collected qualitative observations and quantitative measurements of the dike segment structures. In particular at each dike segment, we measured strike, dip, and width, observed interactions between the dikes and the surrounding wallrock, mineralogy, and cooling habit and took detailed geochemical samples. This detailed geochemical sampling included two transects, one across the dike (Jackson Lake) in the MLDC area and one along strike (Glacier Pass) (Fig.3). Our collaborators also collected transects of paleomagnetic and thermochronologic samples in the tonalitic wallrock near the margins of several dike segments to measure the reset distance induced by heating from the dike. For a more detailed look at these methods and the modeling necessary to quantify a “reset distance” see (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Goughnour et al., 2021; Karlstrom et al., 2019). Oxygen isotope sampling was done in conjunction with the reset distance sampling and thus provides a further constraint on the variation between segments in the MLDC and Lakes Basin areas (Fig.3). 2.4 Summary of Statistical Methods and Classification The geochemical dataset of intrusive samples includes 198 samples from multiple parts of the CRB stratigraphy that have overlapping geochemical compositions, especially on the 152 canonical bivariate geochemical plots (total alkali-silica, Harker plots, trace element ratio, etc.). In the previous chapter, we developed a model that utilized a newly compiled large database of CRB lavas to recognize the geochemical patterns of different stratigraphic members of the CRB (Chapter 2). This method is effective in delineating groups and variation. This tool provides a foundation with which to classify unknown samples into the detailed erupted stratigraphy of the CRB. Here we apply it to the unknown intrusive samples associated with the CRB system. We utilize the multinomial logistic regression methods detailed in Chapter 2 to create a model to sort the unknown intrusive samples into the stratigraphy (Dangeti, 2017; Itano et al., 2020; Pedregosa et al., 2011; Ueki et al., 2018). At the CJDS scale with 198 dike samples, we classify dike segments at the Formation and Member level (Fig.2). For each unknown sample, a probability distribution is created based on the likelihood of belonging to each possible category over all dimensions. This can then generate a distribution of maximum likelihood (Dempster et al., 1976) classifications for the population of dike samples from the Wallowa Mtns. area. (Chapter 2), as well as quantify the confidence of classifications and any other likely labels. We present this data in Section 3.1 and 3.3 in the form of a confusion matrix that shows the proportion of samples classified into each category from the test data and a histogram of the probabilities and assignments. This allows us to map the quantitative stratigraphic assignment for each sample. Samples with a probability under 85% in this study are considered uncertain and are therefore subjected to further geochemical scrutiny for classification. At the scale of the MLDC, the classification at the member level is improved by including more data. All the segments in the MLDC had reversed polarity (J. Biasi & Karlstrom, 2021). Our model can restrict matches only to units that have reversed polarity. We train the supervised multinomial logistic regression on these reversed lava compositions to then find the closest possible match for these samples with that narrowed down suite of training data. This improves the accuracy of the model and allows for more reliable results at this detailed scale. To access the full dataset of dike geochemical samples and the probability distributions for each class and each sample using 70% and 100% of the training data, see the supplemental files in this dissertation (S1, S2, S3). 153 3 Results Within the broader population of CRB segments, most of the magmas most closely resemble Grande Ronde magmas with a larger variation in formation identification on the outskirts of the Wallowa Mountains and deeper in the plumbing system (i.e. Hurricane Ck.) (Fig.3, 5). At the member level within these Grande Ronde identified dikes, most appear to closely resemble the magmas that erupted as part of the Wapshilla Ridge, Winter Water, and Sentinel Bluffs members, with a secondary large population representing Wanapum Formation related dikes (Fig. 5,6). We also identify three localities in the Lakes Basin where Grande Ronde dikes cross cut Imnaha type dikes, and additional areas with strong depletions in oxygen isotopes. In the following Section (3.1) we detail these results, illustrating the variation that occurs with the exposed CJDS in the Wallowa region. At the scale of the full CJDS, individual dike segments are often idealized as continuous vertical sheets of intrusive magma moving through the crust as hydraulic (opening mode) fluid- filled fractures, though there has been documentation on the variation from this idealized model (Hudak et al., 2022; Keating et al., 2008; Sigmundsson et al., 2015, 2020; Townsend et al., 2017). But our mapping of the MLDC shows immense complexity in the structural and chemical characteristics between segments within 100 m of each other along strike (relative to the size of the CRB flood basalt province of extrusive lavas). Structurally we find that the en echelon nature of the MLDC is likely indicative of near-surface rotation of the stress field (Keating et al., 2008). We do not think such segmentation is common in the CRB and may be related to the fact that the MLDC is not aligned with the regional average of CJDS orientations. Chemically at the MLDC, we find that Wapshilla Ridge type magma, Grouse Creek type magma, and Meyer Ridge type magma, three neighboring stratigraphic members of the CRB, occupy this interwoven structure of en echelon dike segments that all align in strike over at least 2.5 km. There is no evidence for any reoccupation of the dike segments in the MLDC by multiple magmas. This either suggests that these different compositions are part of a mixing trend or that the timing between the members was shorter than the cooling timescale of the dikes (which has been constrained to the order of years). We explore these two hypotheses to try to establish the source of the variation in this dike complex. 154 Figure 5 Dike geochemical samples classified by Formation using supervised machine learning Multinomial Logistic Regression. 3.1 CJDS Supervised Classification With high accuracy training and testing models from the CRB lava dataset (Chapter 2), we can categorize samples taken from dikes in the Wallowa Mountains and associate them to their geochemically closest extrusive stratigraphic relative and, when known, relate that with polarity. The probability-based classification of each individual sample can also be placed into the context of the population of dikes being categorized (Fig. 5). In this study we used several 155 preprocessing methods prior to carrying out supervised classification (see Section 2). Figure 5 shows the spatial outcomes of classifying the dikes in this population. The majority of dikes in the Wallowa Mountains and in the dataset generally are categorized as Grande Ronde related dikes. This is evident when we analyze a histogram of population identifications for the full dike dataset at a member level within the full CRB stratigraphy (Fig. 6). When we use just ratio combinatorics and power transformation, most of the dikes are Sentinel Bluffs or Wapshilla Ridge related, with another large population classified into the Wanapum formation (Fig. 5, 6a). When we use bootstrapping to even out the number of samples in each category, the majority of the dikes are classified into the Wapshilla Ridge (Fig. 6b). This bootstrapped model appears to overfit the training data, as small sample numbers result in sampling the same instance over and over again, creating an artificially tight distribution for those groupings. Test data suggests that these models are not good at recognizing unknowns that don’t fit the point exactly. There is always variation in geochemical groupings that the model must account for. Thus, we rely mostly on the non-bootstrapped data outcome (Fig. 6a) to analyze the categorical probabilities for each sample because it more accurately captures the variation in the member geochemical distributions and therefore better represents the natural variability in geochemical sampling. Further member level classification of the dikes with a formation classification of Grande Ronde is also displayed in the histograms of probability in Figure 7. Here we reduce the training set to just Grande Ronde members and classify the dikes based just on that smaller training set (Fig. 7). We also assess the impact of outliers on the outcome of supervised classification. Outliers are defined as samples that lie more than three standard deviations away from the mean compositions of their respective formations (Peck et al., 2005). When we include the small group of outliers (<0.5% of the entire CRB database) in the classification of just the samples labeled as part of the Grande Ronde Formation, the majority of the dikes in the swarm are related to the Sentinel Bluffs member (Fig.7a). However, when we remove that outlier group from the Grande Ronde classification model and focus just on the main compositions of Grande Ronde magma, the dikes appear to be related to both the largest volume eruptions, the Wapshilla Ridge and Sentinel Bluffs members (Reidel et al., 2013) (Fig.7b). Because the group of outliers is relatively small and may represent geochemical variation outside of magmatic processes, we prefer the model that does not use outliers (Fig.7b). 156 Figure 6 Probability histogram plots have several components: the left hand side (blue) shows number of samples per the member category on the y-axis, the right hand side shows the average maximum probability for the samples classified into each group (salmon bars) and the black dashed line that accompanies it shows the mean probability of other possible category assignments (the higher the probability of this line, the more uncertainty). A) Formation wise 157 classification outcome over the entire CRB stratigraphy for the dike segment geochemical samples. The Majority of the dikes belong to the Grande Ronde Formation. B) Member-wise classification model over the entire CRB stratigraphy and 197 dike samples. The majority of dikes classify as Sentinel Bluffs, Wapshilla, and Teepee Butte. Figure 7 Test data mode applied to just the Grande Ronde samples and a model that just classifies over the Grande Ronde stratigraphy. A) Supervised classification model for samples classified in the Grande Ronde Formation to divide them into members. The model is panel A includes outliers. B) Supervised classification model over the Grande Ronde members with outliers removed from the extrusive dataset. 158 At a formation level, dikes are classified with a high overall accuracy (95%) model. At the Member level, the classification of the dikes into the well-known categories, such as the Wapshilla Ridge and Sentinel Bluffs category, all have maximum probabilities over 95%. However, several of the dikes classified into other members of the Grande Ronde have low confidence associated with their classifications (Fig. 7b). Less than 20 of the dike segments classified as Grande Ronde and further classified into a member level had probability less than 75% for the geochemical classification. These samples that do not have high maximum probabilities related to a single class are considered uncertain in their classification. These compositions may be difficult to place between multiple chemically similar members, be related to unerupted magma without a near stratigraphic relative or may be indicative of cooling or alteration differences between the intrusive and extrusive samples (Sawlan, 2017) (Fig. 7). 3.2 Classifying the Chemistry of the Lakes Basin Study Area In Hurricane Ck. and the Lakes Basin, several instances of samples with Imnaha compositional affinities were recorded, in addition to Grande Ronde composition dikes (Fig.8). In Hurricane Ck. the dike segments sampled represent some of the lowest paleodepth samples in the Wallowa Mountains (Hales et al., 2005; Schoettle-Greene et al., 2022) (Fig.8), up to ~3 km below the paleo surface. They had plagioclase phenocrysts up to 5 mm in length with abundant phenocrysts of clinopyroxene and plagioclase overall. Where they intrude into the marble facies of the surrounding metasedimentary rock, the dikes in Hurricane Ck. have induced contact skarn- like metamorphism on the surrounding marble, with tremolite crystals > 1cm growing abundantly along the margins of the dike. These dike segments were classified as Imnaha type dikes. Sample DRH1801 from Hurricane Creek is classified as part of the Rock Creek Member of the Imnaha Formation with a 99% probability. The other sample from Hurricane Creek (DRH1802) was a little less clear in its chemical affinity; the sample had a 78% chance of being related to the American Bar Member and an 18% probability of belonging to the Rock Creek Member. These segments in Hurricane Ck. appear spatially and structurally related and therefore represents an example of a dike system where we see observe variation along strike. 159 Figure 8 Geochemical sample locations colored by Formation in the Lakes Basin and Maxwell Lake Study areas. In the Lakes Basin Study Area, we sample a transect that spans ~1.25 km along strike from Glacier Pass to the heart of the Lakes Basin. The 20-LBR-08 Transect, appears to be less segmented and structurally complex than the MLDC suggesting that this is more like the idealized model of a single continuous sheet, rather than en echelon segments (Fig. 9). The dike segments are generally oriented N/NW, which is more in line with the overall dominant orientation of the dike swarm, regional tectonic fabric, and extensional stresses (Morris et al 2020). Starting at 08a at Glacier Pass and all the way to the valley floor with 08h, 300m lower in elevation, this transect sampled six dike segments along strike (Fig. 9). Some of the samples are more evolved than others in this transect leading the model to suggest that these are in fact segments of Wapshilla Ridge, Grande Ronde dikes (Fig. 9). Along this transect the dike is 10-15 m wide with minimal partial melt on the margins (Fig.9). We did however observe a strong correlation between samples that classify as Grande Ronde samples and segments that had large 160 (> 1 m3) blocks of assimilated tonalitic wallrock within 10m of the sample. It was in these areas with influence by a block of assimilant that the composition appears more Grande Ronde, whereas along the rest of the segment it classifies as part of the Imnaha Formation. The samples classified with chemical affinity to the Imnaha were all classified as part of the American Bar Member of the Imnaha Formation with 20-LBR-08b, d, e, g, and all classifying with probabilities over 95%. Figure 9 Lakes Basin geochemical samples colored by Formation. This variation along strike is not the only notable interaction between dike segments of different chemistry in the Lakes Basin area. Samples 20-LBR-06a and 06b are from two cross cutting dike segments (Fig. 9). Sample 20-LBR-06a is oriented ~10-20°N while 06b is oriented 161 at nearly 90° to the E-W. Sample 20-LBR-06a was significantly thicker (~15m) and more crystal rich (mostly clinopyroxene and plagioclase) (Fig. 9). Sample 20-LBR-06b on the other hand was crystal poor with only microlites, it was accompanied by partial melt on the edges of the dike segment, and it was much thinner (~8m). The latter dike segment was classified as a Grande Ronde type dike, with a 94% probability of belonging to the Wapshilla Ridge Member category, and crosscut the thicker dike segment, which was classified as an Imnaha dike, with a 99% probability of belonging to the American Bar Member (Fig. 9). Two other dike segments of interest in the Wallowa Mountain Region were also classified as Imnaha type dikes: sample CRBD1907 from Big Lookout Mtn. and 20-LBR-13 from the Lakes Basin. The CRBD1907 dike segment is exposed along the road to the top of Big Lookout Mountain, south of the main Wallowa Mountain pluton (Fig.1, 2, 5) This dike was sampled and described (but not published) during the work of Heather Petcovic (Petcovic, 2004). It is unusual because it is an extremely thick dike (>20m thick), has large zones of dehydration partial melting, marked particularly by biotite reactions, extending from the dike margins and is well exposed in an area with significant cover. Most dikes that have this thickness (>15m) lack partial melt zones on the edges and appeared to have mostly excavated the wallrock rather partially melting it. This dike therefore represents a target for future work in the southern plutons as a thermochronology comparison to the main Wallowa pluton. This dike was classified as part of the Rock Ck. member of the Imnaha Formation with 99.9% probability. In the Wallowa Mountains Lakes Basin area sample 20-LBR-13 lies along the Two Pan trail from the Lostine River Valley to the Lakes Basin Wilderness area. (Fig.9). The segment has a complex structure appearing to bend at 30° in outcrop. This could represent a cross cutting structure, but no field evidence was found for a cross cutting relationship. This dike also deserves attention in future work, with a ~9 m thickness and significant crystal cargo (plagioclase and clinopyroxene). This sample was classified as part of the Imnaha Formation with an accuracy of 95%, however the member affinity for this sample is uncertain in the model classification. The model gives a 60% probability that the sample belongs to the American Bar Member, but another 42% probability of classification to the American Bar Member. We therefore find this membership category uncertain and would need to use further investigation to decipher its chemical affinity. 162 3.3 Classifying the Chemistry of the MLDC Having classified each dike in our database, we can now investigate in more detail the Maxwell Lake Dike Complex, and particularly its geochemically segmented variation. Each dike segment in this area was classified as a Grande Ronde dike. This includes 5 samples to the south of the Maxwell Lake area. All samples in that area are most likely Grande Ronde dike segments and may represent a continuation of the MLDC trend (Fig. 8). More focused analysis can then further classify each dike at the sub-formation member level. This approach requires a separate, new classification model with additional paleomagnetic constraints. We use a subset of training data that learns the categories with reversed polarity in the Grande Ronde Fm., due to the added polarity constraint that all samples in the MLDC are reversed in polarity (J. Biasi & Karlstrom, 2021) (Fig.10). This reversed Grande Ronde member level model is trained to an overall accuracy of 91% (Fig.10a). Each sample that used this model was classified into a category with a maximum probability >75% and with the majority classifying with >90% probability, except for one sample (CRB-60) which had a maximum probability < 75% (Fig. 9, 10b). Four samples had maximum probabilities below 90% and are thus considered potentially uncertain (Fig.10b). Sample 20-LK-TB1 has an 82% probability of belonging to the Wapshilla Ridge Member, and a 12% probability of belonging to the Grouse Creek Member (Fig.10b). 21-MaxR-08 has a 78% probability of belonging to the Wapshilla Ridge Member and has a 12% probability of belonging to the Grouse Creek Member. CRB-58 collected by Dr. Ilya Bindeman had an 88% probability of belonging to the Meyer Ridge category, similar to the other samples from this same Jackson A segment but had a small probability (8%) that it is more similar to the Mt. Horrible Member of the Grande Ronde. While these three samples are lower than 90%, they are above 75% so we cautiously assume the maximum probabilities for these samples and plot them on the map for spatial analysis. Sample CRB-60, collected by Dr. Ilya, only had a maximum probability of 66% of belonging to the Mt. Horrible Member. A further three members for this sample had probability over 10% and we therefore consider this sample uncertain in classification by the model. All the other samples in the MLDC dataset, are classified with probability over 95% into the Wapshilla Ridge Member (the Maxwell A dike segment) or into the Meyer Ridge Member (Jackson A segment transect of samples) (Fig.12, 13). 163 Table 1. Probability Outcomes for Samples from MLDC Original Buckhorn Hunter/Birch Teepee Rogersb- Skeleton Center Kendrik Mt Wapshilla Grouse Meyer Max Sample ID Springs Ck. Butte urg Ck. Ck. Grade Horrible Ridge Ck. Ridge Probability 21-MaxR-02 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-LK-TC1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-MaxR-10 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-MaxR-12 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.999 0.000 0.000 1.00 20-MaxR-08 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-MaxR-14 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 21-MaxR-01 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-MaxR-07 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.999 0.001 0.000 1.00 20-MaxR-06 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.999 0.000 0.000 1.00 20-MaxR-05 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 LKML21-3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.999 0.001 0.000 1.00 20-MaxR-09 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 LKML21-1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 21-MaxR-05 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.998 0.002 0.000 1.00 21-MaxR-06 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.997 0.001 0.000 1.00 20-MaxR-11 0.000 0.000 0.000 0.000 0.005 0.000 0.001 0.000 0.994 0.000 0.000 0.99 20-MaxR-13 0.000 0.000 0.000 0.000 0.018 0.000 0.000 0.000 0.979 0.002 0.000 0.98 21-MaxR-08 0.000 0.011 0.000 0.000 0.067 0.000 0.000 0.000 0.782 0.138 0.001 0.78 20-LK-TB1 0.000 0.026 0.000 0.009 0.007 0.002 0.000 0.017 0.816 0.122 0.001 0.82 MLR-01-72 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 MLT-01-65 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 1.00 20-MaxR-01 0.000 0.000 0.003 0.000 0.000 0.001 0.000 0.001 0.000 0.000 0.994 0.99 20-MaxR-04 0.000 0.000 0.007 0.002 0.000 0.000 0.000 0.008 0.000 0.001 0.981 0.98 20-MaxR-15g 0.000 0.001 0.002 0.001 0.000 0.002 0.000 0.008 0.000 0.000 0.986 0.99 21-MaxR-04 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.996 1.00 21-MaxR-09 0.000 0.001 0.012 0.001 0.000 0.000 0.000 0.011 0.000 0.000 0.975 0.97 20-MaxR-15n 0.000 0.001 0.003 0.004 0.000 0.001 0.000 0.018 0.000 0.001 0.971 0.97 CRB-58 0.000 0.003 0.005 0.015 0.009 0.010 0.000 0.082 0.000 0.001 0.876 0.88 164 Table 1, continued Original Buckhorn Hunter/Birch Teepee Rogersb- Skeleton Center Kendrik Mt Wapshilla Grouse Meyer Max Sample ID Springs Ck. Butte urg Ck. Ck. Grade Horrible Ridge Ck. Ridge Probability LKML21-2 0.000 0.000 0.004 0.001 0.000 0.001 0.000 0.002 0.000 0.000 0.992 0.99 20-LK-TE1 0.000 0.001 0.001 0.001 0.000 0.002 0.000 0.013 0.000 0.001 0.981 0.98 20-MaxR-15h 0.000 0.001 0.001 0.001 0.000 0.001 0.000 0.005 0.000 0.000 0.991 0.99 CRB-60 0.000 0.021 0.023 0.027 0.255 0.002 0.000 0.660 0.000 0.000 0.011 0.66 21-MaxR-03 0.000 0.004 0.012 0.007 0.000 0.004 0.000 0.029 0.000 0.001 0.943 0.94 20-MaxR-15j 0.000 0.001 0.003 0.000 0.000 0.002 0.000 0.001 0.000 0.000 0.993 0.99 20-LK-TD1 0.000 0.004 0.000 0.000 0.000 0.001 0.000 0.001 0.000 0.000 0.993 0.99 21-MaxR-07 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.998 1.00 165 Figure 10 A) Confusion matrix for model to classify MLDC geochemical samples that only compared the samples to known reversed polarity Grande Ronde extrusive lava fields (Member ID Labels for Reversed Grande Ronde Members. 6: Buckhorn Springs; 7: Hunter and Birch Creek; 8: Teepee Butte; 9: Rogersburg; 10: Skeleton Creek; 11: Center Creek, 12: Kendrik Grade; 19: Mount Horrible; 20: Wapshilla Ridge; 21: Grouse Creek; 22: Meyer Ridge) B) Outcome of that classification model on the samples of the MLDC showing the number in each category on the left and the average maximum probabilities for those classifications on the right. Figure 11 Circular barplot example of a classification with more uncertainty for sample CRB- 58, collected by Dr. Ilya Bindeman (I. N. Bindeman et al., 2020) in the MLDC. 166 Figure 12 Circular barplots of probabilities from the Supervised Classification for the Maxwell A (MLR-01-72 collected by Heather Petcovic) and Jackson A (20-MaxR-15h) dike segments in the MLDC. Figure 13 Bivariate distribution of Maxwell Samples against closest extrusive affinity 167 Geochemistry of the MLDC is heterogeneous and has a complex spatial pattern (Fig. 13, 14). However, individual dike segments all have tightly clustered compositions and thus end up with the same classification, suggesting that it is not sampling bias or variation across dike segments that is causing the variation seen in between dike segments (Fig. 13). This includes the Jackson A dike segment with 6 samples taken from different points across the dike which all classify as the more primitive Meyer Ridge composition. One of the samples taken from the middle of the dike is shown in Figure 12a with 99% probability. Further along strike, several samples from across the Maxwell A dike segment (and taken by multiple researchers) all have a definitive Wapshilla Ridge chemistry (Fig. 12b, 13; Petcovic & Dufek, 2005; Petcovic & Grunder, 2003). Sample MLR-01-72, from the Maxwell A segment, was classified as Wapshilla Ridge at 99.9% confidence, consistent with (Petcovic & Dufek, 2005; Petcovic & Grunder, 2003) (Fig. 12b). The different compositions do not occupy the same dike segment structures and therefore appear to have an alternating en echelon pattern (Fig. 14). Figure 14 summarizes these maximum classifications and shows the spatial pattern of the chemistry in the dike segments of the MLDC. Figure 14 MLDC map of geochemical samples colored by Grande Ronde Member. 168 169 3.4 Comparison Between Intrusive and Extrusive CRB Data The excellent exposures of the CRB system offer us an opportunity to compare the intrusive magmas to extrusive lavas. Figure 8 shows two bivariate spaces that include the colored data of the lavas and the black stars indicating the compositions of the dikes that were sampled and put into the database. While the dikes do not span the entire array of extrusive compositions (i.e. some formations have less dike exposures than others), this dataset has strong overlap between the intrusive and extrusive compositions in all dimensions (Fig. 15). The dikes cover the majority of the extrusive data, with the rest of the main trend in the extrusive data falling within two standard deviations of the dike compositions (Fig.15). In particular the Grande Ronde, Imnaha and Wanapum seem to be well represented in this dataset of intrusive samples that has a bias for sampling in and around the Wallowa Mountains area (Fig.15). A small outlier group (<0.5% of the dataset, < 40 samples) in the data of the Grande Ronde Formation (Fig. 15), noticeable in the Rb content especially, does not have dike compositions associated with it and lies more than three standard deviations away from the dike compositions. This could indicate that this small group of samples experienced post emplacement processes, or it could indicate that the dikes for that group have yet to be sampled. The Saddle Mountains formation is the least sampled in the dike database. However, dike data from north of the Wallowa Mountains show compositions similar to that of Saddle Mountains flows with high Ba content (J. Biasi & Karlstrom, 2021) (Fig.3). For all the formations including the Saddle Mountains and Grande Ronde, the dikes appear to cluster more tightly than the extrusive geochemistry but show good coverage of the full range in variability (Fig.15). We discuss the implications of this comparison in Section 4.1. Figure 15 Bivariate comparison between intrusive dike segment geochemical data and extrusive lava geochemical data. Lavas are colored by Formation ID (0: Picture Gorge; 1: Steens; 2: Imnaha; 3: Grande Ronde; 4: Wanapum; 5: Saddle Mountains) 170 171 3.5 Oxygen Isotope Analyses from the MLDC and Lakes Basin Study Areas Prior work on oxygen isotopes from the Jackson A dike segment and surrounding wallrock found 1-2‰ depletions in δ18O in the dike relative to the flows, consistent with ~100m of hydrothermal activity around the dike over the 4-7 years it was modeled to be active (I. N. Bindeman et al., 2020). In this study we expand the application of oxygen isotope analyses to dike segments elsewhere in the plumbing system. This larger lens on the oxygen isotopes allowed us to investigate the extent to which hydrothermal systems develop in the crust around CRB dikes more generally. The oxygen isotope analyses from samples taken from the Lakes Basin and MLDC areas are presented in Figure 16, with all phases shown and organized by latitude. Spatial visualization of these results is shown in Figures 17 and 18, which also allow us to investigate whether the Jackson A dike segment is an anomaly. Several of the dike samples in both the Lakes Basin area and MLDC have very depleted values of basalt dikes. This includes values of basalt with duplicated analyses as low as -0.2‰ (Fig. 16a) from the MLDC 20-MaxR-13 sample. In the Lakes Basin area, two similarly depleted basalt samples have δ18O below 0‰ (Fig. 16b, 18). This is indicative that in some areas the hydrothermal alteration was even more pervasive, with a longer duration and potentially more fluid and higher temperatures than at the Jackson A segment. The rest of the basalt dike samples however, are within error of normal mantle value of 5.5‰ indicating minimal hydrothermal alteration (Fig. 16) (I. Bindeman, 2008; I. Bindeman et al., 2008). Partial melt analyses from the contact zone between the dikes and the tonalite wallrock also varied significantly, from +3‰ to +9‰, suggesting that the hydrothermal alteration was localized and does not vary systematically along strike (Fig. 16, 17, 18). 172 Table 2. Oxygen isotope analyses from dikes and wallrock in the Wallowa Mountains with associated locations. Stratigraphic Dike δ18O Partial Biotite Xenolith Sample Latitude Longitude Sc/Zr Member Fm (‰) Melt (‰) (‰) (‰) Class Max-13 45.253374 -117.406805 0.19495743 WP GR -0.2 45.253374 -117.406805 2.6 45.253374 -117.406805 3.1 45.253374 -117.406805 3.6 Max-11 45.24679 -117.414 0.1805485 WP GR 5.4 45.24679 -117.414 4.1 Max-02, - 45.257833 -117.4034 6.2 03 45.257833 -117.4034 6 45.257833 -117.4034 6.4 45.257833 -117.4034 6.3 LBR-04 45.19215 -117.31 -0.2 LBR-06a 45.19171 -117.309 0.18128079 WP GR 4.5 45.19171 -117.30895 0.18128079 WP GR 4.885 45.19171 -117.309 5.8 LBR-06b 45.19171 -117.309 0.19594937 IM IM 5.2 45.19171 -117.30895 0.19594937 IM IM 5.18 45.19171 -117.30895 8.247 LK-TB 45.253672 -117.405998 0.20626963 WP GR 5.4 45.253672 -117.405998 5.8 45.253672 -117.405998 6.1 LK-TA 45.252214 -117.407807 7.7 45.252214 -117.407807 6.1 Max-13 45.253374 -117.406805 0.19495743 WP GR -0.082 45.253374 -117.406805 3.06 173 Table 2, continued Stratigraphic Dike δ18O Partial Biotite Xenolith Sample Latitude Longitude Sc/Zr Member Fm (‰) Melt (‰) (‰) (‰) Class Max -13 45.253374 -117.406805 3.649 LBR-04 45.19215 -117.31 1.594 45.19215 -117.31 5.02 LBR-05 45.19172 -117.31108 5.231 LBR-03 45.19377 -117.30847 4.182 45.19377 -117.30847 5.136 45.19377 -117.30847 4.966 LBR-12 45.1825 -117.308 0.15027076 WP GR 5.275 45.1825 -117.30779 8.748 45.1825 -117.308 7.6 45.1825 -117.308 7.5 LBR-08 45.16662 -117.28588 3.995 45.16747 -117.286 5.9 45.16662 -117.28588 6.666 45.17439 -117.29156 3.746 174 Figure 16 A) Dike δ18O results vs Sc/Zr geochemistry for dike samples with both chemical analyses and oxygen isotope. B) Summary of oxygen isotope data from the Maxwell and Lakes Basin areas collected during this study organized by latitude. 175 Figure 17 Map of oxygen isotope analyses for the MLDC. Diamonds indicate localities where oxygen isotope information was sampled, including the transect from Bindeman et al., 2020, with colored diamonds indicating new sample localities. The traces of the dike margins are shown in solid black lines along with the geochemical sample sites shown in the background in green. 176 Figure 18 Oxygen Isotope analyses for the Lakes Basin. Geochemical samples are shown by the circles underneath the isotopic measurements. 177 3.6 Observations of Assimilation and Excavation In addition to lab analyses and machine learning classifications, we also carried out field observations to describe these dikes and their interactions. In particular, we noted signs of assimilation in the interaction between the intruding dikes and surrounding tonalite or metasedimentary wallrock (Fig. 4). In some instances, the dikes partially melted the surrounding granite, leaving behind resistant fins that exhibit partial melting dehydration reactions that progressively consume biotite, hornblende, and orthoclase from the tonalitic wallrock and leave behind a melt residue of glass and partially consumed quartz and plagioclase (Petcovic & Grunder, 2003). This type of assimilation can be explained by thermally activated partial melting and melt migration (Hampton et al., 2020; Petcovic & Dufek, 2005; Petcovic & Grunder, 2003). This type of partial melting was documented on several segments in the MLDC but notably was present on the Maxwell A and Jackson A segments. Figure 19 shows a map of the partial melt extent at the Jackson A dike segment, while Figure 20 shows the results of the chemical analyses from the dike. Figure 19 Map of partial melt zone variability along the margin of the Jackson A dike segment. But in other cases, the assimilation was more likely mechanical in origin. We noted areas where large blocks of granite were entrapped in the dike and frozen in the act of partially melting (Fig. 4, 21a).We also noted areas around the dikes where clear signs of excavation were exposed, 178 displaying areas that had been cut out by the force of the lateral and vertical movement in the dikes (Fig. 4, 21b, c), or even more subtly as small ribbons of basalt slowly excavating wallrock from the edge of the dike. Despite these observations, wallrock assimilation and entrainment do not seem to modify the chemical compositions of the dikes. For example, in the transect of geochemistry across the Jackson A dike segment, though there are large partial melt zones on the margins (~2 meters) and visual evidence frozen in the dike of crystal cargo (mainly quartz) and partial melt from the wallrock entering the dike and progressively melting, the chemistry is extremely consistent and does not show large variations (Fig. 20). These observations are not the focus of this study, but they do provide significant evidence for ongoing assimilation even during transport in the plumbing system and argue for efficient mixing to homogenize the dike chemistry across the full width. Figure 20 Composition of Jackson A dike segment (samples 20-MaxR-15g, h, j, n collected during this study and KD_2 and KD2 analyzed in previous studies (J. Biasi & Karlstrom, 2021; Karlstrom et al., 2019)) accompanied by a field photo showing the Jackson A segment (photo oriented looking South) and the location of the transect and oxygen isotope analyses by Bindeman et. al., (2020). The partial melt fins on the margins of the dike are visible. 179 Figure 21 A) Margin of a dike segment from the Wallowa Mountains displaying an irregular contact with the host rock indicative of partial melting. B) Irregular contact with zone of partial melting from the MLDC taken on the ridge above the dike segment originally sampled by Petcovic and Grunder (2003). C) Interaction between the tip of a dike segment in the Wallowa Mountains and the surrounding wallrock as smaller tendrils of the dike continue to intrude into the wallrock beyond the main outcrop of the segment. D) Quenched dike margin in the marble facies of the metasedimentary wallrock in Hurricane Ck.. Less partial melt occurs in this area. 4 Discussion By classifying the dataset of 198 dike compositions and comparing that to detailed field work, we were able to interrogate the dynamics of magma transport and emplacement in a flood basalt province over multiple spatial scales. At a broader scale we can use our data and analytical approaches to address questions about the geometry of the plumbing system as it pertains to chemical variability. Our detailed analysis of dike segments for a single dike/area gives more 180 specific insight to the questions of dike segment variation and dynamics, both structurally and chemically. The chemical analysis and mapping also contribute to the discussion of dike longevity by providing constraints for models. In the discussion below we utilize these results at both scales to understand the geochemical variation in the system (Section 4.2, 4.3) and relate that to the processes we observe frozen in the rock record of the dikes (Section 4.3, 4.4). With this we make a preliminary interpretation about the role of assimilation processes that may have been acting within the plumbing system of the CRB (Section 4.4). 4.1 Hydrothermal Alteration in the CRB Prior to analyzing the variation in the geochemical data, it is necessary to understand how representative this intrusive data is of the CRB system as a whole. This involves understanding both whether the dike compositions can be related to the extrusive compositions on the basis of chemistry and whether or not we find evidence for pervasive hydrothermal alteration in the system. We use the comparison between intrusive and extrusive chemistries in conjunction with oxygen isotope evidence to constrain the role alteration may have played and its resulting effect on the data. We can first analyze the oxygen isotope data. Bindeman et. al, (2020) argued that modeling results showed evidence for a hydrothermal system in the crust surrounding the dike at the paleodepth of Maxwell Lake (~1-2 km). The authors argued this based on the depletion in oxygen isotopes from the background values (around +8.0‰ for the tonalitic crust) within the 50 cm of wallrock closest to the dike (I. N. Bindeman et al., 2020). While it was an important process at the Jackson A dike, this segment could have represented an unusual instance of magmatic focusing based on its long-life span. We therefore tested whether the dike segments all created hydrothermal systems and whether the segments themselves experienced pervasive hydrothermal alteration based on oxygen isotopes. Except for two segments that we sampled from the Lakes Basin or Maxwell Lake area, the dikes have δ18O of +4.8 - +5.3‰ which is close to what we would expect for normal mantle values +5.0 - +5.5‰ (Fig.16, 17, 18) (I. Bindeman, 2008). The two segments with low values were both duplicated and had similarly low values with the duplicate analysis (Fig. 16a). These segments appear to have experienced hydrothermal alteration; however, more sampling and detailed analyses of these segments would need to be done to confirm that this was not anomalous sampling. Across all segments sampled for wallrock δ18O, the samples were all depleted from background values with the least depleted in the MLDC 181 being the Maxwell A segment (Fig.17). Together these isotopic analyses suggest that the crust was experiencing pervasive hydrothermal alteration but the dikes themselves are not generally depleted, though there may be a small (<0.5‰) depletion in some of the samples. The results also show considerable variation in the isotopic nature of the system, even between segments separated by 100m (Fig. 16b, 17, 18). We can then turn our attention to the comparison between the intrusive and extrusive compositions (Fig.15). The distributions overlapped with the dikes occupying a smaller geochemical field in most dimensions than the extrusive lavas (Fig.15). If extensive post emplacement processes (ex. hydrothermal alteration, protracted cooling, crystal settling) occurred then we might expect the lavas to vary significantly from the intrusive compositions. An alternative hypothesis is that the intrusive and extrusive samples could share alteration through the hydrothermal system that we know penetrated to at least 1 km based on δ18O measurements in this study and previous studies (I. N. Bindeman et al., 2020). If most of the dike segments were unaltered, we would expect the variation in the extrusive system to greatly exceed the distributions we see in the trace elements. On the other hand, if magmatic processes are primarily responsible for the variation in the data, then we would expect strong overlap between the intrusive and extrusive populations and distributions that remain tightly clustered by formation. The relative lack of dike sampling compared to the full swarm prevents us from ruling out the hypothesis that the extrusive lava compositional field extends beyond that of the dike field due to pervasive cryptic alteration (Sawlan, 2019) (Fig.15). Previous authors have suggested that the true composition of CRB lavas should only occupy a tightly clustered compositional space (Sawlan, 2017, 2019), though this idea has remained controversial (Baker et al., 2019). If the dikes truly occupy a more condensed field than the lavas, then this may be evidence for true magma compositions that are less variable. The preliminary inspection shows overlap and the same trends between the intrusive and extrusive datasets, but these encouraging observations do not obscure the observation that the lavas have larger geochemical variation than the dikes in this database (Fig.15). More work needs to be done sampling the dikes to concretely claim that the dikes cover the full range of the database, but the preliminary replication of variability within the dikes compared to the extrusive system is suggestive that the same processes affected the shallow crustal plumbing system and the extrusive lava flows. One would 182 expect that as dike sampling increases, overlap coverage with the extrusive field would expand, though if it didn’t quite match after complete sampling that would be indicative of post emplacement prolonged cooling or alteration. Preliminarily, these results suggest that we can cautiously connect the intrusive and extrusive systems and that the intrusive exposures might be used to inform variation that occurred in the plumbing system to create the variation observed in the surface lavas. 4.2 Geochemical Variation Observed in the CJDS The classification of the dikes in the full dataset at a formation level, results in two central observations: the first is overlapping compositions between the dikes and the extrusive compositions which we address in Section 4.1, and the second is a possible spatial pattern to the classifications (Fig.5). The petrogenetic link between Wallowa Mountains dikes and the main phase extrusive compositions, in addition to the discovery of surface vents of that fed the main- phase in in the Zumwalt Prairie (Davis et al., 2017), strongly suggest that this area is a locus of main-phase CRB volcanism. This is reflected in the classifications of the dikes which are predominantly categorized into the Imnaha, Grande Ronde and Wanapum formations of the CRB, with an emphasis on an abundance of dikes categorized into the Grande Ronde formation (Fig. 5, 6). At a member level the categorization of the dikes corresponds to extrusive members of the main phase with large volumes. Dikes in the Grande Ronde Formation tend to fit into either the Sentinel Bluffs or Wapshilla Ridge categories of the Grande Ronde with a secondary population in the Wanapum and a third in the Imnaha Formation, American Bar and Fall Creek Members (Fig. 5, 6, 7). These classification results suggest that the largest volume members (Reidel et al., 2013) are associated with the largest number of dikes. If this is not a consequence of sampling bias, this result argues that the Wapshilla Ridge Member for instance was sourced out of many fissures over a very large area, perhaps nearly simultaneously. This would represent a far larger fissure eruption than any historical analog eruptions (Keating et al., 2008). We also analyze whether there are spatial or structural patterns associated with particular compositions in the classification. Figure 5 shows the dike compositions and their sample location. In the plutons exposed south of the Wallowa Mtns. (Big Lookout Mtn. Pedro Mtn. etc.) there is a large compositional range in the dike geochemical classes (Taubeneck, 1964, 1970) (Fig. 5). Many of the dike segments appear to be related to the Imnaha composition, while another population appears to be composed of both Grande Ronde and Wanapum compositions 183 (Fig. 5). There are two instances of very primitive Steens type magmas and three instances of more evolved Saddle Mtns. type magmas dispersed in the Wallowa Mountains area (Fig. 5). Particularly the two instances of Steens type magma in the Wallowa Mountains may be indicative of unerupted primitive magma, as they occur in an area without nearby extrusive Steens lavas (Camp et al., 2013; Moore et al., 2018; Reidel et al., 2013). These compositions show the full range of variation from primitive to evolved within the plumbing system (Fig.13). Within the Lakes Basin we observe two types of chemical variations that may hint at larger trends amongst the dike swarms which could motivate future work in the area. The first is a cross cutting relationship between sample 20-MaxR-06a and b (Fig. 16). This cross-cutting relationship between dikes of different formations suggests that the dike that intruded first would have had to cool enough to be cross cut without major interaction between the dike segment along the margins. In the field, we find that most cross-cutting relationships have clean boundaries between them, indicating a time of cooling for the older dike before the younger dike was intruded. This may imply a more pulsing magma flux as opposed to a continuous outpouring, if all of the segments were part of a feeder system for the surface flows. To further investigate this, we would want to link our dike chemistries to the orientation of the various dike segments. As geochronology has improved, estimates for eruption rates in the CRB have increased to suggest a system that is erupting with occasional episodes of high magma flux pulses (Kasbohm & Schoene, 2018). However, possible hiatuses have been difficult to distinguish due to the overlap in the U-Pb dating between the members and flows (Kasbohm & Schoene, 2018). Our observations are more consistent with possible pulses of magma than with a system that has a constant low eruption rate magma flux over millions of years (Barry et al., 2013; Black et al., 2021), but work needs to be done to confirm these results at the scale of the entire dike swarm. Our field evidence cannot resolve this debate, and so the eruption rates of these particular magmas remain a question for further thermochronology and geochronology work. At a detailed level when we inspect the transect taken from Glacier Pass to the Lakes Basin, we see an example of a dike structure that is not as clear in its chemical classification (Fig. 9). This dike is predominantly categorized as an Imnaha, American Bar Member, structure based on the chemistry (Section 3.2; Fig. 9). The dike showed no evidence for selvage zones or other structural evidence for repeated injection of magma, and it is therefore difficult to find 184 evidence of multiple magmatic types occupying this structure. However, we do offer a preliminary interpretation and explanation for this chemical signature. This dike is very wide (15 m or more) and contains several large (>1 m3) assimilated blocks of tonalite within the basalt and occasional blocks of marble and other metasediments transported laterally in the shallow crust. These assimilated blocks showed signs of partial melting and in some cases nearly complete melting into the basalt. In each of the localities with a less certain probability and categorized as Grande Ronde, there was a large, assimilated block in the vicinity though our sampling efforts attempted to sample outside of the influence of the blocks (Fig. 9). This change in composition could therefore represent a snapshot of what occurs in this Imnaha like basalt when a large block of granite is assimilated. Assimilation of these blocks is perhaps one way that the magma can evolve and become well mixed. Mixing in dikes has previously been proposed as a geometry that could result in homogenization (Jellinek & Kerr, 1999; Snyder et al., 1997) and is consistent with our transects and duplicate sampling at individual dike segments that have tightly varying compositions. Further investigation of these potential assimilation processes are discussed in Section 4.4. 4.3 Dike Segment Variation Along Strike in the MLDC Table 3. Along strike variation in the MLDC in oxygen isotope, thermochronology, paleomagnetic, structural and geochemical data Distance from 0 260 760 990 1125 1275 1500 1750 1800 2300 2350 2500 Southern Ridge (m) Elevation (m) 2593 2487 2389 2319 2304 2295 2290 2316 2340 2133 2145 2121 Dike δ18O (‰) 5.4 5.3 -0.2 5.8 5.5 Lowest Wallrock 4.1 2.6 6.1 6 δ18O (‰) Paleomagnetic 3.7 5.8 0.5 Reset Distance (m)* Thermochron (Zr) 5 4.8 2.5 reset distance (m)* 185 Table 3 Cont. Distance from 0 260 760 990 1125 1275 1500 1750 1800 2300 2350 2500 Southern Ridge (m) Dike Thickness (m) 4.9 1.4 10.8 5.5 2 4 8 10 9.5 4 8.5 8.6 Strike (°) 20 25 15 22 20 20 25 22 15 46 35 30 Dip (°) 85 88 65 85 81 80 65 81 80 Partial Melt Avg. 0.2 0 2.5 1 0.1 0.5 3 1 0.5 0.1 2.5 2.2 Thickness (m) Geochemical Class WP WP MR WP WP MR WP MR WP WP MR MR The CJDS dataset used in this study provides us with evidence for geochemical variation throughout the dike swarm. To investigate the processes that may have been occurring within these dikes though, we return to the investigation of a group of individual segments in the Maxwell Lake area (Fig.14). Two of the segments in the MLDC have previously been studied in detail using multiple methods to estimate their active lifespan (J. Biasi & Karlstrom, 2021; I. N. Bindeman et al., 2020; Goughnour et al., 2021; Karlstrom et al., 2019; Petcovic & Dufek, 2005; Petcovic & Grunder, 2003). The Maxwell A and Jackson A segments documented differences in active lifespan across several thermochronometers (Table 2) but remained petrogenetically linked due to their similar strike N/NE and similar thick margins of partial melt (J. Biasi & Karlstrom, 2021; Karlstrom et al., 2019). However, the geochemistry from this study is more consistent with segmented variation in the dike segments, particularly in the geochemistry. We now seek to characterize the MLDC as a case study for variation between dike segments and offer a preliminary interpretation for that variation. This allows us to bring all the previously published data together through geochemical connections and evaluate the structural and chemical segmentation we see in this dike complex. This along strike variation is presented in Figure 23. 186 Figure 22 Summary of along strike variation in all data at the MLDC. This also shows a comparison to other data types including paleomagnetic reset distance and thermochronology reset distance (J. Biasi & Karlstrom, 2021; Goughnour et al., 2021). Oxygen isotope, geochemical, and structural data was all collected during the work of this study. The bottom panel separately displays the elevation profile and the dike segment length as a function of along strike distance. The variation display considerable segmentation with each dike segment taking on individual characteristics. 187 Structurally, the MLDC is composed of at least ~12 dike segments (Fig. 14, 23). These discontinuous, segmented features are in opposition to traditional models of dike mechanics and other observations of dikes as more continuous structures (Gonnermann & Taisne, 2015; Rivalta et al., 2015; Townsend et al., 2017). The orientation is also off the main trend of the CJDS, with a strike to the NNE versus the dominant strike of NNW (Morriss et al., 2020) (Fig.14, 23). While this orientation off the main CRB direction, may explain why the dikes segmented into so many en echelon features in this area (Gonnermann & Taisne, 2015), the variation along strike in this dike within each segment is notable and suggests additional processes at play. The strike and dip of the dike plane varies within the system, including within segments themselves. However, these preliminary measurements give us an idea of the overall orientation and segmentation of the dike complex structure. What is more, this along strike variation is mirrored in every facet of data collected (Fig. 23). Figure 23 (and the data in Table 2) shows the along strike variation for several of the data variables collected in this area. From paleomagnetic data (J. Biasi & Karlstrom, 2021) to thermochronology (Goughnour et al., 2021), to the chemical classifications and oxygen isotopes, to thickness and amount of partial melting, the segments of the MLDC vary widely (Fig. 23). The thermochronology data and the paleomagnetic data both provide an estimate of the distance heat traveled away from each dike segment (J. Biasi & Karlstrom, 2021; Goughnour et al., 2021; Karlstrom et al., 2019). In a continuous dike sheet model, we might expect a gradual change or variation in dike longevity and heating of the surrounding crust along strike. It has also been suggested that along-segment localization from thermoviscous effects may be common (e.g., Bruce & Huppert, 1990). However, what the data show are dramatic shifts between the dike segments themselves in terms of active lifespan (Fig. 23). More data have been gathered to expand these transects to other segments, but at this moment the segment reset distance does not vary systematically as a function of along dike distance (Fig. 23). Rather it appears to reflect individual heating patterns for each segment (Fig. 23). This could reflect flow localization within the complex or even at the segment scale but more of the structural and thermochronology data must be analyzed before any interpretation of localization could be made. The segment individuality is also reflected in the chemistry and oxygen isotopes (Fig. 23). The chemistry of the dikes forms a nearly bimodal pattern with a few samples potentially bridging the gap (Fig. 13). Unlike the transect in the Lakes Basin where there may have been a 188 clear sampling indicator for geochemical change, in this area the same conclusions cannot be reached. The individual dike segments appear to be well mixed and relatively homogenous in this area, as evidenced by the dike transect across the Jackson Lake Dike (Fig. 19, 20). At the Jackson A segment, samples were taken from a transect across the dike that revealed tightly clustered chemistry (Fig. 19, 20). This cluster in general is much more enriched in mafic compatible elements such as Cr, MgO, and CaO and depleted in terms of incompatible trace element concentration (ex. Rb, Ba) relative to the other chemical cluster in the area that covers the chemistry of the Maxwell A dike segment (Fig. 13, 19, 20). Despite partial melting on the edge of the dike, no apparent change in chemistry across the dike is obvious, suggesting that the segments themselves (~5-10m wide) are well-mixed (Fig. 13, 19). The oxygen isotope analyses vary in the dike segments in the MLDC primarily between slightly depleted values of 5.3‰ to slightly enriched values of 5.7‰. The sample from the Jackson A dike that has a value of 5.3‰ may be slightly depleted from interactions with hydrothermal fluids but this is a very small isotopic effect that would need further study to evaluate whether it represents an excursion from the other dike segments. One sample from the MLDC had an extremely depleted value of -0.2‰ that is likely indicative of hydrothermal system activity (Fig. 17, 23). This overall small variation and observed alteration in one segment, vary on a segment-by-segment basis, not systematically along strike as expected (Fig. 23). For example, the Jackson A segment was studied in detail by Bindeman et. al, (2020) and found to have near normal mantle values of oxygen isotopes in the basalt. However, the most isotopically depleted basaltic samples lie in an adjacent dike segment in the creek below Jackson Lake (Fig. 17, 23). These dike segments with different compositions may represent different durations (and potentially different episodes) of magmatic intrusion and may therefore represent variable time histories of oxygen isotope depletion rather than just a spatial trend. Regardless, the primary pattern appears to reflect strong variation in a small spatial area of hydrothermal circulation in the MLDC. Together these pieces of data illustrate two possible broad hypotheses for the MLDC. The first possibility is that the variation in the chemistry at the segment scale is indicative of a fissure eruption between the Wapshilla and its stratigraphic neighbors without much, if any, separation in time between emplacement of the different compositions. In this scenario the Wapshilla Ridge and the other chemical population found in the area are variants of one another based on a 189 mixing trend in evolution through RAFC processes along strike or feeding the magmas from below. An alternative hypothesis suggests that this area was first occupied by a dike with a chemistry similar to the Wapshilla Ridge flow. This dike structure is off the main trend of the CJDS and thus may have segmented into en echelon structures to match a rotated near-surface stress field (D. D. Pollard et al., 1982; D. Pollard & Muller, 1976; D. Pollard & Segall, 1987; Townsend et al., 2017). After the eruption of this member, the alternative hypothesis suggests that the area was reoccupied after a short gap in time by ascending magmas of the Meyer Ridge Member. In this scenario this magma took advantage of the same transport system as the Wapshilla Ridge, also segmented and reoccupied the MLDC with a more primitive magma, but only after a time of cessation and cooling. In other words, the null hypothesis to explain the two chemical compositions that dominate the classified geochemistry of the MLDC, is a mixing trend – perhaps involving distinct reservoirs tapped in eruptions occurring nearly simultaneously. The alternate hypothesis is that there are two distinct dikes that overlap spatially. We consider several pieces of evidence to interpret the geochemical clusters: comparison of intrusive data and extrusive members, the field relationships between these two compositions, and the structural characteristics. First, we consider the intrusive and extrusive data for the MLDC. On bivariate plots, such as CaO vs K2O, which are representative of clinopyroxene fractional crystallization and overall evolution respectively, and Ba vs Rb, the highest variation trace elements in the extrusive dataset and elements that should behave similarly (Fig. 13), it is visually apparent that the Wallowa area dikes have an affinity with Meyer Ridge, Grouse Ck. and Wapshilla Ridge members (White, 2013). The compositions exhibit strong overlap with the Wapshilla and Meyer Ridge compositions, forming a nearly bimodal distribution with a few samples that appear to cross the trend (Fig. 13). This is particularly intriguing as the Wapshilla Ridge represents the largest outpouring of magma in the CRB (Davis et al., 2017; Reidel et al., 2013; Reidel & Tolan, 1989) and the Meyer Ridge may be representative of an episode of recharge (Chapter 2, Yu et al., 2015). This may help to explain why the Meyer Ridge is so much less evolved than the Wapshilla Ridge magma when we analyze them in bivariate space (Fig. 13). It is possible that what we see in the variation of the dike compositions is reactivation of the Wapshilla Ridge structure in this area during the eruption of the Meyer Ridge flow. These chemical classifications are consistent with field observations, including one locality where the 190 two chemistries have a cross cutting relationship, the Meyer Ridge (later in the stratigraphy) cross cuts the Wapshilla Ridge chemistry. Next, we can consider the structural field relationship to the geochemistry, which may give us an idea of relative timing for these compositions. At the locality of 21-MaxR-07 and -08 at the northern extent of the mapped area in Figure 14 we recorded that two dike segments cross cut each other visibly. The older dike (21-MaxR-08) segment was thinner (~5-6 m thick), had smaller margins of partial melt < 0.5 m, was more jointed, and had finer phenocrysts (microlites primarily). The younger dike (21-MaxR-07) had a partial melt zone ~1-2 m thick on both margins, had abundant plagioclase and clinopyroxene phenocrysts, and was 8-10m thick. To create a neatly outcropping cross cutting relationship, there must have been time between when the first dike was emplaced and cooled before the next dike was emplaced. Our model cannot resolve whether this was a true hiatus or whether the eruption moved around spatially to other fissures, but it does locally suggest a break in the activity at the MLDC dike segments between emplacement of the two magmas. The compositions at this cross-cutting locality suggested a Meyer Ridge type dike segment composition cutting a Wapshilla Ridge Member type composition after supervised classification of the samples taken from the cross-cutting dikes (Fig.14). In a mixing trend between two magmas with one more evolved and one more primitive magma, as is the case here at the MLDC, we would expect the relatively younger composition to be more evolved as it has experienced more time for differentiation. But the field observations suggest that at the MLDC the more primitive composition is younger (Fig.14). This implies a trend in the system towards more primitive compositions of magma; generally, this trend towards more primitive compositions is thought to be indicative of recharge processes in the system (Yu et al., 2015). It is suggestive of more parental mafic material being intruded into the system after emplacement of the more incompatible element enriched composition of the Wapshilla Ridge type dikes. This hypothesis is consistent with the preliminary interpretations made in Chapter 2 of this dissertation, which suggest that the Meyer Ridge extrusive member is indicative of a mafically recharged system that followed the eruption of the massive and highly evolved Wapshilla Ridge member. This conclusion from Chapter 2 appears to be supported by this field evidence. Alternatively, this return to more primitive compositions in the system could be an indicator that the eruption was tapping two spatially distinct reservoirs. More work to extend the sampling in 191 the system further along strike, investigate any interaction points between the compositions, and use geochronology is necessary to resolving this timing. However, the field evidence we already have allows us to begin to reject the null hypothesis that these dikes represent a mixing trend to more evolved compositions, and more likely represent two dike compositions that record a recharge episode of the fissure system that fed the Wapshilla Ridge Member. 4.4 Evidence for Assimilation in the Dikes and Implications for the Petrogenesis of the Main Phase CRB The observations of dike segments in the CJDS broadly and MLDC in particular have raised an important question about the role that the transport system plays in assisting with assimilation and the overall evolution process for these magmas (Fig. 4, 20). For decades there has been debate over the geometry of the magmatic system necessary to evolve the CRB from the more primitive basalt observed in the Steens and Imnaha formations to a generally crystal poor, voluminous basaltic andesite in the Grande Ronde (Davis et al., 2017; Hales et al., 2005; Takahahshi et al., 1998; Wolff et al., 2008; Wolff & Ramos, 2013). To change the chemistry this drastically requires considerable input of highly evolved, incompatible rich material, the outcome of which has been a source of thermodynamic debate (Wolff et al., 2008; Wolff & Ramos, 2013). Some have argued that such assimilation amounts are not feasible and thus the composition is more reflective of changes in the source with minimal path or storage effects. Throughout the Wallowa Mountains, we observed evidence for excavation and assimilation by partial melting of Wallowa tonalite (Fig. 4, 20, 23). Before venturing into further interpretation, it is critical to state that this is only indicative of the upper plumbing system, where the surrounding crust is likely the coldest, and therefore the least likely to produce large amounts of partial melt. To gain insight into the deeper plumbing system we can compare this work to research at other flood basalt provinces where the deeper plumbing system is exposed (Ernst, 2014; Muirhead et al., 2014; Pearce et al., 2021). In these provinces magmas are transported hundreds of km or more through dike transport structures and instances of considerable assimilation have been found in areas with fertile crust to be partially melted (Airoldi et al., 2016; Ernst, 2014; France et al., 2010; Heinonen et al., 2021; Muirhead et al., 2014; Pearce et al., 2021). Study of the shallow plumbing system of the CRB does however provide a representation and a snapshot of the processes occurring in the dikes exposed at shallower paleodepths. In some instances, we observed whole large blocks of granite assimilated 192 into the dikes, but in other cases thin ribbons from the tonalite appear to flux partial melt into the dike (Fig. 20, 23)). Additional evidence for lateral transport could be found in the presence of marble blocks assimilated and unmelted within the 20-LBR-08 transect, several kms away from the nearest outcrop of marble (Fig.20, 23). There is little doubt with these observations whether there is assimilation occurring within the transport system of the CRB. However, we investigate whether there could be enough assimilation to account for the overall change in chemistry of the magmas assuming that the variation is not induced by a large change in the source. From the dike segments in the MLDC we can carry out a simplistic volumetric calculation to assess whether the dike’s geometry could be responsible for 10-60% assimilation in the dikes (as is suggested by the isotopic work of Wolff & Ramos, 2013). First, we estimate based on our observations of partial melt at the Jackson A dike segment that there may have been as much as 3 m of excavation by the dikes over their active lifetime (Karlstrom et al., 2019)(Fig. 19, 20). If we then generously estimate a continuous sheet of magma with dimension 10 km in length, 20 km deep, we get a total volume of 0.003 km x 10 km x 20 km of partial melt, or around 0.6 km3. If we then need roughly 10% by mass of the total Wapshilla Ridge volume of 40,000 km3 (Reidel et al., 2013), we need 4,000 km3 of partial melt. This suggests that with 0.5 km3 of partial melt per dike, we would need 8,000 dikes to generate this volume of partial melt. This is unlikely based on the current exposures of ~4000 dikes in the CRB, which of course represent all members and not just Wapshilla Ridge (Morriss et al., 2020). The length scale in this calculation is likely the most unknown value, however. If the length scale increases to 100 km per dike, we then generate ~5 km3 of partial melt per dike and would need 800 dike structures of that length, which is not out of the realm of possibility, though it is very high. Figure 23 Several examples of assimilated blocks, partially melting within the dikes. A) MLDC dike segment with assimilated tonalite block surrounded by partial melt halo. B) Partial melting and mechanical erosion of blocks into the dike. C) Example of assimilated blocks of marble in the Lakes Basin. 193 194 While this latter number is slightly more plausible in terms of the numbers of total dikes in the CJDS, assimilation within the dikes is unlikely to explain the evolved Wapshilla Ridge chemistry alone. This therefore suggests that shallow crustal dikes are not where the majority of assimilation occurs, although they likely play a role in mixing and further evolving a magma on the way to the surface from a deeper storage zone. Based on these calculations a large storage zone is needed to increase the surface area of the system and create a large enough volume of partial melt to evolve the magma. Such a large storage zone would likely have left behind large amounts of cumulates which may be reflected in higher than average seismic velocities underlying regions south of the Wallowa Mountains (Davenport et al., 2017) and may have contributed unusual regional geodynamics (Castellanos et al., 2020). Magma Chamber Simulator calculations from the work of Wolff and Ramos (2013) and chapter 2 of this dissertation suggest that such a storage zone was likely interacting with the Idaho Batholith and therefore may have been occurring much deeper than the assimilation exposed in the dikes at the surface. 5 Conclusion In this study we explored the plumbing system exposures of the Columbia River Basalts flood basalt province with geochemical, machine learning, thermochronology, paleomagnetic and field tools. Though eruptions of this size have never occurred in human history, by investigating the frozen rock record, we gained insight into the magma dynamics of this swarm as a whole and of individual segments and dikes. The overall complexity demonstrated in this study is highly motivating to future studies of this dike swarm and dike segments and of other dike swarms around the world. It also provides motivation for further application of supervised machine learning and variation analysis to better understand the geochemical affinities of rocks collected in the field. Not only could this multidimensional classifier tool be used to further classify unknown samples into the CRB stratigraphy, but it could also be used at other systems where geochemical categories can be trained and classification can be improved through probabilistic assignment. By utilizing these methods, we rely on the full suite of geochemical elements to provide the most robust multidimensional signal possible. This further allows us to put quantitative constraints on the categorization of different samples through assigned probabilities. We therefore hope that this study motivates future use and application of these machine learning methods at volcanic systems around the world. 195 6 References Cited Ágústsdóttir, T., Winder, T., Woods, J., White, R. S., Greenfield, T., & Brandsdóttir, B. (2019). Intense Seismicity During the 2014–2015 Bárðarbunga-Holuhraun Rifting Event, Iceland, Reveals the Nature of Dike-Induced Earthquakes and Caldera Collapse Mechanisms. Journal of Geophysical Research: Solid Earth, 124(8), 8331–8357. https://doi.org/10.1029/2018JB016010 Airoldi, G. M., Muirhead, J. D., Long, S. M., Zanella, E., & White, J. D. L. (2016). Flow dynamics in mid-Jurassic dikes and sills of the Ferrar large igneous province and implications for long-distance magma transport. Tectonophysics, 683, 182–199. https://doi.org/10.1016/j.tecto.2016.06.029 Baker, L. L., Camp, V. E., Reidel, S. P., Martin, B. S., Ross, M. E., & Tolan, T. L. (2019). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province: COMMENT. Geosphere, 15(4), 1436–1447. https://doi.org/10.1130/GES02047.1 Barry, T. L., Kelley, S. P., Camp, V. E., Self, S., Jarboe, N. A., & Duncan, R. A. (2013). Eruption chronology of the Columbia River Basalt Group. Geological Society of America Special Papers, 2497(02), 45–66. https://doi.org/10.1130/2013.2497(02). Biasi, J. A. (2021). Paleomagnetism and Geochemistry of Basalts in the North American Cordillera , Davis Strait , and Antarctica Thesis by. Biasi, J., & Karlstrom, L. (2021). Timescales of magma transport in the Columbia River flood basalts, determined by paleomagnetic data. Earth and Planetary Science Letters, 576, 117169. https://doi.org/10.1016/j.epsl.2021.117169 Bindeman, I. (2008). Oxygen isotopes in mantle and crustal magmas as revealed by single crystal analysis. Reviews in Mineralogy & Geochemistry, 69, 1–34. https://doi.org/10.2138/rmg.2008.69.11 Bindeman, I., Gurenko, A., Sigmarsson, O., & Chaussidon, M. (2008). Oxygen isotope heterogeneity and disequilibria of olivine crystals in large volume Holocene basalts from Iceland: Evidence for magmatic digestion and erosion of Pleistocene hyaloclastites. Geochimica et Cosmochimica Acta, 72(17), 4397–4420. https://doi.org/10.1016/j.gca.2008.06.010 Bindeman, I. N., Greber, N. D., Melnik, O. E., Artyomova, A. S., Utkin, I. S., Karlstrom, L., & Colón, D. P. (2020). Pervasive Hydrothermal Events Associated with Large Igneous Provinces Documented by the Columbia River Basaltic Province. Scientific Reports, 10(1), 1–9. https://doi.org/10.1038/s41598-020-67226-9 Black, B. A., Karlstrom, L., & Mather, T. A. (2021). The life cycle of large igneous provinces. Nature Reviews Earth and Environment, 2(12), 840–857. https://doi.org/10.1038/s43017- 021-00221-4 196 Bruce, P. M., & Huppert, H. E. (1990). Solidification and melting along dykes by the laminar flow of basaltic magma. In Magma transport and storage (pp. 87–101). Cahoon, E. B. (2020). Distribution, Geochronology, and Petrogenesis of the Picture Gorge Basalt with Special Focus on Petrological Relationships to the Main Columbia River Basalt Group. Cahoon, E. B., Streck, M. J., Koppers, A. A. P., & Miggins, D. P. (2020). Reshuffling the Columbia river basalt chronology-picture gorge basalt, the earliest-and longest-erupting formation. Geology, 48(4), 348–352. https://doi.org/10.1130/G47122.1 Camp, V. E., Ross, M. E., Duncan, R. A., Jarboe, N. A., Coe, R. S., Hanan, B. B., & Johnson, K. (2013). The Steens basalt: Earliest lavas of the Columbia River basalt group. The Columbia River Flood Basalt Province: Geological Society of America Special Paper 497, 2497(04), 87–116. https://doi.org/10.1130/2013.2497(04). Camp, V. E., Ross, M. E., Duncan, R. A., & Kimbrough, D. L. (2017). Uplift, rupture, and rollback of the Farallon slab reflected in volcanic perturbations along the Yellowstone adakite hot spot track. Journal of Geophysical Research: Solid Earth, 1–20. https://doi.org/10.1002/2016JB013849 Castellanos, J. C., Perry-Houts, J., Clayton, R. W., Kim, Y. H., Stanciu, A. C., Niday, B., & Humphreys, E. (2020). Seismic anisotropy reveals crustal flow driven by mantle vertical loading in the pacific NW. Science Advances, 6(28), 1–10. https://doi.org/10.1126/sciadv.abb0476 Conrey, R., Beard, C., & Wolff, J. (2013). Columbia River Basalt flow stratigraphy in the palouse Basin Department of Ecology test wells. Dangeti, P. (2017). Statistics for Machine Learning: Techniques for exploring supervised, unsupervised, and reinforcement learning models with Python and R. Davenport, K. K., Hole, J. A., Tikoff, B., Russo, R. M., & Harder, S. H. (2017). A strong contrast in crustal architecture from accreted terranes to craton, constrained by controlled- source seismic data in Idaho and eastern Oregon. Lithosphere, 9(2), 325–340. https://doi.org/10.1130/L553.1 Davis, K. N., Wolff, J. A., Rowe, M. C., & Neill, O. K. (2017). Sulfur release from main-phase Columbia River Basalt eruptions. 45(11), 1043–1046. https://doi.org/10.1130/G39371.1 Dempster, A. P., Laird, N. M., & Rubin, D. B. (1976). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 46(1), 139–144. https://doi.org/10.1115/1.3424485 DePaolo, D. J. (1981). Trace element and isotopic effects of combined wallrock assimilation and 197 fractional crystallization. Earth and Planetary Science Letters, 53(2), 189–202. https://doi.org/10.1016/0012-821X(81)90153-9 Ernst, R. E. (2014). Large Igneous Provinces. Cambridge University Press. https://doi.org/10.1017/CBO9781139025300 Ernst, R. E., & Buchan, K. L. (1997). Giant radiating dyke swarms: Their use in identifying pre- Mesozoic large igneous provinces and mantle plumes. Geophysical Monograph Series, 100, 297–333. https://doi.org/10.1029/GM100p0297 France, L., Koepke, J., Ildefonse, B., Cichy, S. B., & Deschamps, F. (2010). Hydrous partial melting in the sheeted dike complex at fast spreading ridges: Experimental and natural observations. Contributions to Mineralogy and Petrology, 160(5), 683–704. https://doi.org/10.1007/s00410-010-0502-6 Ghiorso, M. S., & Sack, R. (1995). Chemical mass transfer in magmatic processes IV. Contributions to Mineralogy and Petrology, 119, 197–212. https://doi.org/10.1007/BF00307281 Gibson, I. . (1969). A comparative account of the fl ood basalt volcanism of the Columbia Plateau and eastern Iceland. Bulletin of Volcanology, 33, 420–437. Gonnermann, H., & Taisne, B. (2015). Magma Transport in Dikes. In Encyclopedia of Volcanoes (pp. 215–224). Goughnour, R., Murray, K. E., Karlstrom, L., Cox, S., & O’Sullivan, P. (2021). Quantifying the duration of magma flow through Columbia River Flood Basalt dikes using (U-Th)/He, fission-track, and 40Ar/39Ar thermochronology. 17th International Conference on Thermochronology. Hales, T. C., Abt, D. L., Humphreys, E. D., & Roering, J. J. (2005). A lithospheric instability origin for Columbia River flood basalts and Wallowa Mountains uplift in northeast Oregon. Nature, 438(7069), 842–845. https://doi.org/10.1038/nature04313 Hampton, R. L., Bindeman, I. N., Stern, R., Coble, M. A., & Rooyakkers, S. (2020). A microanalytical oxygen isotopic and U-Th geochronologic investigation of rhyolite petrogenesis at the Krafla Central Volcano, Iceland. Journal of Volcanology and Geothermal Research, 414(107229), 1–15. https://doi.org/10.1130/abs/2020am-355088 Hastie, W. W., Watkeys, M. K., & Aubourg, C. (2014). Magma flow in dyke swarms of the Karoo LIP: Implications for the mantle plume hypothesis. Gondwana Research, 25(2), 736– 755. https://doi.org/10.1016/j.gr.2013.08.010 Heinonen, J. S., Luttinen, A. V., Spera, F. J., Vuori, S. K., & Bohrson, W. A. (2021). Serial interaction of primitive magmas with felsic and mafic crust recorded by gabbroic dikes from the Antarctic extension of the Karoo large igneous province. Contributions to 198 Mineralogy and Petrology, 176(4), 1–21. https://doi.org/10.1007/s00410-021-01777-6 Hooper, P. R. (2000). Chemical discrimination of Columbia River Basalt Flows. Geochemistry, Geophysics, Geosystems, 1. https://doi.org/10.1029/2000GC000040 Hudak, M. R., Feineman, M. D., LaFemina, P. C., Geirsson, H., & Agostini, S. (2022). Conduit formation and crustal microxenolith entrainment in a basaltic fissure eruption: Observations from Thríhnúkagígur Volcano, Iceland. Volcanica, 5(2), 249–270. https://doi.org/10.30909/vol.05.02.249270 Itano, K., Ueki, K., Iizuka, T., & Kuwatani, T. (2020). Geochemical discrimination of monazite source rock based on machine learning techniques and multinomial logistic regression analysis. Geosciences (Switzerland), 10(2). https://doi.org/10.3390/geosciences10020063 Jellinek, A. M., & Kerr, R. C. (1999). Mixing and composiitonal stratification produced by natural convection pt.2: Applications to the differentiation of basaltic and silicic magma chambers and komatiite lava flows. Journal of Geophysical Research, 104(B4), 7203–7218. Johnson, D. M., Hooper, P. R., & Conrey, R. M. (1999). XRF Analysis of Rocks and Minerals for Major and Trace Elements on a Single Low Dilution Li-tetraborate Fused Bead. Advances in X-Ray Analysis, 41, 843–867. Karlstrom, L., Murray, K. E., & Reiners, P. W. (2019). Bayesian markov-chain monte carlo inversion of low-temperature thermochronology around two 8 − 10 m wide columbia river flood basalt dikes. Frontiers in Earth Science, 7(April). https://doi.org/10.3389/feart.2019.00090 Kasbohm, J., & Schoene, B. (2018). Rapid eruption of the Columbia River flood basalt and correlation with the mid-Miocene climate optimum. 1–8. Keating, G. N., Valentine, G. A., Krier, D. J., & Perry, F. V. (2008). Shallow plumbing systems for small-volume basaltic volcanoes. Bulletin of Volcanology, 70(5), 563–582. https://doi.org/10.1007/s00445-007-0154-1 Kimmel, P. G. (1982). Stratigraphy, Age, and Tectonic Setting of the Miocene-Pliocene Lacustrine Sediments of the Western Snake River Plain, Oregon and Idaho. Cenezoic Geology of Idaho: Idaho Bureau of Mines and Geology Bulletin, 26, 559–578. Mittal, T., & Richards, M. A. (2021). The Magmatic Architecture of Continental Flood Basalts: 2. A New Conceptual Model. Journal of Geophysical Research: Solid Earth, 126(12). https://doi.org/10.1029/2021JB021807 Mittal, T., Richards, M. A., & Fendley, I. M. (2021). The Magmatic Architecture of Continental Flood Basalts I: Observations From the Deccan Traps. Journal of Geophysical Research: Solid Earth, 126(12), 1–54. https://doi.org/10.1029/2021JB021808 199 Moore, N. E., Grunder, A. L., & Bohrson, W. A. (2018). The three-stage petrochemical evolution of the Steens Basalt ( southeast Oregon , USA ) compared to large igneous provinces and layered mafic intrusions. 14(6), 1–28. https://doi.org/10.1130/GES01665.1/4346436/ges01665.pdf Morriss, M. C., Karlstrom, L., Nasholds, M., & Wolff, J. (2020). The Chief Joseph Dike Swarm of the Columbia River Flood Basalts, and the Legacy Dataset of William H. Taubeneck. Geosphere, 16(4), 1082–1106. Muirhead, J. D., Airoldi, G., White, J. D. L., & Rowland, J. V. (2014). Cracking the lid: Sill-fed dikes are the likely feeders of flood basalt eruptions. Earth and Planetary Science Letters, 406, 187–197. https://doi.org/10.1016/j.epsl.2014.08.036 Pearce, J. A., Ernst, R. E., Peate, D. W., & Rogers, C. (2021). LIP printing: Use of immobile element proxies to characterize Large Igneous Provinces in the geologic record. Lithos, 392–393, 106068. https://doi.org/10.1016/j.lithos.2021.106068 Peck, R., Olsen, C., & Devore, J. (2005). Introduction to Statistics and Data Analsis. In Thomson. Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. JMLR, 12, 2825–2830. Petcovic, H. L. (2004). Dissertation: Feeder dikes to the Columbia River flood basalts: Underpinnings of a large igneous province. Oregon State University. Petcovic, H. L., & Dufek, J. D. (2005). Modeling magma flow and cooling in dikes: Implications for emplacement of Columbia River flood basalts. Journal of Geophysical Research: Solid Earth, 110(10), 1–15. https://doi.org/10.1029/2004JB003432 Petcovic, H. L., & Grunder, A. L. (2003). Textural and thermal history of partial melting in tonalitic wallrock at the margin of a basalt dike, Wallowa Mountains, Oregon. Journal of Petrology, 44(12), 2287–2312. https://doi.org/10.1093/petrology/egg078 Pollard, D. D., Segall, P., & Delaney, P. T. (1982). Formation and interpretation of dilatant echelon cracks. Geological Society of America Bulletin, 93(12), 1291–1303. https://doi.org/10.1130/0016-7606(1982)93<1291:FAIODE>2.0.CO;2 Pollard, D., & Muller, O. (1976). The effect of gradients in regional stress and magma pressure on the form of sheet intrusions in cross section. Journal of Geophysical Research, 81(5), 975–984. Pollard, D., & Segall, P. (1987). Theoretical displacements and stresses near fractures in rock: with applications to faults, joints, veins, dikes, and solution surfaces. In Fracture Mechanics of Rock (pp. 277–349). Reidel, S. P. (1982). Stratigraphy of the Grande Ronde Basalt , Columbia River Basalt Group , 200 From the Lower Salmon River and Northern Hells Canyon Area, Idaho, Oregon, and Washington. Idaho Bureau of Mines and Geology Bulletin, 26, 77–101. Reidel, S. P. (2015). The Columbia River Basalt Group: A Flood Basalt Province in the Pacific Northwest, USA. Geosciences Canada, 42, 151–168. Reidel, S. P., Camp, V. E., Tolan, T. L., & Martin, B. S. (2013). The Columbia River flood basalt province : Stratigraphy , areal extent , volume , and physical volcanology. Special Paper of the Geological Society of America, 497, 1–43. https://doi.org/10.1130/2013.2497(01). Reidel, S. P., Johnson, V. G., & Spane, F. A. (2002). Natural Gas Storage in Basalt Aquifers of the Columbia Basin , Pacific Northwest USA: A Guide to Site Characterization. Pacific Northwest National Laboratory, August, 277. Reidel, S. P., & Tolan, T. L. (1989). The Grande Ronde Basalt, Columbia River Basalt Group - Stratigraphic Descriptions and Correlations in Washington , Oregon, and ldaho. Special Paper of the Geological Society of America, 239, 21–53. https://doi.org/10.1130/SPE239- p21 Rivalta, E., Taisne, B., Bunger, A. P., & Katz, R. F. (2015). A review of mechanical models of dike propagation: Schools of thought, results and future directions. Tectonophysics, 638(C), 1–42. https://doi.org/10.1016/j.tecto.2014.10.003 Sawlan, M. G. (2017). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province. Geosphere, 14(1), 286–303. https://doi.org/10.1130/GES01188.1 Sawlan, M. G. (2019). Alteration, mass analysis, and magmatic compositions of the Sentinel Bluffs Member, Columbia River flood basalt province: REPLY. Geosphere, 15(4), 1448– 1458. https://doi.org/10.1130/GES02047.1 Schoettle-Greene, P., Duvall, A. R., & Crowley, P. D. (2022). Multiphase Topographic and Thermal Histories of the Wallowa and Elkhorn Mountains, Blue Mountains Province, Oregon, USA. Tectonics, 41(3), 1–22. https://doi.org/10.1029/2021TC006704 Sigmundsson, F., Hooper, A., Hreinsdóttir, S., Vogfjörd, K. S., Ófeigsson, B. G., Heimisson, E. R., Dumont, S., Parks, M., Spaans, K., Gudmundsson, G. B., Drouin, V., Árnadóttir, T., Jónsdóttir, K., Gudmundsson, M. T., Högnadóttir, T., Fridriksdóttir, H. M., Hensch, M., Einarsson, P., Magnússon, E., … Eibl, E. P. S. (2015). Segmented lateral dyke growth in a rifting event at Bárðarbunga volcanic system, Iceland. Nature, 517(7533), 191–195. https://doi.org/10.1038/nature14111 Sigmundsson, F., Pinel, V., Grapenthin, R., Hooper, A., Halldórsson, S. A., Einarsson, P., Ófeigsson, B. G., Heimisson, E. R., Jónsdóttir, K., Gudmundsson, M. T., Vogfjörd, K., Parks, M., Li, S., Drouin, V., Geirsson, H., Dumont, S., Fridriksdottir, H. M., 201 Gudmundsson, G. B., Wright, T. J., & Yamasaki, T. (2020). Unexpected large eruptions from buoyant magma bodies within viscoelastic crust. Nature Communications, 11(1), 1– 11. https://doi.org/10.1038/s41467-020-16054-6 Snavely, P. D. (1962). Tertiary Geologic History of Western Oregon and Washington. AAPG Bulletin, 46. https://doi.org/https://doi.org/10.1306/BC7437FF-16BE-11D7- 8645000102C1865D Snyder, D., Crambes, C., Tait, S., & Wiebe, R. A. (1997). Magma mingling in dikes and sills. Journal of Geology, 105, 75–86. Swanson, D. A., Vance, J. A., Clayton, G., & Evarts, R. C. (1989). Cenozoic volcanism in the Cascade Range and Columbia Plateau, southern Washington and northernmost Oregon : Seattle, Washington to Portland, Oregon, July 3-8, 1989. American Geophysical Union, Field Trip, 1–60. Takahahshi, E., Nakajima, K., & Wright, T. L. (1998). Origin of the Columbia River basalts: Melting model of a heterogeneous plume head. Earth and Planetary Science Letters, 162(1– 4), 63–80. https://doi.org/10.1016/S0012-821X(98)00157-5 Taubeneck, W. H. (1964). Cornucopia Stock, Wallowa Mountains, Northeastern Oregon: Field Relationships. Geological Society of America Bulletin, 75(November), 1093–1116. Taubeneck, W. H. (1970). Dikes of the Columbia River Basalt in Northeastern Oregon, Western Idaho, and Southeastern Washington, in Gilmour, E.H. and Stradling, D. eds., Proceedings of the 2nd Columbia River Basalt Symposium, Cheney, Washington. Eastern Washington Press, 2, 73–96. Taubeneck, W. H. (1990). Significant discoveries during 1989 involving dikes of Columbia River Basalt in pre-Tertiary rocks in eastern Oregon (OR) and western Idaho (ID). Geological Society of America, 88. Taubeneck, W. H. (1997). Preferential occurrence of eruptive axes of dikes of the Columbia River Basalt Group in unmetamorphosed Mesozoic granitic intrusives, with 40Ar/39Ar dates for four dikes and two of the earliest flows of basalt northeast Oregon and western Idaho. Geological Society of America, 48. Tolan, T. L., Martin, B. S., Reidel, S. P., Anderson, J. L., Lindsey, K. A., & Burt, W. (2009). An Introduction to the stratigraphy, structural geology, and hydrogeology of the Columbia River Flood Basalt Province: A primer for the GSA Columbia River Basalt Group field trips. The Geological Society of America, Field Guid. Townsend, M. R., Pollard, D. D., & Smith, R. P. (2017). Mechanical models for dikes: A third school of thought. Tectonophysics, 703–704, 98–118. https://doi.org/10.1016/j.tecto.2017.03.008 202 Ueki, K., Hino, H., & Kuwatani, T. (2018). Geochemical discrimination and characteristics of magmatic tectonic settings: A machine-learning-based approach. Geochemistry, Geophysics, Geosystems, 19(4), 1327–1347. https://doi.org/10.1029/2017GC007401 Waters, A. C. (1961). Stratigraphic and lithologic variations in the Columbia River Basalt. American Journal of Science, 259(8), 583–611. Wells, R., Niem, A., Evarts, R., & Hagstrum, J. (2009). The Columbia River Basalt Group-From the gorge to the sea. The Geological Society of America, 15, 737–774. https://doi.org/10.1130/2009.fl White, W. . (2013). Trace Elements in Igneous Processes. In Geochemistry (pp. 259–313). Wilson, R. L., & Watkins, N. D. (1967). Correlation of Petrology and Natural Magnetic Polarity in Columbia Plateau Basalts. Geophysical Journal of the Royal Astronomical Society, 12(4), 405–424. https://doi.org/10.1111/j.1365-246X.1967.tb03150.x Wolff, J. A., & Ramos, F. C. (2013). Source materials for the main phase of the Columbia River Basalt Group: Geochemical evidence and implications for magma storage and transport. Special Paper of the Geological Society of America, 497, 273–291. https://doi.org/10.1130/2013.2497(11) Wolff, J. A., Ramos, F. C., Hart, G. L., Patterson, J. D., & Brandon, A. D. (2008). Columbia River flood basalts from a centralized crustal magmatic system. 704, 177–180. https://doi.org/10.1038/ngeo124 Wright, T. L., Grolier, M. J., & Swanson, D. A. (1973). Chemical variation related to the stratigraphy of the Columbia river basalt. Bulletin of the Geological Society of America, 84(2), 371–386. https://doi.org/10.1130/0016-7606(1973)84<371:CVRTTS>2.0.CO;2 Yu, X., Lee, C., Chen, L., & Zeng, G. (2015). Magmatic recharge in continental flood basalts:Insights from the Chifeng igneous province in Inner Mongolia. Geochemistry, Geophysics, Geosystems, 2082–2096. https://doi.org/10.1002/2015GC005805.Received 203