Conformational Dynamics of DNA and Protein-DNA Complexes at Single- Stranded-Double-Stranded DNA Junctions by Jack Maurer A dissertation accepted and approved in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemistry Dissertation Committee: Jeffery Cina, Chair Andrew H. Marcus, Advisor Peter von Hippel, Co-Advisor Julia Widom, Core Member Brian Smith, Institutional Representative University of Oregon Fall 2023 © 2023 Jack Maurer 2 DISSERTATION ABSTRACT: Jack Maurer Doctor of Philosophy in Chemistry Title: Conformational Synamics of DNA and Protein-DNA Complexes at Single-Stranded- Double-Stranded DNA Junctions Most biological systems, particularly protein-DNA complexes, leverage a dynamic evolution of their structure to perform a myriad of functions within the context of the cell. Decades of detailed biophysical research have established that the intricacies of such systems stem heavily from their dynamic evolution, abandoning the previous notion of a purely static ‘structure- function’ relationship. This dissertation introduces a new polarization-sensitive methodology for studying the dynamic evolution of local conformation in single-molecules of dsDNA containing an i(Cy3)2 dimer. The methodology developed during this dissertation is applied to DNA under a variety of experimental conditions as well as protein-DNA complexes. A massively parallel computational pipeline was developed in the course of this work to aid the optimization of kinetic network models, which forms the basis for all current analyses of single-molecule data in the Marcus and von Hippel lab. The primary discovery of this work is the persistence of four relevant conformational macrostates in DNA only systems and five relevant conformational macrostates in the protein-DNA systems examined. The thermodynamic and mechanical stability of these systems is analyzed in detail and structural mechanisms are proposed to merge the observed dynamics with hypothesized local conformations during the dynamic evolution of these ubiquitous biological systems. This dissertation contains previously published co-authored material. 3 CURRICULUM VITAE NAME OF THE AUTHOR: Jack Maurer GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene OR. University of Denver, Denver CO DEGREES AWARDED Doctor of Philosophy, Chemistry, 2023, University of Oregon Bachelor of Science, Physics, 2016, University of Denver Bachelor of Science, Chemistry, 2016, University of Denver SPECIAL AREAS OF INTEREST Single-Molecule Biophysics Optical Spectroscopy and Metrology Protein-DNA Interactions PROFESSIONAL EXPERIENCE Physical Chemistry Undergraduate Teaching Assistant, Spring 2022,2023 Undergraduate General Chemistry Teaching Assistant, UO, 2016-2017 Technical Sales Associate, Thyssenkrupp, 2016-2017 GRANTS, AWARDS AND HONORS Emmanuel Fellowship, University of Oregon, 2023 Raymund Fellowship, University of Oregon, 2017-2023 Mao Fellowship, University of Oregon, 2017-2023 Dean’s First Year Merit Award, University of Oregon, 2017 4 Outstanding Senior in Physics and Astronomy, University of Denver, 2016 Recipient of Partners in Natural Sciences Grant, University of Denver, 2014, 2015 Dean’s List, University of Denver, 2012-2016 PUBLICATIONS Marcus, A. H., Heussman, D., Maurer, J., Albrecht, C. S., Herbert, P., & von Hippel, P. H. (2023). Studies of Local DNA Backbone Conformation and Conformational Disorder Using Site-Specific Exciton-Coupled Dimer Probe Spectroscopy. Annual Review of Physical Chemistry. Vol 74. Maurer, J., Albrecht, C.S. , Heussman, D., Herbert, P., von Hippel, P.H. & Marcus, A.H. (2023). Polarization-sweep single-molecule fluorescence microscopy of exciton-coupled (Cy3)2 dimer-labeled DNA fork constructs. In Press JCPB. Maurer, J., Albrecht, C.S. , von Hippel, P.H. & Marcus, A.H. (2023). ‘DNA Breathing’ free energy landscape determination via kinetic network analysis of polarization-sweep single- molecule fluorescence microscopy data. In Preparation. Maurer, J., Albrecht, C.S. , P., von Hippel, P.H. & Marcus, A.H. (2023). Multi-state kinetic network modeling of time resolved single-molecule data using a generalized master equation approach. In Preparation. Herbert, P., Maurer, J., Heussman, D., von Hippel, P.H. & Marcus, A.H (2023). Investigating local DNA conformational trapping dynamics during DNA polymerase holoenzyme assembly and exonuclease proof-reading activity. In Preparation. Albrecht, C.S. , Israels, B., Maurer, J., P., von Hippel, P.H. & Marcus, A.H. (2024). Sub- millisecond Single-Molecule FRET Studies of Single-Stranded DNA Conformation Fluctuations Mediated by Single-Stranded DNA Binding Proteins. Work in Progress. 5 ACKNOWLEDGEMENTS I am both grateful and lucky to have been a member of the Marcus and von Hippel group for the last six years. I would like to thank all current and former lab members for their advice, insights, and general inspiration over the course of my thesis work. It truly has been a rich and rewarding environment in which to learn and grow as a scientist. I would like to thank my primary advisor, Andy Marcus, for his dedicated supported of my research efforts, his willingness to let me explore and investigate new scientific directions and for the pertinent advice he’s offered to me which has largely shaped my scientific outlook and thought process. I would also like to thank my co-advisor Peter von Hippel for his constant optimism about life, excitement over the possibilities held by science, pointed research advice, and generosity in supporting both my wife and I during our time in Eugene. Additionally, a special thank you to Larry Scatena for all his help and guidance. I’ve been fortunate to have fostered many long-lasting relationships during my time at the University of Oregon. Many of these relationships were forged over a mutual love of food, the outdoors, science, and more often than I would have anticipated, overly complicated board games. To those friends who I met during my time here, thank you for your company, support and all the treasured memories I will hold onto for years to come. Lastly, thank you to my wife and my family. Your willingness to stand by me on this journey has meant everything to me. Your patience and tolerance for my often overly exuberant personality hasn’t gone unnoticed. I believe it is an enormous privilege to pursue a PhD, one which so many around the world can never afford, so thank you for getting me here and continuing to support me along the way. I wouldn’t have made it this far without you. 6 This dissertation is dedicated to my wife and family. 7 TABLE OF CONTENTS Chapter 1 : INTRODUCTION...................................................................................................... 19 1. OVERVIEW....................................................................................................................... 19 2. BACKGROUND INFORMATION ................................................................................... 20 3. SINGLE MOLECULE APPROACHES ............................................................................ 24 4. MACROMOLECULAR STRUCTURE AND DYNAMICS ............................................ 28 Chapter 2 : POLARIZATION SWEEP SINGLE-MOLECULE FLUORESCENCE MICROSCOPY............................................................................................................................. 31 1. OVERVIEW....................................................................................................................... 31 2. INTRODUCTION ............................................................................................................. 31 3. MATERIALS AND METHODS ....................................................................................... 38 Section 3.1. (iCy3)2 dimer-labeled single-stranded (ss) – double-stranded (ds) DNA fork constructs. .............................................................................................................................. 38 Section 3.2. Sample preparation. ........................................................................................... 40 Section 3.3. Polarization-sweep single-molecule fluorescence (PS-SMF) microscopy experimental setup. ................................................................................................................ 40 Section 3.4 Theoretical description of Jones vector electric field components............... 43 Section 3.5 Derivation of the polarized signal intensity for PS-SMF experiments ........ 46 Section 3.6. Characterization of conformational macrostates using probability distribution functions (PDFs) .................................................................................................................... 52 8 Section 3.7 Characterization of conformational dynamics using multi-point time-correlation functions (TCFs) .................................................................................................................... 57 Section 3.8. Consideration of fluorescence correlation spectroscopy (FCS) as a reporter on conformational dynamics of (iCy3)2 dimer-labeled single-stranded (ss) – double-stranded (ds) DNA fork constructs .............................................................................................................. 63 4. RESULTS AND DISCUSSION ......................................................................................... 70 Section 4.1. Analysis of PS-SMF spectroscopic signals. ...................................................... 70 Section 4.2. Studies of local DNA breathing of +1, -1 and -2 (iCy3)2 dimer-labeled ss-dsDNA fork constructs. ...................................................................................................................... 81 Section 4.3. Salt concentration-dependent breathing of +1 and -2 (iCy3)2 dimer-labeled ss- dsDNA fork constructs. ......................................................................................................... 86 5. CONCLUSION .................................................................................................................. 92 Chapter 3 : SIMULATING A KINETIC NETWORK TO OBTAIN AN EXPERIMENTALLY DERIVED FREE ENERGY LANDSCAPE ................................................................................. 96 1. OVERVIEW....................................................................................................................... 96 2. INTRODUCTION ............................................................................................................. 96 3. COMPUTATIONAL METHODS ...................................................................................... 97 Section 3.1 Simulating a kinetic network using Markov chains ........................................... 97 Section 3.2 Estimating the number of macrostates and minimally necessary model complexity ............................................................................................................................................. 101 9 Section 3.3 Optimization methods for application of a kinetic network model to experimental single-molecule data ............................................................................................................ 104 Section 3.4 Details of calculating the chi-squared across three surfaces .............................115 4. CONCLUSION ................................................................................................................ 129 Chapter 4 : ANALYSIS OF PS-SMF DATA FROM SS-DSDNA JUNCTION USING A KINETIC NETWORK MODEL ................................................................................................................. 131 1. OVERVIEW..................................................................................................................... 131 2. INTRODUCTION ........................................................................................................... 131 3. MATERIALS AND METHODS ..................................................................................... 134 4. RESULTS AND DISCUSSION ....................................................................................... 135 4.1 Positional dependence of DNA replication fork free energy landscapes ...................... 135 4.2 Salt dependence of DNA replication fork free energy landscapes ................................ 142 5. CONCLUSION ................................................................................................................ 152 5.1 Proposed structural model of conformational macrostates at replication fork junctions ................................................................................................................................. 152 Chapter 5 : APPLICATION OF PS-SMF TO PROTEIN-DNA COMPLEXES ......................... 158 1. OVERVIEW..................................................................................................................... 158 2. INTRODUCTION ........................................................................................................... 159 3. MATERIALS AND METHODS ..................................................................................... 164 4. RESULTS AND DISCUSSION ....................................................................................... 166 10 Section 4.1 Duplex region studies of DNA p/t junctions .................................................... 166 Section 4.2 Protein-DNA complexes studies at and near DNA p/t junctions ...................... 177 5. CONCLUSION ................................................................................................................ 187 Chapter 6 : CONCLUDING SUMMARY .................................................................................. 193 Appendix: INVESTIGATIONS INTO BROADBAND INTERFEROMETRIC MEASUREMENTS PERFORMED ON SINGLE MOLECULES.............................................195 11 LIST OF FIGURES Figure 1.1 Depiction of the full T4 bacteriophage replisome and its various protein complexes ....................................................................................................................................................... 22 Figure 1.2 Depiction of various ss-dsDNA junction topologies. ................................................. 23 Figure 1.3 Structure of dsDNA. ................................................................................................... 27 Figure 2.1 Labeling chemistry, nomenclature and structural parameters of the internal (iCy3)2 dimer probes.................................................................................................................................. 33 Figure 2.2 The PS-SMF microscopy experimental layout.. ......................................................... 34 Figure 2.3 Schematic of the interferometer and integrated phase-tagged photon counting (PTPC) electronics ..................................................................................................................................... 43 Figure 2.4 Control measurements of phase histograms. ........................................................... 52 Figure 2.5 Model demonstration of variable integration time on the emergence of multimodal behavior. ....................................................................................................................................... 56 Figure 2.6 Experimental demonstration of variable integration time on the emergence of multimodal behavior ................................................................................................................... 57 Figure 2.7 PS-SMF probability distribution functions (PDFs) for the +1 (iCy3)2 dimer-labeled ss-dsDNA fork constructs.. ......................................................................................................... 64 Figure 2.8 PS-SMF probability distribution functions (PDFs) for the +1 iCy3 monomer- labeled ss-dsDNA fork construct................................................................................................ 65 Figure 2.9 PS-SMF time correlation functions (TCFs) for an anisotropic Cy3-film ............. 68 Figure 2.10 PS-SMF time correlation functions (TCFs) for an isotropic rhodamine sample ....................................................................................................................................................... 69 Figure 2.11 Example PS-SMF signal trajectories ..................................................................... 72 12 Figure 2.12 PDFs of the flux, visibility and phase for the +1 (iCy3)2 dimer-labeled ss-dsDNA fork construct. .............................................................................................................................. 74 Figure 2.13 Probability distribution functions (PDFs) of the visibility for the +1 (iCy3)2 dimer- labeled ss-dsDNA construct, and the +1 iCy3 monomer-labeled ss-dsDNA construct ......... 76 Figure 2.14 Joint PDFs of the flux and visibility signals. ........................................................ 77 Figure 2.15 Two-point time correlation functions (TCFs) of the visibility and (B, D) the flux for the +1 iCy3 monomer-labeled ss-dsDNA construct and the +1 (iCy3)2 dimer-labeled ss- dsDNA construct ......................................................................................................................... 79 Figure 2.16 Probability distribution functions, two-point time-correlation functions and three- point TCFs of the PS-SMF visibility for the +1, -1 and -2 (iCy3)2 dimer-labeled ss-dsDNA fork constructs.. ........................................................................................................................... 83 Figure 2.17 Salt concentration-dependent PDFs, two-point TCFs and three-point TCFs of the signal visibility for the +1 (iCy3)2 dimer-labeled ss-dsDNA construct. ................................... 87 Figure 2.18 Salt concentration-dependent PDFs, two-point TCFs and three-point TCFs of the signal visibility for the -2 (iCy3)2 dimer-labeled ss-dsDNA construct. .................................... 88 Figure 2.19 Schematic diagram illustrating hypothesized mechanism of salt-induced instability of (iCy3)2 dimer-labeled ss-dsDNA fork constructs ............................................... 95 Figure 3.1 An explicit form of the rate matrix. .......................................................................... 105 Figure 3.2 Comparsion of the results from sampling two separate distributions utlizing an alternate sampling method ..........................................................................................................................110 Figure 3.3 A diagrammatic depiction of the genetic algorithm workflow. .................................113 Figure 3.4 Example of a ‘good’ fit to the data from a model outcome. ..................................... 121 Figure 3.5 Weight surfaces applied to the C2 and C3 in ‘good’ fit case. ................................... 121 13 Figure 3.6 ‘Chi-Stats’ overview for reporting on statistical measures of ‘true’ error in the case of a ‘good’ fit ................................................................................................................................... 122 Figure 3.7 ‘Chi-Stats’ overview for reporting on statistical measures of ‘weighted’ error in the case of a ‘good’ fit .............................................................................................................................. 123 Figure 3.8 Example of a ‘bad’ fit to the data from a model outcome. ....................................... 124 Figure 3.9 Weight surfaces applied to the C2 and C3 in ‘bad’ fit case. ..................................... 124 Figure 3.10 ‘Chi-Stats’ overview for reporting on statistical measures of ‘true’ error in the case of a ‘bad’ fit. .................................................................................................................................... 125 Figure 3.11 ‘Chi-Stats’ overview for reporting on statistical measures of ‘weighted’ error in the case of a ‘bad’ fit.. ....................................................................................................................... 126 Figure 4.1 Optimized fits from our kinetic network analysis for the position dependence of DNA ‘breathing’ under physiological salt conditions. ......................................................................... 136 Figure 4.2 Depiction of the various modes and extents of DNA ‘breathing’ one might expect . ..................................................................................................................................................... 137 Figure 4.3 Free energy landscapes and associated kinetic networks for each of the positions examined under physiological salt conditions.. .......................................................................... 140 Figure 4.4 Summary diagrams for the thermodynamic and mechanical stability of the +1, -1 and -2 positions under physiological salt conditions. ........................................................................ 141 Figure 4.5 Optimized fits from our kinetic network analysis for the salt dependence of DNA ‘breathing’ at the +1 position. ..................................................................................................... 143 Figure 4.6 Free energy landscapes and associated kinetic networks for each of the salt conditions examined at the +1 position.. ...................................................................................................... 146 14 Figure 4.7 Optimized fits from our kinetic network analysis for the salt dependence of DNA ‘breathing’ at the -2 position.. ..................................................................................................... 148 Figure 4.8 Free energy landscapes and associated kinetic networks for each of the salt conditions examined at the -2 position.. ....................................................................................................... 149 Figure 4.9 Summary diagrams for the thermodynamic and mechanical stability of the +1 and -2 and positions under a series of salt conditions. ........................................................................... 151 Figure 4.10 Proposed structural model based on the results of our kinetic network analysis of replication fork junctions under numerous solvent conditions. .................................................. 154 Figure 5.1 Schematic depiction of the clamp-clamp loader reaction cycle ............................... 161 Figure 5.2 Proposed kinetic scheme obtained of the DNA holoenzyme ................................... 163 Figure 5.3 Linear absorbance of duplex labeled p/t junction constructs.................................... 167 Figure 5.4 Panels A-E: Experimental single-molecule data and optimized fits for the labeled p/t junction constructs. ..................................................................................................................... 168 Figure 5.5 Linear absorbance and CD spectra of +15, +3, and +1 constructs upon the addition of gp45 clamp and activated gp44/62 clamp loader. ....................................................................... 178 Figure 5.6 Experimental data and optimized fits for the +1 dimer and clamp-clamp loader complex. ...................................................................................................................................... 179 Figure 5.7 CD spectra of gp43 binding to +4 p/t construct upon addition of increasing Mg2+ concentration. .............................................................................................................................. 183 Figure 5.8 Experimental data and optimized fits for the +4 dimer and DNA polymerase complex.. ..................................................................................................................................................... 185 Figure A.1 Schematic of the experimental apparatus used for ultrafast linear and nonlinear single molecule measurements...............................................................................................................201 15 Figure A.2 Two representative linear interferometric experiments on single molecules of dsDNA containing a i(Cy3)2 dimer. .........................................................................................................211 Figure A.3 A series of linear Fourier transforms are shown as a function of integration time...213 Figure A.4 Two dimensional spectra obtained on a single dsDNA molecule containing a monomer of Cy3..........................................................................................................................................215 Figure A.5 Real part of the rephasing spectrum obtained on an ensemble of dsDNA molecules containing a monomer of Cy3. ...................................................................................................216 Figure A.6 Rephasing 2D spectra obtained from single Cy3 monomers incorporated into dsDNA.........................................................................................................................................217 Figure A.7 Rephasing 2D spectra obtained from single Cy3 simers incorporated into dsDNA.........................................................................................................................................217 Figure A.8 Scaling of Nonrephasing and Rephasing signals for variable photon number.........220 Figure A.9 Intensity scaling of the linear and nonlinear signal magnitudes from a Cy3 monomer in dsDNA.....................................................................................................................................221 Figure A.10 Intensity scaling of the linear and nonlinear signal magnitudes from a Cy3 dimer in dsDNA.........................................................................................................................................222 Figure A.11 Experimental apparatus for broadband polarization-sweep measurements. ..........223 Figure A.12 Simulated linear signal components rotating versus non-rotating polarization .....................................................................................................................................................224 16 Figure A.13 Simulated signal contrast between two oppositely handed conformers in the case of rotating versus non-rotating polarization.....................................................................................225 Figure A.14 A histogram of phase-tags obtained on an isotropic same of rhodamine using a broadband polarization sweep experimental setup......................................................................227 Figure A.15 Non-rotating polarization time-correlation function analysis of the signal visibility for rhodamine and a single molecule containing a Cy3 dimer.....................................................228 Figure A.16 Non-rotating polarization time-correlation function analysis of the signal flux for a single molecule containing a Cy3 dimer......................................................................................229 Figure A.17 Two-point TCFs for control data sets with cross-polarized polarization................230 Figure A.18 Two-point TCFs for a +1 and +15 dimer with cross-polarized polarization...........231 17 LIST OF TABLES Table 1 Base sequences and nomenclature for the ss-dsDNA fork constructs used in DNA only replication fork studies. ................................................................................................................. 39 Table 2 Multi-exponential fit parameters for the flux and visibility 2-point TCFs of the iCy3 monomer- and (iCy3)2 dimer-labeled ss-dsDNA constructs. ........................................................ 80 Table 3 Multi-exponential fit parameters of the visibility 2-point TCFs for the (iCy3)2 dimer- labeled ss-dsDNA fork constructs. ................................................................................................ 84 Table 4 Number of models, per model category, for N=4,5,6 in the course of model generation. ..................................................................................................................................................... 107 Table 5 Number of models, per model category, for N=4,5,6, when filters are applied during model generation. ................................................................................................................................... 108 Table 6 Optimized kinetic network model parameters for the free energy minima and activation barriers for all positions and conditions studied at replication fork junctions. ........................... 154 Table 7 Optimized kinetic network model parameters for the kinetic rate constants for all positions and conditions studied at replication fork junctions. .................................................................. 157 Table 8 Base sequences of p/t-DNA constructs used in protein-DNA studies. .......................... 165 Table 9 Optimized kinetic network model parameters for the free energy minima and activation barriers for all protein-DNA conditions studied ......................................................................... 189 Table 10 Optimized kinetic network model parameters for the kinetic rate constants for all for all protein-DNA conditions studied. ................................................................................................ 192 18 CHAPTER 1 : INTRODUCTION 1. OVERVIEW The primary focus of this thesis is the study of thermal fluctuations in dsDNA, coined ‘DNA breathing’, both with and without the assembly of protein complexes. The first Chapter of this thesis serves as an introduction to the detailed discussion of research results described in later Chapters. The primary focus of this introductory Chapter is to bolster the readers understanding of the scientific history which led to the studies described here, the key scientific issues addressed by the development of single-molecule techniques during this thesis, and more broadly the field of macromolecular structure and dynamics, which is comprised of many biologically complementary but technologically orthogonal approaches. The contents of later Chapters are devoted to the various investigations undertaken in the labs of Dr. Andrew H. Marcus and Dr. Peter H. von Hippel. Chapter 2 has been submitted to JCPB and covers the development and implementation of polarization-sweep single-molecule fluorescence spectroscopy, done in collaboration with Claire Albrecht, Patrick Herbert, Dylan Heussman, Anabel Chang, Peter von Hippel and Andrew H. Marcus . Chapter 3 is the topic of forthcoming publication dedicated to the development of a computational framework capable of simulating kinetic network models in a massively parallel fashion. This work was done in collaboration with Claire Albrecht, Peter von Hippel and Andrew H. Marcus. Chapter 4 details the analysis of polarization-sweep single-molecule fluorescence spectroscopy experiments using the kinetic network analysis described in Chapter 3. This work was done in collaboration with Claire Albrecht, Peter von Hippel and Andrew H. Marcus. Chapter 5 covers the analysis of protein dependent single-molecule studies carried out by Patrick Herbert. This work was a collaboration between Patrick Herbert, Peter von Hippel and Andrew H. Marcus. The Appendix reports preliminary investigations into the feasibility and utility of ultrafast single- 19 molecule measurements. This work was done in collaboration with Amr Tamimi, Anabel Chang, Lulu Enkhbaatar, Peter von Hippel and Andrew H. Marcus. 2. BACKGROUND INFORMATION The thermodynamic stability of many biological macromolecules is held in a delicate balance between multiple conformational states under physiological conditions. This inherent lack of highly stable conformations is frequently thought of as a fundamental feature of nucleic acids, proteins, and fully assembled protein complexes, where dynamical rearrangements of their three- dimensional structure are essential for overall function. The importance of thermal fluctuations in the structure of DNA is of great importance for proteins that assemble and function onto a DNA scaffold [1]. The ability to scientifically investigate the role of DNA ‘breathing’ fluctuations began nearly 60 years ago with foundational studies of hydrogen-tritium exchange in nucleic acids obtained from calf thymus samples by von Hippel and coworkers [2], [3]. This work established an experimental basis for the hypothesized presence of local ‘breathing’ events within duplex DNA well below its typical melting temperature of ~65°C. These local ‘breathing’ events were characterized over a range of pH changes and salt concentrations. The trends in pH and salt concentration demonstrated clear relationships between the rates of hydrogen-tritium exchange and the local solvent environment, establishing a basis for future investigations into the mechanisms DNA ‘breathing’. The results offered an elegant picture of the inherent propensity of dsDNA to transiently form and break hydrogen-bonds between complementary base-pairs throughout the double stranded region, which were thought to be important for providing regulatory proteins access to the interior of dsDNA. Future work by Alberts and Nossal demonstrated that despite the ability of the T4 bacteriophage single-stranded DNA binding (ssb) 20 protein (called gene product-32, or gp32) to bind preferentially to ssDNA, gp32 binding to dsDNA was not occurring, suggesting a kinetic block of the fluctuations needed in dsDNA to produce the thermally activated conformational states necessary for the binding of gp32 protein [4], [5]. Followup studies, again by von Hippel and coworkers, focused on the use of formaldehyde as a chemical adduct to prove the ability of base-pairs to not only disrupt the complementary Watson- Crick (W-C) hydrogen bonds, but also to ‘flip-out’ bases into the surrounding solvent [6], [7]. These findings collectively led to many new mechanistic insights regarding the importance of conformational fluctuations in a variety of fundamental biological processes involving dsDNA [1]. Many of these studies focused on the use of protein systems purified and reconstituted from the T4 bacteriophage replication complex (or replisome), a model system for the study of DNA replication [1]. The T4 replisome is an ideal model system for studying DNA replication due to the conservation of its individual protein components in higher organisms, thereby providing a template with potential implications for human health and disease [8]. A cartoon depiction of the fully assembled T4 bacteriophage replisome is shown in Figure 1.1. A few key features can be gleaned from this depiction. First, there are many proteins involved in the form and function of the replisome. Namely, helicase gp41 that unwinds double-stranded DNA, gp59 helicase loading protein which stimulates to binding of gp41, the primase gp61 that synthesizes RNA primers on the lagging strand and the single-stranded DNA binding protein gp32, which coats ssDNA to prevent degradation. The replisome proteins include two gp43 DNA polymerases, which synthesize new DNA complimentary at the origins of the leading and lagging strands, and then gp45 sliding clamp, which contributes to polymerase processivity and fidelity, as well as the gp44/62 clamp-loader. Lagging strand synthesis is completed by RNase H, an exonuclease that removes RNA primers, and gp30 DNA ligase, which ligates the Okazaki fragments together into 21 a unified strand [9]. The second feature to note in this depiction is the presence of unique DNA topologies at the various junctions formed in the T4 replisome. This is highlighted by Figure 1.2. Notably, the assembly of the protein complexes involved in the T4 replisome is known to be independent of nucleic acid sequence, suggesting that local structure may be the determining factor in the cooperative assembly mechanisms at play [1]. Figure 1.1 Depiction of the full T4 bacteriophage replisome and its various protein complexes 22 Figure 1.2 Depiction of various ss-dsDNA junction topologies, highlighting that local structural differences are thought to play a role in the cooperative assembly and function of each distinct complex in the process of replication. The presence of distinctly different DNA topologies, each with its own purpose within the replisome, and the notable absence of base sequence specificity in complex assembly begs the question: how do thermal fluctuations of local structure at ss-dsDNA junctions facilitate the binding and assembly of replication protein complexes? To address this question several biophysical considerations need to be made. The pertinent considerations at the level of protein- DNA systems are the length and time scale of relevant conformational fluctuations. These length and time scales ultimately constrain the design and implementation of an experimental protocol which can achieve sensitivity to the structural and dynamical information required. There are many approaches, spanning orders of magnitude in length and time scale, which have been successfully applied to the study of conformational changes in biological macromolecules. The next section focuses on introducing the most employed technique for obtaining primarily dynamical but also structural information from macromolecular systems: single molecule experiments. 23 3. SINGLE MOLECULE APPROACHES Measurements of single molecules are an effective route to decoding the heterogeneous nature of countless biological systems. In the absence of synchronization, nearly every dynamical system viewed from the ensemble perspective exists as a weighted average across the numerous observable states occupied by the individual subunits in time. It’s recognized that many biological systems rely on time dependent changes in their structural state to either achieve a multitude of functions or more tightly regulate a particular function, in the context of the cell [10]. The most natural approach to studying the time evolution of any system, made up of many subunits, is to isolate and study one single subunit. This intuitive idea was brought into full form with the onset of single molecule imaging in the early 1990’s with the first ever optical detection of single molecules [11]. Not long after the initial detection and proof of principle experiments, useful applications of such technological innovations emerged. Most ubiquitous in the field of biophysical science is single molecule Förester Resonant Energy Transfer (smFRET), which was pioneered in the mid 1990’s [12]. When the right experimental conditions are met, FRET provides a useful probe of distance between two points in a protein, DNA, RNA or other biological macromolecules. The changes in the FRET efficiency as a function of time then provide a dynamical trajectory of the molecular rearrangements occurring on the length scale of typically ~1-10 nm, and a timescale of 1𝜇s-10 sec, depending on the specific FRET donor-acceptor chromophore pair employed [13]–[15]. Another commonly employed biophysical single molecule strategy is colocalization spectroscopy, which relies on multiplexed imaging of spectrally distinct chromophores to measure the presence and interaction between multiple subunits [16], [17]. Such multiplexed approaches provide a wealth of simultaneous information on the local constituents of hundreds to thousands of individual sub-systems, but often have a time resolution capped at 1ms 24 due to the use of wide-field imaging cameras. Interestingly, the use of highly multiplexed imaging of single molecules has emerged in recent years as an effective route to achieving sub-diffraction limited optical images of intricate structures within a multitude of systems at the cellular and tissue scale [18]–[20]. And lastly, the use of force-spectroscopy to measure the energetics of protein folding domains, nucleic acid base-pair energies and numerous other biophysical properties of polymers pertinent to molecular packing and overall rigidity [21]–[23]. In the case of force spectroscopy, time resolution is typically unimportant, while sensitivity to small changes in chain length as a function of applied force is often strictly desired. Taken together, these single molecule techniques can address a wide variety of questions within biophysical science. However, certain questions remain difficult to address with the currently available repertoire of single molecule measurement approaches. Notably, the initially stated research question of this thesis, ‘how do thermal fluctuations of local structure at ss-dsDNA junctions facilitate the binding and assembly of replication protein complexes?’, remains an elusive target for such techniques. This is primarily due to the necessity for a highly local and site- specific measure of conformational changes occurring at ss-dsDNA junctions. To probe the fluctuations of ss-dsDNA junctions, both inside and outside the junction, one could consider monitoring these fluctuations from either the perspective of the base-pairs or the perspective of the sugar-phosphate backbone, as the two primary constituents of DNA structure. In the work described here, the focus is on the sugar-phosphate-backbone, as previously successful experiments using iCy3-dimer probes have demonstrated positional and temperature dependence of the iCy3-dimer structure as a reporter of the local conformations adopted by the necleobases and sugar-phosphate backbones immediately adjacent to the probes [24]–[27]. To successfully probe these same conformational changes at the single molecule limit, what is needed is an 25 experiment that can measure small changes in the relative distance and orientation of the two Cy3 probes rigidly inserted into the sugar-phosphate backbones with temporal resolution that meets the timescale of relevant fluctuations. Ideally, this same experiment would delineate changes in signal between single linkages of the sugar-phosphate backbones, as a single linkage defines the transition from single-stranded to double-stranded across the ss-dsDNA junction. While issues of distance and orientation dependence at the limit of single molecules have been routinely handled by FRET in a variety of systems[14], [28]–[31], the length scales involved in the internal rearrangement of the sugar-phosphate backbones are simply too small (< 1 nm) to be within the range of sensitivity of most FRET experiments (~5 nm). Figure 1.3 illustrates the length scales of both the internal duplex of dsDNA and the distance between nearest neighbor base-pairs, emphasizing the need for an observable which can achieve sensitivity in the range of ~1 nm internal to the duplex. Also, it is important to consider the ability to maintain signal and sensitivity upon binding of proteins and formation of protein complexes to key sites along the sugar phosphate backbones, as the aim of many, if not all, oligonucleotide single molecule experiments is to better elucidate the biological mechanism governing protein-DNA interactions. 26 Figure 1.3 Structure of dsDNA, showing the separation of adjacent bases, diameter of the double helix and definitions for the major and minor grooves of the double helix [32]. The culmination of considering these constraints has led to the development of a polarization sensitive single molecule experiment, called ‘polarization-sweep single molecule fluorescence microscopy’, which can obtain a measure of the internal conformation of iCy3-dimers rigidly inserted into the sugar-phosphate backbones of dsDNA, with demonstrated sensitivity to changes of site-specific labeling on the scale of a single linkage of sugar-phosphate backbone, sensitivity to salt-concentration in the surrounding solvent, and ultimately sensitivity to the binding and functional activity of proteins that form protein-DNA complexes. 27 4. MACROMOLECULAR STRUCTURE AND DYNAMICS This section is dedicated to a brief overview of the various biophysical methods available to the field of molecular and structural biology which have played and continue to play an important role in the determination of the structure and dynamics of macromolecules. The interested reader is encouraged to further investigate the cited references of this section, as the field of biophysics has grown rapidly over the past few decades with numerous complementary but technological orthogonal techniques coming into existence that can offer a wealth of information about the systems under study. In general, there are a handful of techniques that are well suited for structure determination and a handful of techniques that are well suited for capturing dynamics. It is uncommon for a single approach to offer detailed structural information and fast enough time resolution to be considered a measurement of dynamics. Traditionally, questions of structure have been handled independently of investigations into dynamics. However, the next section briefly highlights progress made into the combination of these two critical components in the study of biological macromolecules. The field of structural biology is longstanding and thoroughly developed in the fields of X-ray crystallography[33]–[35], NMR[36]–[38], small angle X-ray scattering[39], cryo-electron microscopy [15], [40], [41] and more recently computer simulations [42]. This suite of techniques offers tradeoffs between spatial resolution, addressable system scale and occasionally time resolution, with very few achieving all three to a high degree. Recent advances in cryo-EM have opened the door to stopped-flow experiments of large biomolecules, with time resolution down to a limit of 5ms [43]. Also notable is the use of single-particle picking techniques in cryo-EM [15], [44] as well as single-particle X-ray diffraction methods [35], both aimed at unpacking the inherent heterogeneity of biological macromolecules, which maintaining a high degree of structural 28 information. Recent developments in structural NMR have allowed the identification and structural assignment of minority protein conformations in solution which play important roles in regulating their function and mediating interactions with therapeutics [36], [45]. Another notable achievement is the use of all optical approaches, such as two-dimensional fluorescence spectroscopy (2DFS), for the determination of local structure with a high degree of specificity, in solution, that also affords quantification of the inherent heterogeneity in the systems under study [24], [25], [46]–[48]. The list of techniques outlined here is by no means exhaustive, mass- spectrometry, EPR and DEER spectroscopy have all been utilized for protein structure determination as well. Ultimately, all these approaches offer a glimpse into the structures which macromolecules freely adopt under physiological conditions, with some approaches coming much closer to reporting on physiological conditions than others. In the opposing, but intimately related, realm of measuring macromolecular dynamics, the major techniques are time-resolved optical experiments[49], [50], single molecule techniques [8], [13], [14], [51]–[56], computer simulations [57], [58] and also to some extent the previously mentioned developments in NMR, cryo-EM and mass-spectrometry. In the context of this dissertation introduction, the consideration of ‘dynamics’ has been limited to real-time observation and tracking of conformational changes occurring in single molecules and identification of non- majority conformers in ensemble experiments. Importantly, many ultrafast optical techniques exist for probing the fast changes in either the structure or chemical environment of small molecules but are often held to an upper limit of timescales on the order of nanoseconds. Taken together, these techniques offer a broad range of time resolution depending on the question of interest. At the time of this discussion, single molecule dynamics on the order of milliseconds are routinely accessed, while dynamics on the order of microseconds and even 29 nanoseconds have been accessed in some cases [14], [59]. Computer simulations have made remarkable progress in producing time resolved conformational changes from both all-atom and coarse-grained approaches, which have greatly aided the pursuits of experimentalists by offering insights beyond the current capabilities of experiment [57], [60]–[62]. However, in nearly all cases the longest trajectories available to such simulated approaches are on the order of microseconds, making direct comparison with slow dynamics observed in experiment difficult. For the purely experimental approaches taken in the field of single molecule spectroscopy, currently only smFRET offers a direct measure of ‘structure’ via the distance dependence of its transfer efficiency. All other structural information is often inferred from the dynamical information measured, as is the case with kinetic binding assays and colocalization studies. In rare instances, the structural resolving power of cryo-EM has been combined with single molecule studies to successfully bridge the gap between the intrinsic dynamics and structural diversity of complex biophysical systems [15], [17], [63]–[65]. This lack of structural sensitivity in predominantly dynamical experiments highlights the need for better real-time estimates of conformation in the field of biophysical science. Given the enormous advantages of studying conformational changes at the limit of single molecules, particular effort should be paid to innovative approaches which either go beyond or compliment the capabilities of FRET. The remaining Chapters of this thesis are dedicated to a detailed discussion of the design, development, optimization, and implemented use cases of the aforementioned ‘single-molecule- polarization sweep microscopy’ method, which achieves microsecond resolved estimates of conformation in the sugar-phosphate backbone of ss- dsDNA molecules, both alone and in complexation with proteins. The concluding Chapter of this thesis details investigations into broadband single molecule experiments, which focuses on technical feasibility and inherent advantages versus disadvantages of such experiments. 30 CHAPTER 2 : POLARIZATION SWEEP SINGLE-MOLECULE FLUORESCENCE MICROSCOPY 1. OVERVIEW This Chapter contains worked from the article “Studies of DNA ‘breathing’ by polarization- sweep single-molecule fluorescence microscopy of exciton-coupled (iCy3)2 dimer-labeled DNA fork constructs” authored by Jack Maurer, Claire Albrecht, Patrick Herbert, Dylan Heussman, Andrew H. Marcus and Peter von Hippel. This work was funded by the National Institutes of Health (NIGMS Grant GM-215981 to P.H.v.H. and A.H.M.). Andrew H. Marcus was the principal investigator for this work. The contents of the article have been expanded upon here to serve as a comprehensive overview and introduction to the methods which will be used throughout the remaining Chapters. This Chapter will more formally introduce the motivation for the ‘polarization-sweep’ experiment, the experimental setup itself, experimental signal derivation and the statistical analysis approaches employed for all single-molecule data discussed throughout later Chapters. 2. INTRODUCTION The assembly of the protein-DNA complexes that drive the various processes of genome expression involves coordinated interactions between multiple macromolecular species. For example, during the initial stages of DNA replication, proteins recognize and bind to base- sequence-specific sites at the replicative origin, forming a multi-subunit protein-DNA complex that, in turn, recruits additional protein factors to establish the ‘trombone’ shaped framework of 31 the functional DNA replication-elongation complex. This DNA framework is subject to thermally induced fluctuations that permit local regions of the initially double-stranded (ds) DNA to transiently expose single-stranded (ss) DNA templates to the aqueous surroundings, and thus provide access to the protein components of the DNA replication complex. Although the notion of ‘DNA breathing’ is consistent with free energy landscape models of protein-DNA interactions[66], little is known about the distributions of the Boltzmann-weighted conformational macrostates at and near replication fork junctions, or the associated activation barriers that control the assembly and operation of replisomal protein-DNA complexes. In this work, we present a novel, polarization-selective single-molecule fluorescence method to study the local conformations and conformational dynamics of model DNA fork constructs, which are labeled with pairs of cyanine dyes [(iCy3)2] that are rigidly inserted into the sugar- phosphate backbones at various positions relative to the ss-dsDNA fork junction (see Fig. 2.1A and 2.1B)[67]. The closely spaced iCy3 monomers (labeled A and B) are electrostatically coupled, so that the (iCy3)2 dimer supports delocalized excitons [symmetric (+) and anti-symmetric (−)] with orthogonally polarized electric dipole transition moments (EDTMs), 𝝁 = 1± (𝝁𝐴 ± 𝝁𝐵). The √2 magnitudes of the EDTMs depend sensitively on the local conformation of the (iCy3)2 dimer probe, which is characterized by the ‘tilt’ angle, 𝜃𝐴𝐵, and the ‘twist’ angle, 𝜙𝐴𝐵 (see Fig. 2.1C) [24], [25], [68]. 32 Figure 2.1 Labeling chemistry and nomenclature of the internal (iCy3)2 dimer probes positioned within the sugar- phosphate backbones of model ss-dsDNA constructs. (A) The Lewis structure of the iCy3 chromophore is shown with its 3’- and 5’-linkages to the sugar-phosphate backbone of a local segment of ssDNA. The double-headed green arrow indicates the orientation of the electric dipole transition moment (EDTM). (B) An (iCy3)2 dimer-labeled DNA construct contains the dimer probe near the single-stranded (ss) – double-stranded (ds) DNA junction. The conformation of the (iCy3)2 dimer probe reflects the local secondary structure of the sugar-phosphate backbones at the probe insertion site position. The sugar-phosphate backbones of the conjugate DNA strands are shown in black and blue, the bases in gray, and the iCy3 chromophores in green. (C) The structural parameters that define the local conformation of the (iCy3)2 dimer probe are the inter-chromophore separation vector 𝑅𝐴𝐵, the tilt angle 𝜃𝐴𝐵, and the twist angle 𝜙𝐴𝐵. The electrostatic coupling between the iCy3 chromophores gives rise to the anti-symmetric (−) and symmetric (+) excitons, which are indicated by the red and blue arrows, respectively, and whose magnitudes and transition energies depend on the structural parameters. (D) The insertion site position of the (iCy3)2 dimer probe is indicated relative to the pseudo-fork junction using positive integers in the direction towards the double-stranded region, and negative integers in the direction towards the single-stranded region. (E) A hypothetical free energy surface (FES) describing local fluctuations of the (iCy3)2 dimer-labeled sugar-phosphate backbones near the ss-dsDNA fork junction as a function of the twist angle parameter, 𝜙𝐴𝐵[24]. In Fig. 2.2, we show a schematic layout of our experimental approach, which we call polarization-sweep single-molecule fluorescence (PS-SMF) microscopy. A single (iCy3)2 dimer- labeled DNA construct is resonantly excited using a continuous-wave (cw) laser whose plane polarization is rotated at the frequency Ω⁄2𝜋 = 1 MHz. The symmetric (+) and anti-symmetric (−) excitons of the (iCy3)2 dimer probe, which are oriented perpendicular to one another, are 33 alternately excited as the laser polarization direction is rotated across the corresponding EDTMs, and the ensuing modulated fluorescence is detected using the phase-tagged photon-counting (PTPC) method [69]. The resulting PS-SMF signal is directly related to the local conformation of the (iCy3)2 dimer-labeled ss-dsDNA construct, whose fluctuations can be monitored on microseconds and longer time scales. Figure 2.2 The PS-SMF microscopy experimental layout. (A) An (iCy3)2 dimer-labeled ss-dsDNA fork construct, which can form a protein-DNA complex, is attached to the microscope slide surface using biotin / neutravidin linkages. (B) A total internal reflection fluorescence (TIRF) microscope is used to illuminate the sample with a continuous- wave (cw) 532 nm laser. The plane-polarized electric field vector of the laser is continuously rotated at the frequency Ω⁄2𝜋 = 1 MHz by sweeping the phase of an interferometer (Fig. 2.3). Individual signal photons (fluorescence) from a single molecule are detected using an avalanche photodiode (APD). Each detection event is registered within a time window Δ𝑡 = 1 𝜇s, and its phase is assigned to one of 80 incremented values (with bin width Δ𝜑 = 360°/80 = 4.5°) using the method of phase-tagged photon counting (PTPC) [69]. (C) A single (iCy3)2 dimer-labeled ss-dsDNA construct has symmetric and anti-symmetric EDTMs (labeled 𝜇±, shown as blue and red vectors, respectively), which define the major and minor axes of a ‘polarization ellipse’ in the transverse cross-sectional area of the incident laser beam. The orthogonally polarized symmetric and anti-symmetric 34 excitons are alternately excited as the laser electric field vector (shown in green) rotates in the clockwise direction (with phase 𝜑⁄2 = Ω𝑡⁄2). PS-SMF experiments provide both structural and kinetic information about the microscopic configurations of the sugar-phosphate backbones and bases immediately surrounding the (iCy3)2 dimer probe. PS-SMF trajectories contain information about the equilibrium distribution of conformational macrostates and their associated rates of interconversion, which are directly related to parameters that characterize the site-specific local free energy landscape (FEL) [26]. From the photon count data stream, we determine the following three ensemble-average functions of the PS-SMF signal: (i) the two-point time-correlation function (TCF), 𝐶̅(2)(𝜏) that describes the average correlation between two successive measurements separated by the interval 𝜏, which is sampled on tens-of-microsecond time scales; (ii) the three-point TCF, 𝐶̅(3)(𝜏1, 𝜏2) that describes the average correlation between three successive measurements separated by the intervals 𝜏1 and 𝜏2 and is sampled on millisecond time scales; and (iii) the probability distribution function (PDF) of the PS-SMF signal visibility 𝑃(𝑣), which is averaged over the 10 ms time scale required to achieve a signal-to-noise (S/N) ratio of ~10. Although beyond the scope of this Chapter, the above functions can be analyzed using a kinetic network model to characterize the equilibrium and kinetic properties of multistep conformational transition pathways at equilibrium. While the two-point TCF, 𝐶̅(2), contains information about the characteristic relaxation times of the system, the three-point TCF, 𝐶̅(3), contains additional information about the exchange times between pathway intermediates[14], [56], [70], [71]. The use of a kinetic network model will be discussed in Chapters 3 and 4, while the role of time correlation functions and probability distribution functions will be covered in section 3.6 and 3.7 of this Chapter. 35 Recently, Heussman et al. used ensemble absorbance, circular dichroism (CD) and two- dimensional fluorescence spectroscopy (2DFS) to study the average local conformations and conformational disorder of iCy3 monomer- and (iCy3)2 dimer-probes at various positions within ss-dsDNA fork constructs [25]. From these studies it was concluded that the average conformation of the (iCy3)2 dimer-probe changes systematically as the labelling site is varied across the ss- dsDNA junction from the +2 to the -2 positions (see Fig. 2.1D for site-labelling nomenclature). For positive integer positions the mean local conformation was shown to be right-handed (with mean twist angle, ?̅?𝐴𝐵 < 90°), and for negative integer positions the local conformation was left- handed (?̅?𝐴𝐵 > 90°). Moreover, these experiments indicated that the distributions of conformational states at these key positions are relatively narrow, suggesting that regions of the junction extending towards and into the single-stranded segments exhibit moderate levels of structural order. A possible explanation for the results of Heussman et al. [25] is that the (iCy3)2 dimer probe, which is covalently linked to the sugar-phosphate backbones and bases immediately surrounding the probe, can adopt only a small number of local conformations whose relative stabilities depend on position relative to the ss-dsDNA fork junction. In Fig. 2.1E, this concept is illustrated using a hypothetical free energy landscape, which depicts four possible local conformations of the (iCy3)2 dimer probe at a positive integer position, with the right-handed Watson-Crick (WC) B-form conformation being most favored. In this picture the free energy differences between the B-form ground state and the other (non-canonical) local conformations are sufficiently high to ensure that the Boltzmann-weighted distribution of available macrostates is dominated by the B-form structure. At the same time, the moderate free energies of activation allow for infrequent transitions between non-canonical local structures, which are present in trace 36 amounts. As the probe-labelling position is varied across the ss-dsDNA junction towards the single-stranded region, one might expect the free energy surface to exhibit local minima with approximately the same coordinate values, but with relative stabilities and barrier heights shifted to reflect the presence of non-canonical structures observed in ensemble measurements [24]–[26]. The PS-SMF method presented here is an extension of one previously developed by Phelps et al. [13], which simultaneously monitors the linear dichroism (LD) and Förster resonance energy transfer (FRET) signals from an internally labeled iCy3 / iCy5 donor-acceptor pair. The primary advantage of the PS-SMF method is that it can sensitively isolate signals due to the relative internal motions of an exciton-coupled (iCy3)2 dimer probe from those arising from collective rotation and/or excited state population fluctuations (i.e., ‘blinking’) [13]. As such, PS-SMF experiments provide a direct means to observe the dynamic interconversions between local conformational macrostates of (iCy3)2 dimer-labeled ss-dsDNA constructs. The PS-SMF method is thus well suited for studies of the kinetic pathways and equilibrium distributions of the conformational fluctuations of the (iCy3)2 dimer-labeled sugar-phosphate backbones and bases, which are central to understanding — at the molecular level — the different forms of local DNA ‘breathing’ at specified positions relative to ss-dsDNA junctions. In the following Chapters, we show that PS-SMF experiments can be used to study position- and salt-dependent ‘breathing’ fluctuations of (iCy3)2 dimer-labeled ss-dsDNA fork constructs. Among the significant findings of this work is that the local conformations of the (iCy3)2 dimer- labeled ss-dsDNA constructs undergo transient fluctuations between four conformational macrostates, as suggested by the free energy landscape depicted schematically in Fig. 2.1E. Furthermore, at probe labeling positions at and near the ss-dsDNA junction, extending towards the single-stranded region, the local B-form conformation becomes unstable relative to conformations 37 that exist in the duplex at only trace occupancy levels. Furthermore, altering the salt concentration from physiological (6 mM MgCl2, 100 mM NaCl), either to increasing or decreasing levels, also destabilizes the B-form conformation, reminiscent of the concept of cold denaturation [72]. While certain aspects of salt concentration-dependent DNA stability and protein-DNA interactions are well understood [73]–[75], little information is currently available about the microscopic details of how salt concentration affects conformational fluctuations near ss-dsDNA fork junctions. The results of our experiments suggest a possible structural framework for understanding the roles of transient DNA conformational fluctuations at and near ss-dsDNA fork junctions in driving the processes of protein-DNA complex assembly and function. 3. MATERIALS AND METHODS Section 3.1. (iCy3)2 dimer-labeled single-stranded (ss) – double-stranded (ds) DNA fork constructs. The model ss-dsDNA fork constructs that we used in this work have either an iCy3 monomer or an (iCy3)2 dimer incorporated into the DNA framework, as shown in Figs. 2.1A and 2.1B. The specific nucleic acid base sequences are listed in Table 1. We purchased nanomolar quantities of single-stranded oligonucleotides from Integrated DNA Technologies (IDT, Coralville, IA). The initially dehydrated samples were rehydrated at 100 nM concentrations in aqueous buffer (100 mM NaCl, 6 mM MgCl2 and 10 mM Tris, pH 8.0). Complementary oligonucleotides were combined in a 1:1.5 molar concentration ratio between the biotinylated and non-biotinylated strands, respectively, before they were heated to 94°C for 4 min and allowed to cool slowly to room temperature (~23°C). The annealed iCy3 monomer and (iCy3)2 dimer-labeled ss-dsDNA 38 constructs contained both single-stranded and double-stranded regions with the probe labeling positions indicated by the nomenclature described in Fig. 2.1. The iCy3 monomer-labeled ss- dsDNA constructs contained a thymine base (T) in the complementary strand at the position directly opposite to the probe chromophore. Solution samples were stored at 4°C between experiments conducted over consecutive days, or frozen at –4°C between experiments conducted over more extended periods. Table 1 Base sequences and nomenclature for the iCy3 monomer and (iCy3)2 dimer-labeled ss-dsDNA fork constructs used in these studies. The horizontal lines indicate regions of complementary base pairing. DNA Nucleotide base sequence construct +1 iCy3 3'-CAG CGG TCG GAG CGT CG(iCy3) GTT TTT TTT TTT TTT-5' monomer 5'-biotin/GTC GCC AGC CTC GCA GC T CTT-3' +1 (iCy3)2 3'-CAG CGG TCG GAG CGT CG(iCy3) GTT TTT TTT TTT TTT-5' dimer 5'-biotin/GTC GCC AGC CTC GCA GC(iCy3) CTT-3' -1 (iCy3)2 3'-CAG CGG TCG GAG CGT CGG (iCy3)TT TTT TTT TTT TTT-5' dimer 5'-biotin/GTC GCC AGC CTC GCA GCC (iCy3)TT-3' -2 (Cy3)2 3'-CAG CGG TCG GAG CGT CGG T(iCy3)T TTT TTT TTT TTT-5' dimer 5'-biotin/GTC GCC AGC CTC GCA GCC T(iCy3)T-3' 39 Section 3.2. Sample preparation. Solutions of annealed ss-dsDNA samples were diluted 1000-fold (~100 pM concentration) before they were introduced into a custom-built microscope sample chamber, as described in prior work [69]. Sample chambers were constructed from microscope slides, which were chemically modified using the procedure of Chandradoss et al.[76]. The inner surfaces of the sample chambers were coated with a layer of poly(ethylene glycol) (PEG), which was sparsely labeled with biotin. Neutravidin was used as a linker to bind the biotin-labeled PEG to the biotin molecules covalently attached to the 5’ ends of the dsDNA regions of the ss-dsDNA constructs, as shown in Table 1. To reduce the rate of photobleaching in our measurements, an oxygen scavenging solution consisting of glucose oxidase, catalase and glucose was introduced into the sample cell before the PS-SMF data were recorded. To suppress the ‘photo-blinking’ effects of excited-state intersystem crossing, the triplet state quencher Trolox was used in the solution buffer[77]. Section 3.3. Polarization-sweep single-molecule fluorescence (PS-SMF) microscopy experimental setup. In Fig. 2.3, we show a schematic of the instrumental setup for PS-SMF experiments on model (iCy3)2 dimer-labeled ss-dsDNA fork constructs. The setup is an extension of an approach previously developed to simultaneously monitor single-molecule Förster resonance energy transfer (FRET) and fluorescence-detected linear dichroism (FLD) signals from iCy3/iCy5 (hetero-) dimer-labeled DNA constructs [13]. As we discuss in further detail below, the signals from PS- SMF measurements provide direct information about the distributions of conformational macrostates and conformational dynamics of the (iCy3)2 dimer probe, which is site-specifically positioned within the ss-dsDNA fork junction. 40 The (iCy3)2 dimer-labeled ss-dsDNA constructs were chemically attached to the surface of a glass microscope slide using biotin/neutravidin linkages, as described in section 3.2 (see Fig. 2.2A). The sample chamber was placed on a computer-controlled translation stage with the probe- labeled DNA constructs illuminated in a total internal reflection fluorescence (TIRF) geometry by a plane-polarized continuous-wave laser (with center wavelength 𝜆𝐿 = 532 nm, see Fig. 2.2B). The laser beam was focused to a 50 𝜇m diameter spot at the sample, and the incident power was adjusted to ~15 mW. Fluorescence from the sample was collected using a high numerical aperture objective lens (N.A. = 1.4) and the image from a single molecule was isolated using a pinhole, a spectral filter (532 nm long-pass) and detected using an avalanche photodiode (APD). The mean total fluorescence photon detection rate (or mean signal flux, 𝑓)̅ was typically ~8,000 – 10,000 counts per second (cps). The signal ‘background rate’ was determined to be ~500 cps by scanning the stage position away from the center of a molecule. Thus, the signal-to-background ratio of our experiments was ~20. The polarization-dependent signal from each single molecule sample was recorded for a total duration of 30 – 60 s, as described further below. A Mach-Zehnder interferometer (MZI), which was equipped with acousto-optic phase modulators in each arm, was used to prepare the plane-polarization state of the laser. In Fig. 2.3, we show a schematic of the laser interferometer and the integrated detection electronics used for our PS-SMF experiments. A half-wave plate is used to rotate the plane polarization vector of a continuous wave (cw) 532 nm laser to 45° from vertical. The laser is directed through a polarizing beam-splitter (PBS) to produce balanced vertical and horizontal plane-polarized beams, which traverse separately the two paths of a Mach-Zehnder Interferometer (MZI) before they are recombined at the PBS exit port. Two acousto-optic Bragg cells (AOBCs) are each placed within the paths of the MZI. Each AOBC is driven continuously at a fixed radio frequency so that a 41 relative phase ‘sweep’ is imparted to the MZI output beam: 𝜑 = Ω𝑡 so that the plane polarization (Jones) vector of the electric field was rotated at the constant frequency Ω⁄2𝜋 = 1 MHz. The output beam passes through a quarter-wave plate, which is rotated to 45° from vertical, before it is directed to the TIRF microscope to excite the single-molecule sample, as shown in Fig 2.2. A pick-off window is used to direct a minor component of the beam through a polarizer and detected using an avalanche photodiode (APD) to create a 1 MHz analog reference signal. The reference is utilized in a negative feedback loop to actively minimize the path difference of the MZI using a piezo-controlled mirror, as implemented in a previous experiment[13]. The digital electronics illustrated in Fig. 2.3 are used to implement the phase-tagged photon counting (PTPC) method described by [69]. In this approach, we use a field- programmable gate array (FPGA), which is a manually reconfigurable integrated circuit that contains hardware-enabled signal-processing algorithms. We used the FPGA to discretize the phase of a given modulation cycle into a set of m ‘phase bins,’ which are numbered and incrementally advanced using an 80 MHz digital counter. The 1 MHz analog reference waveform is first converted into a ‘logical square wave’ (i.e., a periodic TTL waveform) using a custom-built rising-edge comparator circuit. The logical square wave is then used to trigger an 80 MHz phase counter (16-bit width), and to reset the counter at the 1 MHz phase-sweep frequency. The counter reset automatically synchronizes the phase-sweep cycle in the presence of external room vibrations that introduce noise to the MZI reference phase. The average number of phase bins during a modulation cycle is m = 80 MHz / 1 MHz = 80 bins, and the phase bin interval is Δ𝜑 = 360° / 80 = 4.5° bin-1. Thus, during each phase-sweep cycle the counter advances the phase bin value 𝜑𝑗 = 𝑗Δ𝜑 (with j = 0, 1, …, 79) at the 1 MHz clock speed. We also used a second 1 MHz counter (with 64bit width) to assign a time bin value 42 𝑡𝑘 = 𝑘Δ𝑡 (with Δ𝑡 = 1 𝜇s and k = 0, 1, …) to each detection event. Thus, each individual detection event is assigned a 16-bit phase bin value and a 64-bit time bin value, and these data are streamed to computer memory in real time. Figure 2.3 Schematic of the interferometer and integrated phase-tagged photon counting (PTPC) electronics [69] used to perform PS-SMF experiments. PBS: polarizing beam-splitter; AOBC: acousto-optic Bragg cell; APD: avalanche photodiode; FPGA: field-programmable gate array. Section 3.4 Theoretical description of Jones vector electric field components In this section, we derive Eq. (7) which appears at the end of this section. We decompose the polarization state of the electric field in the Jones vector basis of horizontal (H) and vertical (V) plane polarization components. Before entering the MZI (see Fig. 2.3), the plane polarization of the laser is rotated to 45° from vertical using a half-wave plate so that 𝐻 the initial electric field may be written in the [ ] basis: 𝑉 1 𝑬𝑖𝑛𝑖𝑡𝑖𝑎𝑙(𝜔) = 𝐴(𝜔)𝑒 𝑖𝜔𝐿𝑡 [ ] (1) 1 43 In Eq. (1), 𝐴(𝜔) is a narrow spectral envelope and 𝜔𝐿 is the laser center frequency (= 2𝜋𝑐⁄𝜆𝐿, where 𝜆𝐿 = 532 nm). At the entrance port of the MZI, the PBS separates the H and V polarization components into separate paths. Within each MZI path, an AOBC imparts a time-varying phase shift to its respective beam according to 𝜑𝐻 = Ω𝐻𝑡 and 𝜑𝑉 = Ω𝑉𝑡. Thus, the electric field components within the H and V paths can be written: 𝑬 ( 1 𝜔) = 𝐴(𝜔)𝑒𝑖𝜔𝐿𝑡 [ ] 𝑒𝑖𝜑𝐻𝐻 (2a) 0 𝑬𝑉( 0 𝜔) = 𝐴(𝜔)𝑒𝑖𝜔𝐿𝑡 [ ] 𝑒𝑖𝜑𝑉 (2b) 1 At the exit port of the MZI, the H and V beams are recombined at a second PBS to produce the total field: 𝐴(𝜔) 𝑒𝑖𝜑𝑬 𝑖𝜔 𝑡 𝐻 (3) 𝐿 𝑡𝑜𝑡𝑎𝑙(𝜔) = 𝑬𝐻(𝜔) + 𝑬𝑉(𝜔) = 𝑒 [ ] √2 𝑒 𝑖𝜑𝑉 Equation (3) can be simplified by defining the relative phase shift 𝜑 = 𝜑𝑉 − 𝜑𝐻 = Ω𝑉𝐻𝑡 between the H and V paths (with Ω𝑉𝐻 = Ω = 1 MHz), and by multiplying by the overall phase shift 𝑒−𝑖(𝜑𝑉+𝜑𝐻)⁄2: 𝐴(𝜔) 𝑬 (𝜔) = 𝑒𝑖𝜔𝐿𝑡 [𝑒 −𝑖(𝜑⁄2) (4) 𝑡𝑜𝑡𝑎𝑙 ] √2 𝑒+𝑖 (𝜑⁄2) 44 The beam is next directed through a quarter-wave plate that is rotated 45° from vertical. The ‘Jones matrix’ 𝑴 for the quarter-wave plate is given by: 𝑒𝑖(𝜋⁄4) 1 −𝑖 (5) 𝑴 = [ ] √2 −𝑖 1 Thus, after passing through the quarter-wave plate and applying the linear transformation of 𝑴 to the recombined phase modulated field components, the final electric field can be written compactly as: 𝐴(𝜔) −𝑖(𝜑⁄2) 𝑖𝜔 𝑡 𝑖(𝜋⁄4) 1 −𝑖 𝑒 (6) 𝑬(𝜔,𝜑) = 𝑬𝑡𝑜𝑡𝑎𝑙𝑴 = 𝑒 𝐿 𝑒 [ ] [ ] 2 −𝑖 1 𝑒+𝑖(𝜑⁄2) = 𝐴(𝜔)𝑒𝑖𝜔𝐿𝑡 +cos(𝜑⁄2) [ ] −sin(𝜑⁄2) The second equality of Eq. (6) is the electric field incident on the sample, which is equivalent to Eq. (7) below: 𝑬(𝜔,𝜑) = 𝐴(𝜔)𝑒𝑖𝜔𝐿𝑡[cos(𝜑⁄2)?̂? − sin(𝜑⁄2)𝒋̂] (7) 45 Section 3.5 Derivation of the polarized signal intensity for PS-SMF experiments In Eq. (7), 𝐴(𝜔) is the (narrow) spectral envelope, 𝜔𝐿 (=2𝜋𝑐⁄𝜆𝐿) is the laser center frequency and ?̂? and 𝒋 ̂ are unit vectors that point along the x and y axes of the molecular frame, respectively, as shown in Fig. 2.1C. The total electric dipole transition moment (EDTM) of the coupled AB dimer is 𝝁 1𝑡𝑜𝑡 = 𝝁+ + 𝝁−, where 𝝁± = (𝝁𝐴 ± 𝝁𝐵) are the EDTMs of the √2 symmetric (+) and anti-symmetric (−) excitons. We note that the magnitudes of the 𝜇± EDTMs depend sensitively on the local conformation of the AB dimer (see Fig. 2.1C and discussions in [24], [25], [27]. The electric field vector, 𝑬(𝜔,𝜑), is given by Eq. (7) and the symmetric (+) and anti-symmetric (–) electric dipole transition moments (EDTMs) of the exciton coupled dimer, 𝝁±(𝜔), are given by (𝛼) +cos(𝜓⁄2) 𝝁+(𝜔) = ∑𝜇+ (𝜔) [ ] (8a) +sin(𝜓⁄2) 𝛼 (𝛼′) −sin(𝜓⁄2) 𝝁−(𝜔) = ∑𝜇− (𝜔) [ ] (8b) +cos(𝜓⁄2) 𝛼′ In Eqs. (8a) and (8b), the angle 𝜓⁄2 specifies the orientation of the dimer in the x-y plane, which we assume to be constant or slowly varying on the time scales of the dimer conformational (𝛼) fluctuations of interest. The terms 𝜇± (𝜔) are the spectrally-dependent magnitudes of the EDTMs, which depend on the dimer conformation, as discusses previously, and the index 𝛼 enumerates the singly-excited dimer states in order of increasing energy [24], [25], [27]. Furthermore, the 46 directions of 𝝁± have orthogonal relative orientation, and thus define a ‘polarization ellipse’ in the cross-sectional area in which the laser beam projects onto the molecular frame: ( ) 𝝁+(𝜔) = ∑ 𝛼 𝛼 𝜇+ (𝜔) [cos(𝜓⁄2)?̂? + sin(𝜓⁄2)𝒋̂], (9a) 𝝁 (𝜔) = ∑ 𝜇(𝛼)− 𝛼 − (𝜔) [−sin(𝜓⁄2)?̂? + cos(𝜓⁄2)𝒋̂], (9b) The 𝝁± symmetric and anti-symmetric EDTMs (𝝁+ and 𝝁−, respectively) are depicted graphically in Fig. 2.1C. At any instant, the ‘polarized’ single-molecule fluorescence intensity is given by the spectral overlap and square modulus of the vector projections between the laser electric field and the total EDTM. ( 2 𝐼 (𝜑) = ∑∫𝑑𝜔|𝑬(𝜔,𝜑) ∙ 𝝁 𝛼 ) 𝑃 𝑡𝑜𝑡(𝜔)| (10) 𝛼 Equation (10) can be written as the sum of symmetric (+) and anti-symmetric (–) exciton contributions by substituting 𝝁𝑡𝑜𝑡 = 𝝁+ + 𝝁−: 2 2(𝛼′) 𝐼𝑃(𝜑) ( ) (11) = ∑∫𝑑𝜔|𝑬(𝜔, 𝜑) ∙ 𝝁 𝛼+ (𝜔)| + ∑ ∫𝑑𝜔 |𝑬(𝜔,𝜑) ∙ 𝝁− (𝜔)| 𝛼 𝛼′ 47 We identify the first term on the right-hand side of Eq. (11) as the symmetric exciton contribution, 𝐼+(𝜑), and the second term as the anti-symmetric exciton contribution, 𝐼−(𝜑). The separation given by Eq. (11) is a consequence of 𝝁+ and 𝝁− having orthogonal orientations, which additionally leads to the two signal contributions having a relative phase of 180°. This relative phase is a consequence of the way the rotating electric field is prepared with the use of a quarter wave plate, as described by Eq. 7. We first calculate the signal contribution from the symmetric excitons. Substitution of Eq. (6) and Eq. (8a) into the first term on the right-hand side of Eq. (11) leads to: 2 𝐼+( ( ) 𝜑) = ∑∫𝑑𝜔|𝑬(𝜔,𝜑) ∙ 𝝁 𝛼 (𝜔)| (12a) + 𝛼 = |𝜇 2+| [cos 2(𝜓⁄2)cos2(𝜑⁄2) + sin2(𝜓⁄2)sin2(𝜑⁄2) (12b) −2cos(𝜓⁄2)sin(𝜓⁄2)cos(𝜑⁄2)sin(𝜑⁄2)] 𝟐 where we have defined the spectral overlap function |𝜇+| 2 = |𝐴(𝜔)|2 ∑ (𝛼) 𝛼 ∫𝑑𝜔 |𝜇+ (𝜔)| . Equation (12b) can be simplified using double angle formulas to yield |𝜇 2+| (13) 𝐼+(𝜑) = [1 + cos(𝜑 + 𝜓)] 2 48 Following a similar procedure for the anti-symmetric exciton contribution to the polarized signal [by substitution of Eq. (6) and Eq. (8b) into the second term on the right-hand side of Eq. (11)] leads to: |𝜇 |2− (14) 𝐼−(𝜑) = [1 − cos(𝜑 + 𝜓)] 2 𝟐 where |𝜇 2 2 (𝛼)−| = |𝐴(𝜔)| ∑𝛼 ∫𝑑𝜔 |𝜇− (𝜔)| . Combining Eqs. (13) and (14) gives the expression for the polarized single-molecule fluorescence intensity: |𝜇 |2+ + |𝜇−| 2 |𝜇 2+| − |𝜇−| 2 (15) 𝐼𝑃(𝜑) = 𝐼+(𝜑) + 𝐼−(𝜑) = {1 + [ ] cos(𝜑 + 𝜓)} 2 |𝜇 |2 + |𝜇 |2+ − = 𝑓{1 + 𝑣 cos(𝜑 + 𝜓)} In Eq. (15), 𝑓 = 1[|𝜇 2+| +|𝜇 |2− ] is the mean signal rate and 𝑣 = [|𝜇 2 2 2 2 2 + | − |𝜇−| ]⁄[|𝜇+| + |𝜇−| ] is the signal visibility. From Eq. (15), we see that the signal is a sum of two terms. The first term is the ‘instantaneous’ signal flux 𝑓, which is assumed to be constant during a particular measurement period, and the second (𝜑-dependent) term varies rapidly at the phase modulation frequency (𝜑 = Ω𝑡, Ω = 1 MHz). The modulated signal component is proportional to the visibility 𝑣, which is directly related to the local conformation of the exciton-coupled (iCy3)2 dimer probe. The signal phase 𝜓 represents the orientation of the dimer in the lab frame. Thus, to monitor the internal 49 conformational fluctuations of the (iCy3)2 dimer probe, it is necessary to isolate the modulated signal component 𝑣 from the signal flux 𝑓 subject to a well-defined phase 𝜓. Under conditions of high signal intensity, a linear detector, such as a photomultiplier tube (PMT), might be used in place of the APD illustrated in Fig. 2.3. The continuously varying signal described by Eq. (15) could then be measured using a conventional lock-in amplifier [69]. However, under the low-signal-flux conditions of the current work, Eq. (15) describes the probability that the APD detects a single photon at time t. Thus, at any instant, we consider each detection event to be a discrete sample of the signal phase probability distribution function, sampled by the Dirac delta function 𝛿(𝜑 − 𝜑𝑖), where the 𝜑𝑖 are random variables. In our single-molecule experiments, the mean signal flux (integrated over many seconds) was typically 𝑓̅ ≅ 8 – 10 kHz, so that on average only one detection event occurred per 100 – 125 modulation cycles. Therefore, to sample the low flux signal as efficiently as possible, we implemented the phase-tagged photon counting (PTPC) method [69]. In this approach, the electronic waveform used to drive the relative phase of the interferometer was synchronized to an 80 MHz digital counter (see Fig. 2.3). We thus collected a stream of sparsely sampled single- photon detection events in which we assigned to the nth detection event a ‘time stamp,’ 𝑡𝑛, with resolution Δ𝑡 = 1 𝜇s, and a ‘phase stamp,’ 𝜑𝑛, with resolution Δ𝜑 = 360° / 80 = 4.5°. In post-data- acquisition, we calculated the discrete signal intensity from the set of 𝑁 detected photons that were measured during an (adjustable) sampling window 𝑇𝑤 centered at a particular time 𝑡. 1 𝑡+(𝑇 𝑁 𝑤⁄2) (16) 𝐼𝑑𝑃 (𝜑, 𝑡) = ∫ ∑ 𝛿(𝜑 − 𝜑 ′ ′ 𝑇 𝑛 )𝛿(𝑡 − 𝑡𝑛) 𝑑𝑡 𝑤 𝑡−(𝑇𝑤⁄2) 𝑛=1 50 We isolated the modulated component of Eq. (16) by calculating the first term of the complex- valued Fourier series expansion with respect to 𝜑 [69]. 1 2𝜋 (17) 𝑍𝑃(𝑡) = ∫ 𝐼 𝑑 𝑃 (𝜑, 𝑡)𝑒 −𝑖𝜑𝑑𝜑 2𝜋 0 Substitution of Eq. (16) into Eq. (17) shows that the modulated signal component is the sum of 𝑁 single photon phase factors: 𝑁(𝑡) (18) 1 𝑍 (𝑡) = ∑ 𝑒−𝑖𝜑𝑛 = 𝑓(𝑡)𝑣(𝑡)𝑒−𝑖𝜓𝑃 𝑇𝑤 𝑛=1 The second equality of Eq. (18) relates the visibility to the modulated signal component according to 𝑣(𝑡) = ⌈𝑍𝑃(𝑡)⌉⁄𝑓(𝑡), where 𝑓(𝑡) = 𝑁(𝑡)⁄𝑇𝑤 is the instantaneous signal flux and 𝑁(𝑡) = 𝑡+(𝑇𝑤⁄2) ∫ ∑𝑁 𝛿(𝑡′𝑛=1 − 𝑡𝑛) 𝑑𝑡 ′ is the number of photons measured at time 𝑡 during the sampling 𝑡−(𝑇𝑤⁄2) window 𝑇𝑤. We note that the instantaneous flux can be obtained by averaging the signal [Eq. (11)] 2𝜋 over the phase variable: 𝑓 = 1 ∫ 𝐼𝑃(𝜑)𝑑𝜑 = 𝑁⁄𝑇2𝜋 0 𝑤. To demonstrate that the polarization state of the source laser is well described by Eq. (7), and that the phase (𝜑)-dependent signal behaves according to Eq. (11), we carried out control measurements on i) an isotropic solution of Rhodamine B in ethanol, and ii) on an anisotropically- stretched poly vinyl alcohol (PVA) film containing Cy3 (see Fig. 2.4). As expected, the 51 fluorescence signal of the anisotropic sample exhibited a pronounced 𝜑-dependent modulation, while no significant 𝜑-dependent modulation was observed from the isotropic sample. Figure 2.4 Control measurements of phase histograms from (A) an isotropic solution of Rhodamine B in ethanol, and (B – D) an anisotropically-stretched film of Cy3 supported in poly vinyl alcohol (PVA). For each of the panels, B – D, the uniaxial stretch direction and spot location of the PVA film was varied. Histograms were constructed from an average of 10 separate time series, each of ~45 s duration. For visualization, the histograms were constructed by re-binning the native 16,000 bin histograms into 79 bins, so that the resolution shown is course-grained to ~ 0.08 rad bin-1. The horizontal dashed lines indicate the maximum (top), mean (middle) and minimum (bottom) number of counts per bin. The signal was detected using the phase-tagged photon counting (PTPC) method [69]. Section 3.6. Characterization of conformational macrostates using probability distribution functions (PDFs) We monitored the local conformational fluctuations of the (iCy3)2 dimer-labeled ss-dsDNA constructs as reflected by measurements of the PS-SMF signal visibility, 𝑣(𝑡). We analyzed these data by constructing ensemble averaged probability distribution functions (PDFs) and multi-point 52 time-correlation functions (TCFs). In the current Chapter, we provide an overview of the statistical analysis approach which can yield these data surfaces. However, by eventually applying a kinetic network model analysis like the approach of Israels et al. [14], we may obtain from these data equilibrium distributions of macrostates and kinetic rate constants necessary to parameterize the free energy landscapes of these systems. A quantitative analysis of the free energy landscapes is the focus of Chapter 4. This section serves to introduce the core components needed for a kinetic network analysis. In order to perform a kinetic network analysis and uncover the free energy landscape which describes the observed thermodynamics, the relative population of conformational macrostates must be captured [14]. Here, we define macrostates (also referred to simply as ‘states’) as groups of configurations, or microstates, that produce similar macroscopic measurements of the observable of interest (i.e. the signal visibility). The microscopic configurations within a macrostate occupy the same free energy minima and are well separated from other macrostates free energy minima by transition energy barriers. We acknowledge that our macrostate assignments obscure interconversion between microstates of the system, which are likely occurring on the nanosecond timescale and faster, but we believe this analysis can capture essential mechanistic details despite the inherent simplifications. The experimentally derived histogram, which serves as the basis for the PDF in our analysis, is a sum over the instantaneous value of the system observable at a particular choice of integration time for all single molecule trajectories in a data set. We define the individual peaks in our experimental histogram using a sum of gaussian distributions, given by Eq. (19). 53 2 1 (𝑣 − 𝑣 ) (19) 𝑗 𝐴𝑗(𝑣) = exp (− ) 𝜎𝑗√2𝜋 2𝜎 2 𝑗 The choice of a gaussian to describe the probability of observing a particular value of the visibility within each conformational macrostate is of course the lowest order treatment for such a PDF. Alternative distributions could be used instead of a pure gaussian treatment, such that the effects of anharmonicity in the free energy landscape could be described. But such a treatment would require a greater level of detail than is presently available via the retrievable information in our single-molecule experiments. We set the center of each gaussian defined by Eq. (19) based on a two-step process. First a standard gaussian decomposition is used to establish reasonable boundaries for the allowed values of each peak’s center. Then second, the established boundaries are used in a nonlinear optimization to fit the experimentally derived data surfaces to a model, of which the set of visibilities defining each microstate is an output, as discussed in Chapters 3 and 4. We define the probability distribution of conformational macrostates as a sum over Boltzmann weighted peaks of the experimental observable after optimization of the gaussian peak centers and integrated area. 𝐺𝑗(𝑣) (20) 𝑝𝑗(𝑣) = exp (− )/𝑍 𝑘𝐵𝑇 The histogram reports on the probability distribution within each underlying macrostate. The Boltzmann relation, transforms the probabilities of each state to their corresponding free energy distributions, where 𝐺𝑗(𝑣) is the free energy as a function of the experimental observable, 𝐺𝑗(𝑣)𝑣, 𝑘𝐵 is Boltzmann’s constant, 𝑇 is the temperature and 𝑍 = Σ 𝑀 𝑗=1 exp (− ) is the partition 𝑘𝐵𝑇 54 function. Rearranging this expression to solve for the free energy gives the free energy minima as a function of our experimental observable (Eq. 21). 𝑒𝑞 𝐺𝑗(𝑣 ) 𝑝 1 2= − ln[𝑝𝑗(𝑣 )] = −𝑙𝑛 ( 𝑗 ) + 2 (𝑣 − 〈𝑣〉𝑗) (21) 𝑘𝐵𝑇 𝜎𝑗√2𝜋 2𝜎𝑗 Eq. (21) will serve as the central relationship in later Chapters to translate the outcomes of our analysis pipeline into thermodynamic parameters of interest. For the purpose of a pure signal discussion, we define the histogram of our observable, 𝐴, which could be the visibility, FRET or 2 (𝐴−⟨𝐴𝑗⟩) some other quantity, as 𝑃(𝐴) = Σ𝑁𝑗=1𝑝𝑗(𝐴) where 𝑝𝑗(𝐴) = 𝛼𝑗 exp(− 2 ) where 𝛼 =2𝜎 𝑗𝑗 𝑝𝑒𝑞𝑗 /𝜎𝑗√2𝜋, ⟨𝐴𝑗⟩ is the mean observable value, and 𝜎𝑗 is the standard deviation of the 𝑗 𝑡ℎ 𝑝𝑒𝑞distribution, as defined by Eq. 19. The area of each of these distributions corresponds to the 𝑗 𝑒𝑞 ∞ 𝑒𝑞 such that 𝑝𝑗 = ∫ 𝑝𝑗(𝐴)𝑑𝐴 = 𝐴𝑗𝜎𝑗√2𝜋 and that the sum over the 𝑝𝑗 distributions is unity, i.e., −∞ Σ𝑁 𝑝𝑒𝑞𝑗=1 𝑗 = 1. We require that 𝒑 𝒆𝒒 values resulting from the histogram fits must be consistent with the 𝒑𝒆𝒒 values we obtain from the TCF simulations, as discussed below. The final discussion point concerning experimental histograms is the choice of an integration time. The integration time sets the bin size used to sum up individual photons phase factors, 𝑇𝑤 in Eq. (18), to obtain an estimate of the polarized signal intensity. Under typical conditions of sampling from a fixed, singularly valued, distribution describing some system 1 observable, the standard error associated with the sampled distribution goes as , where 𝑛 is the √𝑛 number of independent sampling events performed on the underlying distribution. This standard situation suggests that sampling for as long as possible is optimal, to limit the standard error of the sampling to some minimum value. However, in the case of a dynamic, multimodal distribution, as 55 is the case for the single-molecule observables under consideration here, an optimal choice of integration time comes down to a compromise between the lower standard error associated with longer integration times and the loss of important features in the histogram to the effects of time averaging multiple distinct values together, a consequence of the central limit theorem. These effects, and others pertinent to the TCFs discussed later, have been nicely describe by Berg and coworkers [78] An illustrative example of various choices in integration time is shown below in Fig. 2.5. Figure 2.5 Model demonstration of variable integration time on the emergence of multimodal behavior and suppression of noise in the experimental histogram of a single molecule observable. Each curve displayed is the same underlying experimental data, integrated with using various bin sizes. The earliest choice of 10usec in this example yields highly discrete peaks at the limits of the domain due to single photon statistics, giving a poor estimate of the macrostates in the system. At the level of 10ms, a clear picture of the underlying PDF governing the macrostates emerges. In general, the choice of integration time depends on the data set and system under 56 consideration. The optimal choice will always be the fastest integration time which affords a robust estimate of the signal (SNR ~10). But given the variability in signal intensity, or photon flux, the choice must be made on a case-by-case basis. A panel of experimentally derived histograms, for a single PS-SMF example trace, at various integration times are shown in Fig. 2.6 as the corresponding example of non-idealized binning time effects. Figure 2.6 Experimental demonstration of variable integration time on the emergence of multimodal behavior and suppression of noise in the experimental histogram of a single molecule observable. Each panel is the same experimental data, integrated with using various bin sizes. Both the trajectory of the visibility and the resulting PDF are shown. Section 3.7 Characterization of conformational dynamics using multi-point time-correlation functions (TCFs) Time correlation functions are an invaluable tool for isolating meaningful fluctuations from stochastic noise in a time series. Here, we use them to extract the rates of interconversion between macrostates and characterize the kinetic information contained in our PS-SMF experiments using 57 two-point and three-point TCFs [13], [14], [70]. We define the time-dependent fluctuation of the visibility 𝛿𝑣(𝑡) = 𝑣(𝑡) − 〈𝑣〉, where 〈𝑣〉 is the mean value determined from a time series of measurements performed on a single molecule for all later discussions of PS-SMF data. For the discussion which presently follows, a general experimental observable 𝐴 will be used, which could be the visibility, FRET or any other equilibrium observable obtained from single molecule studies. The simplest TCF is the two-point time correlation function (Eq. 22), which is the average product of two successive measurements separated by the interval 𝜏: 𝐶̅(2)(𝜏) = 〈𝛿𝐴(𝜏)𝛿𝐴(0)〉 (22) In Eq. (22), the angle brackets indicate a running average over all possible initial measurement times. The function 𝐶̅(2) contains information about the number of quasi-stable macrostates of the system, the mean value of each macrostate, and the characteristic time scales of interconversion between macrostates [13], [56], [70]. If the signal rate is continuous, we can use the Weiner- Khinchin theorem to calculate the two-point TCF from the experimental data (Eq. 23), ( ) 1 𝐶 2?̅? (𝜏) = −1{| [ ( )]|2} 𝑁2 ℱ ℱ 𝛿𝐴 𝜏 (23) 𝑏𝑖𝑛𝑛𝑒𝑑 where 𝑁𝑏𝑖𝑛𝑛𝑒𝑑 is the number of counts per integration period and ℱ and ℱ −1are the fast Fourier transform and inverse fast Fourier transform, respectively. Otherwise, if the sampling rate is fast with respect to the frequency of signal events, producing a sparse time series, we use the average product of the discrete observable values (Eq. 24), 58 𝑁tot−1 (2) 1𝐶 (𝜏) = ∑ [𝛿A(𝑡𝑖)𝛿A(𝑡𝑖 + 𝜏)] (24) ?̅? 𝑁2𝑃(𝜏) 𝑖=0 where 𝑁2𝑃 is the number of two-point pair-wise products where both bins contain valid data, 𝑁𝑡𝑜𝑡, is the total number of possible pairs spaced apart by the delay 𝜏 and 𝑡𝑖 are the starting indexes. We note that in the limit that the signal becomes continuous the results of Eq. 23 are identical to Eq. 24. The analysis presented here is recommended for systems that contain dynamics with well separated timescales in the two-point TCF, such as the data presented in section 4. As discussed in detail in Chapter 3, section 3.1, the number of well separated decay components in the two-point TCF can be used to determine the minimum number of macrostates required to model the data with a kinetic network model. While the two-point TCF reports on direct state-to-state transitions, we also use higher- order TCFs to study the effects of multi-step kinetic pathways that influence the observed dynamics in our experiments. In previous studies [14], [56], [70], our group has previously applied four-point TCFs to calculate the average product of the signal fluctuation across four points separated by three-time intervals, 𝜏1 = 𝑡2 − 𝑡1, 𝜏2 = 𝑡3 − 𝑡2 and 𝜏3 = 𝑡4 − 𝑡3 (Eq. 25). 𝐶̅(4)(𝜏) = 〈𝛿𝐴(𝜏1 + 𝜏2 + 𝜏3) 𝛿𝐴(𝜏1 + 𝜏2)𝛿𝐴𝛿(𝜏1)𝐴(0)〉 (25) As before, the Weiner-Khinchin theorem can be used to calculate higher order TCFs for experiments with high enough acquisition rates to be considered continuous. However, for the studies discussed here, data sets are sparse at the chosen binning times, so we must use the discrete product form with 𝑁4𝑃 as the number of four-point products where all bins contain valid data (Eq 26). (4) 1𝐶?̅? (𝜏1, 𝜏2, 𝜏3) = ∑ 𝑁tot−1 𝑖=0 [𝛿A(𝑡𝑖)𝛿A(𝑡𝑖 + 𝜏1)𝛿𝐴(𝑡𝑖 + 𝜏1 + 𝜏2)𝛿𝐴(𝑡 + 𝜏 + 𝜏 + 𝜏 )] (26) 𝑁4𝑃(𝜏 ,𝜏 ,𝜏 𝑖 1 2 31 2 3) 59 Our applications of four-point time correlation functions have typically allowed the 𝜏2 interval to be zero, reducing Eq 25 to: 𝐶̅(4)(𝜏1, 𝜏2 = 0, 𝜏3) = ⟨𝛿𝐴(𝜏 2 1 + 𝜏3)𝛿𝐴(𝜏1) 𝛿𝐴(0)⟩ (27) Recently, Berg and coworkers showed that when performing these TCF calculations, the effect of noise must be considered [78]. Here, we define our signal fluctuation as 𝛿?̅?(𝑡) = 𝐴(𝑡) + 𝜖(𝑡) + 𝑏 − ⟨𝐴(𝑡)⟩ − ⟨𝜖(𝑡)⟩ − ⟨𝑏⟩ ≈ 𝛿𝐴(𝑡) + 𝜖(𝑡) where 𝐴(𝑡) is our observable of interest, 𝜖(𝑡) is time dependent noise (e.g., gaussian distributed dark counts, errors in phase and time tagging electronics, etc.) 𝑏 is some constant background and ⟨… ⟩ indicate the time average. We note that the time dependent noise overall averages to zero, ⟨𝜖⟩ = 0, it is uncorrelated at different times, ⟨𝜖(𝑡)𝜖(𝑡′)⟩ = 0 where 𝑡 ≠ 𝑡′, and it is uncorrelated with the signal ⟨𝐴(𝑡)𝜖(𝑡′)⟩ = 0 for all 𝑡, 𝑡′. However, the time dependent noise is correlated with itself at the same instant in time, i.e., ⟨𝜖(𝑡)𝜖(𝑡)⟩ = 𝜎2𝜖 ≠ 0. Plugging 𝛿?̅?(𝑡) into Eq 27 and simplifying produces Eq. 28: 𝐶̅(4)(𝜏1, 𝜏2 = 0, 𝜏3) = ⟨𝛿𝐴(𝜏 2 2 1 + 𝜏3)𝛿𝐴(𝜏1) 𝛿𝐴(0)⟩ + 𝜎𝜖 ⟨𝛿𝐴(𝜏1 + 𝜏3)𝛿𝐴(0)⟩ + 𝜎 2 𝜖 𝛿(𝜏1)𝜎 2 𝜖 𝛿(𝜏3) (28) where 𝜎2𝜖 = ⟨𝜖(𝑡)⟩ 2, and 𝛿(𝑡) is the Kroneker Delta function. The first term in Eq. 28 is the signal correlation we are interested in. The third term is a noise term that is only nonzero when 𝜏1 and 𝜏3 are zero, at the first point of the TCF. But the second term is a contribution to the higher-ordered correlation which is proportional to the variance of the time dependent noise in the system. Thus, when using a four-point time correlation function, with one delay set to zero, there is a non- negligible component of correlated time dependent noise. 60 We resolved this problem by switching our analysis to calculating three-point TCFs (Eq. 29), instead of four-point TCFs with one delay at zero (Eq. 27). The three-point TCF is the time- averaged product of three consecutive measurements separated by the intervals 𝜏1 and 𝜏2. 𝐶̅(3)(𝜏1, 𝜏2) = ⟨𝛿𝐴(𝜏1 + 𝜏2)𝛿𝐴(𝜏1)𝛿𝐴(0)⟩ (29) 𝑁tot−1 1 = ∑ [𝛿A(𝑡𝑖)𝛿A(𝑡𝑖 + 𝜏1)𝛿A(𝑡𝑖 + 𝜏1 + 𝜏 )] 𝑁 23𝑃(𝜏) 𝑖=0 This function is sensitive to the roles of intermediates, like four-point TCF, whose presence can facilitate or hinder successive transitions between macrostates. However, unlike even-moment TCFs, the three-point TCF, 𝐶̅(3), contains no underlying noise-related background term that could potentially obscure the experimentally derived surface. 𝐶̅(3)(𝜏1, 𝜏2) does not result in a correlation of the time dependent noise when 𝛿?̅?(𝑡) is plugged into Eq. 29. Instead, we are left with the three- point correlation of interest and the noise term, which vanishes after the 𝜏1 = 𝜏3 = 0 point, thus avoiding the issues with the 𝐶̅(4)(𝜏1, 𝜏2 = 0, 𝜏3) [71]. This is given below by Eq. 30. Going forward we proceed with the three-point TCF to study higher order correlations in single-molecule time series. 𝐶̅(3)(𝜏1, 𝜏2) = ⟨𝛿𝐴(𝜏1 + 𝜏2)𝛿𝐴(𝜏1)𝛿𝐴(0)⟩ + 𝜎 2 𝐸𝛿(𝜏 2 1)𝜎𝐸𝛿(𝜏2) (30) A notable feature of the three-point correlation function is its inherently higher degree of noise compared with the two-point correlation function. This is due to the reduced number of available products of the signal observable at three unique points in time, compared to two unique points in time. Due to the limiting SNR of the three-point TCF, decisions about minimal integration times in experimental data sets should be based on the availability of an analyzable surface, for kinetic network modeling, in the case of the three-point TCF. 61 In the analysis, which is presented in Chapter 2, section 4, we constructed our time- dependent measurements of the PS-SMF visibility the PDF, 𝑃(𝑣), using the sampling window 𝑇𝑤 = 10 ms. We chose this value for 𝑇𝑤 because it is roughly equal to the time required for the signal- to-noise ratio (S/N) to acquire a value of ~10. In addition, we calculated two-point and three-point TCFs, 𝐶̅(2) and 𝐶̅(3), respectively, using the sampling window 𝑇𝑤 = 250 𝜇s. Typically, ~150 – 250 individual single-molecule data sets were combined to construct each of the experimentally- derived statistical functions. While TCFs are a powerful tool for extracting dynamics from experimental time series, there are some conditions that need to be satisfied. First, the main source of information should be obtained from deviations in the observable of interest from the mean, i.e., fluctuations, 𝛿𝐴. The analysis could be applied to pure observable values, but we expect the sensitivity to decrease as the dynamic range of the observable correlation will be greatly reduced in the cases where the dynamic fluctuation is small compared to the average. Additionally, these fluctuations need to come about in an experiment where the system and surroundings are in equilibrium. Stopped-flow kinetic experiments and force-pulling single molecule measurements would not typically meet these criteria, expect during periods of constant solvent environment or net zero force applied, respectively [15], [23] Another consideration is the balance between the fluorescence lifetime and quantum yield of the molecular probe and the measurement speed of the instrument. The instrument needs to have a sampling rate fast enough to capture the dynamics of interest, but a point of diminishing returns is reached when the sampling rate well surpasses the flux from the fluorescent probe, such that the desired integration periods are mostly empty or poorly resolved. Brighter dye molecules provide more frequent sampling of the state of the system and, ultimately, better signal-to-noise of fast 62 dynamics can be achieved when in combination with fast measurement speed. The importance of the measurement speed and fluorescent flux needs to be balanced based on experimental constraints and the timescales of the relevant dynamics. Ultimately, time correlation functions can be an excellent way to study the dynamics of single-molecule data from an experiment designed for an informative fluctuating observable, a sufficiently bright fluorescent probe and correspondingly fast measurement capabilities. Section 3.8. Consideration of fluorescence correlation spectroscopy (FCS) as a reporter on conformational dynamics of (iCy3)2 dimer-labeled single-stranded (ss) – double-stranded (ds) DNA fork constructs A commonly employed approach to investigating macromolecules and their dynamics is to study correlations in the pure fluorescence intensity from a single molecule, known as fluorescence correlation spectroscopy. While successful at determining the concentration, diffusive properties, hydrodynamics radius and in some limited cases molecular interactions, the ability to parse generic photo-physics from true molecularly induced changes to the intensity remain difficult [79]. This section will discuss key features of the visibility signal, 𝑣(𝑡), versus the pure signal rate, or 2𝜋 intensity, 𝑓 = 1 ∫ 𝐼𝑃(𝜑)𝑑𝜑 = 𝑁⁄𝑇𝑤, and their respective abilities to act as a reporter on the 2𝜋 0 internal conformational changes of ss-dsDNA junctions. It is natural to think that fluctuations in the signal flux might also be a reporter on the internal conformations of dsDNA given that the spectral overlap terms determining the overall fluorescence intensity of a single Cy3 dimer are also a function of the dimer’s geometry. 63 Figure 2.7 PS-SMF probability distribution functions (PDFs) for the +1 (iCy3)2 dimer-labeled ss-dsDNA fork construct. Histograms were constructed from the raw photon data streams for the signal flux (left column), signal visibility (middle column) and signal phase (right column). The mean signal flux is 𝑓̅ = 7,500 s-1. Comparisons are shown varying the integration window (A) 𝑇𝑤 = 10 ms, (B) 𝑇𝑤 = 100 ms and (C) 𝑇𝑤 = 500 ms. The corresponding S/N = √?̅? = √𝑓̅𝑇𝑤 = 8.7, 27 and 61, respectively. 64 Figure 2.8 PS-SMF probability distribution functions (PDFs) for the +1 iCy3 monomer-labeled ss-dsDNA fork construct. Panels are the same as described in Fig. 2.7. As discussed in section 3.6 and 3.7, our approach to analyzing and interpreting dynamics in macromolecular assemblies relies on the ability to construct a PDF and calculate TCFs. The need for a well-behaved PDF shouldn’t be understated, given the often- complicated deconvolution process which can influence the centers and number of macrostate assignments within a dataset. Figures 2.7 and 2.8 demonstrate the effects of variable integration time for both a Cy3 dimer and Cy3 monomer, which is an important factor in the 65 SNR and dynamical resolution of the PDF, for three separate signal quantities: the flux, the visibility, and the phase. For both the Cy3 dimer and Cy3 monomer cases, the flux and visibility appear reasonably well behaved at the lower integration time limit of 10ms. However, as the integration time is increased, the visibility nicely converges to a distinct set of values with well defined averages, while the flux begins to take on multi-modal characteristics which neither converge to a well-defined average nor appreciably decrease in their standard deviation. The width of the phase distribution in all cases tracks with the standard deviation of the visibility as it is an internal measure of the visibility’s certainty. From a pure PDF perspective, if one were to use the signal flux or intensity to establish a discrete set of observable values which correspond to distinct conformational macrostates within the system, then both the number of states assigned as well as their central averages would be a function of integration time. This poses an immense challenge in the configuration of any analysis protocol given the strong variability of model outcome with variable integration time. The visibility signal demonstrably does not suffer from this same issue and is therefore far better suited for analysis approaches which attempt to leverage the discretization of conformational macrostates within biophysical systems. This is owed in part to the visibility being photon number normalized, such that it is impervious to fluctuations in intensity arising from purely photophysical phenomena as well as the variability in average intensity which occurs within and across data sets due to instrumental alignment, laser stability and optical clarity of the sample. An additionally important consideration in the use of any single molecule observable is, whether systematic sources of random or correlated noise persist on the timescales used to investigate single molecules. In order to address this possibility, control experiments can be 66 performed for a variety of samples that capture the potential dependence of any signal observable on the lab environment or instrumentation. The use of TCFs to determine the timescales and relative magnitude of spurious correlations in the experimental apparatus is a useful tool for comparing differing signal quantities. A set of TCFs examining the correlations in both the visibility signal and signal flux are shown in figures 2.9 and 2.10. Figure panels 2.9, A-D and 2.9, A-D provide the experimental basis for the inclusion of a control time scale in our single molecule polarization sweep measurements. Two control samples were measured, a stretched film of Cy3 embedded in a soft polymer matrix and a solution of rhodamine in methanol. The stretched film is an anisotropic sample, displaying a significant collective orientation of individual Cy3 transition dipole moments within the film, as illustrated in Fig. 2.4. The completely isotropic rhodamine sample provides a check on the electronics and phase tagging method, as any non-zero visibility or correlations within the visibility ought to be absent in this case. The photon streams from both samples were separately recorded and time correlation functions were calculated from those data streams. What can be seen immediately from examination of plots A and B of Fig. 2.9 and 2.10 is that only the Cy3 film displays any correlation in the signal visibility, with an exponential decay timescale on the order of 2-4 seconds. Additionally, this correlation arises only at the limit of long binning time, suggesting that the spurious correlations in the signal visibility are only on slow timescales and not affecting the faster dynamics observed in dsDNA studies. As stated in the main text, we attribute this slow correlation timescale to instrumental drift due to room vibrations. In the case of rhodamine, no visibility correlations are present at either the short or long binning times examined, suggesting that any pure contribution from phase-tagging electronics to the visibility or its correlations are not present. 67 Figure 2.9 PS-SMF time correlation functions (TCFs) for an anisotropic Cy3-film, both the visibility and flux TCF are calculated and shown in panels A-D. Integration times of 100usec and 10ms are examined for both the visibility and flux case. 68 Figure 2.10 PS-SMF time correlation functions (TCFs) for an isotropic rhodamine sample, both the visibility and flux TCF are calculated and shown in panels A-D. Integration times of 100usec and 10ms are examined for both the visibility and flux case. Two additional points should be made regarding these control studies. The first is that the correlations in the flux TCF for either the Cy3 film or rhodamine contain multiple timescales that do not arise in the visibility signal and cannot be easily deconvolved from the relevant timescales 69 observed in single molecule data. This presents a strong argument against attempting the kinetic network analysis outlined in this work using only the time dependent flux of the molecule. Secondly, the overall magnitude of the correlations seen in the visibility signal from the Cy3 film are on the order of 10−5 whereas the correlations seen from single molecule samples are typically on the order of 10−4 − 10−3, demonstrating that this contribution from instrument drift is at most a full order of magnitude less than the correlations arising from conformational fluctuations of single molecules. Taken all together, it is clear that both the PDFs and TCFs of the signal flux are plagued by a host of issues when using the PS-SMF methodology, despite the potential relationship between signal flux and internal geometry of dsDNA reported on by a single Cy3 dimer. Given the longstanding approach of using FCS to study molecular interactions, we believe that the visibility signal in our PS-SMF is a more faithful reporter of the conformational fluctuations in dsDNA and can be used without significant complications, unlike the signal flux, in the kinetic network analysis discussed in Chapter 3. 4. RESULTS AND DISCUSSION Section 4.1. Analysis of PS-SMF spectroscopic signals. For each of the iCy3 monomer and (iCy3)2 dimer-labeled ss-dsDNA constructs that we studied (see Table 1), we recorded PS-SMF data streams for a typical duration of ~30 – 50 s. In Fig. 2.11, we present examples of PS-SMF signal trajectories for the +1 (iCy3)2 dimer- (see Figs. 2.11, A – C) and the +1 iCy3 monomer-labeled ss-dsDNA fork construct (Figs. 2.11, D – F) using the integration window 𝑇𝑤 = 100 ms. In these examples, the mean flux is 𝑓̅ ≈ 7,500 s -1 for the 70 (iCy3)2 dimer and 𝑓̅ ≈ 6,000 s-1 for the iCy3 monomer. We note that the apparent ‘noise’ in the signal trajectories is due to the measurement uncertainty associated with the finite number of photons detected at these low flux levels during the integration window [69]. Thus, the mean signal-to-noise ratio is S/N = √𝑓?̅?𝑤 ≈ √750 ≈ 27.4 for the trajectory of the (iCy3)2 dimer-labeled construct, and S/N ≈ 24.5 for the trajectory of the iCy3 monomer-labeled construct. The three signal components are the instantaneous flux, 𝑓(𝑡) = 𝑁(𝑡)⁄𝑇𝑤 (Figs. 2.11 A and D), the phase, 𝜓(𝑡) (Figs. 2.11 B and E) and the visibility, 𝑣(𝑡) = ⌈𝑍𝑃(𝑡)⌉⁄𝑓(𝑡) (Figs. 2.11 C and F), which are described by Eqs. (16) – (18), respectively. The flux, 𝑓(𝑡), is equivalent to the fluorescence intensity, a quantity that is independent of the laser polarization since it is averaged over the polarization phase variable, as discussed in the previous section. We note that fluctuations of the flux reflect changes in the (iCy3)2 dimer probe’s local environment, which influence excited state deactivation pathways. The phase, 𝜓(𝑡), is shown ‘unwrapped’ with its mean value set (arbitrarily) to ?̅? = 0. The phase represents the orientation of the major axis of the (iCy3)2 dimer ‘polarization ellipse’ in the laboratory frame (see Fig. 2.2 C), which undergoes intermittent jumps about its mean value (?̅? = 0) over the duration of the trajectory. We note that most of these phase jumps appear to traverse a full 360° revolution, which we attribute to mechanical room vibrations. 71 Figure 2.11 Example PS-SMF signal trajectories using the integration window 𝑻𝒘 = 100 ms for the (A – C) +1 (iCy3)2 dimer- and (D – F) +1 iCy3 monomer-labeled ss-dsDNA fork constructs. Experiments were performed using 100 mM NaCl and 6 mM MgCl2. For the +1 (iCy3)2 dimer-labeled construct the photon data stream was recorded with a mean signal flux ?̅? = ~7,500 s-1, corresponding to S/N ≈ 27.4. For the +1 iCy3 monomer-labeled construct the mean signal flux was ?̅? = ~6,000 s-1, corresponding to S/N ≈ 24.5. (A, D) The instantaneous signal flux, 𝒇(𝒕) = 𝑵(𝒕)⁄𝑻𝒘, is shown in blue. (B, E) The instantaneous signal phase, 𝝍, is shown in red, ‘unwrapped’ with its mean value set equal to zero, ?̅? = 0. (C, F) The signal visibility, 𝒗(𝒕), is plotted in green. Horizontal dashed lines are shown as a guide to the eye to indicate the presence of multiple discrete conformational states for the (iCy3)2 dimer-labeled ss-dsDNA construct (panel C), while only a single discrete state is observed for the iCy3 monomer- labeled ss-dsDNA construct (panel F). From Fig. 2.11C, we see that the visibility trajectory for the +1 (iCy3)2 dimer-labeled ss- dsDNA construct undergoes discontinuous transitions between a small number of discrete values 72 within the range 0 < 𝑣 < 0.4 (dashed lines are guides to the eye). In this case, the visibility is a direct measure of the internal conformation changes of the (iCy3)2 dimer probe, as discussed in Sect. 3.4 [see Eq. (11)]. In contrast, the visibility trajectory for the +1 iCy3 monomer-labeled ss- dsDNA construct shown in Fig. 2.11F exhibits relatively uniform fluctuations about a single discrete value, 𝑣 ~0.08. Here, the visibility measures only the mean projection of the iCy3 monomer’s ‘polarization ellipse’ within the transverse plane of the polarized laser beam (Fig. 2.2C), which remains relatively constant over the duration of the trajectory. For example, were the EDTM of the iCy3 monomer to be oriented orthogonally to the laser polarization, the visibility would be zero. In principle, the visibility can assume values within the continuous range -1 < 𝑣 < +1, with sign inversion corresponding to a 180° phase shift. Nevertheless, the PS-SMF experiment cannot distinguish between negative and positive values of the visibility, so that in practice we measure the absolute value |𝑣|. From the PS-SMF data streams, we constructed probability distribution functions (PDFs) for different values of the integration time window, 𝑇𝑤. In Fig. 2.12, we compare PDFs of the flux, 𝑓 (left column), visibility, 𝑣 (middle), and phase, 𝜓 (right), for the +1 (iCy3)2 dimer-labeled ss- dsDNA construct. Additional PDFs for the +1 (iCy3)2 dimer- and the +1 iCy3 monomer-labeled ss-dsDNA constructs using 𝑇𝑤 = 500 ms are shown in Figs. 2.7 and 2.8, respectively. 73 Figure 2.12 PDFs of the flux (left column), visibility (middle) and phase (right) for the +1 (iCy3)2 dimer-labeled ss-dsDNA fork construct. PDFs were constructed from the raw photon data streams (see, e.g., Fig. 2.10). The mean signal flux is 𝒇 = 8,000 s-1. Comparisons are shown varying the integration window (A) 𝑻𝒘 = 10 ms and (B) 𝑻𝒘 = 100 ms. The corresponding S/N = √?̅? = √𝒇𝑻𝒘 = 8.9 and 28, respectively. For all the samples that we studied, the visibility PDFs are asymmetrically distributed and bounded within the range 0 < 𝑣 < 0.5, with mean value ?̅? = ~0.1. The widths of the visibility PDFs narrow with increasing 𝑇𝑤, revealing the presence of distinct features. The effect of increasing 𝑇𝑤 is to average over random noise associated with the sparsely detected (photon counting) signal. However, choosing too large of an integration period can potentially lead to additional, undesirable narrowing of the PDFs by averaging over the state-to-state interconversion events that we wish to monitor and study [78]. It is therefore important to choose a value of 𝑇𝑤 that is large enough to ensure that S/N ≳ 10, but small enough to retain the necessary time resolution to monitor the relevant kinetics. Since the mean flux in our experiments is typically 𝑓̅ = 8,000 s-1, a suitable 74 integration period is 𝑇𝑤 = 10 ms, which corresponds to S/N = √𝑓̅𝑇𝑤 ≅ 8.9. However, we find 𝑇𝑤 = 100 ms to be useful for display purposes. Like the visibility, the phase PDFs narrow with increasing 𝑇𝑤, indicating a singular value of the phase that defines the visibility over the integration time window (Fig. 2.12, right column). However, the flux PDFs do not converge to well-behaved distributions, but rather exhibit increasingly complex structure with increasing 𝑇𝑤 (Fig. 2.12, left column). While the instantaneous flux depends on multiple environmental factors, which influence chromophore absorbance and fluorescence efficiency (e.g., probe orientation and excited electronic state dynamics), the normalized visibility depends primarily on the relative strengths and orientations of the probe EDTMs. In Fig. 2.13, we compare the visibility PDF of the +1 (iCy3)2 dimer-labeled construct to the +1 iCy3 monomer-labeled construct. For the +1 monomer, the PDF exhibits a single ‘low visibility’ feature in the range 0 ≲ 𝑣 ≲ 0.1 (see Figs. 2.13 B and D), which results from the projection of the monomer EDTM onto the plane of the rotating laser polarization. This contrasts with the +1 dimer-labeled construct, which exhibits – in addition to a dominant low visibility feature – a sparsely populated ‘high visibility’ feature in the range 0.1 ≲ 𝑣 ≲ 0.4 (see Figs. 2.13 A and C). The PDF for the dimer appears to resolve in multiple underlying features when the integration window is increased to 𝑇𝑤 = 500 ms (see Figs. 2.7 and 2.8). The broadly distributed features that we observe for the (iCy3)2 dimer-labeled ss-dsDNA constructs indicate the presence of both stable and thermally activated local conformations of the dimer probes. Transitions between low and high visibility macrostates are evident in the trajectories for the (iCy3)2 dimer- labeled ss-dsDNA construct, as shown in Fig. 2.11 C, but not for the iCy3 monomer-labeled ss- dsDNA construct (Fig. 2.11 F). The above observations suggest that the (iCy3)2 dimer probes 75 interconvert between a small number of local conformations, which likely depend on the relative stabilities and dynamics of the bases and sugar-phosphate backbones immediately adjacent to the (iCy3)2 dimer probes. Furthermore, these results agree with recent ensemble studies of (iCy3)2 dimer-labeled ss-dsDNA constructs, which concluded that only a small number of local conformational macrostates of the (iCy3)2 dimer are populated under room temperature and physiological salt conditions due to steric restrictions of the surrounding nucleic acid bases and sugar-phosphate backbones [24]. Figure 2.13 Probability distribution functions (PDFs) of the visibility for (A, C) the +1 (iCy3)2 dimer-labeled ss- dsDNA construct, and (B, E) the +1 iCy3 monomer-labeled ss-dsDNA construct. (A, B) 𝑻𝒘 = 10 ms. (C, D) 𝑻𝒘 = 100 ms. To determine whether a unique correspondence exists between the instantaneous flux and visibility, we examined the bivariate PDFs for the +1 (iCy3)2 dimer- and the +1 iCy3 monomer- 76 labeled ss-dsDNA constructs. In Fig. 2.14, we plot the bivariate distributions for these constructs as two-dimensional contour diagrams. The bivariate PDF for the +1 (iCy3)2 dimer-labeled ss- dsDNA construct (Fig. 2.14 A) shows that low visibility conformations (with 0 ≲ 𝑣 ≲ 0.1) correspond to a broad range of flux values, while high visibility states (with 0.1 ≲ 𝑣 ≲ 0.4) correspond to relatively low flux values. Similarly, the low visibility states of the iCy3 monomer- labeled ss-dsDNA construct (Fig. 2.14 B) correspond to a broad range of flux values. Thus, there does not appear to be a unique correspondence between flux and visibility values, which suggests that the visibility signal alone, and not the flux, contains information that can be interpreted in terms of the local conformational fluctuations of the iCy3 monomer and (iCy3)2 dimer probes. This conclusion is also supported by the earlier discussion of section 3.8 which detailed the comparison of visibility and flux on the basis of the PDFs and TCFs of differing control samples. Figure 2.14 Joint PDFs of the flux and visibility signals (A) the +1 (iCy3)2 dimer-labeled ss-dsDNA construct and (B) the +1 iCy3 monomer-labeled ss-dsDNA construct using 𝑇𝑤 = 100 ms. 77 We next examined how the PS-SMF signals report on the dynamics of the site-specifically- labeled positions within ss-dsDNA fork constructs. In Fig. 2.15, we plot separately the two-point TCFs of the visibility and the flux for the +1 iCy3 monomer- and the +1 (iCy3)2 dimer-labeled ss- dsDNA constructs. We present these data using 𝑇𝑤 = 250 μs alongside model fits to multi- exponential functions, ∑𝑖 𝛼𝑖exp(−𝜏⁄𝑡𝑖), where the number of decay components for the dimer is 4 and the number of decay components for the monomer is 3. The values that we obtained for the fitting parameters are given in Table 2. Although the time dependencies of the visibility and flux TCFs are qualitatively similar, we found that the decay components of the flux TCFs are generally faster than those of the visibility for all the ss-dsDNA constructs that we studied. Thus, although the dynamic processes that give rise to fluctuations of the flux and visibility signals are related to one another, they are clearly not identical. From Figs. 2.15 A and 2.15 B, we see that the decay of the +1 iCy3 monomer-labeled ss- dsDNA construct is significantly faster than the +1 (iCy3)2 dimer-labeled construct. The decay of the monomer-labeled construct is dominated by two relatively fast components (𝑡1 ≲ 0.5 ms, 𝑡2 ≲ 4 ms), and exhibits a slowly-decaying, low amplitude, ‘baseline’ with 𝑡3 ≲1.2 s and amplitude 𝐴3 ~0.03. In comparison, the (iCy3)2 dimer-labeled construct exhibits much slower relaxation behavior with four well-separated decay components: 𝑡1 ~0.28 ms, 𝑡2 ~2.9 ms, 𝑡3 ~44 ms and 𝑡4 ~2.5 s. We note that the slowest time scale component (on the order of seconds) is present in all our data sets. In our control measurements and previous studies [14] we attributed this slow, low- amplitude process to room vibrations. 78 Figure 2.15 Two-point time correlation functions (TCFs) of (A, C) the visibility, 〈𝜹𝒗(𝝉)𝜹𝒗(𝟎)〉, and (B, D) the flux, 〈𝜹𝒇(𝝉)𝜹𝒇(𝟎)〉, for the +1 iCy3 monomer-labeled ss-dsDNA construct and the +1 (iCy3)2 dimer-labeled ss- dsDNA construct. (A) Visibility and (B) flux TCFs, respectively, using the integration time window 𝑻𝒘 = 250 μs. Solid dashed curves are multi-exponential fits to the data with fitting parameters given in Table 2. (C) Visibility and (D) flux TCFs, respectively, of the (iCy3)2 dimer-labeled ss-dsDNA construct as a function of 𝑻𝒘. Note that all TCFs and fits are shown normalized to the point 𝝉 = 250 μs. Our finding that the two-point TCF for the +1 iCy3 monomer-labeled ss-dsDNA construct exhibits a relatively fast decay is consistent with ensemble spectroscopic measurements [24]–[26], from which we concluded that the local environment immediately surrounding the iCy3 monomer probe is structurally disordered. We speculated that the disorder is due to misalignment of complementary Watson-Crick base pairing, which is induced by the presence of the iCy3 monomer in one strand and a thymidine spacer at the opposite position in the complementary strand. In contrast, the relatively slow decay of the +1 (iCy3)2 dimer-labeled ss-dsDNA construct suggests that the local environment immediately surrounding the (iCy3)2 dimer probes is relatively well 79 ordered at room temperature and under physiological buffer salt conditions, which is also in agreement with results of our prior studies [24]. Table 2 Multi-exponential fit parameters for the flux and visibility 2-point TCFs of the iCy3 monomer- and (iCy3)2 dimer-labeled ss-dsDNA constructs. Note that the TCFs and fits are normalized to the point 𝜏 = 250 μs. DNA 𝐴1 𝑡1 𝐴2 𝑡2 𝐴3 𝑡3 𝐴4 𝑡4 construct (μs) (ms) (ms) (s) visibility two-point TCFs +1 0.8 275 0.18 2.9 0.09 43.8 0.31 2.5 (iCy3)2 dimer +1 iCy3 0.98 493 0.32 4.2 0.03 1,200 − − monomer flux two-point TCFs +1 0.92 265 0.17 3.9 0.09 56.7 0.32 2.7 (iCy3)2 dimer +1 iCy3 0.99 373 0.45 3.0 0.03 1,300 − − monomer We next considered the effects of varying the integration window, 𝑇𝑤, on the TCFs of the +1 (iCy3)2 dimer-labeled ss-dsDNA construct (see Figs. 2.15 C and 2.15 D). On the shortest time scale (0.25 ms < 𝜏 < 10 ms), the flux and visibility TCFs exhibit similar decays for all values of 𝑇𝑤. However, for larger values of 𝑇𝑤 (= 10 ms, 100 ms), the decay of the visibility TCFs at longer times (10 ms < 𝜏 < 100 ms) is faster than those using small values of 𝑇𝑤 (= 250 μs, 1 ms, see Fig. 2.15 C). In contrast, the long-time decay behavior of the flux TCFs do not vary with 𝑇𝑤 (Fig. 2.15 80 D). Thus, the conformational dynamics of the (iCy3)2 dimer-labeled ss-dsDNA constructs on time scales less than 10 ms appear to be reflected by fluctuations of both the flux and visibility signals, while the dynamics on time scales greater than 10 ms are most accurately reflected by fluctuations of the visibility alone. In the calculations that follow, we constructed two-point TCFs of the visibility by stitching together the decays using 𝑇𝑤 = 250 μs over the range 250 μs – 25 ms, and using 𝑇𝑤 = 10 ms over the range 25 ms – 2.5 s. Section 4.2. Studies of local DNA breathing of +1, -1 and -2 (iCy3)2 dimer-labeled ss-dsDNA fork constructs. Having established protocols for constructing experimentally-derived functions of the visibility (i.e., the PDFs, and the two-point and three-point TCFs), we next examined the sensitivity of these functions to varying (iCy3)2 dimer probe position relative to the ss-dsDNA fork junction. In Figs. 2.16 A – C, we present results for the +1, -1 and -2 (iCy3)2 dimer- labeled ss-dsDNA constructs, which were obtained at room temperature (23°C) and physiological buffer salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM). The left column shows the visibility PDFs, 𝑃(𝑣), the middle column shows the two-point TCFs, 𝐶̅(2)(𝜏), which are overlaid with multi-exponential fits (parameters listed in Table 3), and the right column shows contour plots of the three-point TCFs, 𝐶̅(3)(𝜏1, 𝜏2). Like the PDF of the +1 (iCy3)2 dimer- labeled ss-dsDNA construct (discussed above and shown in Fig. 2.16 A), the PDFs of the -1 and - 2 dimer-labeled constructs exhibit a major ‘low visibility’ feature in the region 0 ≲ 𝑣 ≲ 0.1, and minor ‘high visibility’ features in the region 0.1 ≲ 𝑣 ≲ 0.4. However, the relative weights of the ‘high visibility’ features depend on (iCy3)2 dimer probe labeling position relative to the ss-dsDNA fork junction with ‘high visibility’ features notably less prominent for the -1 construct in 81 comparison to the +1 and -2 constructs. Furthermore, the PDF of the -2 construct is significantly different than that of the +1 construct, as expected given the differential stabilities of stacked bases within dsDNA versus ssDNA regions spanning the ss-dsDNA fork junction. In the second and third columns of Fig. 2.16, respectively, we present the two-point and three-point TCFs, which characterize the conformational dynamics of the (iCy3)2 dimer-labeled ss-dsDNA constructs as a function of probe labeling position. The two-point TCFs are shown overlaid with multi-exponential fits (dashed red curves) whose parameters are listed in Table 3. For each of the (iCy3)2 dimer-labeled ss-dsDNA constructs, the four decay components are well- separated in time: 𝑡1 ~0.3 – 0.6 ms, 𝑡2 ~2 – 4 ms, 𝑡3 ~50 - 200 ms and 𝑡4 ~1 – 2 s. As mentioned previously, the seconds-long decay 𝑡4 is due to mechanical room vibrations. We thus find that there are three relevant, well-separated decay components present in all the (iCy3)2 dimer-labeled ss- dsDNA constructs that we studied under physiological salt conditions and for most elevated and reduced salt conditions (see Table 3). 82 Figure 2.16 Probability distribution functions (PDFs, left column), two-point time-correlation functions (TCFs, middle column) and three-point TCFs (left column) of the PS-SMF visibility for the (A) +1, (B) -1 and (C) -2 (iCy3)2 dimer-labeled ss-dsDNA fork constructs. The integration windows used for the PDFs and three-point TCFs are indicated in the insets. The two-point TCFs were stitched together over the range 250 μs – 25 ms using 𝑇𝑤 = 250 μs, and from 25 ms – 2.5 s using 𝑇𝑤 = 10 ms. The two-point TCFs are shown overlaid with multi- exponential fits (dashed red curves) whose parameters are listed in Table 3. The three-point TCFs are plotted as two-dimensional contour diagrams. Experiments were performed at room temperature (23 °C) and physiological buffer salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM). From the two-point TCFs of Fig. 2.16, we see that the + 1 construct exhibits the slowest overall decay (Fig. 2.16 A). As the position of the (iCy3)2 dimer is varied across the ss-dsDNA junction from +1 to -1 (see Fig. 2.15 B), and again from -1 to -2 (Fig. 2.16 C), the relaxation dynamics become faster. These observations are consistent with the notion that the conformational motions of the (iCy3)2 dimer probe depend on the stabilities and dynamics of the DNA bases and sugar-phosphate backbones immediately adjacent to the probe. The Watson-Crick (WC) base pairs within the duplex side of the ss-dsDNA junction are largely stacked, while stacking interactions 83 between bases within the ssDNA side of the junction are much weaker due to the lack of complementary base pairing. We note that the three-point TCFs, 𝐶̅(3), exhibit at short times (10 ms < 𝜏1, 𝜏2 < 100 ms) positive amplitude for the +1 and -1 constructs, and negative amplitude for the -2 construct. These observations suggest that the dominant kinetic pathways that govern conformational transitions for the +1 and -1 constructs, which undergo exchange dynamics relatively slowly, are distinct from the pathways that govern the transitions of the -2 construct, which undergoes relatively fast exchange dynamics. Table 3 Multi-exponential fit parameters of the visibility 2-point TCFs for the (iCy3)2 dimer-labeled ss-dsDNA fork constructs studied in this work. Note that the TCFs and fits are normalized to the point 𝜏 = 250 μs. construct 𝛼1 𝑡1 (μs) 𝛼2 𝑡2 𝛼3 𝑡3 𝛼4 𝑡4 (s) 𝛼5 𝑡5 (ms) (ms) (s) +1 (iCy3)2 dimer, [NaCl] = 0.90 250 0.24 2.2 0.18 18 0.14 0.15 0.20 2.0 300 mM, [MgCl2] = 6 mM +1 (iCy3)2 dimer, [NaCl] = 0.8 300 0.18 3.7 0.20 170 0.30 3.1 NA NA 100 mM, [MgCl2] = 6 mM +1 (iCy3)2 dimer, [NaCl] = 1.0 270 0.21 2.8 0.13 90 0.23 1.5 NA NA 20 mM, [MgCl2] = 6 mM +1 (iCy3)2 dimer, [NaCl] = 1.0 290 0.26 3.7 0.19 82 0.12 1.1 NA NA 20 mM, [MgCl2] = 0 mM -1 (iCy3)2 dimer, [NaCl] = 0.62 600 0.27 2.2 0.13 53 0.14 1.4 NA NA 100 mM, [MgCl2] = 6 mM -2 (iCy3)2 dimer, [NaCl] = 1.0 360 0.13 5.1 0.12 200 0.17 2.5 NA NA 300 mM, [MgCl2] = 6 mM 84 -2 (iCy3)2 dimer, [NaCl] = 1.0 390 0.19 3.0 0.12 93 0.08 2.0 NA NA 100 mM, [MgCl2] = 6 mM -2 (iCy3)2 dimer, [NaCl] = 1.0 310 0.18 5.9 0.13 100 0.21 1.6 NA NA 20 mM, [MgCl2] = 6 mM -2 (iCy3)2 dimer, [NaCl] = 1.0 360 0.20 2.9 0.10 22.0 0.07 1.2 0.12 4.6 20 mM, [MgCl2] = 0 mM construct 𝛼1 𝑡1 𝛼2 𝑡2 𝛼3 𝑡3 𝛼4 𝑡4 𝛼5 𝑡5 (μs) (ms) (ms) (s) (s) +1 0.90 250 0.24 2.2 0.18 18 0.14 0.15 0.20 2.0 (iCy3)2 dimer, [NaCl] = 300 mM, [MgCl2] = 6 mM +1 0.8 300 0.18 3.7 0.20 170 0.30 3.1 NA NA (iCy3)2 dimer, [NaCl] = 100 mM, [MgCl2] = 6 mM +1 1.0 270 0.21 2.8 0.13 90 0.23 1.5 NA NA (iCy3)2 dimer, [NaCl] = 20 mM, [MgCl2] = 6 mM +1 1.0 290 0.26 3.7 0.19 82 0.12 1.1 NA NA (iCy3)2 dimer, [NaCl] = 20 mM, [MgCl2] = 0 mM -1 0.62 600 0.27 2.2 0.13 53 0.14 1.4 NA NA (iCy3)2 dimer, [NaCl] = 100 85 mM, [MgCl2] = 6 mM -2 1.0 360 0.13 5.1 0.12 200 0.17 2.5 NA NA (iCy3)2 dimer, [NaCl] = 300 mM, [MgCl2] = 6 mM -2 1.0 390 0.19 3.0 0.12 93 0.08 2.0 NA NA (iCy3)2 dimer, [NaCl] = 100 mM, [MgCl2] = 6 mM -2 1.0 310 0.18 5.9 0.13 100 0.21 1.6 NA NA (iCy3)2 dimer, [NaCl] = 20 mM, [MgCl2] = 6 mM -2 1.0 360 0.20 2.9 0.10 22.0 0.07 1.2 0.12 4.6 (iCy3)2 dimer, [NaCl] = 20 mM, [MgCl2] = 0 mM Section 4.3. Salt concentration-dependent breathing of +1 and -2 (iCy3)2 dimer-labeled ss- dsDNA fork constructs. We next consider the effects of varying buffer salt concentration on the equilibrium properties and dynamics of the (iCy3)2 dimer-labeled ss-dsDNA fork constructs. We examined the effects of varying salt concentration on the slowest (+1 position) and the fastest (-2 position) of the ss-dsDNA constructs (see Fig. 2.17 and Fig. 2.18, respectively). In the case of the +1 construct the dimer probe is positioned within the dsDNA region immediately adjacent to the ss-dsDNA fork 86 junction, while the -2 construct has the dimer probe positioned within the ssDNA region immediately adjacent to the junction. Figure 2.17 Salt concentration-dependent PDFs (left column), two-point TCFs (middle column) and three-point TCFs (right column) of the signal visibility for the +1 (iCy3)2 dimer-labeled ss-dsDNA construct. Experiments were performed at room temperature (23°C) and using (A) [NaCl] = 300 mM, [MgCl2] = 6 mM, (B) [NaCl] = 100 mM, [MgCl2] = 6 mM, (C) [NaCl] = 20 mM, [MgCl2] = 6 mM, and (D) [NaCl] = 20 mM, [MgCl2] = 0 mM. 87 Figure 2.18 Salt concentration-dependent PDFs (left column), two-point TCFs (middle column) and three-point TCFs (right column) of the signal visibility for the -2 (iCy3)2 dimer-labeled ss-dsDNA construct. Experiments were performed at room temperature (23°C) and using (A) [NaCl] = 100 mM, [MgCl2] = 6 mM, (B) [NaCl] = 20 mM, [MgCl2] = 6 mM, (C) [NaCl] = 20 mM, [MgCl2] = 0 mM, and (D) [NaCl] = 300 mM, [MgCl2] = 6 mM. At physiological salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM) the DNA duplex is thought to adopt primarily the Watson-Crick right-handed conformation, although thermally excited conformational macrostates may be populated transiently. The marginal stability of the DNA duplex results from a near balance between opposing thermodynamic forces (i.e., enthalpy- entropy compensation)[72]. For example, the forces that favor base-stacking are due primarily to 88 the positive solvent entropy change upon expulsion of the water molecules that are otherwise confined between flat base surfaces (i.e., the hydrophobic effect). However, the forces that oppose base stacking are enthalpic. These are due primarily to the strain of the sugar-phosphate backbones and electrostatic repulsion between negatively charged phosphate groups, which are 70% screened by the diffuse ‘ion cloud’ of monovalent sodium ions within the condensation layer immediately surrounding the negatively charged DNA-water interface. Decreasing salt concentration (relative to physiological conditions) destabilizes the DNA duplex by reducing the electrostatic screening between negatively charged phosphate groups [72], [73], [75]. On the other hand, increasing salt concentration above physiological conditions can also destabilize the DNA duplex. The origins of duplex destabilization at moderately elevated salt concentrations (> 100 mM NaCl) are not fully understood, although this may be partly due to a reduction of the water entropy associated with the formation of structured solvation shells around increasing concentrations of ions and counter- ions in bulk solution. At salt concentrations much higher than physiological (~ 1M), the duplex may be either stabilized or destabilized through Hoffmeister effects [80], [81]. Few details are currently available about how salt concentration affects the equilibrium distribution of conformational macrostates at and near ss-dsDNA fork junctions, or the transition barriers that mediate their interconversion. In Fig. 2.17, we present the results of our salt concentration-dependent studies for the +1 (iCy3)2 dimer-labeled ss-dsDNA construct. These results are organized, from top row to bottom, in order of decreasing salt concentration. We note that for all the salt concentrations that we studied, the PDFs exhibit a pattern of low and high visibility features, and most two-point and three-point TCFs exhibit three well-separated decay components, like those discussed in previous sections. An exception is the TCF corresponding to the highest salt condition shown in Fig. 2.17 89 A, which exhibits four well-separated decay components (see Table 3). As described in previous work [13], [14], the presence of three well-separated decay components suggests that the simplest kinetic network model that can be used to simulate these systems involve four macrostates in thermal equilibrium. At the highest salt concentrations that we studied ([NaCl] = 300 mM, [MgCl2] = 6 mM, Fig. 2.17 A), the PDF of the +1 (iCy3)2 dimer-labeled ss-dsDNA construct exhibits increased population of ‘high visibility’ macrostates within the range 0.15 ≲ 𝑣 ≲ 0.45 in comparison to physiological salt conditions (see Fig. 2.17 B). Moreover, the TCFs decay significantly faster, suggesting that the transition barriers for state-to-state interconversion are reduced. When the salt concentration is decreased relative to physiological conditions to [NaCl] = 20 mM, [MgCl2] = 6 mM (Fig. 2.17 C), the PDF exhibits a slight narrowing of the distribution of ‘low visibility’ macrostates within the range 0 ≲ 𝑣 ≲ 0.15 and a reduced population of ‘high visibility’ macrostates within the range 0.25 ≲ 𝑣 ≲ 0.45. Furthermore, the TCFs exhibit faster dynamics at these low salt concentrations, suggesting that the most heavily weighted macrostates exhibit shorter lifetimes in comparison to physiological conditions. Finally, elimination of divalent magnesium at the lowest sodium concentration ([NaCl] = 20 mM, [MgCl2] = 0 mM, Fig. 2.16 D) leads to the reappearance in the PDF of ‘high visibility’ macrostates in the range 0.1 ≲ 𝑣 ≲ 0.3, albeit with a Boltzmann- weighted distribution quite different than observed at physiological conditions. The dynamics exhibited by the TCFs are fast, suggesting that the transition barriers of the most heavily weighted macrostates are reduced relative to physiological salt conditions. In summary, the effects of raising and lowering salt concentration relative to physiological conditions for the +1 (iCy3)2 dimer- labeled ss-dsDNA construct are to shift the equilibrium balance of ‘high’ and ‘low visibility’ 90 conformational macrostates to favor ‘high visibility’ states and to lower the free energy of activation for state-to-state interconversion. We next examined the effects of varying salt concentration on the -2 (iCy3)2 dimer-labeled ss-dsDNA construct (see Fig. 2.18). The TCF of the -2 DNA construct at physiological salt concentrations exhibits three well-separated decay components, which are significantly faster than for the +1 construct. Moreover, the PDF of the -2 construct at physiological salt concentrations exhibits significant population of ‘high visibility’ states within the range 0.15 ≲ 𝑣 ≲ 0.35, indicating that the bases and sugar-phosphate backbones immediately adjacent to the dimer probes at the -2 position adopt a different distribution of conformational macrostates than that of the +1 (iCy3)2 dimer-labeled ss-dsDNA construct. The relatively fast dynamics of the -2 construct indicates that the transition barriers of the most heavily weighted macrostates are relatively low compared to the +1 construct. Interestingly, the effects of increasing and decreasing salt concentration on the -2 dimer-labeled DNA construct relative to physiological conditions are opposite to those we observed for the +1 construct. At elevated monovalent salt concentration ([NaCl] = 300 mM, [MgCl2] = 6 mM, Fig. 2.17 A), the PDF exhibits a narrowing of the ‘low visibility’ states within the range 0 ≲ 𝑣 ≲ 0.15 and reduced population of ‘high visibility’ states (Fig. 2.17 A). Moreover, the relaxation dynamics are significantly slowed, suggesting that the activation barriers for state-to-state interconversion are elevated. A similar effect occurs for reduced monovalent salt concentration ([NaCl] = 20 mM, [MgCl2] = 6 mM, Fig. 2.17 C). Elimination of divalent magnesium at the reduced sodium concentration ([NaCl] = 20 mM, [MgCl2] = 0 mM, Fig. 2.18 D) leads to a slight broadening of the PDF and faster relaxation dynamics, like those observed at physiological salt conditions. In addition, at this low salt concentration the TCFs exhibit four well-separated decay components 91 (see Table 3). We note that the primary negative amplitude feature of the three-point TCF, seen at physiological salt concentrations, becomes positive at elevated and reduced salt concentrations under conditions in which the dynamics have slowed. These observations indicate that the effects of varying salt concentration relative to physiological conditions for the -2 (iCy3)2 dimer-labeled ss-dsDNA construct are to shift the distribution of conformational macrostates to favor ‘low visibility’ conformational macrostates and to raise the free energy of activation for state-to-state interconversion. 5. CONCLUSION In this work, we have introduced a novel experimental method, called polarization-sweep single-molecule fluorescence (PS-SMF) microscopy, to study site-specific DNA ‘breathing’ fluctuations at and near ss-dsDNA fork junctions. PS-SMF uses exciton-coupled (iCy3)2 dimer- labeled ss-dsDNA constructs to directly monitor the fluctuations of the dimer probes, which depend sensitively on the local conformations of the DNA bases and sugar-phosphate backbones immediately adjacent to the dimer probes at specific site positions. Our results (summarized in Fig. 2.16 – Fig. 2.18) indicate that the bases and sugar-phosphate backbones sensed by the (iCy3)2 dimer probes can adopt four quasi-stable local conformations whose relative stabilities and transition state barriers depend on probe labeling position relative to the ss-dsDNA fork junction. Under physiological buffer salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM), the +1 (iCy3)2 dimer-labeled ss-dsDNA construct exhibits a relatively broad distribution of local base and backbone conformations within the duplex region of the ss-dsDNA junction (Fig. 2.16). The conformational macrostates at this position undergo the slowest dynamics of all the systems that we investigated, indicating that the most thermodynamically stable states 92 within the duplex side of the fork junction are also mechanically stable with relatively long population lifetimes. In comparison, the -2 dimer-labeled ss-dsDNA construct exhibits significantly faster dynamics and a distinctly different distribution of conformational macrostates, indicating that the thermodynamically favored conformations of bases and sugar-phosphate backbones sensed by the dimer probes within the ssDNA region of the fork junction are mechanically unstable and distinct from the duplex. The energetics of the (iCy3)2 dimer-labeled ss-dsDNA constructs can be systematically varied by increasing or decreasing salt concentrations relative to physiological conditions. The position- dependent distribution of conformational macrostates is affected by salt concentration in complementary ways. For the +1 construct, increasing or decreasing salt concentration relative to physiological conditions shifts the equilibrium distribution to favor macrostates that are mechanically unstable (with relatively low transition barriers, see Fig. 2.17). For the -2 construct, varying salt concentration shifts the equilibrium distribution of macrostates to favor those that are mechanically stable (with relatively high transition barriers, see Fig. 2.18). In Fig. 2.19, we illustrate a hypothetical mechanism to account for these observations. In the case of the +1 dimer-labeled ss-dsDNA construct, the local base and backbone conformations adjacent to the dimer probes are within the DNA duplex region of the fork junction (Fig. 2.19 A). At physiological salt conditions, the majority of conformational macrostates sensed by the dimer probe are dominated by WC base stacking, and these conformations are mechanically stable. Increasing or decreasing salt concentration leads to disruption of the local WC conformations within the duplex region and the reduction of mechanical stability (Fig. 2.19 B). In the case of the -2 dimer-labeled ss-dsDNA construct, the local base and backbone conformations adjacent to the dimer probes are within the ssDNA region (Fig. 2.19 C). At physiological salt conditions, the 93 majority of conformational macrostates sensed by the dimer in the ssDNA region are dominated by unstacked base conformations, which are distinct from the duplex region and mechanically unstable. Further evidence for the presence of unstacked and dynamically labile base conformations within the ssDNA region of oligo(dT)15 tails of ss-dsDNA fork constructs was determined in microsecond-resolved single-molecule FRET experiments [14] and corroborated by small-angle x-ray scattering experiments [82]. At elevated or reduced salt concentration, the distribution of conformational macrostates sensed by the dimer probes within the ssDNA region of the fork junction is shifted to mechanically stable base-stacked conformations (Fig. 2.19 D). The above hypothetical mechanism is consistent with the findings of previous ensemble studies of (iCy3)2 dimer labeled ss-dsDNA fork constructs. PS-SMF data, such as those presented in the current work, can be further analyzed using a kinetic network model [13], [56] to provide quantitative information about the relative stabilities of the various macrostates and their free energies of activation. The structural and kinetic properties of the ss-dsDNA fork junction revealed by these experiments can provide mechanistic insights about how replication and repair proteins that assemble at these sites carry out their biological functions. The analysis of these data with a kinetic network model is detailed in Chapters 3 and 4. The PS-SMF method monitors the signal visibility, 𝑣, from exciton-coupled (iCy3)2 dimer probes that are rigidly inserted within the sugar- phosphate backbones of the ss-dsDNA fork construct. Unlike single-molecule optical experiments that solely monitor fluorescence intensity, the visibility observable is a reduced quantity that is directly related to the alignment of the monomeric subunits of the dimer probe. Therefore, PS- SMF is a useful means to measure local structure at the single-molecule level on the scale of a few Angstroms, which cannot be otherwise studied using other single-molecule optical approaches. 94 Figure 2.19 Schematic diagram illustrating hypothesized mechanism of salt-induced instability of (iCy3)2 dimer- labeled ss-dsDNA fork constructs labeled at (A, B) the +1 position and (C, D) the -2 position. Left column panels (A, C) represent the physiological salt condition and right column panels (B, D) represent elevated or reduced salt conditions. The PS-SMF experiments presented here focus on local DNA conformations at and near model ss- dsDNA fork junctions, which are relevant to the non-base-sequence specific assembly of replication and repair proteins at these sites. However, the PS-SMF approach could be applied advantageously to study the local conformations and conformational fluctuations relevant to base- sequence specific recognition of proteins that function to regulate genes in addition to the thermodynamic and kinetic effects of epigenetic modification of bases. 95 CHAPTER 3 : SIMULATING A KINETIC NETWORK TO OBTAIN AN EXPERIMENTALLY DERIVED FREE ENERGY LANDSCAPE 1. OVERVIEW This Chapter contains worked from the forthcoming article “Multi-state kinetic network modeling of time resolved single-molecule data using a generalized master equation approach.” authored by Jack Maurer, Claire Albrecht, Andrew H. Marcus and Peter von Hippel. This work was funded by the National Institutes of Health (NIGMS Grant GM-215981 to P.H.v.H. and A.H.M.). Andrew H. Marcus was the principal investigator for this work. The contents of the article have been expanded upon here to serve as a comprehensive overview and introduction to the development of computational methods which were used for the analysis of all single-molecule data presented in this thesis. Additionally, the computational methods discussed here will serve as the analytical basis for gp32 FRET studies and future PS-SMF studies. 2. INTRODUCTION The methodology described in Chapter 2, sections 3.6 and 3.7, served to introduce the statistical framework used to analyze experimental data, generate the relevant data surfaces, and then form an interpretation of the observed dynamics using kinetic networks and a generalized master equation. This Chapter expands on those sections and introduces the concepts of a memoryless Markov chain and the rate matrix, with its associated eigenvectors, as the primary tools for simulating a kinetic network of interconverting macrostates. This methodology can retrieve an experimentally measured free energy landscape with only a handful of statistical data surfaces at its disposal. The use of a kinetic network to model this free energy surface makes the 96 underlying assumption that our experimental observable reports only on thermodynamic macrostates, with the many microstates within those energetic minima being obscured by either the time resolution of the instrument or the sensitivity of the measurement itself to small fluctuations in the signal observable. The final portion of this Chapter focuses on the implementation of our kinetic network simulations in a highly parallel optimization pipeline. This computational approach alleviates the inherent bottleneck on optimization due to the sheer number of possible networks. Our approach also allows the exploration of the parameter space to be carried out in a two-fold fashion, utilizing an initially coarse search of the parameter space with a custom built genetic algorithm followed by refinement of the initially optimized values in a more conventional steepest descent method. The nuances, strenfths and weaknesses of this approach are discussed. 3. COMPUTATIONAL METHODS Section 3.1 Simulating a kinetic network using Markov chains The kinetic network model proposed here is a Markov chain of jumps between macrostates. The Markov nature of the chain implies memory-less system where there is no memory of previous states, only the current state and its probability of jumping to another [70]. We employ this model (Eqns. 31 and 32) to simulate our experimental TCFs (Eqns. 22 and 29). 𝑀 𝐶̅(2)(𝜏) = ∑ 𝛿A 𝑝 (𝜏)𝛿A 𝑝𝑒𝑞j 𝑖𝑗 i 𝑖 (31) 𝑖,𝑗=1 𝑀 𝐶̅(3)(𝜏1, 𝜏2) = ∑ 𝛿Ak 𝑝𝑗𝑘(𝜏 𝑒𝑞 2)𝛿Aj𝑝𝑖𝑗(𝜏1)𝛿Ai 𝑝𝑖 (32) 𝑖,𝑗,𝑘=1 97 𝑒𝑞 where 𝛿𝐴𝑖 is the fluctuation in the 𝑖 𝑡ℎ observable, as defined in the previous section, 𝑝𝑖 is the probability of state-𝑖 in the long-time limit, i.e., the equilibrium probability, and 𝑝𝑖𝑗(𝜏) is the conditional probability of the system going from state-𝑖 to state-𝑗 in time 𝜏. Each 𝐴𝑖 is an optimization parameter, which will be further constrained by the experimental histogram. We simulate the histogram of our observable, 𝐴, as 𝑃(𝐴) = Σ𝑁𝑗=1𝑝𝑗(𝐴) where 𝑝𝑗(𝐴) = 2 (𝐴−⟨𝐴𝑗⟩) 𝑒𝑞 𝛼𝑗 exp(− 2 ) where 𝛼𝑗 = 𝑝𝑗 /𝜎𝑗√2𝜋, ⟨𝐴𝑗⟩ is the mean observable value, and 𝜎𝑗 is the 2𝜎𝑗 standard deviation of the 𝑗𝑡ℎ distribution. The area of each of these distributions corresponds to 𝑒𝑞 𝑒𝑞 ∞ 𝑒𝑞 the 𝑝𝑗 such that 𝑝𝑗 = ∫ 𝑝𝑗(𝐴)𝑑𝐴 = 𝐴𝑗𝜎𝑗√2𝜋 and that the sum over the 𝑝𝑗 distributions is −∞ 𝑒𝑞 unity, i.e., Σ𝑁𝑗=1𝑝𝑗 = 1. We require that 𝒑 𝒆𝒒 values resulting from the histogram fits must be consistent with the 𝒑𝒆𝒒 values we obtain from the TCF simulations, as discussed below. This allows us to constrain our model to three experimental data surfaces: the histogram, the two-point TCF and three-point TCF. The conditional probabilities and equilibrium probabilities are calculated by solving the kinetic master equation, ?̇?(𝑡) = 𝑲 𝒑(𝑡) (33) This describes the flow of probability, ?̇?(𝑡), at time, 𝑡, given the current probability, 𝒑(𝑡), and the rates of interconversion that exist in the system, which are contained within the rate matrix 𝑲. As written in Eqn. 33, ?̇?(𝑡), 𝒑(𝑡) and 𝑲 are 𝑀x𝑀 matrices, where 𝑀 is the number of states in the system. The ?̇?(𝑡) matrix contains all 𝑝𝑖𝑗 elements describing the conditional probability of going from state 𝑖 to state 𝑗 where 𝑖 and 𝑗 can be any of the 𝑀 states in the system. The 𝒑(𝑡) matrix contains the current probability of the system, where the 𝑖𝑡ℎ column is the array of 𝑀 probabilities 98 at time 𝑡 given that the system started in state-𝑖. The rate matrix, 𝑲, contains the microscopic rate constants, 𝑘𝑖𝑗, which are the inverse of the time for the system to go from state 𝑖 to state 𝑗, i.e., 𝑘 −1𝑖𝑗 = 𝑡𝑖𝑗 , and the layout of the matrix – or where the nonzero elements lie – sets the connectivity (Eqn. 34). −∑𝑀𝑖≠𝑗=1 𝑘1𝑗 𝑘 21 𝑘 𝑀 ⋯ 𝑀1 𝑘 𝑲 = 12 −∑ 𝑘𝑖≠𝑗=1 𝑘2𝑗 𝑀2 (34) ⋮ ⋱ ⋮ [ 𝑘1𝑀 𝑘 𝑀 2𝑀 ⋯ −∑𝑖≠𝑗=1 𝑘𝑀𝑗] The on-diagonal elements contain the negative sum of rates out of the 𝑖𝑡ℎ state, and the off-diagonal elements contain the rates from the 𝑖𝑡ℎ to 𝑗𝑡ℎ state where 𝑖 is the column and 𝑗 is the row. A properly constructed rate matrix will have each column sum to zero. We also ensure that any loops within the network satisfy detailed balance, preserving the flow of probability of the fluctuations within the system at thermal equilibrium [56], [70], [83]. The solution to the master equation (Eqn. 33) gives a matrix of all possible conditional probabilities and the long-time limit of the conditional probabilities provides the equilibrium 𝑒𝑞 probabilities (i.e. 𝑝𝑖𝑗(𝜏) → 𝑝𝑖 as 𝜏 → ∞), which correspond to each of the sub-distributions in the experimental histogram. So, to simulate our TCFs we solve the master equation for all possible initial conditions, i.e., the system is allowed to begin in any of the available states. The technique we use to solve the master equation is called a similarity transform [79]. We start by diagonalizing 𝑲 to obtain the eigenvalues, 𝜆𝑖, and eigenvectors 𝝂𝒊. Using the eigenvalues, we construct a matrix exponential, 1 0 ⋯ 0 λ2𝑡 ⋯ [𝒆𝝀𝒕] = [0 e 0 ⋮ ⋮ ⋱ ⋮ ] 0 0 ⋯ eλM𝑡 99 The structure of the rate matrix, 𝑲, requires that one eigenvalue always be zero, we let this be 𝜆1, and all other eigenvalues, 𝜆𝑖, are negative. Then we construct a matrix, 𝑼, where each column is an eigenvector of 𝑲, giving 𝑼 = [𝝂𝟏, 𝝂𝟐, … , 𝝂𝑴]. Using these components, we formulate our similarity transform, 𝒑(𝑡) = 𝑼[𝑒𝜆𝑡]𝑼−𝟏𝕀 (35) where 𝑼−𝟏 is the inverse of 𝑼 and 𝕀 is the 𝑀x𝑀 identity matrix, accounting for all possible initial conditions [70] The resulting 𝒑(𝑡) is an 𝑀x𝑀 matrix of conditional probabilities, where the elements 𝑝𝑖𝑗(𝑡) give the probability of the system transferring from state 𝑖 to state 𝑗 in the time 𝑡. As we let 𝑡 → ∞, each column of 𝒑(𝑡) gives the 𝑀𝑥1 vector of equilibrium probabilities, 𝒑𝒆𝒒. The 𝒑𝒆𝒒 will be further constrained by the experimental histogram as discussed in more detail below. To better understand the time dependence of the solution, we can write the solution explicitly for the 𝑛𝑡ℎ column vector as, 𝒑𝑛(𝑡) = 𝑐 𝑛 1 𝝂1𝑒 𝜆1𝑡 + 𝑐𝑛 𝜆2𝑡 𝑛 𝜆𝑀𝑡2 𝝂2𝑒 + ⋯+ 𝑐𝑀𝝂𝑀𝑒 As mentioned above, the first eigenvalue of 𝑲 is always zero, so this expression simplifies to 𝒑 𝑒𝑞𝑛(𝑡) = 𝒑 + 𝑐 𝑛𝝂 𝑒𝜆2𝑡2 2 + ⋯ + 𝑐 𝑛 𝜆𝑀𝑡 𝑀𝝂𝑀𝑒 (36) Now that we have expressions for our conditional and equilibrium probabilities, we can plug them back into Eqn. 31 to see the time dependence of the simulated two-point TCF. 𝐶̅(2)(𝜏) = 𝒜 𝜆2𝜏 𝜆3𝜏 𝜆𝑀𝜏2𝑒 + 𝒜3𝑒 + ⋯+ 𝒜𝑀𝑒 (37) Here the 𝒜𝑖s are the relative weights of the 𝑀 − 1 collective relaxation processes. This form of 𝐶̅(2) explicitly shows that the simulation is comprised of 𝑀 − 1 decay components for a model of 100 𝑀 states. We use this as our criteria for determining the minimum number of macrostates for a given system. An important point in our analysis is that the 𝑀 − 1 decay components indeed the need for minimally M states to explain the data. This means that the lower limit on the number of possible conformational macrostates is set by the number of decays seen in the two-point TCF. It also implies that a robust and reproducible method is needed for evaluating the proper number of decays to represent the system. For our evaluation of the number of decays, we turn to the use of general estimates for model complexity in the form of Akaike and Bayesian information criterion. Section 3.2 Estimating the number of macrostates and minimally necessary model complexity We construct our two-point time correlation function from experimental data, as described previously, and fit it with a sum of exponential decays, starting with one decay and we obtain the R-squared value and calculate the Akaike and Bayesian information criterion (AIC and BIC, respectively). These calculations are done as follows: 𝐴𝐼𝐶 = 2𝑘 − 2ln (ℒ) (38a) 𝐵𝐼𝐶 = 𝑘𝑙𝑛(𝑛) − 2ln (ℒ) (38b) Where 𝑘 is the number of parameters in the model, ln (𝐿) is the log-likelihood of the model and 𝑛 is the number of data points. These criteria balance the complexity of the model with the goodness of fit. The interplay between the complexity of the model being used to construct a fit and the potential information content in the underlying data ensures that overfitting is avoided [84]. For the purposes of fitting the two-point TCF with a sum of exponentials, we use the model, 101 𝑡 ℎ(𝑡) = Σ𝐻𝑖=1𝛼𝑖 exp (− ) + 𝛽 (39) 𝜏𝑖 where ℎ(𝑡) is our exponential fit, 𝐻 is the total number of exponentials, 𝛼𝑖 is the amplitude of the 𝑖𝑡ℎ decay, 𝜏𝑖 is the decay time of the 𝑖 𝑡ℎ exponential and 𝛽 is an overall offset. This has 2𝐻 + 1 parameters for a model with 𝐻 exponentials, which gives us our value for 𝑘 in the AIC and BIC calculations. The value for 𝑛 can be obtained by just counting the number of data points in the two-point TCF, and so what we are left with is calculating the log-likelihood. We maintain a consistent data point spacing and density in the TCFs to avoid issues relates to variable data point number in the AIC/BIC estimations. The log-likelihood describes the probability of the data given a set of known parameters [84], and it is aimed at trying to determine how well the data and the fit agree. Or in other words, it aims to address how good of an explanation the model is for the data. To calculate this, we are first going to determine the squared residuals of the sum of exponentials fit to the data. We assume that for a good fit, the residuals should be normally distributed about zero. We first optimize a gaussian distribution to the residuals to estimate the value of the standard deviation. Then we can 𝑥2 1 − 𝑖 write down our likelihood function as. ℒ 2𝑖 = 𝑒 2𝜎 , where 𝑥𝑖 are each of the data points from √2𝜋 𝜎 the residual calculation (𝑖 𝜖[1, 𝑛]) and 𝜎 is the value from the optimized fit. We calculate this for all 𝑛 data points in the residual and then we take the sum over the log of the likelihood, i.e., Σni=1 ln (ℒ𝑖). Now we have all the pieces required to calculate the AIC and BIC (Eqn. 17a, 17b). So, we perform fits of one, two, three, etc. exponentials until we find which case is most favored based on our information criterion calculations. To compare models with the information criterion, take the minimum value of AIC (BIC) across all models sampled and subtract it from all the other AIC (BIC) values. Typically, the rule of thumb is that if the difference of AIC (BIC) is about 10 or 102 more, the model with the smaller value is the preferred case. An alternative, and often valuable, way of calculating AIC and BIC is to use either built-in functions within the software Origin, or another statistical software interface, that can compare two models against one another for the relative strength of model fit. This provides an additional estimate of the AIC and BIC beyond our own approach, which can differ due to the nature of the log-likelihood function that is employed for model comparison. Another consideration that must be made is if there are any decays that are present in the data but are not due to the conformational dynamics. For example, there could be drift from the instrument or photophysical phenomena that manifests itself in the two-point TCF, which needs to be carefully characterized and removed. One way to test for this these parasitic fluctuations is to design control measurements. For example, in our PS-SMF measurements we use an isotropic sample (e.g. rhodamine dye in solution) and a highly oriented sample (e.g., stretched film of a dye molecule) to identify the control timescales. If there is a decay in the TCFs of the control samples, which are selected such that they are expected to not have interesting conformational dynamics probed by the experiments, then we assign these decay timescales to instrumental effect and we do not incorporate them into our model estimate for the number of states. The total number of states will therefore be, 𝑀 = 𝑛𝑒𝑥𝑝 + 1 − 𝑛𝑐, where 𝑛𝑐 are the number of timescales assigned to the instrument from the control experiments. Then when we construct our simulated TCFs we add these control time scales before comparing with the experimental data. This transforms Eqns. 31 and 32 to 𝑀 𝑛𝑐 𝐶̅(2)(𝜏) = ∑ 𝛿Aj 𝑝𝑖𝑗(𝜏)𝛿A 𝑒𝑞 i 𝑝𝑖 + ∑𝛽 −ηs𝜏 𝑠 e (40) 𝑖,𝑗=1 𝑠=1 103 𝑀 𝑛𝑐 𝐶̅(3)(𝜏1, 𝜏2) = ∑ 𝛿Ak 𝑝𝑗𝑘(𝜏2)𝛿Aj𝑝𝑖𝑗(𝜏 𝑒𝑞 −ηs𝜏1 −ηs𝜏3 1)𝛿Ai 𝑝𝑖 + ∑Γ𝑠 e e (41) 𝑖,𝑗,𝑘=1 𝑠=1 where 𝑛𝑐is the number of controls, 𝜂𝑠 are the time scales of the control decays, 𝛽𝑠 and Γ𝑠 are the amplitudes of the control decays in the two-point and three-point TCF, respectively. Typically, we allow some variation in the control amplitudes and decay values when optimizing our models to account for variations due to the amplitudes coming from the decays in our kinetic network models. With the minimally necessary number of microstates determined on a dataset-by-dataset basis, we continue on to optimize those same datasets within our kinetic network analysis. It is occasionally true that the determined number of macrostates via AIC and BIC comparisons fails to yield optimized fits using a kinetic network approach. In these rare cases, optimization attempts are halted and then restarted using one additional state beyond the originally prescribed number. Next, we will dive into the details of the various computational tools we employed to apply this framework to analyze single-molecule data. Section 3.3 Optimization methods for application of a kinetic network model to experimental single-molecule data We have discussed the mathematics needed to simulate time-correlations functions, probability distribution functions and ultimately leverage that information for reconstruction of a free energy landscape describing the system. As previously discussed, obtaining the free energy landscape requires determination of the equilibrium probabilities and kinetic rates which interconnect all macrostates of the system. To determine these quantities using our approach, we employ a large-scale optimization of potential kinetic networks, allowing each network to be 104 simulated separately. In this section we discuss the fundamental components needed to define a single network and our approach to generalizing the simulation of all possible network connectivity’s. Optimization of the kinetic network that best represents the experimental data necessitates that all possible network topologies be generated and tested for their ability to reproduce the three aforementioned data surfaces. The exception to this bottom-up exhaustive approach is when physical insights or logically motivated mechanisms necessitate or exclude certain transitions within the network. The fundamental setup for any network stems from Eqn. 34, which defines the rate matrix 𝑲. Fig. 3.1 illustrates the relation between a single form of the rate matrix and a particular network connectivity. Where the non-zero elements of 𝑲 define each connection in the network. Figure 3.1 An explicit form of the rate matrix is depicted, with the only zero elements being those which would close the connection between state 1 and state 3 in this linear chain model. A schematized network is shown as the illustrated outcome of such a rate matrix to demonstrate the relationship between a particular form of 𝑲 and the resulting network of macrostates. We considered two methods to produce all possible network topologies: a reductive based ‘top-down’ method and an additive based ‘bottom-up’ method. The reductive approach would start with a fully connected network, and stepwise eliminate connectivity between any two-states yielding a new network. The process of sequential reductions in connectivity could then be repeated until all possible networks were generated. Due to the complicated nature of properly handling microscopic reversibility in our kinetic network models [83], as detailed below, we have instead chosen to employ the additive ‘bottom-up’ method. 105 Section 3.3.1 Generating kinetic network models We begin by generating the sequence space for all permutations of N distinguishable objects with a generic permutation generating function within MATLAB. For the kinetic networks of interest, there is two-fold redundancy in the uniqueness of all permutations, since the same sequence run forward or backward is equivalent in a kinetic network approach, but distinct in the case of pure permutations. We remove these redundancies to obtain the complete set of linear connectivities, which we build upon for more complex networks. With the set of linear models, we generate higher-ordered networks with connections between various points in the chain. The result is to obtain unique networks containing all possible loop pathways, given a base linear model. An important consideration in this process is the need to satisfy detailed-balance conditions, also known as microscopic reversibility [83]. This network dependent constraint on the microscopic rates is the primary driver behind the decision to generate higher-ordered networks from the basic linear chains, rather than start with a fully connected network and eliminate connections to generate all possible networks. Each time a connection is added to an existing network, which yields a closed loop, a single microscopic rate must be redefined in terms of a product between the forward and backward microscopic rates contained in the newly established loop. We transform each higher order network into a Boolean mask using the criterion 𝑲 > 0. This maps each connectivity into a binary matrix form of zero and non-zero entries. With each newly obtained higher-ordered network, we compare the Boolean mask to the currently accumulated set of possible Boolean masks, such that each unique network appears only once in the final set. Using the same set of Boolean masks, we can later populate each 𝑘𝑖𝑗 value in the 106 proper 𝑖𝑡ℎ column, 𝑗𝑡ℎ row. Finally, the on-diagonal elements are calculated as the negative sums of the off-diagonal elements in each column (Eqn. 34). This generates the proper starting point for each unique kinetic network model by defining the rate matrix, 𝐾, from our generated connectivity’s. The only remaining components to define a model are the observable values defining macrostates, 𝐴, and the standard deviations of those same macrostates, 𝜎, for which guesses can easily be generated. To better understand the fundamental need for a generalized approach to modeling networks, we include Table 4 below, which shows the connectivity categories and number of models for a steadily increasing number of macrostates. We note that for our present use cases, we have not employed ‘branching’ network models, which are models with higher connectivity than linear chain models but no closed loops. The model generating algorithm has the capability to produce such models, but their inclusion in our investigations has thus far been unwarranted. Table 4 Number of models, per model category, for N=4,5,6 in the course of model generation. Model categories are defined by their connectivity, with linear models having no closed loops and all higher level model categories being defined by the number of loops they contain. Number of Linear Single Loop Multi-loop Total Macrostates Models Models Models models 4 12 15 7 34 5 60 188 363 611 6 360 2148 12303 14811 In the cases of M=5 the number of models becomes burdensome, and in the case of M=6, the number of total models possibles becomes intractable even with highly parallel computing capabilities. To address this issue of rapidly increasing model numbers with increasing M, we have a filtering method to remove models from the total set of simulated possibilities. The imposition of filters can be either including or excluding a particular state-to-state transition within the set of 107 possible networks. Table 5 shows the newly calculated number of models when two filters are included in the generation of the model space. Table 5 Number of models, per model category, for N=4,5,6, when filters are applied during model generation. Filters applied in most cases are based on physically intuitive connections, or lack thereof, in the set of kinetic networks. Number of Linear Single Loop Multi-loop Total Macrostates Models Models After Models After models After Filtering Filtering Filtering 4 4 4 6 14 5 18 55 89 162 6 85 601 3267 3953 Clearly, the inclusion of a few physically intuitive filters, be it transitions that are strictly forbidden or highly likely to occur, can vastly reduce the model space to a level that is tractable with HPC resources. Once all models are generated, appropriately filtered, and rate matrices initialized, we can simulate the aforementioned data surfaces and compare experimentally derived results using a custom nonlinear least-squares fitting routine. Section 3.3.2 Broad exploration with home-built genetic algorithm The first phase of the nonlinear least squares optimization uses a home built genetic algorithm to broadly search the parameter space. The inherent advantage to using a genetic algorithm at the onset of parameter refinement is to better ensure the global chi-squared minima is explored during the optimization, rather than exploring a subset of local chi-squared minima. We employ steepest descent refinement methods after an initial round of optimization in a genetic algorithm to ensure we have reached the bottom of the minima. We have found it to be true in all cases that close fits are required from the genetic algorithm for the secondary refinement steps to 108 be effective, whereas poor fits or random starting points simply cannot be effectively refined using the steepest descent approach. The genetic algorithm developed in our lab works on the same basic principles as any genetic algorithim. A set of initial guesses for the parameters are made, they are sorted by their chi-squared, then a subset of those cases are retained based on quality of fit to the data (Fig. 3.3). Those not retained are replaced by random mixtures of the retained guess parameters, then all guesses are subject to random mutations of variable magnitude. This process is repeated until the overall best outcome fails to improve after a chosen number of parameter mixing and mutation events. We have introduced a few variations and home-built methods which have led to a uniqueness compared with other more conventional approaches and have been instrumental in obtaining sufficient fits to our single-molecule data [14]. In the remainder of this section, we will discuss the newly introduced unique features. New sampling methods The first step in the genetic algorithm is the initialization of many possible parameters within a single model. In this case, this constitutes unique guesses of each parameter that comprises a single kinetic network model. In our approach, we let the size of the initial guess pool be roughly ten times larger than the subsequent “generations” of guesses (Fig. 3.3). This is due in part to the overall increase of the parameter space size which is necessitated by moving from 𝑀 = 3, as was done in [14], to 𝑀 > 3 states. This increase in parameter space dimension leads to a significant barrier in the combinatorics of best fit possibilities. An additional consideration on the combinatorics is the span of each parameters allowed values, which can require a highly open boundary on the inverse rate constants (𝑘−1𝑖𝑗 ) to ensure each parameter takes on its optimal value. 109 By opening the boundary on the inverse rate constants to span multiple decades in time, the uniformity of parameter space sampling becomes a concern. The use of standard random number generation, with a multiplier on the uniform (0,1) interval to meet the maximum allowed value on each parameter, fails to give equal probabilities for each power of ten interval of the allowed space (Fig 3.2). This is particularly problematic in the modeling efforts described here, since the inverse rate constants are given the interval of (10−6,102) seconds to explore. To combat the systematic oversampling of long times compared to short times in the inverse rate constants, a new sampling method was developed which ensures an equal probability and number of sample occurrences within each decade of the allowed parameter space. This method simply determines the number of decades being spanned, devises a probability distribution function that can be sampled from, and then returns a uniform number of sampling occurrences within each decade of the allowed boundary. Figure 3.2 Two panels compare the results of sampling two separate distributions. The first panel spanning many orders of magnitude is improperly uniform when using conventional RNG. To overcome this issue, we custom built the LogRandomUniform sampling method. The results of sampling with our method are shown in the first panel. The second panel depicts the results of selecting population members from either a half-normal or uniform distribution within the mixing and mutation phase of of the genetic algorithm. 110 Sort and select good fits Once all parameters are initialized and a population set of simulated outcomes is produced, the next step is evaluation of individual guess quality (Fig. 3.3). The selection process for retention versus elimination of modeled outcomes at each step of the overall algorithm typically comes down to a single choice of a cutoff percentage. In this case, the set of chi-squared sorted guesses which below the cutoff percentage line are removed, while those above the line are retained and their parameters used for mixing in the subsequent step. In a typical genetic algorithm, a uniform distribution is used to select a model outcome from the chi-squared sorted list for mixing. But to better leverage the large number of model outcomes being calculated, a half-gaussian distribution function was used for selection of model outcomes to be mixed. This leads to more rapid convergence of each iteration of the genetic algorithm, at the cost of slightly narrowing the exploration of the parameter space due to the preference for a subset of potential parameters to be mixed. In principle this approach also aims to avoid mixing poor guess outcomes into later generations, which fall very near to the cutoff percentage line. Mix and mutate Once a set of models has been chosen for retention, a small fraction is kept without alteration to the parameters, typically the top 2-4 individual outcomes. The fraction of the guess population thrown out at each iteration is regenerated with random mixtures of the parameters from all retained members of the previous iteration. This step generates entirely new guesses independent of the previous iteration’s fit quality, which helps prevent trapping within a local chi- squared minima of the large combinatoric space. Once the set of models are regenerated, all parameters are probabilistically subjected to random mutations of their parameter values from the previous iteration (Fig. 3.3). Again, because the span of each parameter can be quite significant, 111 our logarithmically sampled random number generator is used to ensure a minimal mutation on the order of the parameter space lower bound can be achieved as well as a maximal mutation on the order of 10% the upper bound. Lastly, all best fit outcomes after typically 35-40 repeated generations of mixing and mutations steps within a single iteration of the genetic algorithm that fail to improve the chi- squared by more than 1% are saved and sorted based on their chi-squared. Variable control over the number of repeated generations within an iteration exists within the routine. By increasing the number of repeats or decreasing the percentage-wise chi-squared improvement cutoff, iterations can be refined longer for successively better outcomes. This does come at the expense of total iterations, which limits the extent of parameter space explored by the genetic algorithm within a fixed amount of time. This allows for the best outcome across tens of thousands of guesses to be examined, while the statistics across best fit outcomes are used to motivate adjustment of the allowed parameter space. The algorithm is run for a fixed number of iterations, typically chosen to be ~500-1000 depending on the number of states 𝑀, and degree of span within the allowed parameter space. Individual iterations are stopped when the best fit chi-squared across the total population meets the criteria discussed in the previous paragraph. These criteria currently serve as a place holder for a stricter convergence condition. However, despite the lack of a strict convergence condition, the optimized results of the genetic algorithm routinely approach a level which can be safely refined using a more traditional steepest decent method, as will be discussed below. 112 Figure 3.3 A diagrammatic depiction of the genetic algorithm workflow. Starting from initialization of random guesses to form a population. Followed by simulation and model fit quality assessment. Subsequent mixing, mutation and re- evaluation steps are carried out until one possible stopping criteria is triggered. Section 3.3.3 Deterministic optimization with Pattern Search The second phase of the nonlinear least squares optimization for determination of the optimal kinetic network employs a commercially available optimization routine from MATLAB known as ‘Pattern Search’. ‘Pattern Search’ utilizes a mesh grid refinement routine across all parameters allowed to float in the model under consideration. If the model under consideration has N floating parameters being optimized, then at every step of the ‘Pattern Search’ optimization an N-dimensional grid is generated about the current parameter values in both the positive and 113 negative directions, yielding 2N total possibilities for the next best fit result. The initial starting point for the routine is the set of values found during the genetic algorithm. Each parameter receives a positive or negative shift in its value, with the magnitude being proportional to the current size of the mesh grid in all N dimensions. The mesh grid size starts off at its maximally allowed value, which is based on the allowed boundaries for each parameter. As the optimization evolves, the mesh grid size contracts to smaller values if no lower chi-squared is observed across the 2N possibilities of the current refinement step. This mesh grid size contraction propagates until the result being refined reaches the level of the precision allowed by the lower bounds on all parameters, thus resulting in a plateaued minimization of the chi-squared between target data and model. Given that the mesh grid is only allowed to contract in this routine, the ability of the routine to reach a globally optimized result is highly dependent on the quality of input fit parameters determined in the genetic algorithm. Whether these input parameters correspond to the global or local minima in the chi-squared determines the success of the ‘Pattern Search’ refinement. For this reason, the working protocol is to allow the genetic algorithm roughly 500-1000 iterations of optimization prior to submitting any result to ‘Pattern Search’ for refinement. 1000 iterations are likely excessive for three state models but may not be sufficient for five or six states models given the higher dimensional parameter space. However, for four states we have observed that 500-1000 iterations of the genetic algorithm have achieved robust refinement results in most cases. Multiple polling methods for the generation of the 2N possible outcomes at each step of ‘Pattern Search’ exist. A polling method in this context is how the generation and evaluation of the N dimensional mesh grid at every step is performed. In this work, we use the ‘GPS Poll Method’, which causes the mesh grid size, and therefore degree of perturbation to each of the N parameters, to maintain its value in both the positive and negative directions at each step. Additionally, inverse 114 proportionality across all dimensions is maintained with the current step number, leading to the largest shifts in any parameter being made early in the refinement, while small shifts occur later. This ensures a funneling of the chi-squared to its local minima is achieved. Other polling methods were used early on in this work, namely the ‘MADS’ polling method, which is a random magnitude, random direction method that aims to avoid the deterministic trajectory of the ‘GPS’ method for a given input. However, the performance of the ‘MADS’ method was inferior to the ‘GPS’ method in nearly all cases and was not explored further for that reason. Variable control over either the number of optimization iterations, simulation evaluations or total run time performed in a single run of ‘Pattern Search’ is used as a stopping criterion. We use the total run time of ‘Pattern Search’ as our stopping criterion given the amenability to HPC job submission run times. While convergence within the allowed number of model calculations is not guaranteed, a typical value of 50,000 model calculations, with each step of ‘Pattern Search’ making 2N such calculations, is seen to be sufficient for four states. Given that the number of model calculations at each step grows with the size of the model under consideration, a value of 50,000 maximum model calculations may not be sufficient for five states and beyond. For this reason we regularly employ a 24 hour run time across all data sets for single runs of ‘Pattern Search’. Section 3.4 Details of calculating the chi-squared across three surfaces A critical component of the optimization routines described above is how the calculation and evaluation of the best-fit chi-squared is performed. For each surface the chi-squared is calculated in a standard way, 115 2 Σ 𝜒2 = 𝑖 (𝜓𝑖−?̅?𝑖) (42) 𝜓𝑖 where 𝜓𝑖 is the data and ?̅?𝑖 is the simulated fit. As will be discussed further below, dividing by the data is sometimes ignored, and otherwise is carefully tailored to avoid dividing by zero. The chi-squared values for each surface are then summed together into a total chi-squared value which is the parameter minimized by the optimization routines, i.e., 𝜒𝑡𝑜𝑡 = 𝜒ℎ𝑖𝑠𝑡 + 𝜒𝐶2 + 𝜒𝐶3. Since the minimization of the chi-squared guides the optimization at every level, it is paramount to achieving an accurate representation of the data via a model fit. There are two central issues at play in simultaneously fitting the multiple statistical functions described here. The first issue is the disparate magnitudes of the data comprising these statistical functions, because the total chi-squared score across distinct models is compared based on the sum over all three surfaces individual chi-squared scores. The magnitude of the curve describing the probability distribution function is on the order of 10−1 while the three-point time correlation function is order 10−5. When all statistical functions experience a similar percent error residual on the model outcome versus data, the chi-squared only reflects the residual of the highest magnitude statistical surface, while the remaining functions can have a lesser contribution by orders of magnitude. Untreated, this leads the optimization to fit only the highest magnitude statistical function without regard for the remaining functions. Therefore, a constant multiplier must be applied to the individual chi- squared of each statistical function so that their contribution to the total chi-squared is of similar magnitude given a similar degree of residual error (𝛼ℎ𝑖𝑠𝑡 , 𝛼𝐶2, 𝛼𝐶3). 𝜒𝑡𝑜𝑡 = 𝛼ℎ𝑖𝑠𝑡 𝜒ℎ𝑖𝑠𝑡 + 𝛼𝐶2 𝜒𝐶2 + 𝛼𝐶3𝜒𝐶3 (43) The second issue that arises in the fitting of these statistical functions is the density of points used to represent the time correlation functions across the individual surfaces. As was 116 discussed in [14] a logarithmic sampling of τ in both the two and three-point correlation functions is used to achieve enough data points for a robust representation, without calculating every possible τ product available at a particular data binning resolution. This avoids an overly dense representations of the two and three-point TCF at late τ, which greatly increases calculation time from raw data and simulation time during modeling. The simple result of this approach is that the first decade of τ contains very few points compared to the final few decades (Fig. 3.2). It is worth noting that this problem would not only persist but be amplified if τ were to be exhaustively sampled at the data binning resolution. The calculation of chi-squared performed for each statistical function is an average over all squared residuals between data and model. Given the much greater number of points in the average from long values of τ compared to short values, a simple average is dominated by the long τ values. To compensate for this density issue, we apply an artificial weighing function to the square of the residuals prior to taking the average. These arbitrary weight functions, (𝜂𝛽 where 𝛽=histogram, C2 or C3) or mask, are applied to each element of the 𝜒𝑡𝑜𝑡 calculation (see eq) and is calculated assuming a steadily increasing noise profile of the data. This originates from both the decaying dynamical correlation in the data as well as the loss of pair-wise products which contribute at late τ compared to early τ, leading to a statistically less accurate measure of the correlation functions at late τ. 𝜒𝑡𝑜𝑡 = 𝛼ℎ𝑖𝑠𝑡𝜒ℎ𝑖𝑠𝑡𝜂ℎ𝑖𝑠𝑡 + 𝛼𝐶2𝜒𝐶2𝜂𝐶2 + 𝛼𝐶3𝜒𝐶3𝜂𝐶3 (44) To best capture the essential features of each unique data set, a home-built chi-squared weight function is created to address the subtle differences between inputs being optimized. The basic elements of the chi-squared weight function are inputs of a constant noise floor 𝛾0, a variable noise profile 𝛾(𝜏) across the various decades of 𝜏, as well as a choice in the discretization 𝑁𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 of the chi-squared surface to address data point density. The underlying assumption of 117 the noise mask 𝜀(𝜏) is that the noise profile of the correlation function surfaces strictly increases and that the idealized weight function applied to the chi-squared residuals would result in equal contributions from each decade, or segment of a decade, across all 𝜏. This is in opposition to a blind application of a squared residual error function, which is dominated by the latest decades of the correlation functions. The complete noise mask is simply defined by the sum: 𝑁𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 𝜀(𝜏) = Σ 𝛾𝑖(𝜏)+𝛾0 (45) 𝑖=1 The weight surface for the correlation functions is then defined by applying the noise mask to the underlying data and establishing equal contributions from every segment to the pure chi-squared. This is done by multiplying each segment’s chi-squared contribution in the presence of the noise mask by an appropriate scalar 𝑆 to meet the magnitude of the maximally contributing segment. 2 𝑁 𝜂 = ∑ 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 (𝜓𝑗−𝜀𝑗𝜓𝑗) 𝐶2/𝐶3 𝑆𝑗 (46) 𝑗=1 𝜓𝑖 The result of this approach is to obtain unique weight functions for every data set, which only rely on steadily increasing noise and differences in data point density. For the case of the histogram, the weight function is simplified as the inverse of the data surface itself: 1 𝜂𝐻𝑖𝑠𝑡 = (47) 𝑃𝐷𝐹 As is often the case, one set of parameters does not work for every data set. Therefore, the noise floor, noise profile and discretization of the calculated functions are left as inputs to be adjusted for on a data set-by-data set basis. Additionally, cutoffs in the correlation times are established for all correlation functions to avoid dividing the error residuals by zero, as commonly occurs in the 118 data at the longest values of tau. A typical set of values is regularly used at the onset of the optimization process, with adjustments being made once the optimization reaches a certain threshold in the balance of statistical contributions from all three data surfaces. It should be noted that the interplay between these inputs does allow for the ability to tailor the weight functions given a particularly problematic or unique data set in a manner which was not possible for the simpler and more conventional 𝜏−𝑛 approach. The resulting weight functions and artificial noise profile are then applied to the data to determine an average chi-squared contribution across all three surfaces. This determines the starting point for the scalar multipliers to be used in the optimization. Further adjustment of scalar multipliers by the user is typically necessary (𝛼 𝑖 ′s in Eq. 44). However, this approach further reduces computational run time, as the number of unique optimization trials to achieve proper scalar multipliers is dramatically reduced. The final statistical tool we employ to guide the optimization of our kinetic networks via updates to the weight function parameters is a graphical interface which visualizes the error from multiple perspectives. The perspectives are the perceived error from the use of the weight functions, the true error from both the precent error and raw residual, as well as the form of the weight functions themselves. The combination of these statistical data surfaces offers a complete view of where the optimization is missing its target outcome, how that target outcome can be better configured and lastly whether the interpretation of the unweighted error is improving or worsening via alterations to the weight function parameters. Taken in totality, the outputs of this user interface are a convenient and concise way of motivating updates to the optimization across all data sets. The details of this approach are discussed below in section 3.4.1. 119 Section 3.4.1 Quality of fit assessment via Global Chi-squared Statistics In order to assess the quality of fits and deal with the pitfalls of simultaneously fitting three data surfaces to a single model, a modeled chi-squared weight function is generated for every data set using a noise profile based on the experimental data. We refer to this approach as a ‘weights calculator’, where the tabulation of the chi-squared outcome for each model within a data set is optimized based on a constructed interpretation of the residuals rather than a pure interpretation. However, the pure residuals and percent error between model and data are still tabulated and tracked at the initial steps of refinement to help guide the proper selection of ‘weights calculator’ parameters. To better elucidate this process the outcome of calculating the chi-squared using both a ‘weights calculator’ and raw error interpretation is performed for both a high quality and lower quality fit to the same data. The visualization of this approach for both cases is shown in Figs. 3.4- 3.11. The actual weight functions used for these examples are shown in Figs. 3.5 and 3.9. Notably, these surfaces differ from any smooth analytical function which might otherwise be employed for the purpose of weighting the chi-squared. In Fig. 3.4, a high-quality fit to experimental single molecule data can be seen. The resulting ‘weights calculator’ interpretation of this data (Fig. 3.7) and the raw error interpretation (Fig. 3.6) of the data are in relative agreement, with both displaying low values of their respective error functions in all regions where the fit to data is good and the data itself contains little noise. Visual inspection of the actual fits to the data shows a convincingly close representation of the underlying dynamics and conformational states which yielded the experimental results. 120 Figure 3.4 Example of a ‘good’ fit to the data from a model outcome. The PDF, C2 and C3 are shown against their respective model outcomes. The residuals are, on average, near their minimum. Figure 3.5 Weight surfaces applied to the C2 and C3 respectively. These surfaces are applied to the residuals of the data versus the model prior to calculating the chi-squared (see Eq. 44). 121 Figure 3.6 ‘Chi-Stats’ overview for the panels that report on statistical measures of ‘true’ error (no weight function applied). The case shown here is for the high-quality fit of Fig 3.4. 122 Figure 3.7 ‘Chi-Stats’ overview for the panels and statistical measures of the ‘weighted’ error via the application of weight functions to the data residuals. The case shown here is for the high-quality fit of Fig. 3.4. In Figs. 3.8-3.11, a lower quality fit to the same experimental data is shown, using the same ‘weights calculator’ parameters to inspect the chi-squared and error. The raw error in this case is much greater than before and this is captured by the ‘weights calculators’ as a greater magnitude in the interpretation of chi-squared from both the two and three-point correlation functions, in the regions where the model deviates most significantly from the data. Notably, some regions of the ‘weights calculators’ surfaces are greater in magnitude than the raw error would suggest is necessary. This is due to the issue of data point density and dilution of the statistical average chi- squared that was mentioned previously. Ultimately, these panels of interpreted versus actual 123 residual error are used to guide the attenuation of parameters used to construct the ‘weights calculator’ surfaces. Figure 3.8 Example of a ‘bad’ fit to the data from a model outcome. The PDF, C2 and C3 are shown against their respective model outcomes. The residuals are, on average, not near their minimum. Figure 3.9 Weight surfaces applied to the C2 and C3 respectively. These surfaces are applied to the residuals of the data versus the model prior to calculating the chi-squared (see Eq. 44). The surfaces shown here are notably not the same as those of Fig. 3.5, leading to the discrepancy in observed fit quality between the two examples. 124 Figure 3.10 ‘Chi-Stats’ overview for the panels that report on statistical measures of ‘true’ error (no weight function applied). The case shown here is for the low-quality fit of Fig 3.8. 125 Figure 3.11 ‘Chi-Stats’ overview for the panels and statistical measures of the ‘weighted’ error via the application of weight functions to the data residuals. The case shown here is for the low-quality fit of Fig. 3.8. Section 3.4.2 Optimizing models in parallel on a high-performance computing cluster Traditionally, we have used physically motivated intuition to guess the proper network, or a subset of probable networks then proceed with a limited number of optimizations on a handful of local PCs [14]. While this approach certainly reduces the need for large scaling computing, it does leave out many possibilities which may appear unintuitive but represent the data better than physically motivated expectations. With the ability to arbitrarily generate all network models for 𝑀 states within a class of network topologies, as discussed in section 3.3.1, the sheer number of possible models requiring optimization quickly becomes unwieldy. Here, we used a local high performance computing cluster (HPC) to scale the optimization of the best fit kinetic networks 126 across a dozen or more data sets simultaneously. The limitation on the number of models being run simultaneously is simply a matter of hardware availability and utilization, as is the case with any shared computing resource. With the use of the HPC, the entire allowable model space can be evaluated at a sufficient level of sampling within a few days for the genetic algorithm. The underlying MATLAB optimization routines at both the genetic algorithm and pattern search phases were wrapped into bash routines to automate the submission of jobs as well as automate the deployment of all possible data sets onto the HPC simultaneously, achieving optimization of all models in parallel with one another. The constraints on typical laboratory computing resources are of course primarily driven by the sheer size of the optimization, which typically limits optimization of a single model to a single CPU core. Random access memory requirements also limit the total number of models capable of being optimized on a single PC, irrespective of the clock rate and number of independent cores available on the CPU. No GPU based computing was entertained during this work, although in principle could aid the deployment of more models simultaneously onto local computing resources or greater parallelization within a single model, possibly reducing the need for the HPC. For more than five states the number of models grows so substantially that even typical HPC resources cannot handle simultaneous optimization of all possible models, unless unduly long periods of time are tolerable to achieve results. In these cases, it is beneficial to introduce a filter onto the entire model space, which is motivated by simple physical considerations of the system under study, as was discussed in section IIIA. 127 Section 3.4.3 Construction of ‘weights calculator’ surfaces The approach taken in this work to handle the simultaneous fitting of the three data surfaces is to construct a separate three surfaces which individually weight the chi-squared contribution from each segment of data across the entirety of each surface. The basic premise of the ‘weights calculator’ is to model the noise profile of the data and then calculate the anticipated chi-squared contribution from each data segment. In order to do this, a few parameters must be defined to construct the model noise profile. These are called the ‘noise floor’, ‘noise ramp’ and ‘number of data segments.’ Each parameter will be discussed and defined below. A data segment is defined by a short section of consecutive data points from either the two- or three-point correlation function over some discrete length of independent variable Tau. Segments can be as coarse as entire decades in Tau or as fine as individual data points. Coarsely segmenting the data alleviates effects from outliers in producing unusual contributions to the final weight surface. However, averaging together noise contributions using coarse segments results in some loss of sensitivity to the nuances of each underlying data surface and its unique contours. We have composed the noise profile with two main elements: a ‘noise floor’ and a ‘noise ramp.’ The ‘noise floor’ is defined as the baseline noise contribution to the entire underlying data surface. This is a simple percentage of the data surface included in the noise mask applied to the data. The ‘noise ramp’ is a linearly increasing noise contribution as a function of Tau, which allows for the latest points in the correlation functions to have a greater percentage wise error than earlier points. This gain in noise with increasing Tau is expected based on the signal to noise in these statistical functions growing with the total number of valid pair or triple products. As well as the loss of correlation at late Tau. 128 Altering the ‘noise floor’ or ‘noise ramp’ has a dramatic effect on the resulting weight functions. The basic principle behind alterations to these parameters is to better weight segments of the data surfaces that are underfit relative to other data segments which are overfit. The interpretation of the relative under or overweighting of certain data segments is made through examination of the underlying fits to the data as well as inspection of the chi-squared results through the lens of both the ‘weights calculator’ and raw error representations. The approach taken to produce high quality optimized fits to single molecule data is to first run the genetic algorithm, use the statistical tools described previously to guide a balanced optimization across all three data surfaces and then allow the optimization to evolve for some time. Once reasonably high-quality fits are achieved in the genetic algorithm, the results serve as starting points for refinement in ‘Pattern Search’. Once again, outcomes of ‘Pattern Search’ are subjected to the same statistical examination, with balance changes or alterations to the weight surfaces being motivated by the combination of the statistics and visual inspection of the optimized fits. 4. CONCLUSION We have developed an analytical and computational framework for extraction of kinetic network parameters from single molecule measurements, independent of system observable. This information can be leveraged to construct the free energy landscape governing system dynamics and offers a mechanistic view of the conformational changes occurring within macromolecular assemblies. The approach is generalizable to a wide variety of experimental systems, after considering both the relevant control experiments to identify sources of noise as well as the degree to which equilibrium has been established during periods of data acquisition. The ability to analyze data in a parallel fashion opens the possibility of leveraging large data set statistics and best fit 129 parameters to tailor the optimization of a large combinatoric space toward achieving a sensible and robust result. The ability to implement parallel optimizations hinges on the availability of computational resources. But for systems where physically intuitive constraints can be placed on the interconnectivity of the underlying kinetic network, the demand for computational resources can be significantly reduced. In cases where the system is well understood from a conformational macrostate perspective, but no current methodology affords the assignment of microscopic kinetic rate constants, our framework presents the opportunity to uncover a richer picture of the underlying dynamics in such systems. We plan to apply this analysis framework to problems pertaining to the assembly and function of protein-DNA complexes, where the unique assignment of free energy landscape parameters can elucidate the biochemical mechanisms driving the core function of these macromolecular machines. 130 CHAPTER 4 : ANALYSIS OF PS-SMF DATA FROM SS-DSDNA JUNCTION USING A KINETIC NETWORK MODEL 1. OVERVIEW This Chapter contains worked from the article “DNA ‘Breathing’ free energy landscape determination via kinetic network analysis of polarization-sweep single-molecule fluorescence microscopy data” authored by Jack Maurer, Claire Albrecht, Andrew H. Marcus and Peter von Hippel. This work was funded by the National Institutes of Health (NIGMS Grant GM-215981 to P.H.v.H. and A.H.M.). Andrew H. Marcus was the principal investigator for this work. The contents of the article have been expanded upon here to serve as a comprehensive overview and introduction to the methods which will be used throughout the remaining Chapters. This Chapter will more formally analyze the results of ‘polarization-sweep’ experiments. A series of structural inferences are made based on both ensemble and computational work to assign conformational macrostates to a local geometry in the resulting kinetic networks. 2. INTRODUCTION Obtaining an experimental measure of free energy has been of much interest to biophysical studies of heterogeneous systems, namely protein-DNA and protein-protein assemblies, where the dynamical evolution of macromolecular structure determines biological function. The role of DNA ‘breathing’ in the assembly and function of protein complexes at DNA replication forks within the T4 bacteriophage replisome has long been appreciated based on ensemble biophysical studies, which quantified the presence and extent of such thermally populated conformational states in DNA [2], [3]. Ensemble studies using fluorescent nucleobase analogues were able to show that 131 structural differences in base stacking and base-pairing persist between adjacent sites at ss-dsDNA junctions, with unique spectral signatures inside and outside of the junction [85]. More recent ensemble studies employing (iCy3)2 dimer-labeled ss-dsDNA constructs demonstrated that the local structure and degree of disorder near ss-dsDNA junctions was a function of the specific sites investigated[24]. Taken together, there is substantial evidence for the occurrence of thermally populated DNA ‘breathing’ states at replication fork junctions, whose conformations and relative stabilities are a sensitive function of labeling position. Our recent work studying DNA ‘breathing’ using polarization-sweep single-molecule fluorescence (PS-SMF) microscopy demonstrates that these thermally activated DNA conformers can be measured directly and are transiently populated in the vicinity of a DNA replication fork. Our results indicate that the bases and sugar-phosphate backbones sensed by the (iCy3)2 dimer probes can adopt four quasi-stable local conformations, whose relative stabilities and transition state barriers depend on probe labeling position relative to the ss-dsDNA fork junction. Additionally, the energetics of the (iCy3)2 dimer-labeled ss-dsDNA constructs were shown to be systematically varied by increasing or decreasing salt concentration relative to physiological conditions. The position-dependent distribution of conformational macrostates was shown to be affected by salt concentration in complementary ways. In this current work, we present the results of a kinetic network model analysis of the same PS-SMF data using a generalized master equation approach in combination with a massively-parallel optimization scheme deployed on high performance computing cluster resources, which is the topic of a forthcoming article. We quantify the thermodynamic minima of all conformational states and the activation barrier heights connecting states within each unique kinetic network, under all experimental conditions examined. The combination of our experimental and analytical approach can be used to sensitively study the mechanism of protein-DNA complex recognition, assembly, 132 and function, with single molecule sensitivity and microsecond time resolution, allowing for full quantification of the free energy landscape governing fundamental biological processes in replication, repair and beyond. The results of our kinetic network model analysis reveal that the thermodynamically most stable macrostate is typically B-form DNA, with the secondarily most stable macrostate being attributed to a left-handed conformation that is supported by ensemble studies using circular dichroism measurements [24]. Recent computational studies and previous ensemble experimental studies suggest that local Hoogsteen base-pairing conformations can persist as the thermodynamically most stable macrostate at sites where nucleic acids have undergone methylation [86], [87]. These Hoogsteen base-pairing interactions form through a flipping out of a base-pair from the double-stranded region, followed by rotation of that base-pair about its glycosidic bond, to eventually form a Hoogsteen-base pair, where the pucker of the sugar- phosphate backbone can undergo a conformational transition depending on the site and identity of the methylated base-pair[86]. However, even in the absence of methylation, the ability to form Hoogsteen base-pairs at between complementary nucleic acids still occurred in the free energy surfaces of the same simulations. We hypothesize that such a Hoogsteen base-pairing interaction is responsible for the left-handed state that we observe in our experiments. This phenomenon is diagrammatically captured in Fig. 4.10. The effects of altering salt concentration cause the thermodynamic stability of each conformational macrostate to be reorganized and in some cases results in a re-ordering of the most stable conformational macrostate. The thermodynamic transition barriers obtained from our analysis are modulated by the concentration of salt in a manner which supports the inferred structural assignments given to the four conformational macrostates, based on varying degrees of hydrogen bonding and base-stacking interactions in the 133 local vicinity of (iCy3)2 dimer probes that label the ss-dsDNA fork junction. We use the culmination of these results obtained from our model simulations to postulate structural details of each macrostate and offer a proposed mechanism for the interconversion of each conformer within the optimized kinetic network of all experimental data sets. 3. MATERIALS AND METHODS All experimental data analyzed in this paper was obtained using the methodologies described in [120]. For the analysis that is presented here, the theoretical framework employed has been previously used to successfully obtain free energy landscape parameters from single molecule FRET data [14]. A detailed description of the theoretical framework and computational optimization pipeline is the subject Chapter 3. The simulation and optimization of kinetic network models was carried out in MATLAB using the combination of a home-built genetic algorithm and customized mesh-grid refinement based on MATLAB’s ‘PatternSearch’ toolbox. Our MATLAB routine is intended to simulate a single network connectivity, from the set of all possible network connectivity’s, for any given data set. In order to scale our optimization to meet the demands of simultaneously accounting for many different network connectivity’s, across many distinct data sets, we developed a Linux Bash routine that is capable of deploying thousands of model optimizations simultaneously on a high-performance computing cluster. By removing the bottleneck on the optimization of many model networks, each with large parameter spaces defining them, we were able to obtain the globally optimal kinetic network that best explains the observed macrostate probability distribution and conformational dynamics at DNA replication forks. To achieve robust optimized fits across all data surfaces in our routine, as shown in Fig. 4.1, we devised a nonlinear chi-squared weighting scheme which accounts for the disparity in 134 magnitude, noise and data point density across the three data surfaces. The details of this scheme are found in Chapter 3. 4. RESULTS AND DISCUSSION 4.1 Positional dependence of DNA replication fork free energy landscapes The optimized kinetic network outcomes under physiological salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM) at the +1, -1 and -2 labeling positions are shown in Fig. 4.1 and superimposed onto the experimental data in each panel. The fit outcomes are shown in dashed grey, dashed red and dotted red for the probability distribution function, two-point TCF and three- point TCF respectively. We emphasize that the quality of fits shown is high and significantly better than the dozens, sometimes hundreds, of lower quality model outcomes, which are not shown for the sake of brevity. Fig. 4.1 illustrates the positional dependence of the four conformational macrostates thermodynamic stability, obtained from optimizing a kinetic network to our experimental data. Before discussing the relative stabilities and making inferred structural assignments to the macrostates depicted in Fig. 4.1, we briefly review the experimental and theoretical basis for such assignments. Previous studies of DNA ‘Breathing’, ensemble studies of average structure, as well as conventional knowledge about the thermodynamics of nucleic acid chemistry offer a handful of structural motifs which can be used to infer structural assignments of our conformational macrostates [2], [3], [24], [72]. Fig. 4.2 shows a logical progression of DNA ‘Breathing’ modes which can occur over a range of temperatures, or salt concentrations. 135 Figure 4.1 Optimized fits from our kinetic network analysis for the position dependence of DNA ‘breathing’ under physiological salt conditions. Panels A, B, and C are for the +1, -1 and -2 positions respectively. 136 Figure 4.2 Depiction of the various modes and extents of DNA ‘breathing’ one might expect (taken from [1]. Panel (a) shows fully stable B-form DNA. Panel (b) depicts a breathing mode where the bases lift off one another. Panel (c) depicts a breathing mode where the bases break their complementary hydrogen bonding interaction. Panel (d) is a pre- melting transition where a mixture of base-unstacking and base-unpairing is observed. Panel (e) is a fully melted case, where the two strands have completely unannealed. In panel A of Fig. 4.2 the canonical B-form DNA is depicted, characterized by fully stacked and hydrogen-bonded base-pairs. In panel B the bases have lifted off one another, disrupting the stacking interactions. In panel C, the bases have broken their complementary hydrogen bonding such that the duplex is opened up to the surrounding environment. In panel D some combination of base unstacking and loss of hydrogen bonding is depicted, leading to a highly disordered and unstable conformation with many bases flipping outside the duplex. Finally, panel E depicts a nearly fully melted structure which has lost all resemblance to dsDNA. The energetics of each possibility in this series has been addressed in detail, while also accounting for the role of water in the entropy of the system, through a series of calorimetric studies [72]. Treating these various degrees of freedom as energetically distinct structural fluctuations which perturb the arrangement of the sugar-phosphate backbone, we can begin to make assignments to our kinetic network results obtained on DNA replication fork junctions. Starting with the +1 position under physiological conditions, we attribute the state 𝑆2 to B- form DNA, which is most stable at the +1 position and is the expected predominant conformer in 137 duplex DNA. The state 𝑆3 is secondarily most stable, which we attribute to a left-handed conformation, consistent with ensemble studies as well as recent computational studies on the role of Hoogsteen base-pairing [24], [86]. The remaining two states 𝑆1 and 𝑆4 are relatively unstable compared to 𝑆2 and 𝑆3 at the +1 position but remain essential to explaining the observed conformational dynamics in the context of a generalized master equation. The low visibility state 𝑆1is attributed to a propped-open state at the junction, where the loss of complementary hydrogen bonding leads to a locally open conformation that is consistent with the loss of excitonic coupling between the two Cy3 monomers, yielding a low visibility signal. Finally, 𝑆4 is attributed to the bases surrounding the junction lifting off one another, thereby disturbing base-stacking interactions, leading to a more parallel alignment of the sugar-phosphate backbone, that is consistent with a stronger polarized response in our Cy3 dimer probes, and a higher visibility value, in our model. It is interesting to note that the equilibrium probability of 𝑆1 exceeds that of 𝑆4 at all positions under physiological conditions, consistent with the notion that base-stacking interactions rather than base-pairing interactions are the predominant enthalpic contribution to duplex stabilization in dsDNA [72], [88]. At the -1 and -2 positions the thermodynamically most stable state shifts to 𝑆3 from 𝑆2, seen in panels B and C of Fig. 4.1. As previously mentioned, we assign the state 𝑆3 to a left-handed conformation. This redistribution in the equilibrium probabilities of the two most stable, and oppositely handed, states is consistent with ensemble studies of (iCy3)2 dimer labeled fork constructs, where the handedness of the (iCy3)2 dimer probe was shown to invert from right- to left-handed, moving across the junction from the +1 to -1 position [24]. Notably, 𝑆2 becomes the secondarily most stable state at the -1 and -2 positions, causing the relative stability of 𝑆2 and 𝑆3 in the duplex to become the mirror image of the relative stabilities of 𝑆2 and 𝑆3 outside the junction 138 in the single-stranded region. The reduced stability of 𝑆1 and 𝑆4 at the -1 position, relative to both the +1 and -2 positions, suggests that the free energy landscape at the -1 position is unique and dominated by mostly two states. The uniqueness of the -1 position in our analysis is in agreement with previous ensemble measurements made on DNA systems containing base analogues, which showed that the degree of base-stacking was greatly reduced at the -1 position relative to all neighboring positions [85]. As was noted in the initial work which these analyses are based on, the dynamical trends in the two-point and three-point TCFs are expected given the differential stabilities of stacked bases within dsDNA versus ssDNA regions spanning the ss-dsDNA fork junction [120]. The free energy surfaces under physiological conditions for the +1, -1, and -2 positions, calculated using Eqn. 21, are shown in Fig 4.3. The diagrammatic depiction of networks composed of states 𝑆1through 𝑆4 illustrate the microscopic rate constants determined from our kinetic network analysis, which are on the order of microseconds to seconds. Similarly, it can be appreciated that many of the transition barriers between conformational macrostates are within the range of ~2-6 𝑘𝑏𝑇, with only a few of the rarer transitions have higher transition barriers. The optimization results for all studies are summarized in Tables 6, 7 and 8. 139 Figure 4.3 Free energy landscapes and associated kinetic networks for each of the positions examined under physiological salt conditions. Microscopic rate constants are shown and labeled by the state-to-state transitions they correspond to. A summary of the thermodynamic and mechanical stability at each position studied under physiological conditions is presented in Fig. 4.4. Panel A of Fig. 4.4 summarizes the free energy minima associated with each of the four conformational macrostates as a function of position. Panels B and C of Fig. 4.4 show the minimum transition barrier height out of each macrostate, at each position. Panels B and C separate the minimum barrier heights for states 𝑆2, 𝑆3 and 𝑆1, 𝑆4 respectively. By plotting the minimum transition barrier height as a function of labeling position, for each macrostate, the mechanical stability of each underlying conformer can be better understood. The mechanical stability of 𝑆2 is significantly higher than any of the remaining macrostates at the +1 position, suggesting that transitions out of 𝑆2 are unlikely to occur compared to all other transition pathways (Fig. 4.4 Panel B). The mechanical stability of 𝑆3 is relatively low at the +1 position, but steadily increases across the junction to reach a maximum at the -2 position. Simultaneously, the mechanical stability of 𝑆2 decreases moving outward across the junction, 140 reaching a minimum at the -2 position. Notably, the mechanical stability of 𝑆2 and 𝑆3 are in close competition at the -1 position, indicating more rapid interconversion of the two conformers is occurring, further supporting previous studies which suggested a uniqueness of the conformational landscape at the -1 position compared to neighboring positions [85]. The mechanical stability of 𝑆1and 𝑆4are significantly lower on average than either 𝑆2 or 𝑆3. The mechanical stability of 𝑆1 drops off moving outward across the junction, while 𝑆4 becomes more mechanically stable moving from the +1 to the -1 position before returning to a level of mechanical stability at the -2 position which reflects the +1 position (Fig. 4.4 Panel C). Figure 4.4 Summary diagrams for the thermodynamic and mechanical stability of the +1, -1 and -2 positions under physiological salt conditions. Panel A illustrates the free energy minima associated with of the four conformational macrostates as a function of probe position. Panels B and C depict the minimum transition barrier governing the mechanical stability of each macrostate, as the probe position is varied. The continuous decrease in mechanical stability under physiological conditions of 𝑆2, moving outward across the junction, is consistent with the assignment of 𝑆2 to B-form DNA, where the degree of complementary base pairing decreases moving from the +1 to -2 positions in our model replication fork junctions. The steady increase in the mechanical stability of 𝑆3 can be understood from the loss of base-stacking interactions in the single stranded region that occurs moving outward across the junction. This loss of an enthalpic penalty from the disruption of base- 141 stacking interactions tends to stabilize the formation of an unstacked, left-handed conformer at the -1 and -2 positions [72], [82], in which a Hoogsteen base-pair has formed, relative to the predominantly B-form conformer seen in the duplex [86]. The decreasing mechanical stability of 𝑆1 across the junction is consistent with the loss of enthalpically favored hydrogen bonded base- pairs and base-stacking interactions at both nearest neighbor positions that would tend to stabilize a locally propped-open conformation. The trend in mechanical stability for 𝑆4 is perhaps most difficult to interpret. At the +1 and -2 positions, the mechanical stability of 𝑆4 is quite low, consistent with the disruption of base-stacking interactions in the duplex being the largest enthalpic penalty to overcome during a conformational fluctuation. The sharp rise in the mechanical stability of 𝑆4 at the -1 position must be due to some unique interplay between the enthalpic penalty of disrupting base-stacking interactions in the duplex and a higher degree of conformational and solvent entropy upon formation of the thermodynamically unstable 𝑆4 macrostate. 4.2 Salt dependence of DNA replication fork free energy landscapes The optimized outcomes for the salt dependent studies at the +1 and -2 positions are shown in Fig. 4.5 and Fig. 4.6, Panels A-D, with the same color coding as described for the positional dependence of Fig. 4.1. We carried out four unique sets of salt dependent experiments, at the both the +1 and -2 position, with variable NaCl and MgCl2 concentrations, namely, [NaCl] = 100 mM, [MgCl2] = 6 mM; [NaCl] = 20 mM, [MgCl2] = 6 mM; [NaCl] = 20 mM, [MgCl2] = 0 mM; [NaCl] = 300 mM, [MgCl2] = 6 mM. This spanning set of conditions is intended to address the role of elevated and depleted monovalent ions as well as bivalent ions, as they influence the propensity for conformational fluctuations to occur at replication fork junctions. Notably, few details are currently available about how salt concentration affects the equilibrium distribution 142 of conformational macrostates at and near ss-dsDNA fork junctions, or the transition barriers that mediate their interconversion. Figure 4.5 Optimized fits from our kinetic network analysis for the salt dependence of DNA ‘breathing’ at the +1 position. Panels A-D are labeled by the salt condition they represent. Fig. 4.5, Panels A-D, show the analysis results for salt dependent effects on the conformational macrostates observed at the +1 position, across all four sets of conditions. The analysis results for physiological salt concentrations ([NaCl] = 100 mM, [MgCl2] = 6 mM) at the +1 position were already discussed in section 3.1. Reduction of monovalent salt concentration ([NaCl] = 20 mM, [MgCl2] = 6 mM), results in the slight destabilization of the B-form macrostate 𝑆2 and an increase in the thermodynamic stability of the locally propped-open macrostate 𝑆1, relative to physiological salt conditions. Previous ensembles studies of DNA melting temperature 143 curves as a function of salt concentration suggest a general destabilization of DNA at lower salt concentrations, consistent with a higher propensity for non B-form conformers as salt concentration is decreased in our measurements [80], [89], [90]. The thermodynamic stability of 𝑆3is decreased by the loss of monovalent salt at the +1 position, as is the thermodynamic stability of 𝑆4. The loss of thermodynamic stability of 𝑆3 and 𝑆4 at decreased monovalent salt is true at both the +1 and -2 positions (Fig. 4.5 Panel C, Fig. 4.6 Panel C), suggesting a role played by monovalent salt in the stabilization of the left-handed, bases unstacked, Hoogsteen base-paired 𝑆3 conformer as well as the right-handed, bases unstacked, 𝑆4 conformer. Given our proposition that both 𝑆3 and 𝑆4 are mediated by alteration of the base-stacking interactions, one would expect the degree of enthalpic penalty to unstacking bases to be mutually affected by the loss of monovalent salt for either handedness of the (iCy3)2 dimer probe, both in the duplex and single-stranded regions. The subsequent loss of magnesium under already low monovalent salt conditions ([NaCl] = 20 mM, [MgCl2] = 0 mM), leads to the thermodynamic stability of 𝑆3 and 𝑆4 being restored to levels which are similar to physiological salt conditions, while 𝑆1 experiences its global free energy maximum across all conditions and constructs examined. It is thought that magnesium plays a special tole in the stabilization of key structures in DNA, with a significantly higher degree of sensitivity in the melting temperature dependence of DNA due to alterations in magnesium concentration than sodium or potassium concentration [80]. In contrast to the effects of lowering monovalent salt, where the stability of conformers mediated by unstacked bases was decreased, the effect of lowering bivalent salt appears to raise the stability of 𝑆3 and 𝑆4, suggesting that the role of monovalent and bivalent salt in meditating base-stacking interactions have an opposite effect on free energy at the +1 position. 144 Raising the concentration of monovalent salt at the +1 position relative to physiological conditions ([NaCl] = 300 mM, [MgCl2] = 6 mM), causes the thermodynamic minimum of the free energy landscape to shift towards 𝑆3, while 𝑆4, 𝑆2 and 𝑆1all become thermodynamically less stable. Interestingly, a fifth macrostate, 𝑆5, emerges under elevated salt concentrations. This additional macrostate is minimally necessary to explain the results of our PS-SMF experiments using a kinetic network analysis. While the apparent conformational heterogeneity at the +1 position is increased by raising the concentration of monovalent salt, the thermodynamic stability of 𝑆3 is lower and further removed from the thermodynamic stability of neighboring macrostates than any of the remaining salt concentrations examined at the +1 position. The greater stability of 𝑆3 compared to all other macrostates present the +1 position, under elevated monovalent salt, is consistent with increased stability of duplex DNA during both melting curve and force pulling experiments of DNA under elevated monovalent salt [80], [88]. The appearance of an additional conformational macrostate at the +1 position under elevated monovalent salt could simply be the result of 𝑆5 dwell times becoming long enough to be experimentally accessible by the range of integration times available to our PS-SMF experiment [120]. This is consistent with the slowest overall dynamics observed at the +1 position being under elevated monovalent salt (Fig. 4.5 Panels A-D), as well as the emergence of a fifth macrostate at the -2 position under decreased salt and the loss of magnesium, where similarly the slowest dynamics are observed across all conditions explored at the -2 position (Fig. 4.6 Panels A-D). The complete free energy landscapes for the salt dependence of the +1 position are shown in Fig. 4.6, Panels A-D, along with the timescales of state-to-state transitions in the optimized kinetic 145 networks. A convenient summary of the thermodynamic and mechanical stability of conformational macrostates at the +1 position is given in Fig. 4.9, Panels A-C. Figure 4.6 Free energy landscapes and associated kinetic networks for each of the salt conditions examined at the +1 position. Microscopic rate constants are shown and labeled by the state-to-state transitions they correspond to. The salt induced effects on the mechanical stability of the +1 position are summarized in Fig. 4.9, panels B, C. Raising the concentration of monovalent salt at the +1 position ([NaCl] = 300 mM, [MgCl2] = 6 mM) tends to increase the mechanical stability of 𝑆3 and 𝑆4, the conformers mediated by base-stacking interactions. Whereas the mechanical stability of 𝑆1 and 𝑆2 is decreased, which are conformers mediated by hydrogen bonding of complementary base-pairs within our model. Lowering the concentration of monovalent salt away from physiological conditions at the +1 position ([NaCl] = 20 mM, [MgCl2] = 6 mM) has the same effect on the mechanical stability of each macrostate compared with raising the concentration. The mechanical stability of 𝑆3 and 𝑆4 is increased and the mechanical stability of 𝑆1 and 𝑆2 is decreased. This sort of interplay is reminiscent of the concept of cold denaturation, where the optimal conditions for the thermodynamic and mechanical stability of biologically relevant conformations occur precisely at physiological conditions, and any alteration to those conditions tends to destabilize the system 146 [72]. The loss of bivalent salt, at already low monovalent salt concentration ([NaCl] = 20 mM, [MgCl2] = 0 mM), causes the mechanical stability of each conformer at the +1 position to revert toward the picture seen under physiological conditions. This trend in mechanical stability echoes the similar restoration of relative thermodynamic stabilities at the +1 position upon loss of bivalent salt, at already depleted monovalent salt concentrations. It is notable that the mechanical stability of 𝑆3 and 𝑆4 as well as 𝑆1 and 𝑆2 are modulated by the effects of altering salt concentration in equal and opposite ways at the +1 position, supporting the notion that the enthalpic and entropic contributions to the intramolecular interactions mediating these pairs of conformational macrostates are interrelated. Fig. 4.7, Panels A-D, show the analysis results for salt dependent effects on the conformational macrostates observed at the -2 position, across all four sets of conditions. Physiological salt conditions ([NaCl] = 100 mM, [MgCl2] = 6 mM) at the -2 position were previously described in section 3.1. Decreasing the concentration of monovalent salt at the -2 position ([NaCl] = 20 mM, [MgCl2] = 6 mM) causes the thermodynamically most stable macrostate to shift away from 𝑆3 toward 𝑆2. Much like the effects observed at the +1 position, the loss of monovalent salt tends to destabilize the states 𝑆3 and 𝑆4 which are governed by base-stacking interactions in our model. The stability of the propped-open 𝑆1 macrostate is decreased by the loss of monovalent salt at the -2 position. This destabilization occurs in an opposite manner to the +1 position, suggesting that the salt induced instability of base-pairing interactions has an oppositional effect on the duplex and single-stranded regions, much like the hypothesized mechanism outlined in our initial work [120]. 147 Figure 4.7 Optimized fits from our kinetic network analysis for the salt dependence of DNA ‘breathing’ at the -2 position. Panels A-D are labeled by the salt condition they represent. The loss of bivalent salt at lowered monovalent salt concentrations ([NaCl] = 20 mM, [MgCl2] = 0 mM) causes the thermodynamic stability of 𝑆1, 𝑆3 and 𝑆4 to all increase while 𝑆2 remains the thermodynamic minima. This change brings the relative thermodynamic stability of each conformational macrostate into closer agreement with what was observed under physiological conditions, much like the results of losing bivalent salt at the +1 position, albeit with slightly less strict correspondence between the two salt concentration regimes at the -2 position. Interestingly, a fifth macrostate is required to explain these experimental results at the -2 position, much like the results of elevated monovalent salt at the +1 position. 148 As previously mentioned, these two salt concentration regimes, at each respective position, correspond to the slowest overall dynamics observed across all conditions investigated. Finally, raising monovalent salt concentration at the -2 position ([NaCl] = 300 mM, [MgCl2] = 6 mM) also causes the thermodynamic minima to shift away from 𝑆3 toward 𝑆2, further suggesting the role played by salt parallels the effects seen in cold denaturation. Under these same conditions, 𝑆1 is slightly stabilized while 𝑆4 is destabilized. At the -2 position, the effect of increasing monovalent salt tends to stabilize conformers mediated by hydrogen bonding, 𝑆1 and 𝑆2, while destabilizing conformers mediated by base-stacking interactions, 𝑆3 and 𝑆4. The free energy landscapes and timescales connecting macrostates within each kinetic network are shown in Fig. 8 for the -2 position salt series results. Figure 4.8 Free energy landscapes and associated kinetic networks for each of the salt conditions examined at the -2 position. Microscopic rate constants are shown and labeled by the state-to-state transitions they correspond to. The salt induced effects on the mechanical stability of the -2 position are summarized in Fig. 4.9, panels E, F. On average, the mechanical stability of the -2 position is modulated to a lesser extent than the +1 position. One might expect the mechanical stability of conformational macrostates to be less sensitive to the alteration of salt in the single-stranded region (-2 position), 149 as the conformational landscape is already quite disordered and more metastable than duplex DNA (+1 position). Raising the concentration of monovalent salt at the -2 position ([NaCl] = 300 mM, [MgCl2] = 6 mM) tends to increase the mechanical stability of 𝑆3 and 𝑆4, the conformers mediated by base-stacking interactions. A similar trend to that observed at the +1 position. The mechanical stability of 𝑆1 and 𝑆2 are decreased by increasing the concentration of monovalent salt, conformers mediated by hydrogen-bonding interactions, again reflecting the trend observed at the +1 position under elevated monovalent salt. Decreasing the concentration of monovalent salt at the -2 position ([NaCl] = 20 mM, [MgCl2] = 6 mM) tends to increase the mechanical stability of 𝑆4 and decrease the mechanical stability of 𝑆1, while leaving the mechanical stability of 𝑆2 and 𝑆3 relatively unchanged. The decrease in mechanical stability of 𝑆1, as well as the increased mechanical stability of 𝑆4, agree with the salt dependent results at the +1 position. This suggests that the mechanisms governing changes to the enthalpy and entropy of conformational changes between these macrostates is similar between the duplex and single-stranded region upon lowering monovalent salt, whereas the same does not appear to be true for the mechanical stability of 𝑆2 and 𝑆3 at the -2 position. The loss of bivalent salt ([NaCl] = 20 mM, [MgCl2] = 0 mM) causes the mechanical stability of 𝑆2 and 𝑆3 to sharply increase, while the mechanical stability of 𝑆1 and 𝑆4 taken on their global minima at the -2 position. Again, two of the four conformational macrostates, namely 𝑆2 and 𝑆4, see changes in their mechanical stability which reflects the trends observed at the +1 position upon loss of magnesium. While the remaining two macrostates, 𝑆1 and 𝑆3, appear to be differentially affected by the loss of bivalent salt compared with the results obtained in the duplex. 150 Figure 4.9 Summary diagrams for the thermodynamic and mechanical stability of the +1 and -2 and positions under a series of salt conditions. Panel A and D illustrate the free energy minima associated with of the four conformational macrostates as a function of probe position (+1,-2). Panels B, C, E and F depict the minimum transition barrier governing the mechanical stability of each macrostate, as the salt concentration is varied, at the +1 (B and C) and -2 (E and F) positions respectively. The general trends in the thermodynamic and mechanical stability of conformational macrostates at the +1 and -2 positions under a variety of salt conditions suggests that similar interactions mediate 𝑆1and 𝑆2 as well as 𝑆3 and 𝑆4. Notably, in most optimized kinetic networks, these pairs of states (𝑆1and 𝑆2 , 𝑆3 and 𝑆4) are directly connected as nearest 151 neighbors in the layout of the network. Suggesting regularly observed transitions between these conformers must occur. As previously stated, we propose that 𝑆1is a right-handed, locally propped open conformation and 𝑆2 is B-form DNA. We also propose that 𝑆3 is a left- handed, bases unstacked, Hoogsteen base-paired conformation and 𝑆4 is a right-handed, bases-unstacked conformation. Using the insights gained from the results of our kinetic network analysis, we can propose a structural model to accompany the free energy landscape governing the conformational fluctuations we observe at replication fork junctions in ss- dsDNA. 5. CONCLUSION 5.1 Proposed structural model of conformational macrostates at replication fork junctions With the results of our kinetic network model analysis, we may now begin to fill in a proposed structural model for the conformational macrostates occurring in the vicinity of a ss-dsDNA replication fork junction. Our analysis suggests that four thermodynamically stable macrostates persist under physiological conditions inside and outside of the junction. The assignments given on a structural basis are depicted in Fig 4.10, with each position examined in this study shown separately to emphasize the structural differences from the duplex side out into the single stranded region. Furthermore, our analysis suggests that two of these macrostates dominate the free energy landscape, namely 𝑆2, B-form DNA, and 𝑆3, a left-handed Hoogsteen base-paired conformer [86]. These two macrostates are the most thermodynamically stable within the duplex and single stranded regions, under all conditions examined. The remaining two macrostates 𝑆1 and 𝑆4 are 152 present in all data sets with various degrees of occupancy, owing to the trends observed in the thermodynamic and mechanical stability of all macrostates under various salt concentrations. We attribute 𝑆1 to a locally propped-open state where the complementary hydrogen bonds have been broken to allow for a locally form a bubble, much like the mechanism of DNA ‘breathing’ first proposed [2]. The final macrostate, 𝑆4, is attributed to a right-handed bases unstacked conformation which is often rare in DNA only cases but can be important in the formation of protein-DNA complexes. The structural model depicted in Fig. 4.10 summarizes these results and emphasizes the sensitivity we have to the site-specific labeling position of the DNA using i(Cy3)2 dimer probes. Building off these DNA only studies, we can now begin to examine the effects of protein binding on the conformational landscape of DNA. We can also imagine examining sequence and polarity dependent dynamics in the vicinity of junctions or other such locally important regions of DNA, where the enthalpy and entropy effects of various mixtures of AT versus GC base pairs, base-stacks and locations within the major and minor grooves of DNA can all influence the energetics. 153 Figure 4.10 Proposed structural model based on the results of our kinetic network analysis of replication fork junctions under numerous solvent conditions. Rows are labeled by position of the i(Cy3)2 dimer probe while columns label the conformation of the sugar-phosphate backbone, as sensed by the probes in each of the sites examined. The central column depicts the mechanism of action for formation of a Hoogsteen base-pair at the replication fork junction. Table 6 Optimized kinetic network model parameters for the free energy minima and activation barriers (left column) and optimized PDF parameters (right column) for all positions and conditions studied. Blank entries denote connections or states which did not exist in the optimized network models. 154 155 156 Table 7 Optimized kinetic network model parameters for the kinetic rate constants for all positions and conditions studied. Entries with ‘Inf’ denote connections which did not exist in the optimized network models. 157 CHAPTER 5 : APPLICATION OF PS-SMF TO PROTEIN-DNA COMPLEXES 1. OVERVIEW During this thesis work, numerous collaborations were undertaken between the primary author, Jack Maurer, and other members of the Marcus and von Hippel Lab. The biochemically focused collaborations were primarily carried out with Dr. Patrick Herbert and pertinent to the study of key protein-DNA complexes formed during the assembly and function of replisome machinery at ss- dsDNA junctions. The first section of this Chapter aims to introduce the systems studied, detail their biochemical and biophysical importance and then motivate the studies internal to our laboratory, where unique dynamical insights were gained. Some of the information presented here is the topic of a forthcoming research article which at this time is still in preparation for journal submission titled, “Investigating local DNA conformational trapping dynamics during DNA polymerase holoenzyme assembly and exonuclease proof-reading activity”. Primarily two protein- DNA complexes were studied by Dr. Patrick Herbert using the PS-SMF method and numerous ensemble approaches. These complexes were the clamp-clamp loader (gp45 clamp, gp44/62 clamp loader) and DNA polymerase holoezyme (gp43 polymerase). 158 2. INTRODUCTION The T4 bacteriophage clamp-clamp loader system is a well-studied, biologically relevant, macromolecular machine which can load the sliding clamp (gp45) onto a DNA p/t junction for eventually complexation with DNA polymerase (gp43). Progressive chromosomal replication relies on the presence of the clamp to achieve recognition and nucleotide addition in the DNA polymerase holoenzyme with rates of addition that are orders of magnitudes faster in the presence of the clamp than in the absence of the clamp [91]. To achieve this enormous reduction in the time required to make a copy of DNA, the clamp must first be successfully loaded and positioned at a p/t junction, then complex with DNA polymerase to initiate replication. The clamp itself cannot load onto a DNA strand because it is circular and closed in the absence of the clamp loader. Therefore some conformational rearrangement must first facilitate binding of the clamp-clamp loader complex to DNA, where the clamp is at least partially open to facilitate loading, and then another conformation which sees the clamp close down onto the DNA strand and the clamp-loader depart in preparation for DNA polymerase activity and function. There has been significant effort dedicated to the structure and biochemistry of the clamp-clamp loader system of T4 bacteriophage, as well as the corresponding clamp-clamp loader system of eukaryotes and archaea [91]–[94]. Structural studies have been able to elucidate key conformations in the formation of the clamp- clamp loader complex as well as the fully accommodated clamp onto a DNA template strand. But many of these structural studies have been based on crystallography data and lack the dynamical information needed to report on the kinetics of these processes, while also suffering from the potential bias which crystallization conditions can induce in preferentially producing a particular structure from a slew of possibilities [33]. Similar efforts have been made toward elucidation of the structure and biochemical mechanisms of the DNA polymerase holoenzyme [93]–[97]. 159 The highest quality structure of the T4 bacteriophage clamp-clamp loader system bound to a DNA template was reported in 2011 by Kelch et.al [91]. The insights gained in their study are depicted schematically in Fig. 5.1. The detailed atomistic map produced from their work led to a hypothesized mechanism for the assembly of the clamp, by the clamp loader, onto a DNA template. Step 1 in Fig. 5.1, shows that in the absence of ATP, the clamp loader AAA+ modules cannot organize into a spiral shape. Then in step 2, ATP binding causes the AAA+ modules to form a spiral that can bind and open the clamp. Next, in step 3 the primer-template DNA is threaded through the gaps between the clamp subunits I & III and the clamp loader A and A’ domains. In step 4, DNA binding in the interior chamber of the clamp loader, activates ATP hydrolysis. They hypothesize this most likely occurs through flipping of the switch residue and release of the Walker B glutamate seen in their crystal structures. The 5th and final step is ATP hydrolysis at the B subunit which breaks the interface at the AAA+ modules of the B and C subunits and allows closure of the clamp around primer-template DNA. Further ATP hydrolyses at the C and D subunits dissolve the symmetric spiral of AAA+ modules, thus ejecting the clamp loader because the recognition of DNA and the clamp is broken. The clamp is now loaded onto primer-template DNA and the clamp loader is free to recycle for another round of clamp loading. This postulated mechanism of assembly and function is elegant and physically conceivable. But the ability to link static crystal structures with highly dynamical processes occurring in a solution phase context is difficult at best. To better understand the mechanisms that underly these critical protein-DNA complexes, Dr. Patrick Herbert performed experiments on these reconstituted systems in vitro, using a variety of experimental methods. 160 Figure 5.1 Schematic depiction of the clamp-clamp loader reaction cycle taken from Kelch et. al [91]. But most critically, the recent development of PS-SMF opened the possibility of studying the rapid interconversion of multiple conformational macrostates, local to a DNA p/t junction, with high temporal resolution and exquisite-specificity for probe labeling site. The aim of these studies was to uncover the relevant conformations of DNA p/t junctions which are leveraged by the clamp- clamp loader to achieve binding and function. Ultimately, determining the free energy landscape in the presence of the clamp-clamp loader at a p/t junction yields unique insights that can aide in the discovery of new mechanistic insights and offer support toward previous structural assignments made on static complexes. 161 The logical next system to study, given the results obtained on a clamp-clamp loader complex, is DNA polymerase bound to a p/t junction. DNA polymerase can polymerize a copy of DNA from a template strand with enormous accuracy and relatively high-speed. This occurs at both the leading and lagging strands of an unwound DNA template, where the leading strand proceeds in a more straightforward fashion while the lagging strand experiences a series of start- stop-stall cycles that leads to the formation of Okazaki fragments. Typically, base pairs are improperly incorporated only 1 time per 105 to 108 nucleotide additions leading to an ultra high fidelity in the process of DNA replication. A critical feature of this protein-DNA complex is the known switching behavior between two functional modes when bound to DNA. Where each mode of operation is structural distinct [98], [99]. These are the exonuclease mode, where the polymerase rearranges the DNA to perform an important proofreading and editing step to eliminate possible misincorporation of an incorrect nucleotide, and the polymerase mode, where nucleotides are added to the growing DNA copy. This system is a classic example of a macromolecular machine which leverages conformational rearrangements as the basis for the multitude of functions it performs. Additionally, these conformational rearrangements occur on a very local scale within the DNA template, namely at the level of single nucleotides. Owing to the highly local dynamics of DNA polymerase and the logical extension of the clamp-clamp loader work, Dr. Patrick Herbert also obtained PS-SMF data on DNA p/t junctions bound with DNA polymerase, where it was shown that the degree of conformational changes could be turned by the concentration of magnesium in the local solvent environment. This result agrees with previous studies that examined the degree of reversible exo-to-pol switching dynamics using circular dichroism and base analogue probes [99]. Crystal structures of DNA polymerase have been obtained, but obtaining structures bound to a DNA template in either the exonuclease or polymerase mode 162 remain elusive. Recent computational studies have offered a hypothetical view of exo-to-pol switching in DNA polymerase, which suggests a kinetic pathway that requires transition times on the order of tens-of-milliseconds to switch between the two configurations (Fig. 5.2) [98]. Figure 5.2 Proposed kinetic scheme obtained by computational studies of Dodd, et. al.[100] The end-to-end transition is between the exo state and pol state of the bound DNA holoenzyme, which contains the clamp as well as DNA polymerase. Notably, the results suggest a net transition time of ~10ms between the two conformational macrostates. Both the clamp-clamp loader and DNA polymerase complexes offer an exciting opportunity to obtain detailed kinetic and thermodynamic information on the local conformational changes driving key functions within the replisome machinery. This Chapter focuses on the PS-SMF results obtained on these complexes. These results were analyzed with a large-scale optimization to 163 determine the kinetic network which best explained the observed thermodynamic behavior. The kinetic network results are largely presented here, drawing occasionally on results obtained by Dr. Patrick Herbert using more traditional ensemble techniques that aid in the explanation and interpretation of single-molecule studies. Single-molecule studies of DNA only samples labeled well within the duplex side of the p/t junctions are also discussed here, as they are foundational to building up an understanding of the conformational landscape that is leveraged by the fully assembled protein-DNA complexes. 3. MATERIALS AND METHODS Table 8 shows the sequences and nomenclature of the internally labeled (iCy3)2 dimer-labeled p/t-DNA constructs. Oligonucleotide samples were purchased from Integrated DNA Technologies (IDT, Coralville, IA) and used as received. For our absorbance and circular dichroism (CD) measurements, solutions were prepared with sample concentrations of 1 mM and a standard aqueous buffer of 10 mM Tris, 100 mM NaCl, and 6 mM MgCl2, unless elsewise stated. We combined complementary oligonucleotide strands to form (Cy3)2 dimer-labeled p/t-DNA constructs, which contain both ds and ss DNA regions. For ‘duplex’ constructs, the (iCy3)2 dimer probes were positioned deep within the double-stranded region (at the +15 position) of the p/t- DNA construct. For ‘primer-template’ (p/t) DNA constructs, the (iCy3)2 dimer probes were positioned at the +4, +3, +2 and +1 positions relative to the ss–dsDNA junction. Prior to the experiments, the sample solutions were annealed by heating to 95°C for 3 minutes before they were allowed to slowly cool to room temperature. The expression and purification methods of gp45, gp44/62, and gp32 proteins have been previously reported [101]. T4 Bacteriophage DNA 164 Polymerase gp43 was purchased from Molecular Cloning Laboratories (MCLAB, San Francisco, CA). Table 8 Base sequences of p/t-DNA constructs used in these studies. +1 Construct 3’ CTC CCT CGT GTC GTC CGT TCT TGG CTG G(CY3)T TTG GTT GGT TAG GTC TTG TTT TGC CGG TCA 5’ 5’ GAG GGA GCA CAG CAG GCA AGA ACC GAC C(CY3)A 3’ +2 Construct 3’ CTC CCT CGT GTC GTC CGT TCT TGG CTG (CY3)GT TTG GTT GGT TAG GTC TTG TTT TGC CGG TCA 5’ 5’ GAG GGA GCA CAG CAG GCA AGA ACC GAC (CY3)CA 3’ +3 Construct 3’ CTC CCT CGT GTC GTC CGT TCT TGG CT(CY3) GGT TTG GTT GGT TAG GTC TTG TTT TGC CGG TCA 5’ 5’ GAG GGA GCA CAG CAG GCA AGA ACC GA(CY3) CCA 3’ +4 Construct 3’ CTC CCT CGT GTC GTC CGT TCT TGG C(CY3)T GGT TTG GTT GGT TAG GTC TTG TTT TGC CGG TCA 5’ 5’ GAG GGA GCA CAG CAG GCA AGA ACC G(CY3)A CCA 3’ +15 Construct 3’ CTC CCT CGT GTC GT(CY3) CCG TTC TTG GCT GGT TTG GTT GGT TAG GTC TTG TTT TGC CGG TCA 5’ 5’ GAG GGA GCA CAG CA(CY3) GGC AAG AAC CGA CCA 3’ 165 All sample chamber preparation, incubation times and experimental PS-SMF protocols are the same as outlined in Chapter 3 for DNA only studies of replication fork junctions at the +1, -1 and -2 positions. All ensemble data referenced in this Chapter, primarily linear absorbance, and circular dichroism (CD) data, have experimental protocols which have been well documented previously [24], [25], [68]. 4. RESULTS AND DISCUSSION Section 4.1 Duplex region studies of DNA p/t junctions The results of DNA only studies are presented first as the basis for later assignments and inferences in the presence of bound protein assemblies. DNA constructs labeled with (iCy3)2 dimer probes at the +1, +2 ,+3, +4 and +15 position were studied using PS-SMF to obtain thermodynamic, kinetic parameters and construct free energy landscapes to report on the conformational mechanisms within the duplex region of p/t junctions. The corresponding ensemble studies on these constructs were also performed and are shown below in Fig. 5.3, panels A and B. It can be immediately appreciated from these ensemble studies that the +1 and +2 positions are unique compared to the rest of the duplex positions examined. The steady drop in the CD signal moving from +3 to +1 suggests that the conformational freedom of the sugar-phosphate backbone must be steadily increasing as the junction is approached. 166 Figure 5.3 Linear absorbance (a) and CD (b) spectra of +1 (red), +2, (orange), +3 (green), +4 (blue) and +15 (black) constructs. Additionally, the clear shift in the linear absorbance signal from a strongly coupled right-handed signature at the +15 and +4 positions to a less coupled signature at the +1 and +2 positions suggests an increase in dynamics and decrease in relative stability as the junction is approached from the duplex side of these constructs. This inference in the absorbance spectra stems from similarities of the +2 and +1 position to the -1 position in previous ensemble studies, which is known to be uniquely disordered [1], [24]. The results for DNA only PS-SMF experiments, with optimized fits overlaid, for all positions studied, are given in Figure 5.4, panels A-E. Network topologies and kinetic rate constants are shown below each set of optimized fits. 167 Figure 5.4 Panel A: Experimental data and optimized fits for the +1 dimer (sequence given in Table 8). PDF, C2, C3, and kinetic network shown. 168 Figure 5.4 Panel B: Experimental data and optimized fits for the +2 dimer (sequence given in Table 8). PDF, C2, C3, and kinetic network shown. 169 Figure 5.4 Panel C: Experimental data and optimized fits for the +3 dimer (sequence given in Table 8). PDF, C2, C3, and kinetic network shown. 170 Figure 5.4 Panel D: Experimental data and optimized fits for the +4 dimer (sequence given in Table 8). PDF, C2, C3, and kinetic network shown. 171 Figure 5.4 Panel E: Experimental data and optimized fits for the +15 dimer (sequence given in Table 8). PDF, C2, C3, and kinetic network shown. 172 From the results of these optimized fits, we can construct free energy landscapes and begin to make inferences about the potential structure based on considerations of thermodynamic and mechanical stability. The structural assignments inferred in Chapter 4 will be drawn upon here to make similar conclusions about the conformational macrostates which underlie the visibility assignments in our PS-SMF measurements. Starting at the +1 position, we see a relatively large sub population of 𝑆1, 𝑆3 and 𝑆4 relative to 𝑆2. We assign 𝑆2 to B-form DNA, as was done in Chapter 4 and is consistent with the CD results shown for this construct. Comparison of the +1 position in this Chapter to the +1 position examined in Chapter 4 reveals a key difference between them, which is the propensity for 𝑆1 versus 𝑆3. The +1 position in Chapter 4 has a terminal GC base-pair directly at the junction, versus a terminal AT base pair in the dimer of Chapter 5. The extra hydrogen bond of a GC base-pair is indicative of a greater propensity to maintain complementary base-pairing. As such, the lower population of 𝑆1, the propped-open state due to loss of complementary hydrogen bonding, is more probable with a terminal AT base-pair than a GC base-pair. The relative stability of 𝑆3 is lower for the dimer with a terminal AT base-pair than for the GC dimer. This is likely due to the base-stacking interactions further inside the duplex which act to stabilize most right-handed conformations in the presently considered case versus the case considered in Chapter 4, given the assignment of 𝑆3 to a left- handed conformer. This hypothesis agrees with the increased probability of 𝑆4 in the presently considered +1 dimer, since this state was previously attributed to a right-handed, bases-unstacked conformation. The dynamics at the +1 position are the slowest seen in all DNA only studies of the p/t junctions considered in this Chapter. The +1 position in Chapter 4 also displayed the overall slowest dynamics compared to positions within the vicinity of a replication fork junction (-1 and - 173 2). This general trend seen at the +1 suggests a unique free energy landscape governs conformational fluctuations right at the site of ss-dsDNA junctions. Examination of the +2 position results shows that most of the probability still resides in 𝑆2, the right-handed B-form DNA conformation, consistent with ensemble results. Interestingly the relative probability of 𝑆1, 𝑆3 and 𝑆4 are all reduced compared to the probabilities seen in the duplex at the +3, +4 and +15 positions. This result is consistent with the observed dynamics, since the +2 displays the most rapid loss of correlation in both the two- and three-point TCF compared to the surrounding positions. The +2 position is also the only dimer labeled position flanked by GC base- pairs at both nearest neighbor sites. The inability to resolve very rapid interconversion events occurring at the +2 is likely the reason for this apparent greater stability of 𝑆2 at the +2 position compared to the +3, +4 and +15, all of which have stronger right-handed CD signals. Notably, the optimized network topology for the +1 and +2 positions were identical in the presently considered data, despite a blind search of the complete network topology space, suggesting some conservation in the allowed transitions of these conformational fluctuations near p/t junctions. Also notable is the conserved trend in the persistent connection between states 𝑆2 and 𝑆3 as well as 𝑆1 and 𝑆4 in previous and current kinetic networks results at both replication fork and p/t junctions. The results of our kinetic network analysis at the +3 and +4 position are generally consistent with the emerging picture of conformational dynamics in the duplex region of p/t junctions. State 𝑆2 is thermodynamically most stable, while 𝑆1 and 𝑆3 are relatively close in terms of their respective probability, while 𝑆4 remains least stable. The local sequence context of the +3 and +4 positions is quite similar (see Table 8), suggesting that the observed dynamics ought to be similar as well as the propensity for particular conformations to occur. This appears to be the case based on our kinetic network modeling, which suggests a highly similar picture for the +3 and +4 174 position in terms of the thermodynamic stability dictating the free energy landscapes at these local sites. This is also in agreement with CD and linear absorbance data on these constructs which suggest the tightest agreement between the +3 and +4 position relative to all other pairwise comparisons (Fig. 5.3). Lastly, the analysis results at the +15 position show a high probability for 𝑆2 relative to all other states, completing the retention of a B-form DNA conformer as the predominant structure at all duplex positions, as the ensemble data would suggest is the case. The probability of 𝑆1 is greatest at the +15 position, other than 𝑆2, suggesting that the locally propped-open conformation is available deep in the duplex to a greater extent than either of the oppositely handed unstacked base conformations, 𝑆3 and 𝑆4. The dynamics are also very rapid at the +15 position, evidenced by the fast component in the C2 and significant negative decay at 250usec in the C3. This aligns well with previous hypotheses that the rapid loss and gain of complementary hydrogen-bonding, consistent with a locally propped-open state, is occurring on the microseconds timescale in duplex DNA [1]–[3]. Examining the mechanical stability of these thermally populated conformational macrostates can offer valuable insights into the important mechanistic pathways which yield the observed kinetics. To do this, we will make use of the same approach developed in Chapter 4, where the mechanical stability of each macrostate is discussed in terms of the lowest transition barrier connecting it to any other macrostate in the kinetic network. The mechanical stability of each macrostate can be inferred based on the fastest kinetic rate driving transitions away from the macrostate under consideration. All rates and connections are shown in the kinetic network diagrams of Fig. 5.4 panels A-E, and Table 10 at the end of this Chapter. 175 Starting at the +1 position, the mechanical stability of the propped-open macrostate 𝑆1 is relatively low and determined exclusively by its connection to 𝑆4. The mechanical stability of B- form DNA 𝑆2, the thermodynamically most stable macrostate throughout the entire duplex, is determined by transitions to 𝑆4, which are an order of magnitude faster than transitions toward 𝑆3 in the presently consider +1 positional data. The presence of a terminal AT base-pair could offer some insight into the relative propensity for 𝑆2 to prefer fluctuations toward a bases-unstacked right-handed structure of 𝑆4 which is then able to be propped-open into 𝑆1. Whereas previous studies at the +1 position, where the terminal base-pair was GC, tended to prefer the left-handed 𝑆3 macrostate over the right-handed 𝑆4 or 𝑆1 macrostates. The mechanical stability of 𝑆3 in the presently considered data is slightly higher than that of 𝑆4, but determined exclusively by its transition back toward 𝑆2. The relative mechanical stabilities of all four conformational macrostates at the +2 position are quite similar to the +1 position, as is the network topology. The key difference in the two kinetic networks is the reduced mechanical stability of 𝑆4 at the +1 position, which makes very rapid transitions to either 𝑆2 or 𝑆1, causing the relative abundance of 𝑆4 in the PDF to fall off moving from +1 to +2 in the duplex. The mechanical stability of 𝑆2 is also slightly lower at the +2 position but given the decreased mechanical stability of all non-majority conformers, the relative abundance of 𝑆2 isn’t altered much. Moving deeper into the duplex to the +3, +4 and +15 positions, the network topology rearranges slightly compared to the +1 and +2 positions, yielding a new core set of connections between all four macrostates in the system. This shift going from +1/+2 toward deeper duplex sites is seemingly supported by the ensemble CD and absorbance data showing a systematic difference between the +1/+2 and all sites deeper into the duplex (Fig. 5.3). The mechanical stability of 𝑆2 is 176 carefully balanced by its transitions to both 𝑆1 and 𝑆3 at the +3 and +4 positions. 𝑆1 has low mechanical stability in general, with transitions toward either 𝑆4 or back to 𝑆2 being preferable at both the +3 and +4 positions. 𝑆4 is mechanically unstable at both the +3 and +4 positions and is dominated by transitions to 𝑆1. At the +15 position, the mechanical stability of 𝑆2 is carefully balanced by transitions toward all 3 remaining macrostates, with subtle differences distinguishing the predominant pathway for a fluctuation to occur. The mechanical stability of 𝑆1 is dominated by transitions toward 𝑆4, similar to 𝑆3 which also prefers transitions toward 𝑆4 over an immediate return to 𝑆2. Fluctuations out of 𝑆4 occur very rapidly with 𝑆3, while transitions toward 𝑆2 and 𝑆1 are relatively balanced. Notably, the mechanical stability of 𝑆2 is significantly greater than all 3 remaining macrostates at the +15 position, as would be expected for B-form DNA deep inside the duplex. The analysis of these DNA only data allows us to now consider the effects of protein binding events to p/t junctions, as the free energy landscape is rearranged by the functional action of such macromolecular machines. Section 4.2 Protein-DNA complexes studies at and near DNA p/t junctions Section 4.2.1 Clamp-Clamp Loader Studies The PS-SMF experimental results for DNA complexed with the clamp-clamp loader, with optimized fits overlaid, are given in Fig. 5.6. Linear absorption and circular dichroism spectra of the clamp-clamp loader protein-DNA complex is shown in Fig. 5.5. Examination of the ensemble data for the clamp-clamp loader shows that the +1 position is uniquely affected by binding of the clamp, whereas the +3 and +15 position are weakly perturbed. Notably, the handedness of the 177 ensemble remains right-handed at the +1 position in the presence of the clamp loader, suggesting that B-form DNA and other such right-handed conformers remain the dominant species. For these reasons, only the +1 position was investigated in the presence of the clamp-clamp loader complex, given its unique response to binding. ATP𝛾s was added to the experimental samples for the clamp- clamp loader, as this is known to activate the clamp-loader and enable binding of the clamp, but not allow the clamp-loader to fully dissociate at normal rates due to the non-hydrolyzable nature of ATP𝛾s. This ultimately enables dynamical studies of the stalled complex at a p/t junction. The clear shift in the CD spectra upon titration of ATP𝛾s suggests that the fully assembled complex is uniquely forming at the +1 position only when the clamp-loader is activated. Figure 5.5 Linear absorbance and CD spectra of +15 (a, b), +3 (c, d), and +1 (e, f) constructs (red) upon the addition of gp45 clamp (green) and activated gp44/62 clamp loader (blue). 178 Figure 5.6 Experimental data and optimized fits for the +1 dimer and clamp-clamp loader complex. PDF, C2, C3, and kinetic network shown. 179 The results of our kinetic network analysis for the +1 position in complex with the clamp- clamp loader demanded a minimum of five conformational macrostates to explain the observed dynamics. Much like high monovalent salt conditions at the +1 position, discussed in Chapter 4, demanded an additional conformational macrostate to explain the observed dynamics. The thermodynamically most stable state upon clamp-clamp loader binding is still 𝑆2. It is known that the open clamp adopts a spiral geometry to accommodate the native helical structure of B-form DNA, suggesting that initial loading of the clamp to a p/t junction should favor the native B-form structure, which our results also suggest [91], [92], [102], [103]. The overall thermodynamic stability of 𝑆1 is lowered at the +1 position by the presence of the clamp-clamp loader. It has been hypothesized, drawing on crystal structures, that ionic interactions within the central channel of the clamp help to stabilize the interaction between charged regions of DNA and the charged interior of the clamp pore [91], [92]. In this case, the ability of the DNA to be propped-open at the junction is likely reduced, because of the steric constraint placed on the newly encapsulated DNA junction in the presence of the clamp-clamp loader. The thermodynamic stability of 𝑆3 remains close to the +1 position DNA only results, suggesting that the clamp-clamp loader does little to leverage or deplete the interactions leading to formation of a left-handed conformer during its assembly. Most notably, the thermodynamic stability of 𝑆4 is significantly increased in the presence of the clamp- clamp loader. Given this increased stability of 𝑆4, it is likely that this conformational macrostate plays an important role in the formation of the fully bound protein-DNA complex. We have previously assigned 𝑆4 to a right-handed, bases-unstacked conformation, consistent with observed trends in salt-dependent DNA only studies and indicative of a more parallel geometry of the sugar- phosphate backbone, as would be required for a high visibility observation (Eq. 15). Interestingly, previous crystallographic studies have shown that p/t DNA junctions must be thread through the 180 clamp subunits I & III and the clamp loader A and A’ domains [91] (see Fig. 5.1). This gap in the two-proteins forming the complex is relatively small compared to the diameter of B-form DNA. We hypothesize that the enriched signal from 𝑆4 in our PS-SMF experiments, consistent with a right-handed bases unstacked conformation, could be the conformation selected for in the accommodation of DNA into the clamp-clamp loader complex. Such a conformation would have its sugar-phosphate backbone extended along the central axis of the DNA to a greater extent than B-form, possibly leading to a reduction in the overall diameter of the double helix at the +1 position, such that loading could occur through the clamp subunits I & III and the clamp loader A and A’ domains. The fifth macrostate 𝑆5, has an appreciable thermodynamic stability on the order of 𝑆3. This higher visibility state could also be involved in the loading process of DNA into the protein complex, as it further corroborates the necessary elongation of the DNA double helical axis and accompanied narrowing of DNA’s diameter to fit into the protein complex. Alternatively, the emergence of a fifth macrostate could be due to the slower dynamics in the presence of the clamp- clamp loader, which much like the elevated monovalent salt conditions at the +1 position of Chapter 4, lead to longer dwell times in the 𝑆5 conformational macrostate, such that it falls within the achievable temporal resolution of our PS-SMF experiments. It is clear from the redistribution of the free energy landscape at the +1 position in the presence of the clamp-clamp loader that 𝑆4 plays an important role in the assembly and function of the overall complex and that 𝑆2, native B- form DNA, remains the most thermodynamically stable, owing to the spiral geometry the clamp is known to adopt upon binding and activation of the clamp loader. Examination of the mechanical stability of the clamp-clamp loader complex offers unique mechanistic insights. The mechanical stability of B-form DNA, 𝑆2, is dominated by transitions to 𝑆4, the same macrostate which is enriched and seemingly leveraged by complex formation. The 181 mechanical stability of 𝑆4 is similarly dominated by its transition to 𝑆2, demonstrating that the free energy landscape is dominated by transitions between these two conformational macrostates, evidenced by the relative height of all remaining transition barriers given in Table 9. This rapid interconversion between 𝑆2 and 𝑆4 supports the notion that 𝑆4 could be the p/t junction conformation that the clamp-clamp loader selects for as it tries to dynamically load the DNA into the central pore. Since the complex is stalled and binding/unbinding transiently, the lower mechanical stability of both 𝑆2 and 𝑆4 could be reflecting this selection mechanism during binding/unbinding events. The mechanical stability of 𝑆3 is relatively high compared to other macrostates. It could be that this left-handed state forms rarely due to the influence of the clamp- clamp loader, but intermolecular contacts tend to stabilize it upon formation. Alternatively, the enthalpy-entropy compensation of forming 𝑆3 could be uniquely stabilizing, despite high transition barriers into 𝑆3. The mechanical stability of 𝑆5 is closely balanced by transitions to either 𝑆1 or 𝑆2. Notably, since 𝑆5 is a high visibility state like 𝑆4, its conformation could be similar to 𝑆4, but not proper for binding of the clamp-clamp loader. Interestingly, a similar connection between high visibility states in DNA polymerase data exists, suggesting a productive versus unproductive route to protein assembly. The mechanical stability of 𝑆1 is low and determined exclusively by its transition back to 𝑆5. 182 Section 4.2.2 DNA polymerase studies Figure 5.7 (a) CD spectra of gp43 binding to +4 p/t construct with at 0 Mg2+ concentration. Upon addition of physiological Mg2+ (6 mM), gp43 becomes active (green trace). (b) Further increasing Mg2+ concentration results in predominately left-handed conformation at +4 position. Ensemble circular dichroism data for the +4 position upon addition of DNA polymerase (gp43) and dNTPs, with increasing levels of titrated magnesium, are shown in Fig. 5.7. The +4 position is strongly right-handed in the DNA case. The titration of DNA polymerase and dNTPs initially causes a very slight shift in the CD spectra, suggesting some weak association. However, the addition of bivalent salt in the form of magnesium causes the CD spectra to sharply respond and eventually invert to a left-handed conformation at highly elevated levels of magnesium. This sort of interconversion in DNA polymerase between two oppositely handed conformers is reminiscent of trends observed in the CD spectra of coupled base analogues over a range of variable calcium concentrations, which tuned the degree of exonuclease versus polymerase activity in the bound complex at a p/t junction [99]. Taking the observed balance between the two active 183 modes of DNA polymerase at a p/t junction as a starting point for our PS-SMF studies, we can identify which conformational macrostates in our model correspond to the exonuclease and polymerase configurations. 184 Figure 5.8 Experimental data and optimized fits for the +4 dimer and DNA polymerase complex. PDF, C2, C3, and kinetic network shown. The thermodynamic stability of 𝑆1,𝑆2 and 𝑆3 at the +4 position are all significantly altered by the binding of DNA polymerase, seen in Fig. 5.8. Macrostates 𝑆4 and 𝑆5 are considerably less stable but necessary to explain the observed dynamics. Additionally, 𝑆4 plays an important intermediate role in the optimized kinetic network, supporting its necessity in the observed dynamics. Given the assignment of 𝑆3 to a left-handed bases unstacked state, this is likely the exonuclease mode within the complex. This is supported by a few key pieces of information. First, ensemble CD results show that a left-handed conformation becomes favored as bivalent salt is titrated into solution to drive the DNA polymerase bound p/t junction into the exonuclease pocket (Fig. 5.7). Additionally, crystallographic studies as well as computational studies suggest that the arrangement of DNA in the exonuclease mode is perturbed away from B-form, with the sugar- phosphate backbone being rearranged locally [91], [98], [104]. This local rearrangement is more consistent with the high visibility left-handed conformer 𝑆3 than the low visibility propped-open conformer of 𝑆1, since the crystal geometry of the DNA sugar-phosphate backbone in the exonuclease pocket is well off from 90 degrees (low signal visibility). The computational studies performed on the DNA holoenzyme suggested a timescale of interconversion between exonuclease and polymerase modes on the order of milliseconds [98]. The states 𝑆1 and 𝑆3 are directly 185 connected in the optimized kinetic network, both of which directly transition back to B-form DNA 𝑆2. The time scale of interconversion between 𝑆1 and 𝑆3 is on the order of 1ms and is the most rapid interconverting pair of macrostates within the network. This, as well as the previously mentioned structural considerations, suggests that if 𝑆3 is the exonuclease mode then 𝑆1 is the polymerase mode, leaving 𝑆2 as B-form DNA. In this case, 𝑆4 plays an important role as the intermediate which tends to drive transitions from 𝑆2 into the rapidly interconverting 𝑆3 and 𝑆1. Given our assignment of 𝑆4 as a right-handed, bases unstacked conformer, it could be that the loss of base-stacking interactions allows the transition to exonuclease activity to proceed without a significant enthalpic penalty from disruption of favorable stacking interactions. Upon forming the exonuclease pocket, the complex rapidly interconverts between its proof-reading and polymerizing functions, until returning to native B-form DNA. Also of note is that 𝑆2 and 𝑆5 rapidly interconvert between one another, but 𝑆5 is not connected with any other states in the network. It could be that 𝑆5 is structurally related to the other high visibility macrostate 𝑆4, which 𝑆2 also rapidly exchanges with, but 𝑆5 isn’t a productive structure for the formation of the exonuclease mode. In this case B- form DNA is rapidly sampling between two closely related conformers, 𝑆4 and 𝑆5, but only leverages 𝑆4 to induce dynamic switching between exonuclease and polymerase modes. The mechanical stability of a DNA p/t junction when bound to DNA polymerase is highly suggestive of a quasi-one-way loop mechanism. The thermodynamically most stable macrostate, 𝑆2, is mechanically less stable transitioning toward 𝑆5 and 𝑆4 than it is transitioning toward 𝑆1 and 𝑆3, the proposed polymerase and exonuclease modes respectively. As mentioned previously, it could be that 𝑆5 is structurally related to 𝑆4 but differs enough that it proves unproductive in inducing rapid switching between exonuclease and polymerase activity within the complex. When 𝑆2 makes a transition into 𝑆4, the mechanical stability of 𝑆4 barely favors a transition directly back 186 to 𝑆2 over a transition to 𝑆3. However, when 𝑆4 does transition to 𝑆3, the mechanical stability of 𝑆3 is dominated by rapid fluctuations toward 𝑆1, with the mechanical stability of 𝑆1 also being dominated by transitions back toward 𝑆3. Transition barriers back to 𝑆4 from 𝑆3 are significant and tend to cause 𝑆3 and 𝑆1 to rapidly interconvert until transitioning back to 𝑆2 where the cycle begins anew. Notably, 𝑆2 has high transition barriers to directly forming either 𝑆3 or 𝑆1, leading to the formation of a quasi-one-way loop mechanism. In this loop, the intermediate macrostate 𝑆4 is required to induce either of the two active modes of the polymerase starting from B-form DNA 𝑆2. Additionally, both the direct formation of the active modes (𝑆3 and 𝑆1) from 𝑆2 and backward transitions from active modes to the intermediate 𝑆4 are rare. This leads to conformational fluctuations within the DNA polymerase complex occurring mostly in a single direction within the kinetic network once the productive intermediate 𝑆4 successfully forms the exonuclease macrostate. 5. CONCLUSION The interpretation of these results suggests that the DNA p/t junction at the +1 position rapidly fluctuates between multiple thermodynamically stable macrostates, which are selected for by the clamp-clamp loader complex for binding. The macrostate 𝑆4 appears to be the predominant conformer leveraged by the clamp-clamp loader during binding. Our proposals for the structures of 𝑆2 and 𝑆4 are consistent with previous crystallographic studies which have pointed out the small channel which DNA must pass through to load into the central pore of the clamp-clamp loader complex, as well as the spiral geometry of the clamp itself upon activation by the clamp loader to fit the helical backbone of B-form DNA [91], [92], [102]. The free energy landscape of the +1 187 position is significantly rearranged by the action of the clamp-clamp loader, suggesting that observed structural changes in ensemble CD and absorbance measurements performed on this same system are being recapitulated at the single-molecule level. The proposed mechanism of DNA polymerase at a DNA p/t junction appears to be consistent with both previous ensemble biochemical studies and recent computational studies detailing the role of exonuclease versus polymerase modes as they dynamically interconvert [91], [98], [99]. Interestingly, the mechanism born out of our kinetic network analyses suggests a highly one- directional flow of probability around the optimized network of macrostates. The timescales of interconversion and relative abundance of these macrostates matches the intuitive picture put forth by previous studies and are again consistent with both CD and absorbance results that suggest a close competition between active polymerase modes. The free energy landscape of the polymerase data is significantly different than that of the +4 position DNA only data, suggesting we retain sensitivity to protein binding in our single molecule assays, even without the clamp present to tether the polymerase to the junction. The experimental and computational methods described in Chapters 3-5, detailing the PS-SMF approach and its potential utility, can be readily extended to other protein-DNA systems. The most exciting and meaningful targets for such studies will surely be protein-DNA complexes which are highly dynamic on a local scale, precluding the possibility of more traditional FRET measurements and demanding time resolution not typically available to most camera-based imaging systems. The extension of these results might come in the form of additional independent studies beyond the scope of the clamp-clamp loader and DNA polymerase, but interesting insights might be gained by altering both the solvent environment during single-molecule experiments or considering alternate sequences and junction topologies on the ability of these same protein complexes to form 188 and remain stable. The often-accepted notion that replication machinery is agnostic to sequence could be tested further to examine possible issues of fidelity or rate limiting steps in replisome assembly due to alterations of both the junction itself and the solvent surround. Table 9 Optimized kinetic network model parameters for the free energy minima and activation barriers (left column) and optimized PDF parameters (right column) for all positions and conditions studied. Blank entries denote connections or states which did not exist in the optimized network models. 189 190 191 Table 10 Optimized kinetic network model parameters for the kinetic rate constants for all positions and conditions studied. Entries with ‘Inf’ denote connections which did not exist in the optimized network models. 192 CHAPTER 6 : CONCLUDING SUMMARY The technical developments of PS-SMF detailed in Chapters 2 and 3 enable the possibility for dynamical estimates of the local conformational free energy landscape of ss-dsDNA junctions as well as protein-DNA complexes to be made. Owing to the high degree of local specificity and microsecond time resolution, the persistence of four conformational macrostates emerged as the minimally necessary picture to explain the observed data. The sensitivity of the measurement to both position and salt concentration was demonstrated in Chapter 2 and 4. The analysis protocol developed for these single molecule experiments is robust in terms of the spanning set of possible kinetic networks optimized during analysis. Another strength of the approach is the ability to address data sets with largely disparate timescales in the two- and three-point TCFs as well as largely different PDFs. This inherent lack of bias for any particular experimental data set facilities large scale optimization without much alteration to optimization parameters, thereby removing most of the tedious effort required to meet the needs of each individual data set. The application of PS-SMF to protein-DNA complexes in Chapter 5 proves to be a useful analytical tool for investigating the biochemical mechanisms at play in such complexes. The availability of conformations at ss-dsDNA junctions appears to play an important role in the binding and function of these macromolecular machines. The role of intermediates in the formation of the productive modes of each complex appear important, as do the thermodynamic barriers which drive directed action through particular paths in the optimized networks. Potential structural and steric arguments exist which might elucidate the role each conformational macrostate plays in the process of protein complex assembly and function. The appendix details various forays into the use of broadband excitation onto single molecules for potentially useful information that is uniquely available to broadband and higher 193 order spectroscopies. Despite the technical successes of such efforts, no new insights emerged which weren’t already available from ensemble experiments. Some room does exist to improve and redesign these experimental approaches to better harness the inherent strengths of single molecule spectroscopy, namely dynamics and resolution of heterogeneity. Lastly, interesting future directions drawing on combinations and extensions of these works can be imagined. The role of therapeutics in the disruption or enhancement of the conformational space of a protein, DNA or protein-DNA complex could be examined and understood through the lens of PS-SMF experiments. Efforts are currently underway in a variety of industrial and academic settings to leverage SM-FRET as a tool for drug discovery and drug mechanism in a variety of biophysical systems. Owed in part to the longstanding success and significant development of FRET protocols, analyses and biophysical implications, these efforts are making headway. A similar direction could be envisioned for PS-SMF experiments. Additionally, multiplexed experiments employing both a FRET pair and Cy3 homodimer for simultaneous FRET and PS-SMF measurements could be used to examine issues of allosteric control, colocalization effects from multi-subunit complexes and potentially be coordinated with therapeutics candidates to address a wide variety of biophysical questions pertinent to human health and disease. 194 APPENDIX : INVESTIGATIONS INTO BROADBAND INTERFEROMETRIC MEASUREMENTS PERFORMED ON SINGLE MOLECULES 1. OVERVIEW The research results presented in this final appendix are a compilation of various unpublished ultrafast single-molecule approaches developed within the Marcus and von Hippel labs by Jack Maurer and Amr Tamimi, with help of Anabel Chang and Lulu Enkhbataar. Broadly speaking, few labs have achieved ultrafast spectroscopy results on single molecules. This is largely due to the need for both novel detection schemes of single molecule signals as well as the often-fragile nature of individual molecules, which in a fluorescence-based experiment can present significant limitations on available acquisition time. Two main areas of focus were taken on by the contributing authors in this appendix. The first area of focus is the development and implementation of linear and nonlinear interferometric measurements on single molecules, in both one and two dimensions. The second area of focus is the design and implementation of novel time- correlated experiments on single molecules using broadband laser sources. These time-correlated experiments closely echo the approach taken in Chapter 3 to probe the inherently cross-polarized transitions of an electronically coupled i(Cy3)2 dimer. During these investigations, a rather intuitive but robust single molecule sample setup was devised by Jack Maurer to enable the long- term imaging of fluorescent fluorophores on the time scale of hours, which was essential to performing these experiments. An important point should be made regarding these studies and the possibilities offered up by their initial success. While the technological achievement that these results represent is a first of 195 its kind, the biophysical insights gained from the data sets are either redundant when comparing to ensemble results or lacking in sensitivity compared to simpler experiments using CW laser sources. The rather high level of effort and difficulty required to make these experiments successful begs the question, how valuable must these experimental insights be to outweigh their cost in terms of time and resources? In the case of the protein-DNA or DNA only systems investigated in the Marcus and von Hippel group, the answer is surely that the value of the results doesn’t outweigh the costs of performing the experiments. In the closing remarks of this appendix, alternate systems and use cases will be briefly discussed where the approaches described herein might find enhanced utility. 2. INTRODUCTION In many chemical systems a degree of heterogeneity persists both statically and temporally, which tends to obscure the underlying structures and dynamics at the individual molecule level. This is particularly true of biological systems, where a high degree of conformational disorder exists across multiple scales and numerous subsystems within the cell [1].The spectroscopic study of single molecules permits real time measurement of system observables, enabling study of their time evolution. Gaining access to system subpopulations and the instantaneous value of these spectroscopic observables can negate the effects of ensemble averaging, as demonstrated by a variety of well-studied approaches [11], [14], [105], [106]. Recently, advancements have been reported in the detection and analysis of one-dimensional linear and nonlinear spectroscopic signals obtained from single molecules under broadband pulsed excitation [52], [54], [107], [108]. Meanwhile, multidimensional approaches have exclusively been applied to traditional molecular 196 ensembles or spatially localized ensembles on the order of tens of nanometers [24], [25], [47], [109], [110]. Multidimensional spectroscopic techniques can uniquely provide information such as the presence or absence of coupling between optical transitions in photophysical aggregates, monitor the evolution of excited state population dynamics, and provide insight into the relative contributions of homogenous and inhomogeneous line broadening effects in heterogeneous systems. In the presently discussed work, we successfully performed linear, nonlinear, and multidimensional experiments on single molecules of dsDNA containing monomers and dimers of Cy3 fluorophores rigidly inserted into the sugar-phosphate backbone of dsDNA. The signals reported here are acquired using a collinear pulse train and phase modulation technique developed by Marcus and coworkers [48], [111]. Acquisition of the linear response along a single interferometric time delay, followed by Fourier transform, yields the linear absorption spectra while tracking of the nonlinear response over two independent time delays yields the two- dimensional interferogram of a single molecule and the accompanying two-dimensional spectrum via Fourier transform. The linear as well as nonlinear signals in our multidimensional data sets are obtained through single photon time and phase tagging of the Stokes shifted fluorescence signal of individual molecules [69]. This phase-tagging method is based on earlier techniques which similarly employed single photon time and phase tagging to probe the optical response of single molecules [13], [57]. In higher ordered time-resolved spectroscopic approaches, the set of possible observables and the information they contain is much broader than conventional continuous-wave or temporally static spectroscopies [112]–[115]. However, these higher ordered approaches rely on lengthy experimental acquisition times, often across multiple time delays, which results in a 197 measurement time scale that has not been accessible to single molecule techniques due to the rapid and irreversible photobleaching that ubiquitously occurs in fluorescence-based approaches using organic dyes [77] .The work presented here overcomes the traditionally inaccessible regime of higher ordered measurements on single molecules by devising a scheme which permits the fluorophores under study to sustain emission for many hours on end. Recent work to better understand the mechanism of dark-state formation as well as improve the overall yield and stability of organic dyes could aid in the refinement of the approach outlined here[116], [117]. Notably, the lengthy integration times required for successfully obtaining the full two-dimensional histogram tends to cause the results to closely resemble ensemble studies when measuring the traditional rephasing and non-rephasing signals for estimates of the conformational heterogeneity [24]. Alternative approaches to obtaining these data sets, which may enable better access to dynamical effects at the single-molecule limit, are discussed in the conclusions section. The work contained here is largely an investigative project to test the possibilities of each potential measurement for its utility within the broader field of biophysical science. As benchmarks of utility, we examine the effects of time averaging and total experimental acquisition time on the demodulation of linear and nonlinear responses of single molecules by time and phase tagging of single photons. We also present preliminary attempts to perform time-correlated analyses on a variety of linear signals obtained from single molecules, as both a comparison with the established PS-SMF approach and as a unique path toward isolating dynamical fluctuations in the presence of broadband excitation. We compare single molecule results to those obtained in bulk for the same dsDNA systems in the case of fully sampled interferometric measurements. The thresholds for acceptable SNR are established by examination of the resulting one- and two-dimensional spectra over a wide range of integration times and total photon numbers. The uncertainty in our 198 measurements is established through well-known boot-strapping methods for statistical characterization of error when sampling from an underlying probability distribution [69]. Lastly, we introduce a novel method for fluorophore lifetime enhancement using a closed-loop nitrogen purged imaging system. 3. MATERIALS AND METHODS Section A.3.1: Experimental Setup Linear and nonlinear spectroscopic measurements are carried out by sending the output of a high-repetition- rate non-collinear optical parametric amplifier (NOPA) into either a single, or pair of, Mach-Zender interferometers (MZI) that each contain two acousto-optic modulators (AOMs) that sweep the relative phase of the two MZI arms according to 𝜙𝑖𝑗(𝑡) = 𝜈𝑖𝑗𝑚𝑇 with 𝜈𝑖𝑗 = 𝜈𝑖 − 𝜈𝑗 being the difference frequency of the two AOMS within the arms of each MZI and 𝑚𝑇 describing the 𝑚-th pulse in the pulse train with an interpulse separation period of 𝑇 [48], [111]. The recombined pulse train from each MZI is sent to a 50/50 beam splitter, and then relayed to a Nikon TE2000 microscope where a total internal reflection (TIRF) geometry is used to excite a sample of immobilized single molecules. Fluorescence is spatially isolated by use of a 100um pinhole and collected by a high NA immersion oil objective (100x) which directs the transmitted photon stream to either an em-CCD camera for spatial imaging or an APD for time and phase tagging. A schematic of the experimental apparatus is shown in Fig. A.1. Single molecule samples are prepared by passivating the surface of a quartz microscope slide with a combination of mPEG (M.W.=500) and Biotin mPEG. A glass coverslip is passivated using mPEG (M.W.=500). The chemical details of this procedure are described elsewhere [14]. A sample 199 chamber is assembled from a passivated quartz slide and an appropriately cut section of passivated glass coverslip using a diamond tipped score. The quartz slide is drilled using a diamond tipped drill bit to achieve two holes with approximately ~2cm separation along the long axis of the slide. Two 100uL pipet tips are inserted into the quartz slide and the tips are shaven off from the biotinylated surface using a razor blade, actively avoiding the imaging area. Four 1uL droplets of TRIS buffer (100mM NaCl, 10mM TRIS, 6mM MgCl2) are cast between the two drilled inlets on the quartz slide, the prepared section of glass coverslip is carefully placed to fall onto the drop cast region. The edges of the glass cover slip are sealed on the slide using a two-part rapid-cure 3M epoxy. Two short (~4-6cm) segments of Tygon tubing (1/8” internal diameter) are inserted into the remaining ends of the pipet tips and sealed in place using epoxy. The open ends of the Tygon tubing are fit with 1/8”-to-male luer lock adapters. Once all epoxy has dried fully, a 300uL portion of TRIS buffer is injected into the sample chamber using a standard 1mL syringe to check for leaks or clogs. Fully assembled sample chambers may be stored in a fridge for up to 10 days. Assuming no issues exist, 100uL of a Neutravidin solution (0.1mg/mL) is injected into the sample chamber and allowed to react for ~1min prior to a complete flush of the sample chamber using 300uL of TRIS buffer. Next, 100uL of 1pM biotinylated dsDNA solution is injected into the sample chamber and allowed to react for ~1min prior to a complete flush with 300uL of TRIS buffer. The completed slide sample chamber is placed in a protective dark environment to reduce photodegradation of the bound Neutravidin. 200 Figure A.1 Schematic of the experimental apparatus used for ultrafast linear and nonlinear single molecule measurements. A solution of 30mg of Trolox, 120uL of 1M NaOH, 100mg Glucose and 10mL of TRIS buffer is prepared and mixed for a minimum of 2 hours to ensure complete dissolution. Separate solutions of 89mg of glucose oxidase in 1mL of buffer and 57mg Catalase in 1mL of buffer are both prepped and divided into 60uL aliquots, with a single aliquot of each needed per sample. The remaining aliquots can be kept at -20oC for long term storage. A INSI tech series 720p peristaltic pump with a C60 Flex hairpin loop is attached to two long pieces (~1.5’ each) of 1/8” Tygon tubing at each port of the hairpin loop. One of the Tygon tubes is equipped with a 1/8”-to-male luer lock adapter with a 1.5” two way stop-cock attached. The other Tygon tube is attached to a 1/8”-to-male syringe adapter. A third piece of Tygon tubing is cut (1-1.5’) and fitted with a 1/8”-to-male luer lock adapter 201 and a 1.5” two way stop-cock at one end, with the opposite end being attached to a 1/8”-to-male syringe adapter. One 5mL reservoir with airtight septum, two 20-gauge needles and two 18-gauge needles are acquired for final assembly. Tygon tubing lines are cleaned with ethanol and dried prior to use. Once peristaltic pump assembly is finished, a mixture is prepared with 4.9mL Trolox solution, 50uL Catalase and 50uL oxidase (TOSS). TOSS is mixed thoroughly and centrifuged, separating out the supernatant into manageable portions. The pump system is assembled by securing the empty reservoir with septa in place using a ring stand or other device to be near the experimental apparatus. The septa is pierced just once using one 18-gauge needle, inserted completely into the reservoir. The septa is again pierced just once using one 20-gauge needle, inserted completely into the reservoir. Using a syringe with a separate needle attached, aliquots of the TOSS solution are moved into the reservoir using the 18-gauge needle until nearly full (~3.5-4mL). A second 18-gauge needle is inserted and both pump lines are attached to the two 18-gauge needles. TOSS solution is flowed through the dead volume of the pump until full. N2 gas, flowed in via the 20-gauge needle, can be used to generate back pressure in the Tygon pump lines if peristalsis fails to extract TOSS solution from the reservoir. Tygon tubing leads are attached to sample slide, stop-cocks closed, and the reservoir is brought up to full volume. The final 20-gauge needle attached to a pure N2 tank is inserted and a gentle stream of N2 gas inside the reservoir is initiated; with the opposing 20-gauge needle and Tygon tubing line inserted into a vial of water to avoid pressure buildup in the reservoir. The pump system is gently run with slide attached, stop cocks open and nitrogen flowing for ~20-30min to ensure no clogs or leaks form. If intact after waiting period, the pump is shut down. N2 gas is left running and the system is allowed to purge under N2 overnight. After a full purge period, the sample slide is imaged 202 and gently pumped every few hours for ~5min to replenish the working volume of TOSS inside the sample chamber. Section A.3.2 Linear Signal Derivation In the high flux regime at a single MZI delay of 𝑡𝑖𝑗, the linear signal obtained by integration of the fluorescent photon stream is described by Eq. (53), which comes from considering the effects of imparting a phase modulated pulse train onto the transition dipole moment of a chromophore [24], [68], [69]. Starting with the form of the electric field for any one pulse that has a complex valued phase, 𝑖𝜈𝑗𝑚𝑇, imparted to it by the AOMs: ?⃗? (𝜔, 𝑡) = 𝑎 (𝜔)𝑒𝑖𝜔𝑡𝑗−𝑖𝜈𝑗𝑚𝑇𝑗 𝑗 (48) Utilizing the results of Appendix A.10 in [48], we can write the resulting product of the j-th field matter interaction with transition dipole moment of the molecule, 𝜇 𝑛𝑔, as a ket, where the possible transition frequencies, 𝜔𝑛𝑔, are summed over: |𝜓𝑗⟩ = 𝑖 ∑𝑛 𝜇 𝑎 (𝜔 )𝑒 𝑖𝜔𝑛𝑔𝑡𝑗−𝑖𝜈𝑗𝑚𝑇 𝑛𝑔 𝑗 𝑛𝑔 |𝑛⟩ (49) Next, we can consider the overlap between any two such field-matter interactions to produce a population capable of fluorescence by radiative decay. These interactions can come from either two distinctly modulated pulses, from separate arms of the MZI, or from the same pulse. We will assume a non-zero directional overlap between vector elements in the sum. ∗ ⟨𝜓 |𝜓 ⟩ = ∑ |𝜇 |2𝛼 (𝜔 ) 𝛼 (𝜔 )𝑒𝑖𝜔𝑛𝑔𝑡𝑗−𝑖𝜈𝑗𝑚𝑇𝑒−𝑖𝜔𝑛𝑔𝑡𝑖+𝑖𝜈𝑖𝑚𝑇𝑖 𝑗 𝑛 𝑛𝑔 𝑖 𝑛𝑔 𝑗 𝑛𝑔 ⟨𝑛|𝑛⟩ (50) 203 Assuming the field of the pulses to be spectrally identical, and their times of arrival independent from one another: 2 = ∑ |𝜇 |2𝛼(𝜔 ) 𝑒𝑖𝜔𝑛𝑔(𝑡𝑗−𝑡𝑖)−𝑖(𝜈𝑗−𝜈𝑖) 𝑚𝑇𝑛 𝑛𝑔 𝑛𝑔 (51) Utilizing the simplifications states above, we arrive at: 2 ⟨𝜓 |𝜓 ⟩ = ∑ |𝜇 |2𝛼(𝜔 ) 𝑒𝑖𝜔𝑛𝑔(𝑡𝑖𝑗)−𝑖𝜙𝑖𝑗𝑖 𝑗 𝑛 𝑛𝑔 𝑛𝑔 (52) When taking 𝑖 ≠ 𝑗, we obtain a phase modulated term, where 𝜙𝑖𝑗 ≠ 0. In the case of 𝑖 = 𝑗 we obtain a background term which does not contribute to the modulated portion of the time varying fluorescence signal. Combing terms from both cases yields an overall expression for the linear signal given by Eq. (53): 𝐴 (𝜙𝑖𝑗, 𝑡𝑖𝑗) = 𝐴bkgd + 𝐴lin (𝜙𝑖𝑗, 𝑡𝑖𝑗) 2 2 2 2 = 2∑𝑛|𝛼(𝜔𝑛𝑔)| |𝜇𝑛𝑔| + 2Re∑𝑛|𝛼(𝜔𝑛𝑔)| |𝜇𝑛𝑔| exp {𝑖 [𝜙 ( )𝑖𝑗 𝑡 − 𝜔𝑛𝑔𝑡𝑖𝑗]} (53) The sum is carried over all molecular excited state levels (labeled n), 𝜇𝑛𝑔 is still the transition dipole matrix element that couples the ground and nth excited states, 𝜔𝑛𝑔 is the complex-valued optical transition frequency and |𝛼(𝜔) |2 is the intensity of the laser pulse spectrum [111]. The two terms contributing to Eq. (53) are 𝐴bkgd and 𝐴lin(𝜙𝑖𝑗 , 𝑡𝑖𝑗). 𝐴bkgd represents the constant background that arises from single pulse interactions, stray light and detector noise; all of which are independent of the phase sweep in the MZI. The second term, 𝐴lin(𝜙, 𝜏), depends on the phase of the MZI and represents the modulated contribution to the time varying fluorescence that can be detected and demodulated to obtain the signal quadratures. 204 In the regime of single molecules, where the flux of fluorescent photons is low and on the order of 103 to 105 per second, an alternate approach for demodulating the signal amplitude and phase is required [13], [57], [69]. In this regime, we regard each TTL detection event registered at the APD as a Dirac delta function 𝛿(𝜙 − 𝜙𝑘) that instantaneously samples the signal phase 𝜙𝑖𝑗 from a probability distribution specified by the high-flux photon count rate of Eq.(53). Over the course of a fixed integration period TPT, the phase dependent photon rate arising from N detection events can be treated as the discrete sum in Eq.(54): 𝐴𝑃𝑇(𝜙, 𝑡 1 𝑁𝑖𝑗) = ∑𝑘=1 𝛿(𝜙 − 𝜙𝑘) (54) 𝑇𝑃𝑇 In order to isolate the linear signal from the background via Fourier transform, the photon rate must be phase-tagged with respect to the phase coordinate. This is mathematically equivalent to summing over the N photon phase factors: 2𝜋 1 𝑍𝑃𝑇(𝑡 ) = ∫ 𝐴𝑃𝑇(𝜙, 𝑡 )𝑒−𝑖𝜙lin 𝑖𝑗 𝑖𝑗 𝑑𝜙 2𝜋 0 (55a) 2𝜋 𝑁 𝑁 1 1 = ∫ [∑ 𝛿(𝜙 − 𝜙 )] 𝑒−𝑖𝜙𝑑𝜙 = ∑ 𝑒−𝑖𝜙𝑘 𝜋𝑇 𝑘 𝑃𝑇 𝜋𝑇𝑃𝑇 0 𝑘=1 𝑘=1 (55b) = 𝑓𝑃𝑇 𝑃𝑇 𝑃𝑇 |𝑣 (𝜏)|𝑒𝑖[𝛾 (𝜏)−𝜔𝑅𝜏] (55c) Where 𝑣𝑃𝑇(𝜏) is the visibility of the signal, 𝛾𝑃𝑇(𝜏) is the phase of the signal and 𝜔𝑅 is the frequency of the optical reference used for down sampling[48], [69], [111]. The quadrature’s 205 typically obtained from demodulation of analog signals correspond to the real and imaginary parts of the complex valued linear signal obtained by discrete sum over the N photon phase factors. 1 𝑍𝑃𝑇(𝜏) = 𝑋𝑃𝑇(𝜏) − 𝑖𝑌𝑃𝑇 𝑁lin lin lin (𝜏) ≡ [∑𝑘=1 cos(𝜙𝑘) − 𝑖 ∑ 𝑁 𝑘=1 sin(𝜙𝑘)] (56) 𝜋𝑇𝑃𝑇 Where the usual definition for amplitude and phase of a complex valued vector applies. The complex valued molecular susceptibility can then be obtained by Fourier transformation of the complex valued linear signal with respect to the MZI delay, which yields the molecular overlap spectrum [111]. Section A.3.3 Nonlinear Signal Derivation In the high flux regime at two unique MZI delays of 𝑡𝑖𝑗 and 𝑡𝑙𝑘 the nonlinear signal obtained by integration of the fluorescent photon stream is described by Eq. (60 and 61), which comes from the same considerations as were discussed for Eq. 53. The form of each field is the same as before, but owing to the combinatorics of four pulse sequences versus two pulse sequences, more possible terms exist which can contribute to the time varying fluorescence signal modulated at combinations of the underlying linear phase signatures, yielding nonlinear terms. For the sake of brevity a single term out of all possible terms will be derived and considered. The approach taken is generally applicable to all remaining terms. Utilizing the results of Appendix A.12 in [48], we can write the resulting product of the i-th, j- th and k-th field matter interaction with transition dipole moment of the molecule, 𝜇 𝑛𝑔, as a ket, where the possible transition frequencies are individually summed over. We consider only the rephasing and nonrephasing pathways in this appendix. This limits the scope of signal terms to 206 those arising from the overlap of first and third order wave packets [48]. Whereas the terms arising from overlap of two mutually second order wave packets are exclusive to double quantum coherence terms, which are not considered in this work [48], [110]. Notably, the middle interaction in all third order wave packet terms of this approach are taken to act oppositely the other two interactions, yielding a difference in sign from the phase imparted by the middle pulse: |𝜓𝑖𝑗𝑘⟩ = |𝜓 ∗ 𝑖⟩|𝜓𝑗 ⟩|𝜓𝑘⟩ = 𝑖 ∑ 𝜇 𝑎 (𝜔 )𝑒𝑖𝜔𝑎𝑔𝑡𝑖−𝑖𝜈𝑖𝑚𝑇𝑎 𝑎𝑔 𝑖 𝑎𝑔 |𝑎⟩ ∑𝑏 𝜇 𝑏𝑔𝑎 𝑗(𝜔𝑏𝑔)𝑒 −𝑖𝜔𝑏𝑔𝑡𝑗+𝑖𝜈𝑗𝑚𝑇⟨𝑏| ∑ 𝑖𝜔𝑐𝑔𝑡𝑘−𝑖𝜈𝑘𝑚𝑇𝑐 𝜇 𝑐𝑔𝑎 𝑘(𝜔𝑐𝑔)𝑒 |𝑐⟩ (57) Given the constraint in our experiments that relevant signal terms must result in an excited state population, two of these transition dipole elements need be the same (i.e. 𝑏 = 𝑐), with the remaining transition dipole element being equal to the first order wave packet that overlaps this third order wave packet, in the course of a four-pulse interaction sequence [48]. = 𝑖 ∑ 𝜇 𝑎 (𝜔 𝑖𝜔𝑎𝑔𝑡𝑖−𝑖𝜈𝑖𝑚𝑇 −𝑖𝜔𝑏𝑔𝑡𝑗+𝑖𝜈𝑗𝑚𝑇 𝑖𝜔𝑏𝑔𝑡𝑘−𝑖𝜈𝑘𝑚𝑇𝑎 𝑎𝑔 𝑖 𝑎𝑔)𝑒 |𝑎⟩ ∑𝑏 𝜇 𝑏𝑔𝑎 𝑗(𝜔𝑏𝑔)𝑒 𝜇 𝑏𝑔𝑎 𝑘(𝜔𝑏𝑔)𝑒 ⟨𝑏|𝑏⟩ (58) If we take the pulses to be spectrally identical: 2 = 𝑖 ∑ 𝜇 𝑎 (𝜔 )𝑒𝑖𝜔𝑎𝑔𝑡𝑖−𝑖𝜈𝑖𝑚𝑇|𝜇 |2𝑎 (𝜔 ) 𝑒𝑖𝜔𝑏𝑔𝑡𝑗−𝑖𝜈𝑗𝑚𝑇 𝑒𝑖𝜔𝑏𝑔𝑡𝑘−𝑖𝜈𝑘𝑚𝑇𝑎,𝑏 𝑎𝑔 𝑎𝑔 𝑏𝑔 𝑏𝑔 |𝑎⟩ (59) We can now overlap this third order wave packet with a first order wave packet (Eq. 49) to obtain the general form of the rephasing and nonrephasing signal terms: ⟨𝜓𝑙|𝜓𝑖𝑗𝑘⟩ 2 = ∑|𝜇 |2𝑎 (𝜔 ) 𝑒𝑖𝜔𝑎𝑔𝑡𝑖−𝑖𝜈𝑖𝑚𝑇𝑒−𝑖𝜔 2 𝑎𝑔𝑡𝑙+𝑖𝜈𝑙𝑚𝑇 2 𝑎𝑔 𝑎𝑔 |𝜇 𝑏𝑔| 𝑎 (𝜔 ) 𝑒 −𝑖𝜔𝑏𝑔𝑡𝑗+𝑖𝜈𝑗𝑚𝑇 𝑒𝑖𝜔𝑏𝑔𝑡𝑘−𝑖𝜈𝑘𝑚𝑇𝑏𝑔 𝑎,𝑏 2 2 = ∑ |𝜇 |2𝑎 (𝜔 ) |𝜇 |2𝑎 (𝜔 ) 𝑒𝑖𝜔𝑎𝑔(𝑡𝑖−𝑡𝑙)−𝑖(𝜈𝑖−𝜈𝑙)𝑚𝑇𝑒−𝑖𝜔𝑏𝑔(𝑡𝑗−𝑡𝑘)+𝑖(𝜈𝑗−𝜈𝑘)𝑚𝑇𝑎,𝑏 𝑎𝑔 𝑎𝑔 𝑏𝑔 𝑏𝑔 (60) 207 The identification of purely rephasing versus nonrephasing terms comes from the explicit consideration of which pulse produces the first order wave packet. In general, these are four such relevant terms (plus their complex conjugates): 𝑆𝑅𝑃 + 𝑆𝑁𝑅𝑃 = ⟨𝜓1|𝜓234⟩ + ⟨𝜓2|𝜓134⟩ + ⟨𝜓3|𝜓124⟩ + ⟨𝜓4|𝜓123⟩ + ⟨𝜓123|𝜓4⟩ + ⟨𝜓124|𝜓3⟩ + ⟨𝜓134|𝜓2⟩ + ⟨𝜓234|𝜓1⟩ = 2 𝑅𝑒[ ⟨𝜓1|𝜓234⟩ + ⟨𝜓2|𝜓134⟩ + ⟨𝜓3|𝜓124⟩ + ⟨𝜓4|𝜓123⟩] (61) Isolation of the terms which contain differences of the underlying linear signal phases (𝜙21 − 𝜙43) leads to identification of the rephasing signal, while isolation of terms containing the sum of the underlying linear signal phases (𝜙21 + 𝜙43) leads to identification of the nonrephasing signal [48]. In a typical ensemble experiment, both linear signal phases are tracked independently, then multiplied and low-pass filtered to achieve mixing and produce the necessary reference signals for demodulation of nonlinear signals using a lock-in amplifier. But in the case of low-flux single- molecule experiments, every photon is tagged for its time of arrival and phase with respect to both linear references signals. The construction of the nonlinear signal in the low-flux regime is very similar to the approach taken for the linear signal case (Eqns. 54-56), but instead of mixing analog signals as is done in the case of ensemble experiments, simple multiplication of the photon phase tags arising from the two independent linear references yields the nonlinear signal. This results in a total phase factor for the rephasing and nonrephasing signals which appears at the difference and sum of the linear phase tags, respectively. We can define two linear photon phase tag terms, one for each MZI: 𝐴𝑃𝑇21 (𝜙21, 𝑡 1 𝑁 𝑘 21) = ∑𝑇 𝑘=1 𝛿(𝜙21 − 𝜙21) (62) 𝑃𝑇 208 𝐴𝑃𝑇 1 𝑧 43 (𝜙43, 𝑡 1 𝑁 𝑧 𝑁 −𝑖𝜙43 43) = ∑𝑇 𝑧=1 𝛿(𝜙43 − 𝜙43) = ∑ 𝑒 (63) 𝑃𝑇 𝜋𝑇 𝑧=1𝑃𝑇 Here, 𝜙21 is the general coordinate that tracks the phase relationship between the two arms of the first MZI, while 𝜙𝑘21 is the instantaneous phase assigned to the k-th photon detection event. The same follows for 𝜙43 which tracks the second MZI. Each of these terms is a discrete sum over the instantaneous phase of each photon arriving at the APD, with respect to either of the two linear reference signals. Multiplication of these linear phase-tagged sums, prior to averaging over the phase coordinate, yields the nonrephasing photon tags: 𝐴𝑃𝑇 1 𝑁 𝑘 𝑘𝑁𝑅𝑃(𝜙21, 𝜙43, 𝑡21, 𝑡43) = ∑𝑘=1 𝛿(𝜙21 − 𝜙21) 𝛿(𝜙43 − 𝜙43) (64) 𝑇𝑃𝑇 While exchange of the sign on the second reference signal yields the rephasing photon tags: 𝐴𝑃𝑇 1 𝑁 𝑘 𝑘𝑅𝑃(𝜙21, 𝜙43, 𝑡21, 𝑡43) = ∑𝑘=1 𝛿(𝜙𝑇 21 − 𝜙21) 𝛿(𝜙43 + 𝜙43) (65) 𝑃𝑇 Averaging over each phase coordinate respectively, we obtain: 𝑍𝑃𝑇NRP(𝜙21, 𝜙43, 𝑡21, 𝑡43) = 1 2𝜋 2𝜋 1 𝑘 𝑘 ∫ ∫ 𝐴𝑃𝑇 −𝑖(𝜙21+𝜙43) 𝑁 −𝑖(𝜙21+𝜙43) 2𝜋 0 0 𝑁𝑅𝑃 (𝜙21, 𝜙43, 𝑡21, 𝑡43)𝑒 𝑑𝜙21𝑑𝜙43 = ∑𝑘=1 𝑒 (66) 𝜋𝑇𝑃𝑇 𝑍𝑃𝑇RP(𝜙21, 𝜙43, 𝑡21, 𝑡43) = 1 2𝜋 2𝜋 1 𝑘 𝑘 ∫ ∫ 𝐴𝑃𝑇𝑅𝑃(𝜙21, 𝜙43, 𝑡21, 𝑡 )𝑒 −𝑖(𝜙21−𝜙43) 43 𝑑𝜙21𝑑𝜙 = ∑ 𝑁 𝑒−𝑖(𝜙21−𝜙43)43 𝑘=1 (67) 2𝜋 0 0 𝜋𝑇𝑃𝑇 Both these terms can be separated into their real and imaginary parts to obtain the quadratures for both the rephasing and nonrephasing signals, the same as was done for the linear signals in Eq. 56. With demodulation of both the linear and nonlinear phase tagging signals, we can examine the results of single-molecule experiments leveraging the discrete sums of Eqns. 55b, 66 and 67. 209 4. RESULTS AND DISCUSSION Section A.4.1 Linear Ultrafast Single Molecule Measurements Representative linear interferometric measurements performed on single molecules are shown in Fig. A.2. These measurements were obtained with 1-second integration time and a sampling of 51 time point along the 𝑡21 axis. The resulting ~1-minute scans display a similarly well resolved spectral distribution centered at the laser bandwidth. Yet, the spectral features appear to differ between these two scans. The inherent differences between scans are likely due to noise effects at the level of 1-minute acquisition times, but in principle could arise from unique sets of conformational fluctuations during data acquisition. Further refinement of the SNR in these experiments, leading to a reduction of the integration time, as well as compressed sensing in the time domain could being to resolve dynamics at the single molecule level in the form of time resolved linear absorption spectra. 210 Figure A.2 Two representative linear interferometric experiments on single molecules of dsDNA containing a i(Cy3)2 dimer. Interferogram data is shown in the first column and the resulting Fourier transform is shown in the second column. Recent insights gleaned from single molecule studies of protein-DNA interactions reveal that the mechanical stability of certain macromolecular complexes obey transition time scales on the order of hundreds of milliseconds to tens of seconds. These sorts of timescales could be easily amenable to rapid linear interferometric experiments, to obtain a time resolved sampling of the linear absorbance spectrum. However, the information gained from such efforts would need to exceed the dynamical information already available from more conventional single molecule techniques. In the case of rapidly sampled linear absorption spectra, one could model the dipole- dipole interactions occurring within the i(Cy3)2 dimer, using the tools developed by the Marcus lab [24], [46], [47], to obtain structural parameters on a scan-by-scan basis. The limitations here are primarily due to the SNR of such rapidly obtained spectra, but the inherently greater capacity 211 to address structure as well as dynamics could prove valuable. Whereas in more conventional single molecule measurements the structural details are often inferred or limited to a small set of geometric parameters. One important consideration in improving these measurements is the optical setup employed. In the experiments presented here, the output of the NOPA is set at a repetition rate of 144kHz. Given that the pulse duration in our instrument is on the order of ~30fs, the number of fluorescent photons detected is certainly dominated by the fluorescent lifetime of the i(Cy3)2 dimer, which is orders of magnitude longer in time. Higher repetition rates at the pulse generation source could provide a better photon budget during experiments. Notably groups with demonstrated success in related measurements have often employed MHz repetition rate systems [107], [108]. As a demonstration of the limitations on linear measurements with increasing or decreasing the number of detected photons, a series of linear absorption spectra are shown in Fig. A.3, where the integration time at each step in the 𝑡21 delay space was varied 42seconds (Column A) to 1sec (Column B) and finally 100ms (Column C). The data were obtained during a full two-dimensional experiment, such that both MZI 1 (top row) and MZI 2 (bottom row) could be examined simultaneously for their relative loss in SNR as the allowable integration time was varied. 212 Figure A.3 A series of linear Fourier transforms are shown as a function of integration time, ranging from 42sec (left column) to 100ms (right column). The top and bottom rows are the linear transform obtained by demodulation of each interferometers linear signal contribution, which are prepared simultaneously in the course of a two-dimensional experimental protocol. The lower limit of 100ms shows a relatively poorly resolved spectra, while the upper limit at the full 42 second integration time is well resolved. Importantly, these scans were performed with only 9 steps in either the 𝑡21 or 𝑡43 domain, to minimally satisfy the Nyquist sampling frequency such that total acquisition time of the full interferogram could be kept at a minimum. From these initial attempts, it appears that experiments utilizing integration times on the order of 100ms are possible, but will be limited by their SNR. Section A.4.2 Nonlinear Ultrafast Single Molecule Measurements Nonlinear experiments were also performed on single molecules, enabled both by novel time and phase-tagging of single photons as well as the closed-loop sample chamber discussed in the previous section [69]. Initially, these experiments were investigated for their feasibility given 213 the technical hurdle they presented. Later, attempts were made to average together multiple scans from the same molecule to obtain high SNR reconstructions of the rephasing and nonrephasing spectrum. The minimal interferometric sampling and integration time at each point in the delay space was used for individual two-dimensional scans, owing to the long-term drift which can occur in the instrument over the course of one experiment that tends to obscure the true time-zero of each interpulse delay. An example two-dimensional spectrum of a single molecule is shown in Fig. A.4, all individual scans were obtained by taking ten steps of 2.5fs along the 𝑡21 dimension and 9 steps of 2.5fs along the 𝑡43 dimension. At each point in the delay space, the signal is integrated for 42 seconds. This results in a total of 90 steps and a scan time of 63 minutes. A single 63-minute experiment can yield results without further averaging, but best results are obtained from averaging over multiple 63-minute interferograms. A symmetric grid in 𝑡21 and 𝑡43 is constructed from the asymmetric sampling in post processing using a retiming and rephasing procedure. This result is then passed to a two-dimensional Fourier transform, yielding the two-dimensional spectrum. Multiple retimed and rephased data sets from the same molecule were averaged to obtain the results displayed below in Fig. A.4. For comparison to Fig. A.4, a Cy3 monomer spectra obtained in ensemble studies is shown in Fig. A.5, courtesy of Dr. Dylan Heussman [25]. The results of single molecule experiments and ensemble experiments closely resemble one another. While their equivalence provides an excellent check on the validity of the single molecule approach, it also suggests that any ability to study conformational subpopulations or dynamics, without the effects of ensemble averaging, is severely limited. As has been noted in numerous other studies regarding ultrafast spectroscopic measurements on single molecules [54], [107], [112], the typically low flux of fluorescent photons 214 from a single molecule demands a lengthy integration time to achieve reasonable SNR in the demodulation of linear or nonlinear signals. Of course, these signals must also be sufficiently sampled in the time domain to avoid aliasing from frequencies beyond Nyquist, which places a constraint on the minimum separation and number of points sampled in the interpulse delay space. This innate limitation on data acquisition time juxtaposed with the traditionally short-lived fluorescent emission from single molecules demands an alternative experimental approach if sub- ensemble averaged information is desired. Compressed sensing approaches offer reduced acquisition times without loss of spectral information [112], [118], [119], but have not yet been applied at the level of single molecules. Figure A.4 Two dimensional spectra obtained on a single dsDNA molecule containing a monomer of Cy3 rigidly inserted into the sugar-phosphate backbone. The real (upper row) and imaginary parts (lower row) of the nonrephasing and rephasing spectra are shown on the left and right respectively. 215 Figure A.5 Real part of the rephasing spectrum obtained on an ensemble of dsDNA molecules containing a monomer of Cy3. This ensemble result can be directly compared with the upper right panel of Fig. A.4. Section A.4.2.1 Averaging effects on signal-to-noise and error analysis of nonlinear single molecule data sets Figures (A.6) and (A.7) show the real (top row) and imaginary (bottom row) rephasing spectra obtained from single dsDNA molecule containing either a Cy3 monomer (Fig. A.6) or dimer (Fig. A.7) incorporated at the +1 position. These spectra are subjected to a similar test of SNR as the linear spectra of Figure (A.3), where the integration time is variably adjusted from a maximum of 42 seconds down to a minimum of 5 seconds at each point in the time domain. 216 Figure A.6 Rephasing 2D spectra obtained from single Cy3 monomers incorporated into dsDNA. The integration time at each time step is variably adjusted to demonstrate the limits on SNR. Figure A.7 Rephasing 2D spectra obtained from single Cy3 dimers incorporated into dsDNA. The integration time at each time step is variably adjusted to demonstrate the limits on SNR. As can be seen from the resulting rephasing spectra for both the monomer and dimer constructs, the SNR of the rephasing spectra drops dramatically as the integration time is lowered. This drop in SNR occurs well before an integration time of 5 seconds is reached. This is somewhat expected 217 given the higher ordered contributions to nonlinear signals, which greatly reduces the magnitude of nonlinear components demodulated from the time varying fluorescent photon stream. These results also reveal that experimental acquisition times are doubtful to get below the timescale of tens-of-minutes, utilizing the presently discussed protocols, which reduces utility at the single molecule limit. With the use of photon time and phase tagging on single molecules, errors present in the discrete detection can greatly influence experimental results given the relatively low number of photons detected. Therefore, the uncertainty in demodulated signal quadrature’s resulting from low photon numbers requires quantification. This quantification is based on scaling of signal magnitude with the number of photons detected in the window 𝑇𝑃𝑇 of Eqns. 66 and 67. Assuming N photon detection events corresponding to N measurements of the quantity 𝑥 (𝑥1, 𝑥2, 𝑥3 …𝑥𝑁) are statistically independent, then the distribution of measurement outcomes (𝑥1, 𝑥2, 𝑥3 …𝑥𝑁) obey gaussian statistics. In this case, the SNR of these measurements can be assigned as the ratio of the distribution average to the standard deviation of the distribution, 𝑆𝑁𝑅 = ?̅?⁄𝜎 , which scales with 𝑥 √𝑁 [69]. To demonstrate that the nonlinear signals demodulated from single molecules are both valid and not subject to additional sources of error beyond statistical sampling uncertainty, we plot the demodulated magnitude of the real valued rephasing and nonrephasing signals versus the number of photons used to calculate the average. As can be seen in Fig. A.8, the magnitude of the real valued rephasing and nonrephasing signals scale as √𝑁 over the range of photon numbers examined. The standard deviation of the signal at each photon number N is weakly determined by means of typical error estimation. This occurs when the underlying probability distribution is not sufficiently sampled from. In this work, we have estimated the standard deviation of the underlying distribution by employing a bootstrapping approach which randomly samples, with replacement, 218 from the same underlying data set of N photons to form a more reliable estimate of the standard deviation at each photon number examined [69]. In order to assess the validity of the signals obtained via discrete time and phase tagging, an examination of the power dependence for both linear and nonlinear signals is necessary. This can effectively rule out detector saturation effects, photon pile-up and the possibility of dark counts forming the major contribution to the demodulated amplitudes. We expect that in our regime of single molecules, the number of photons N emitted from the sample is dominated by the square of 2 2 the molecular dipole projected onto the electric field, 𝑁 ∝ ∑𝑛|𝛼(𝜔𝑛𝑔)| |𝜇𝑛𝑔| . Based on Eq. (53), we expect a linear dependence between the field intensity and magnitude of the linear signal. Therefore, if both the number of photons N and the linear signal magnitude both vary linearly with the field intensity, their ratio should be constant and independent of the incident field intensity. A non-constant ratio would imply signal artifacts are present to a significant degree. In the case of the nonlinear signals, the magnitude is proportional to the square of the field intensity, as shown in Eq. (60). In this instance, we expect the ratio of the demodulated nonlinear signal magnitude and incident field intensity to obey a linear relationship. Therefore, as the incident field intensity is attenuated, the magnitude of the nonlinear signal should scale linearly. A final test of both the linear and nonlinear signals validity is to attenuate 219 Figure A.8 Scaling of Nonrephasing and Rephasing signals for variable photon number N used to average the signal from single molecules. the emitted photon stream of a single molecule, holding excitation constant. This attenuation of the fluorescent photons can test whether the photon flux arriving at the detector is introducing saturation artifacts. Saturation and photon pile up can result in an artificial overestimate of the signal magnitudes. If no spurious effects are present, then the magnitude of both the linear and nonlinear signals should be constant when attenuating the emitted fluorescent photons. The signals are integrated over the same fixed number of photons at each stage of attenuation to guarantee a similarly reliable estimate of the quadrature magnitudes. Figure A.9 shows the linear and nonlinear signal magnitude scaling of a Cy3 monomer in dsDNA for both pre and post attenuation, corresponding to attenuation of the incident field intensity and emitted photon flux, respectively. Figure A.10 shows the same scaling for a Cy3 dimer in dsDNA. Both cases reveal the predicted trends in signal scaling, affirming that the measured signals are predominantly due to the time 220 varying fluorescence emitted from the excited state populations we prepare and not systematic artifacts of the experimental setup. Figure A.9 Scaling of the linear and nonlinear signal magnitudes from a Cy3 monomer in dsDNA. The pre and post attenuation reflects a reduction in the incident laser power and emitted photon flux, respectively. 221 Figure A.10 Scaling of the linear and nonlinear signal magnitudes from a Cy3 dimer in dsDNA. The pre and post attenuation reflects a reduction in the incident laser power and emitted photon flux, respectively. Section A.4.3 Time-correlated measurements of linear signals from single molecules under broadband excitation To test the possible limits on obtaining dynamical estimates of conformation in dsDNA systems, time-correlated measurements were taken on single molecules under broad-band excitation. The motivation for these investigations is the possible increase in sensitivity to conformational fluctuations afforded by a broader probe of the distribution describing the absorption spectra of i(Cy3)2 dimers. Additionally, simulated studies of polarization sweep 222 experiments utilizing broadband excitation suggested possible interdependency between the measured signal visibility and interpulse delay, 𝑡21, within one MZI. The combination of enhanced spectral sampling with a broad source and potential encoding of conformation dynamics as function of interpulse delay led to the setup of such an experiment. A diagram of the experimental setup is shown in Fig. A.11. The visibility, or complex amplitude, of the demodulated linear signal components (Eq. 56) was analyzed for fluctuations about its mean value, the same analytical approach employed in Chapter 3 for PS-SMF experiments. Figure A.11 Experimental apparatus for broadband polarization-sweep measurements. Half-wave plates (HP), linear polarizers (LP) and a quarter waveplate (QWP) are labeled according to their position within the interferometer. The simulated results of a non-rotating (left panel) versus rotating (right panel) broadband polarization experiment are shown as a function of 𝑡21 in Fig. A.12. The simulation assumes a constant geometry and handedness for the i(Cy3)2 dimer. 223 Figure A.12 Linear signal components arising from either symmetric or anti-symmetric transitions, in either a rotating (right panel) or non-rotating (left panel) polarization scheme. Eqns. 68 and 69 show the dependence of the signal arising from the symmetric versus antisymmetric manifolds of the coupled i(Cy3)2 dimer, in the non-rotating and rotating cases respectively. Eqn. 68 is equivalent to Eqn. 53 of the appendix, section 3, except that the orientation of the single molecule, 𝜑0, causes the signal magnitude to drop as overlap with the linear polarization of the laser is lost. The derivation of this signal is identical to the approach shown for Eqn. 53 of the appendix, section 3, but first a dipole-field inner product is considered to account for the singular orientation of all dipoles probed in the experiment. 𝐴+ is the signal from the symmetric manifold while 𝐴− is the signal from the anti-symmetric manifold. ± 2𝐴 (𝜑, 𝜏 ±21) = ∑𝑛|𝜇𝑛𝑔 ∙ 𝐸(𝜔)| [1 + 𝑐𝑜𝑠(𝜔𝜏 2 21 − 𝛺𝑚𝑇)]𝑐𝑜𝑠 (𝜑0) (68) ± ± 2𝐴𝑅𝑜𝑡(𝜑, 𝜏21) = ∑𝑛|𝜇𝑛𝑔 ∙ 𝐸(𝜔)| [1 ± 2 𝑠𝑖𝑛(𝜔𝜏21 − 𝛺𝑚𝑇 + 2𝜑0)] (69) 224 Rotation of the lasers polarization results in the dipole orientation giving rise to an overall signal phase, rather than an overall signal diminishment in the case of a non-rotating polarization. In both cases, one can ask whether the measured signal from a single molecule would fluctuate if the molecule changed its conformation. To test this, two oppositely handed geometries of a coupled i(Cy3)2 dimer were assumed in the simulation and the difference between the signal quadrature’s arising from one conformer versus the other were calculated. This was done for both the rotating and non-rotating case. It can be seen in Fig. A.13 that the rotating case appears to have much greater signal contrast between the oppositely handed conformers, suggesting sensitivity in the signal visibility to the simulated conformational change occurring during the course of the experiment. Figure A.13 The signal contrast between two oppositely handed conformers in the case of rotating (right panel) versus non-rotating (left panel) polarization under broadband excitation. Direct differences in demodulated quadrature data are plotted. However, experimental implementation of this rotating polarization scheme resulted in the measurement of a persistent signal visibility when exciting a sample of isotropic rhodamine (Fig 225 A.14). This is unexpected within our model and not consistent with the isotropic nature of rhodamine in solution. Notably, this persistent visibility only occurred when employing a broadband QWP, which maintains a quarter wave of retardance across the entire laser bandwidth. No tuning of the relative polarization in the two arms MZI could eliminate this artifact, i.e. there was no interpulse interference contributing to the signal modulation as one might expect in the non-rotating scheme. This was confirmed by removal of the QWP. Since the aim with a rotating polarization scheme is to examine fluctuations of the signal visibility, this persistent contribution from the role of the broadband QWP prevented reliable experimental results from being obtained. It should be noted that multiple broadband QWPs, with varying degrees of accuracy and precision, were tested to try and eliminate this effect. No such effect is observed in the PS-SMF experiments of Chapter 3 where a narrowband QWP is employed for use at 532nm exclusively. 226 Figure A.14 A histogram of phase-tags obtained on an isotropic same of rhodamine at 𝑡21 = 0 using the experimental setup shown in Fig. A.11. A ~15% modulation of the signal can be observed. The inability to reliably preform rotating polarization experiments under broadband excitation led to the test of both non-rotating time-correlated experiments (conventional parallel polarization setup) and a time-correlated experiment reminiscent of linear dichroism studies, where the two arms of the MZI are held cross polarized but no QWP is employed, much like Phelps et.al [13]. The non-rotating experiment attempts to leverage the unique set of spectral interference patterns, which evolve at the modulation frequency, as a function of the interpulse delay, 𝑡21. The hypothesis in this case is, if there is an optimal interpulse delay which produces the best (or worst) spectral overlap with the fluctuating absorption profile of a single molecule, then the time- correlated statistics of the signal visibility should yield unique outcomes as a function of 𝑡21. In essence, this experiment attempts to encode conformational dynamics in the signal fluctuations observed as a function of 𝑡21. The results of time-correlation function analyses in the non-rotating case are shown for both the rhodamine control and a +1 dimer in Fig. A.15. Figure A.15 Non-rotating polarization time-correlation function analysis of a control (rhodamine) and single molecule containing a Cy3 dimer. Data are binned within a narrow range of 𝑡21 to boost the statistics available at low signal levels. 227 The TCFs above are complete noise in the case of the rhodamine control and are very weakly resolved in the case of the +1 dimer. Notably these data are integrated to a binning window of 1ms, which typically results in high SNR TCFs in the more conventional PS-SMF experiment, at similar levels of fluorescent photon flux. These initial experimental attempts suggested that a non-rotating time-correlated approach would yield noisy results at best, without much sensitivity to the interpulse delay. Due to the noise profile present, we choose to also examine the fluctuations in the pure photon flux, which in previous single molecule studies has shown to be sensitive to experimental conditions and therefore might show distinction between TCFs as a function of 𝑡21. This was done to motivate refinement or improvement of such experimental results, in the case that a clear 𝑡21 dependence was observed. The pure flux TCFs as a function of 𝑡21 are shown in Fig. A.16. The SNR of these TCFs is certainly better than those performed on the visibility signals in Fig. A.15. But there is no clear separation between the TCFs in the pure flux case, further suggesting that any inherent dependence of the measured fluctuations as a function of 𝑡21 is either lacking or too weak to resolve. However, this test is partially incomplete given the limited number of molecules included in the statistics. But nonetheless, no obvious distinction in the TCFs as a function of 𝑡21 emerges to motivate further investigation of such phenomena. 228 Figure A.16 Non-rotating polarization time-correlation function analysis of photon flux for a single molecule containing a Cy3 dimer. Data are binned within a narrow range of 𝑡21 to boost the statistics available at low signal levels. The final experimental protocol tested during time-correlated measurements employing a broadband source was two cross-polarized pulses in the two arms of an MZI and no QWP, examined over the range of various 𝑡21. This experiment is much like linear dichroism studies touched on previously, but notably lacks any background signal visibility during control experiments which complicated the interpretation of experiments employing broadband QWP and holds potential for uniquely encoded dynamics as a function of interpulse delay (unlike previous linear dichroism experiments). Control experiments were performed on both rhodamine and a stretched Cy3 film. Both cases show no correlation in the TCFs of Fig. A.17. 229 Figure A.17 Two-point TCFs for both control data sets, in the cross-polarized case, binned at 1ms. Rhodamine (left) and Cy3 film (right). The same experiment was then performed on single molecules containing a Cy3 dimer at both the +1 and +15 position. The TCFs for both cases when 𝑡21 = 0 are shown in Fig. A.18. This case is the first and only experimental protocol to display time correlated fluctuations in the signal visibility, when employing a broadband source. While binning at 1ms results in distinguishable TCFs above the noise floor, these results are significantly lower in their SNR than similar experiments performed for these same constructs with the PS-SMF instrument. Additionally, there is weak separation between the +1 and +15 TCFs, which in previous studies were shown to be distinctly different in their conformational landscape. 230 Figure A.18 Two-point TCFs of the signal visibility for a +1 and +15 dimer in the cross-polarized experimental protocol. Binned at 1ms with 𝑡21 = 0. Whether an exhaustive sampling of the 𝑡21 domain would yield unique insights for this cross- polarized protocol remains undetermined, as those experiments were not attempted due to the inability to perform accurate retiming of the interferometric signals prior to time-correlated analyses. Ultimately it was deemed unlikely that time-correlated experiments of signal visibility under broadband excitation would yield meaningful results above and beyond those already available from CW PS-SMF experiments. 231 5. CONCLUSION The results of these studies demonstrate the feasibility of detecting linear and nonlinear signals from single molecules under broadband excitation, in either one- or two-dimensional protocols. Notably, the detection of nonlinear signals from single molecules is not new [107]. However, multi-dimensional measurements on a single molecule do appear to be the first of their kind. Additionally, these studies place limits on SNR as a function of integration time, which informs the potential for dynamically resolved spectra to be obtained. Time correlated analysis of linear signals failed to produce robust statistic or meaningfully different outcomes under a variety of experimental protocols, although improvements to the instrument or uniquely creative implementation of alternate approaches may still prove useful. The possible applications of this work are likely outside the scope of single molecule biophysics, where fluctuations are rapid and unlikely to be interferometric sampled on relevant time scales. The possibility to address single crystals or semiconductor nanostructures within an ensemble of statically heterogeneous members could prove useful, although advanced approaches already exist for characterization of most solid-state materials. Systems exhibiting slow dynamics could be examined here using fully sampled interferometric techniques, or sparsely sampled compressed sensing approaches, should the relevant timescales be sub-minutes. It appears unlikely that time-correlated analyses of these sort will exceed the utility offered by simpler, more conventional CW experiments. Systems with more significant line shape fluctuations, or generally broader excited state manifolds might be well probed by such approaches. But again, dynamics would need to be sufficiently slow to obtain well resolved photon statistics for each configuration of the system. 232 [1] P. H. Von Hippel, N. P. Johnson, and A. H. Marcus, ‘Fifty years of DNA “breathing”: Reflections on old and new approaches’, Biopolymers, vol. 99, no. 12, pp. 923–954, 2013, doi: 10.1002/bip.22347. [2] M. P. Printz and P. H. von Hippel, ‘On the Kinetics of Hydrogen Exchange in Deoxyribonucleic Acid, pH and Salt Effects’, Biochemistry, vol. Vol 7, no. No. 9, pp. 3194– 3206, 1964. [3] B. Mcconnell and P. H. Von Hippel, ‘Hydrogen Exchange as a Probe of the Dynamic Structure of DNA II. Effects of Base Composition and Destabilizing Salts’, 1970. doi: J. Mol. Bio. [4] N. G. Nossal and B. M. Peterlin, ‘DNA replication by bacteriophage T4 proteins. The T4, 43, 32, 44-62 and 45 proteins are required for strand displacement synthesis at nicks in duplex DNA’, Journal of Biological Chemistry, vol. 254, no. 13, pp. 6032–6037, 1979, doi: 10.1016/s0021-9258(18)50515-9. [5] C. F. Morris, N. K. Sinha, and B. M. Alberts, ‘Reconstruction of bacteriophage T4 DNA replication apparatus from purified components: Rolling circle replication following de novo chain initiation on a single-stranded circular DNA template (replication forrK/riDonucleoside-tripnosphate-dependent priming/DNA strand’, 1975. [6] J. D. McGhee and P. H. von Hippel, ‘Formaldehyde as a Probe of DNA Structure. IV. Mechanism of the Initial Reaction of Formaldehyde with DNA.’, UTC, 1977. [Online]. Available: https://pubs.acs.org/sharingguidelines [7] J. D. McGhee and P. H. von Hippel, ‘Formaldehyde as a Probe of DNA Structure. II. Reaction with Endocyclic Imino Groups of DNA Basest’, UTC, 1975. [Online]. Available: https://pubs.acs.org/sharingguidelines [8] W. Lee, J. P. Gillies, D. Jose, B. A. Israels, P. H. Von Hippel, and A. H. Marcus, ‘Single- molecule FRET studies of the cooperative and non-cooperative binding kinetics of the bacteriophage T4 single-stranded DNA binding protein ( gp32 ) to ssDNA lattices at replication fork junctions’, vol. 44, no. 22, pp. 10691–10710, 2016, doi: 10.1093/nar/gkw863. [9] T. C. Mueser, J. M. Hinerman, J. M. Devos, R. A. Boyer, and K. J. Williams, ‘Structural analysis of bacteriophage T4 DNA replication: A review in the Virology Journal series on bacteriophage T4 and its relatives’, Virology Journal, vol. 7. 2010. doi: 10.1186/1743- 422X-7-359. [10] T. L. Carus, D. R. Natura, S. Greek, R. Empire, T. S. Chymist, and R. Boyle, ‘Introduction to Single Molecule Imaging and Mechanics : Seeing and Touching Molecules One at a Time’. [11] W. E. Moerner and L. Kador, ‘Optical detection and spectroscopy of single molecules in a solid’, Phys Rev Lett, vol. 62, no. 21, pp. 2535–2538, 1989, doi: 10.1103/PhysRevLett.62.2535. 233 [12] T. Ha, T. H. Enderle, D. F. Ogletreet, D. S. Chemla, P. R. Selvin, and S. Weiss, ‘Probing the interaction between two single molecules: Fluorescence resonance energy transfer between a single donor and a single acceptor’, 1996. [13] C. Phelps, W. Lee, D. Jose, P. H. von Hippel, and A. H. Marcus, ‘Single-molecule FRET and linear dichroism studies of DNA breathing and helicase binding at replication fork junctions’, Proceedings of the National Academy of Sciences, vol. 110, no. 43, pp. 17320– 17325, 2013, doi: 10.1073/pnas.1314862110. [14] B. Israels, C. S. Albrecht, A. Dang, M. Barney, P. H. Von Hippel, and A. H. Marcus, ‘Sub- millisecond conformational transitions of short single-stranded DNA lattices by photon correlation single-molecule FRET’. [15] M. F. Juette et al., ‘Didemnin B and ternatin-4 differentially inhibit conformational changes in eEF1A required for aminoacyl-tRNA accommodation into mammalian ribosomes’, Elife, vol. 11, 2022, doi: 10.7554/eLife.81608. [16] S. R. Hansen, D. S. White, M. Scalf, I. R. Corrêa, L. M. Smith, and A. A. Hoskins, ‘Multi- step recognition of potential 5’ splice sites by the Saccharomyces cerevisiae U1 snRNP’, Elife, vol. 11, Aug. 2022, doi: 10.7554/eLife.70534. [17] C. P. Lapointe, R. Grosely, A. G. Johnson, J. Wang, I. S. Fernández, and J. D. Puglisi, ‘Dynamic competition between SARS-CoV-2 NSP1 and mRNA on the human ribosome inhibits translation initiation’, doi: 10.1073/pnas.2017715118/-/DCSupplemental. [18] M. Y. Lee et al., ‘Fluorescent labeling of abundant reactive entities (FLARE) for cleared- tissue and super-resolution microscopy’, Nature Protocols, vol. 17, no. 3. Nature Research, pp. 819–846, Mar. 01, 2022. doi: 10.1038/s41596-021-00667-2. [19] A. M. Koester, K. Tao, M. Szczepaniak, M. J. Rames, and X. Nan, ‘Nanoscopic Spatial Association between Ras and Phosphatidylserine on the Cell Membrane Studied with Multicolor Super Resolution Microscopy’, Biomolecules, vol. 12, no. 8, Aug. 2022, doi: 10.3390/biom12081033. [20] L. A. Barner et al., ‘Multiresolution nondestructive 3D pathology of whole lymph nodes for breast cancer staging’, J Biomed Opt, vol. 27, no. 03, Mar. 2022, doi: 10.1117/1.jbo.27.3.036501. [21] K. C. Neuman and A. Nagy, ‘Single-molecule force spectroscopy: Optical tweezers, magnetic tweezers and atomic force microscopy’, Nature Methods, vol. 5, no. 6. pp. 491– 505, Jun. 2008. doi: 10.1038/nmeth.1218. [22] D. R. Jacobson and T. T. Perkins, ‘Free-energy changes of bacteriorhodopsin point mutants measured by single-molecule force spectroscopy’, doi: 10.1073/pnas.2020083118/- /DCSupplemental. [23] D. T. Edwards, M.-A. Leblanc, and T. T. Perkins, ‘Modulation of a protein-folding landscape revealed by AFM-based force spectroscopy notwithstanding instrumental limitations’, doi: 10.1073/pnas.2015728118/-/DCSupplemental. [24] D. Heussman, J. Kittell, P. H. Von Hippel, and A. H. Marcus, ‘Temperature-dependent local conformations and conformational distributions of cyanine dimer labeled single-stranded- 234 double-stranded DNA junctions by 2D fluorescence spectroscopy’, Journal of Chemical Physics, vol. 156, no. 4, 2022, doi: 10.1063/5.0076261. [25] D. Heussman, J. Kittell, L. Kringle, A. Tamimi, P. H. Von Hippel, and A. H. Marcus, ‘Measuring local conformations and conformational disorder of ( Cy3 ) 2 dimer labeled DNA fork junctions using absorbance , circular dichroism and two- dimensional fluorescence spectroscopy I . Introduction’, pp. 1–38. [26] A. H. Marcus, D. Heussman, J. Maurer, C. S. Albrecht, P. Herbert, and P. H. Von Hippel, ‘Studies of Local DNA Backbone Conformation and Conformational Disorder Using Site- Specific Exciton-Coupled Dimer Probe Spectroscopy’, Annu Rev Phys Chem, 2023, doi: 10.1146/annurev-physchem-090419. [27] L. Kringle et al., ‘Temperature-dependent conformations of exciton-coupled Cy3 dimers in double-stranded DNA’, Journal of Chemical Physics, vol. 148, no. 8, 2018, doi: 10.1063/1.5020084. [28] W. Lee, P. H. Von Hippel, and A. H. Marcus, ‘Internally labeled Cy3 / Cy5 DNA constructs show greatly enhanced photo-stability in single-molecule FRET experiments’, vol. 42, no. 9, pp. 5967–5977, 2014, doi: 10.1093/nar/gku199. [29] N. F. Dupuis, E. D. Holmstrom, and D. J. Nesbitt, ‘Single-molecule kinetics reveal cation- promoted DNA duplex formation through ordering of single-stranded helices’, Biophys J, vol. 105, no. 3, pp. 756–766, Aug. 2013, doi: 10.1016/j.bpj.2013.05.061. [30] E. D. Holmstrom, N. F. Dupuis, and D. J. Nesbitt, ‘Kinetic and Thermodynamic Origins of Osmolyte-In fl uenced Nucleic Acid Folding’, 2015, doi: 10.1021/jp512491n. [31] N. F. Dupuis, E. D. Holmstrom, and D. J. Nesbitt, ‘Single-molecule kinetics reveal cation- promoted DNA duplex formation through ordering of single-stranded helices’, Biophys J, vol. 105, no. 3, pp. 756–766, Aug. 2013, doi: 10.1016/j.bpj.2013.05.061. [32] Bio 3400, ‘DNA Structure’, Bio3400. (n.d.). DNA Structure. How much DNA do you have? http://bio3400.nicerweb.com/Locked/media/ch10/DNA.html . [33] S. Gellerl, ‘X-RAY STRUCTURAL CRYSTALLOGRAPHY’. [Online]. Available: www.annualreviews.org [34] H. Ki, K. Young Oang, J. Kim, and H. Ihee, ‘Ultrafast X-Ray Crystallography and Liquidography’, The Annual Review of Physical Chemistry is online at, vol. 68, pp. 473– 97, 2017, doi: 10.1146/annurev-physchem. [35] B. Von Ardenne, M. Mechelke, and H. Grubmüller, ‘Structure determination from single molecule X-ray scattering with three photons per image’, Nat Commun, vol. 9, no. 1, pp. 1– 9, 2018, doi: 10.1038/s41467-018-04830-4. [36] T. Xie, T. Saleh, P. Rossi, and C. G. Kalodimos, ‘Conformational states dynamically populated by a kinase determine its function’, Science (1979), vol. 370, no. 6513, Oct. 2020, doi: 10.1126/science.abc2754. 235 [37] A. V. Reshetnyak et al., ‘Mechanism for the activation of the anaplastic lymphoma kinase receptor’, Nature, vol. 600, no. 7887, pp. 153–157, Dec. 2021, doi: 10.1038/s41586-021- 04140-8. [38] K. H. Gardner and L. E. Kay, ‘THE USE OF 2 H, 13 C, 15 N MULTIDIMENSIONAL NMR TO STUDY THE STRUCTURE AND DYNAMICS OF PROTEINS’, 1998. [Online]. Available: www.annualreviews.org [39] M. E. Wall, S. C. Gallagher, and J. Trewhella, ‘LARGE-SCALE SHAPE CHANGES IN PROTEINS AND MACROMOLECULAR COMPLEXES’, 2000. [Online]. Available: www.annualreviews.org [40] M. E. Mäeots and R. I. Enchev, ‘Structural dynamics: review of time-resolved cryo-EM’, Acta Crystallographica Section D: Structural Biology, vol. 78. International Union of Crystallography, pp. 927–935, Aug. 01, 2022. doi: 10.1107/S2059798322006155. [41] R. M. Glaeser, ‘How Good Can Single-Particle Cryo-EM Become? What Remains Before It Approaches Its Physical Limits?’, 2019, doi: 10.1146/annurev-biophys-070317. [42] J. Jumper et al., ‘Highly accurate protein structure prediction with AlphaFold’, Nature, vol. 596, no. 7873, pp. 583–589, Aug. 2021, doi: 10.1038/s41586-021-03819-2. [43] S. Torino, M. Dhurandhar, A. Stroobants, R. Claessens, and R. G. Efremov, ‘Time-resolved cryo-EM using a combination of droplet microfluidics with on-demand jetting’, Nat Methods, Aug. 2023, doi: 10.1038/s41592-023-01967-z. [44] M. Holm et al., ‘mRNA decoding in human is kinetically and structurally distinct from bacteria’, Nature, 2023, doi: 10.1038/s41586-023-05908-w. [45] T. Xie, T. Saleh, P. Rossi, D. Miller, and C. G. Kalodimos, ‘Imatinib can act as an allosteric activator of Abl kinase’, 2021. [46] J. R. Widom, ‘Local Conformations and Excited State Dynamics of Porphyrins and Nucleic Acids By 2-Dimensional Fluorescence Spectroscopy’, Doctoral Dissertation, no. December, 2013. [47] J. R. Widom, N. P. Johnson, P. H. Von Hippel, and A. H. Marcus, ‘Solution conformation of 2-aminopurine dinucleotide determined by ultraviolet two-dimensional fluorescence spectroscopy’, New J Phys, vol. 15, pp. 1–16, 2013, doi: 10.1088/1367-2630/15/2/025028. [48] P. F. Tekavec, G. A. Lott, and A. H. Marcus, ‘Fluorescence-detected two-dimensional electronic coherence spectroscopy by acousto-optic phase modulation’, Journal of Chemical Physics, vol. 127, no. 21, pp. 1–21, 2007, doi: 10.1063/1.2800560. [49] K. S. Wilson and C. Y. Wong, ‘Calibrating a spatially encoded time delay for transient absorption spectroscopy’, pp. 3–7. [50] X. Fu, H. Kaur, M. L. Rodgers, E. J. Montemayor, S. E. Butcher, and A. A. Hoskins, ‘Identification of transient intermediates during spliceosome activation by single molecule fluorescence microscopy’, Proc Natl Acad Sci U S A, vol. 119, no. 48, Nov. 2022, doi: 10.1073/pnas.2206815119. 236 [51] N. F. Dupuis, E. D. Holmstrom, and D. J. Nesbitt, ‘Single-molecule kinetics reveal cation- promoted DNA duplex formation through ordering of single-stranded helices’, Biophys J, vol. 105, no. 3, pp. 756–766, Aug. 2013, doi: 10.1016/j.bpj.2013.05.061. [52] L. Piatkowski, E. Gellings, and N. F. Van Hulst, ‘Broadband single-molecule excitation spectroscopy’, Nat Commun, vol. 7, pp. 1–9, 2016, doi: 10.1038/ncomms10411. [53] G. U. Nienhaus, ‘Single-molecule fluorescence studies of protein folding.’, Methods Mol Biol, vol. 490, pp. 311–337, 2009, doi: 10.1007/978-1-59745-367-7_13. [54] D. Brinks et al., ‘Ultrafast dynamics of single molecules’, Chem Soc Rev, vol. 43, no. 8, pp. 2476–2491, 2014, doi: 10.1039/c3cs60269a. [55] A. S. Backer, A. S. Biebricher, G. A. King, G. J. L. Wuite, I. Heller, and E. J. G. Peterman, ‘Single-molecule polarization microscopy of DNA intercalators sheds light on the structure of S-DNA’, no. March, 2019. [56] C. Phelps, B. Israels, D. Jose, M. C. Marsh, P. H. von Hippel, and A. H. Marcus, ‘Using microsecond single-molecule FRET to determine the assembly pathways of T4 ssDNA binding protein onto model DNA replication forks’, Proceedings of the National Academy of Sciences, vol. 114, no. 18, pp. E3612–E3621, 2017, doi: 10.1073/pnas.1619819114. [57] M. C. Fink, K. V. Adair, M. G. Guenza, and A. H. Marcus, ‘Translational diffusion of fluorescent proteins by molecular fourier imaging correlation spectroscopy’, Biophys J, vol. 91, no. 9, pp. 3482–3498, 2006, doi: 10.1529/biophysj.106.085712. [58] M. K. Knowles, M. G. Guenza, R. A. Capaldi, and A. H. Marcus, ‘Cytoskeletal-assisted dynamics of the mitochondrial reticulum in living cells’, Proceedings of the National Academy of Sciences, vol. 99, no. 23, pp. 14772–14777, 2002, doi: 10.1073/pnas.232346999. [59] V. Glembockyte, L. Grabenhorst, K. Trofymchuk, and P. Tinnefeld, ‘DNA Origami Nanoantennas for Fluorescence Enhancement’, Acc Chem Res, vol. 54, no. 17, pp. 3338– 3348, 2021, doi: 10.1021/acs.accounts.1c00307. [60] E. R. Beyerle and M. G. Guenza, ‘Identifying the leading dynamics of ubiquitin: A comparison between the tICA and the LE4PD slow fluctuations in amino acids’ position’, J Chem Phys, vol. 155, no. 24, p. 244108, Dec. 2021, doi: 10.1063/5.0059688. [61] M. G. Saunders and G. A. Voth, ‘Coarse-graining methods for computational biology’, Annu Rev Biophys, vol. 42, no. 1, pp. 73–93, May 2013, doi: 10.1146/annurev-biophys-083012- 130348. [62] J. Wang et al., ‘Twisting and swiveling domain motions in Cas9 to recognize target DNA duplexes, make double-strand breaks, and release cleaved duplexes’, Frontiers in Molecular Biosciences, vol. 9. Frontiers Media S.A., Jan. 09, 2023. doi: 10.3389/fmolb.2022.1072733. [63] G. A. Fitzgerald, D. S. Terry, A. L. Warren, M. Quick, J. A. Javitch, and S. C. Blanchard, ‘Quantifying secondary transport at single-molecule resolution’, Nature, vol. 575, no. 7783, pp. 528–534, Nov. 2019, doi: 10.1038/s41586-019-1747-5. 237 [64] W. B. Asher et al., ‘Single-molecule FRET imaging of GPCR dimers in living cells’, Nat Methods, vol. 18, no. 4, pp. 397–405, Apr. 2021, doi: 10.1038/s41592-021-01081-y. [65] C. P. Lapointe et al., ‘eIF5B and eIF1A reorient initiator tRNA to allow ribosomal subunit joining’, Nature, vol. 607, no. 7917, pp. 185–190, Jul. 2022, doi: 10.1038/s41586-022- 04858-z. [66] P. H. Von Hippel and E. Delagoutte, ‘Review A General Model for Nucleic Acid Helicases and Their “Coupling” within Macromolecular Machines Most well-studied proteins that interact with ssNA’, 2001. [67] E. M. S. Stennett, N. Ma, A. Van Der Vaart, and M. Levitus, ‘Photophysical and dynamical properties of doubly linked Cy3-DNA constructs’, Journal of Physical Chemistry B, vol. 118, no. 1, pp. 152–163, 2014, doi: 10.1021/jp410976p. [68] L. Kringle, N. P. D. Sawaya, J. Widom, C. Adams, M. G. Raymer, and A. H. Marcus, ‘Temperature-dependent conformations of exciton-coupled Cy3 dimers in double-stranded DNA’, no. Cd. [69] A. Tamimi, T. Landes, J. Lavoie, B. J. Smith, M. G. Raymer, and A. H. Marcus, ‘Fluorescence-detected Fourier transform electronic spectroscopy by photon counting phase-tagging’, pp. 1–22. [70] C. Phelps, B. Israels, M. C. Marsh, P. H. Von Hippel, and A. H. Marcus, ‘Using Multiorder Time-Correlation Functions (TCFs) to Elucidate Biomolecular Reaction Pathways from Microsecond Single-Molecule Fluorescence Experiments’, Journal of Physical Chemistry B, vol. 120, no. 51, pp. 13003–13016, 2016, doi: 10.1021/acs.jpcb.6b08449. [71] M. Dhar, J. A. Dickinson, and M. A. Berg, ‘Efficient, nonparametric removal of noise and recovery of probability distributions from time series using nonlinear-correlation functions: Additive noise’. [72] P. Vaitiekunas, C. Crane-Robinson, and P. L. Privalov, ‘The energetic basis of the DNA double helix: A combined microcalorimetric approach’, Nucleic Acids Res, vol. 43, no. 17, pp. 8577–8589, Sep. 2015, doi: 10.1093/nar/gkv812. [73] M. T. Record, P. L. Dehaseth, and T. M. Lohman, ‘Interpretation of Monovalent and Divalent Cation Effects on the lac Repressor-Operator Interaction^’. [Online]. Available: https://pubs.acs.org/sharingguidelines [74] ‘Record-QtrlyrevofBiophysics-1978’. [75] T. M. Lohman~, P. L. Dehaseth, and R. M. Thomas, ‘ANALYSIS OF ION CONCENTRATION EFFECTS ON THE KINETICS OF PROTEIN-NUCLEIC ACID INTERACTIONS. APPLICATION TO LAC: REPRESSOR-OPERATOR INTERACTIONS’, North-Holland Publishing Company, 1978. [76] S. D. Chandradoss, A. C. Haagsma, Y. K. Lee, J. H. Hwang, J. M. Nam, and C. Joo, ‘Surface passivation for single-molecule protein studies’, Journal of Visualized Experiments, no. 86, Apr. 2014, doi: 10.3791/50549. 238 [77] T. Cordes, J. Vogelsang, and P. Tinnefeld, ‘On the Mechanism of Trolox as Antiblinking and Antibleaching Reagent’, no. iv, pp. 5018–5019, 2009. [78] M. Dhar, J. A. Dickinson, and M. A. Berg, ‘Efficient, nonparametric removal of noise and recovery of probability distributions from time series using nonlinear-correlation functions: Additive noise’, J Chem Phys, vol. 159, no. 5, Aug. 2023, doi: 10.1063/5.0158199. [79] L. Yu et al., ‘A Comprehensive Review of Fluorescence Correlation Spectroscopy’, Frontiers in Physics, vol. 9. Frontiers Media S.A., Apr. 12, 2021. doi: 10.3389/fphy.2021.644450. [80] R. Owczarzy, B. G. Moreira, Y. You, M. A. Behlke, and J. A. Wälder, ‘Predicting stability of DNA duplexes in solutions containing magnesium and monovalent cations’, Biochemistry, vol. 47, no. 19, pp. 5336–5353, May 2008, doi: 10.1021/bi702363u. [81] Z. J. Tan and S. J. Chen, ‘Nucleic acid helix stability: Effects of salt concentration, cation valence and size, and chain length’, Biophys J, vol. 90, no. 4, pp. 1175–1190, 2006, doi: 10.1529/biophysj.105.070904. [82] C. H. Mak, ‘Unraveling base stacking driving forces in DNA’, Journal of Physical Chemistry B, vol. 120, no. 26, pp. 6010–6020, Jul. 2016, doi: 10.1021/acs.jpcb.6b01934. [83] D. Colquhoun, K. A. Dowsland, M. Beato, and A. J. R. Plested, ‘How to impose microscopic reversibility in complex reaction mechanisms’, Biophys J, vol. 86, no. 6, pp. 3510–3518, 2004, doi: 10.1529/biophysj.103.038679. [84] K. P. Burnham and D. R. Anderson, ‘Multimodel inference: Understanding AIC and BIC in model selection’, Sociological Methods and Research, vol. 33, no. 2. pp. 261–304, Nov. 2004. doi: 10.1177/0049124104268644. [85] D. Jose et al., ‘Spectroscopic studies of position-specific DNA “‘breathing’” fluctuations at replication forks and primer-template junctions’, 2009. [86] C. Yang, E. Kim, M. Lim, and Y. Pak, ‘Computational Probing of Watson-Crick/Hoogsteen Breathing in a DNA Duplex Containing N1-Methylated Adenine’, J Chem Theory Comput, vol. 15, no. 1, pp. 751–761, Jan. 2019, doi: 10.1021/acs.jctc.8b00936. [87] J. Douglas and P. H. Von Hippel, ‘Effects of Methylation on the Stability of Nucleic Acid Conformations STUDIES AT THE POLYMER LEVEL*’, 1978. [Online]. Available: http://www.jbc.org/ [88] J. M. Huguet, C. V Bizarro, N. Forns, S. B. Smith, C. Bustamante, and F. Ritort, ‘Single- molecule derivation of salt dependent base-pair free energies in DNA’, PNAS, vol. 107, 2010, doi: 10.1073/pnas.1001454107/-/DCSupplemental. [89] R. Owczarzy et al., ‘Effects of Sodium Ions on DNA Duplex Oligomers: Improved Predictions of Melting Temperatures’, Biochemistry, vol. 43, no. 12, pp. 3537–3554, Mar. 2004, doi: 10.1021/bi034621r. [90] J. M. Huguet, C. V. Bizarro, N. Forns, S. B. Smith, C. Bustamante, and F. Ritort, ‘Single- molecule derivation of salt dependent base-pair free energies in DNA’, Proc Natl Acad Sci U S A, vol. 107, no. 35, pp. 15431–15436, 2010, doi: 10.1073/pnas.1001454107. 239 [91] B. A. Kelch, D. L. Makino, M. O’Donnell, and J. Kuriyan, ‘How a DNA polymerase clamp loader opens a sliding clamp’, Science (1979), vol. 334, no. 6063, pp. 1675–1680, Dec. 2011, doi: 10.1126/science.1211884. [92] C. Gaubitz et al., ‘Structure of the human clamp loader bound to the sliding clamp: a further twist on AAA+ mechanism’, doi: 10.1101/2020.02.18.953257. [93] E. Delagoutte and P. H. Von Hippel, ‘Molecular mechanisms of the functional coupling of the helicase (gp41) and polymerase (gp43) of bacteriophage T4 within the DNA replication fork’, Biochemistry, vol. 40, no. 14, pp. 4459–4477, Apr. 2001, doi: 10.1021/bi001306l. [94] E. Delagoutte and P. H. Von Hippel, ‘Function and assembly of the bacteriophage T4 DNA replication complex: Interactions of the T4 polymerase with various model DNA constructs’, Journal of Biological Chemistry, vol. 278, no. 28, pp. 25435–25447, Jul. 2003, doi: 10.1074/jbc.M303370200. [95] S. W. Nelson, R. Kumar, and S. J. Benkovic, ‘RNA primer handoff in bacteriophage T4 DNA replication: The role of single-stranded DNA-binding protein and polymerase accessory proteins’, Journal of Biological Chemistry, vol. 283, no. 33, pp. 22838–22846, Aug. 2008, doi: 10.1074/jbc.M802762200. [96] A. Yuzhakov, Z. Kelman, and M. O’donnell, ‘Trading Places on DNA-A Three-Point Switch Underlies Primer Handoff from Primase to the Replicative DNA Polymerase onto DNA (Figure 1A). This primase displacement task’, 1999. [97] P. Pietroni, M. C. Young, G. J. Latham, and P. H. Von Hippel, ‘Structural Analyses of gp45 Sliding Clamp Interactions during Assembly of the Bacteriophage T4 DNA Polymerase Holoenzyme I. CONFORMATIONAL CHANGES WITHIN THE gp44/62-gp45-ATP COMPLEX DURING CLAMP LOADING*’, 1997. [Online]. Available: http://www.jbc.org/ [98] T. Dodd, M. Botto, F. Paul, R. Fernandez-Leiro, M. H. Lamers, and I. Ivanov, ‘Polymerization and editing modes of a high-fidelity DNA polymerase are linked by a well- defined path’, Nat Commun, vol. 11, no. 1, Dec. 2020, doi: 10.1038/s41467-020-19165-2. [99] K. Datta, N. P. Johnson, and P. H. Von Hippel, ‘DNA conformational changes at the primer- template junction regulate the fidelity of replication by DNA polymerase’, doi: 10.1073/pnas.1012277107/-/DCSupplemental. [100] T. Dodd, M. Botto, F. Paul, R. Fernandez-Leiro, M. H. Lamers, and I. Ivanov, ‘Polymerization and editing modes of a high-fidelity DNA polymerase are linked by a well- defined path’, Nat Commun, vol. 11, no. 1, Dec. 2020, doi: 10.1038/s41467-020-19165-2. [101] K. Pant et al., ‘The role of the C-domain of bacteriophage T4 gene 32 protein in ssDNA binding and dsDNA helix-destabilization: Kinetic, single-molecule, and cross-linking studies’, PLoS One, vol. 13, no. 4, Apr. 2018, doi: 10.1371/journal.pone.0194357. [102] G. J. Latham, D. J. Bacheller, P. Pietroni, and P. H. Von Hippel, ‘Structural Analyses of gp45 Sliding Clamp Interactions during Assembly of the Bacteriophage T4 DNA Polymerase Holoenzyme II. THE gp44/62 CLAMP LOADER INTERACTS WITH A SINGLE 240 DEFINED FACE OF THE SLIDING CLAMP RING*’, 1997. [Online]. Available: http://www.jbc.org/ [103] G. J. Latham, D. J. Bacheller, P. Pietroni, and P. H. Von Hippel, ‘Structural Analyses of gp45 Sliding Clamp Interactions during Assembly of the Bacteriophage T4 DNA Polymerase Holoenzyme III. THE gp43 DNA POLYMERASE BINDS TO THE SAME FACE OF THE SLIDING CLAMP AS THE CLAMP LOADER*’, 1997. [Online]. Available: http://www.jbc.org/ [104] Y. Shamoo and T. A. Steitz, ‘Building a Replisome from Interacting Pieces: Sliding Clamp Complexed to a Peptide from DNA Polymerase and a Polymerase Editing Complex RB69 DNA polymerase shows structural similarities to the pol I fam-ily polymerases in the polymerase catalytic domain’, 1999. [105] W. E. Moerner and D. P. Fromm, ‘Methods of single-molecule fluorescence spectroscopy and microscopy’, Review of Scientific Instruments, vol. 74, no. 8, pp. 3597–3619, 2003, doi: 10.1063/1.1589587. [106] W. Lee, D. Jose, C. Phelps, A. H. Marcus, and P. H. Von Hippel, ‘A Single-Molecule View of the Assembly Pathway, Subunit Stoichiometry, and Unwinding Activity of the Bacteriophage T4 Primosome (helicase − primase) Complex’, 2013, doi: 10.1021/bi400231s. [107] M. Liebel, C. Toninelli, and N. F. Van Hulst, ‘Nonlinear spectroscopy of a single molecule’, Nat Photonics, vol. 12, no. 1, pp. 1–6, 2018, doi: 10.1038/s41566-017-0056-5. [108] R. Moya, A. C. Norris, T. Kondo, and G. S. Schlau-Cohen, ‘Observation of robust energy transfer in the photosynthetic protein allophycocyanin using single-molecule pump–probe spectroscopy’, Nat Chem, vol. 14, no. 2, pp. 153–159, 2022, doi: 10.1038/s41557-021- 00841-9. [109] V. Tiwari, Y. A. Matutes, A. T. Gardiner, T. L. C. Jansen, R. J. Cogdell, and J. P. Ogilvie, ‘Spatially-resolved fl uorescence-detected two-dimensional electronic spectroscopy probes varying excitonic structure in photosynthetic bacteria’, no. 2018, doi: 10.1038/s41467-018- 06619-x. [110] A. Perdomo-Ortiz, J. R. Widom, G. A. Lott, A. Aspuru-Guzik, and A. H. Marcus, ‘Conformation and electronic population transfer in membrane-supported self-assembled porphyrin dimers by 2D fluorescence spectroscopy’, Journal of Physical Chemistry B, vol. 116, no. 35, pp. 10757–10770, 2012, doi: 10.1021/jp305916x. [111] P. F. Tekavec, T. R. Dyke, and A. H. Marcus, ‘Wave packet interferometry and quantum state reconstruction by acousto-optic phase modulation’, Journal of Chemical Physics, vol. 125, no. 19, 2006, doi: 10.1063/1.2386159. [112] J. C. Phys, S. Roeding, N. Klimovich, and T. Brixner, ‘Optimizing sparse sampling for 2D electronic spectroscopy’, vol. 084201, no. December 2016, 2017, doi: 10.1063/1.4976309. [113] T. Brixner, G. Krampert, P. Niklaus, and G. Gerber, ‘Generation and characterization of polarization-shaped femtosecond laser pulses’, Appl Phys B, vol. 74, no. SUPPL., pp. 133– 144, 2002, doi: 10.1007/s00340-002-0911-y. 241 [114] A. J. Kiessling and J. A. Cina, ‘Monitoring the evolution of intersite and interexciton coherence in electronic excitation transfer via wave-packet interferometry’, Journal of Chemical Physics, vol. 152, no. 24, 2020, doi: 10.1063/5.0008766. [115] S. M. Hart et al., ‘Activating charge-transfer state formation in strongly-coupled dimers using DNA scaffolds’, Chem Sci, pp. 13020–13031, 2022, doi: 10.1039/d2sc02759c. [116] S. M. Hart, J. L. Banal, M. Bathe, and G. S. Schlau-Cohen, ‘Identification of Nonradiative Decay Pathways in Cy3’, J Phys Chem Lett, pp. 5000–5007, 2020, doi: 10.1021/acs.jpclett.0c01201. [117] A. K. Pati et al., ‘Tuning the Baird aromatic triplet-state energy of cyclooctatetraene to maximize the self-healing mechanism in organic fluorophores’, Proc Natl Acad Sci U S A, vol. 117, no. 39, pp. 24305–24315, 2020, doi: 10.1073/pnas.2006517117. [118] A. G. Numerik and / Optimierung, ‘Introduction to Compressed Sensing’. [119] J. N. Sanders et al., ‘Compressed sensing for multidimensional spectroscopy experiments’, Journal of Physical Chemistry Letters, vol. 3, no. 18, pp. 2697–2702, 2012, doi: 10.1021/jz300988p. [120] Maurer, J., Albrecht, C.S. , Heussman, D., Herbert, P., von Hippel, P.H. & Marcus, A.H. (2023). Polarization-sweep single-molecule fluorescence microscopy of exciton-coupled (Cy3)2 dimer-labeled DNA fork constructs. Submitted to JCPB. 242