EVOLUTION OF METAL AND PEPTIDE BINDING IN THE S100 PROTEIN
FAMILY
by
LUCAS CLAYTON WHEELER
A DISSERTATION
Presented to the Department of Chemistry and Biochemistry
and the Graduate School of the University of Oregon
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
December 2017
DISSERTATION APPROVAL PAGE
Student: Lucas Clayton Wheeler
Title: Evolution of Metal and Peptide Binding in the S100 Protein Family
This dissertation has been accepted and approved in partial fulfillment of the
requirements for the Doctor of Philosophy degree in the Department of Chemistry
and Biochemistry by:
James Prell Chair
Michael Harms Advisor
Bradley Nolen Core Member
Patrick Phillips Core Member
Alice Barkan Institutional Representative
and
Sara D. Hodges Interim Vice Provost and Dean of the
Graduate School
Original approval signatures are on file with the University of Oregon Graduate
School.
Degree awarded December 2017
ii
c© 2017 Lucas Clayton Wheeler
iii
DISSERTATION ABSTRACT
Lucas Clayton Wheeler
Doctor of Philosophy
Department of Chemistry and Biochemistry
December 2017
Title: Evolution of Metal and Peptide Binding in the S100 Protein Family
Proteins perform an incredible array of functions facilitated by a diverse
set of biochemical properties. Changing these properties is an essential molecular
mechanism of evolutionary change, with major questions in protein evolution
surrounding this topic. How do new functional biochemical features evolve? How
do proteins change following gene duplication events? I used the S100 protein
family as a model to probe these aspects of protein evolution. The S100s are
signaling proteins that play a diverse range of biological roles binding Calcium
ions, transition metal ions, and other proteins. Calcium drives a conformational
change allowing S100s to bind to diverse peptide regions of target proteins. I used
a phylogenetic approach to understand the evolution of these diverse biochemical
features. Chapter I comprises an introduction to the disseration. Chapter II is
a co-authored literature review assessing available evidence for global trends in
protein evolution. Chapter III describes mapping of transition metal binding
onto a maximum likelihood S100 phylogeny. Transition metal binding sites and
metal-driven structural changes are a conserved, ancestral features of the S100s.
However, they are highly labile at the amino acid level. Chapter IV further
iv
characterizes the biophysics of metal binding in the S100A5 lineage, revealing that
the oft–cited Ca2+/Cu2+ antagonism of S100A5 is likely due to an experimental
artifact of previous studies. Chapter V uses the S100 family to investigate the
evolution of binding specificity. Binding specificity for a small set of peptides
in the duplicate S100A5 and S100A6 clades. Ancestral sequence reconstruction
reveals a pattern of clade-level conservation and apparent subfunctionalization
along both lineages. In chapter VI, peptide phage display, deep-sequencing, and
machine-learning are combined to quantitatively reconstruct the evolution of
specificity in S100A5 and S100A6. S100A5 has subfunctionalized from the ancestor,
while S100A6 specificity has shifted. The importance of unbiased approaches to
measure specificity are discussed. This work highlights the lability of conserved
functions at the biochemical level, and measures changes in specificity following
gene duplication. Chapter VII summarizes the results of the dissertation, considers
the implications of these results, and discusses limitations and future directions.
This dissertation includes both previously published/unpublished and co-
authored material.
v
CURRICULUM VITAE
NAME OF AUTHOR: Lucas Clayton Wheeler
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED:
University of Oregon, Eugene, OR
Montana State University, Bozeman, MT
DEGREES AWARDED:
Doctor of Philosophy, Chemistry, 2017, University of Oregon
Bachelor of Science, Biochemistry, 2012, Montana State University
AREAS OF SPECIAL INTEREST:
Evolutionary biochemistry
Biophysics
Evolutionary biology
PROFESSIONAL EXPERIENCE:
PhD candidate & Graduate Research Fellow, University of Oregon, 2014-2017
PhD student & Graduate Teaching Fellow, University of Oregon, 2012-2013
Undergraduate Reseach Assistant, Montana State University, 2009-2012
PUBLICATIONS:
Wheeler LC, Harms MJ (2017). Increased peptide binding specificity for an
S100 protein over evolutionary time (in prep)
Wheeler LC, Anderson JA, Morrison AJ, Wong CE, Harms MJ (2017).
Conservation of specificity in two low specificity proteins (in review)
Wheeler LC, Harms MJ (2017). Human S100A5 binds Ca2+ and Cu2+
independently BMC Biophysics (in press)
vi
Wheeler LC, Donor MT, Prell JS, Harms MJ (2016). Multiple Evolutionary
Origins of Ubiquitous Cu2+ and Zn2+ Binding in the S100 protein Family.
PLoS ONE 11(10): e0164740. doi:10.1371/journal.pone.0164740
Wheeler LC, An-Lim S, Marqusee S, Harms MJ (2016). The thermostability
and specificity of ancient proteins. Curr Op Struct Biol. (LCW and SAL
contributed equally to the work)
vii
ACKNOWLEDGEMENTS
I thank my advisor, Mike Harms, for his excellent mentorship and his
patience. He has pushed me to wrestle with new concepts and given me the
opportunity to learn a diverse set of skills. I also thank the members of the
Harms group for their constructive feedback, helpful conversations, and friendship
throughout my time in the lab. I would especially like to thank Zach Sailer for
his friendship and for always being willing to help me wrestle with new ideas.
Thanks are also in order for the members of the Beer and Theory Society, who have
devoted their time weekly, over the course of several years, to creating an excellent
environment for learning math and physics that we all wish we’d done a better job
of learning in college. In the same vein, I thank the members of the Quantitative
Problem Solving and Research Communication Consortium for their devotion to
helping peers solve challenging problems and representing the ideas of open science
and collaboration. Thanks are in order for my long-time roommates Adam and
Forrest as well as for the members of my trivia team. We’ve had a lot of good times
during my years in Eugene. My friend Stacey Wagner has been very helpful to me
over the years and I appreciate her willingness to get together and trade advice.
The research in this dissertation was supported in part by a grant, R01GM117140,
from the National Institutes of Health, and I was personally supported by NIH
training grant T32 GM007759 for three years of my PhD. I thank my committee
for their assistance throughout my PhD work and with the preparation of this
document. Special thanks go to Jim Prell, who has been an excellent committee
chair as well a collaborator. I have been inspired by his knowledge, enthusiasm, and
strong distaste for false dichotomies. Furthermore, I thank Jim Prell and Alice
viii
Barkan for their assistance in applying for postdoctoral positions. I also thank
Micah Donor, Shion An-Lim, and Susan Marqusee with whom I have collaborated.
I thank Doug Turnbull and Maggie Weitzman in GC3F for their help with next-
generation sequencing. I thank Carol Higginbotham at COCC and Ed LaChapelle
at Bend Research, Inc. for encouraging me to pursue a scientific career very early
in my life. I thank my undergraduate research advisor, Trevor Douglas, for helping
me to develop a firm footing in biochemistry, molecular biology, and experimental
design before starting graduate school. I would like to thank my good friend
Ryan Russel Allen for his friendship, support, and steady stream of humorous
correspondence throughout my undergraduate and graduate careers. Finally I
would like to thank my family and my girlfriend for their constant support and
love.
ix
For my mother Terry, my father Ed, my sister April, my girlfriend Rutendo, and
my consigliere Tucker, without all of whom I could never have maintained enough
sanity to finish my PhD.
x
TABLE OF CONTENTS
Chapter Page
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
II. THERMOSTABILITY AND SPECIFICITY OF ANCIENT PROTEINS:
ASSESSING THE EVIDENCE FOR GLOBAL TRENDS . . . . . . . 14
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 14
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Reconstructed Precambrian Ancient Proteins . . . . . . . . . . . . 17
Trends in Thermostability Are Complex . . . . . . . . . . . . . . . 20
Can Reconstruction Errors Inflate Ancestral Thermostability? . . . 21
A Trend from Promiscuous to Specific is Not Yet Established . . . 23
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Bridge to Chapter III . . . . . . . . . . . . . . . . . . . . . . . . . 26
III. MULTIPLE EVOLUTIONARY ORIGINS OF UBIQUITOUS CU2+
AND ZN2+ BINDING IN THE S100 PROTEIN FAMILY . . . . . . . 28
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 28
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
xi
Chapter Page
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 54
Bridge to Chapter IV . . . . . . . . . . . . . . . . . . . . . . . . . 62
IV. HUMAN S100A5 BINDS CA2+ AND CU2+ INDEPENDENTLY . . . 64
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 64
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Bridge to Chapter V . . . . . . . . . . . . . . . . . . . . . . . . . . 83
V. CONSERVATION OF PEPTIDE BINDING SPECIFICITY IN S100A5
AND S100A6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 85
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 105
Bridge to Chapter VI . . . . . . . . . . . . . . . . . . . . . . . . . 114
xii
Chapter Page
VI. EVOLUTION OF INCREASED BINDING SPECIFICITY IN
S100A5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 116
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 139
Bridge to Chapter VII . . . . . . . . . . . . . . . . . . . . . . . . . 152
VII. SUMMARY AND CONCLUDING REMARKS . . . . . . . . . . . . . 154
APPENDICES
A. SUPPLEMENTAL MATERIAL FOR CHAPTER III . . . . . . . . . 158
Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . 158
B. SUPPLEMENTAL MATERIAL FOR CHAPTER IV . . . . . . . . . 166
Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . 166
xiii
Chapter Page
C. SUPPLEMENTAL MATERIAL FOR CHAPTER V . . . . . . . . . 168
Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . 168
D. SUPPLEMENTAL MATERIAL FOR CHAPTER VI . . . . . . . . . 177
Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . 177
REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
xiv
LIST OF FIGURES
Figure Page
1 Ancestral Sequence Reconstruction (ASR) can be used to trace
the history of evolving proteins . . . . . . . . . . . . . . . . . . . . . . 16
2 Ancient Reconstructed Ancestors Exhibit Elevated
Thermostability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Models for increased specificity of proteins over time . . . . . . . . . . 24
4 Transition metal binding occurs at a common site in diverse S100s . . . 30
5 Model-based phylogenetics reveal several S100 subfamilies . . . . . . . 35
6 Phylogeny, synteny, and taxonomic distribution provide a picture of S100
evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7 Transition metal binding is conserved in the S100 family . . . . . . . . 41
8 Early-branching tunicate S100 binds transition metals at a non-canonical
site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9 Human S100A5 does not bind transition metals at the same site as B and
the calgranulins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
10 Measurements of Cu2+ binding to wildtype S100A5 in the presence
of Ca2+ are difficult to interpret . . . . . . . . . . . . . . . . . . . . . . 68
11 S100A5 can bind Ca2+ and Cu2+ without antagonism . . . . . . . . . . 70
12 Wildtype S100A5 forms high-ordered oligomers . . . . . . . . . . . . . 73
13 Ca2+ and Cu2+ induce increases in α-helical secondary structure
measured by far UV circular dichroism . . . . . . . . . . . . . . . . . . 75
14 Human S100A5 and S100A6 exhibit peptide binding specificity . . . . . 89
15 Diverse peptides bind at the human S100A5 peptide interface . . . . . 92
16 S100A5 and S100A6 arose by gene duplication . . . . . . . . . . . . . . 93
17 S100A5 and S100A6 paralogs exhibit conserved properties . . . . . . . 95
xv
Figure Page
18 Small changes are sufficient to alter binding specificity . . . . . . . . . 99
19 Testing the increased specificity hypothesis requires extensive
sampling of targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
20 Set of binding peptides can be estimated using phage display. . . . . . 123
21 A subpopulation of phage respond to addition of competitor . . . . . . 124
22 Peptide binding can be predicted from amino acid sequence . . . . . . 127
23 Changes in binding sets over time . . . . . . . . . . . . . . . . . . . . . 131
24 Sequence logos of S100 multiple sequence alignment . . . . . . . . . . . 159
25 Bayesian phylogeny of the S100 protein family . . . . . . . . . . . . . . 160
26 Representative ITC data and single-site fits . . . . . . . . . . . . . . . 161
27 Far UV CD spectra of S100 proteins . . . . . . . . . . . . . . . . . . . 162
28 Biophysical characterization of tunA . . . . . . . . . . . . . . . . . . . 163
29 tunB mass spectrometry dilution experiment . . . . . . . . . . . . . . . 164
30 Sedimentation velocity AUC analysis of tunA and tunB . . . . . . . . . 165
31 Raw data corresponding to integrated heats in figure 11 . . . . . . . . . 167
32 Randomer phage enrichment is dependent on Ca2+ and protein . . . . 169
33 Representative raw ITC data traces for each protein . . . . . . . . . . . 170
34 Far UV CD spectra are diagnostic for S100A5 and S100A6 . . . . . . . 171
35 Phage enrichment is reduced by the competitor peptide . . . . . . . . . 178
36 We can identify the number of counts that reliably reports on frequency
in a sequenced phage pool . . . . . . . . . . . . . . . . . . . . . . . . . 178
37 Enrichment distributions for all proteins . . . . . . . . . . . . . . . . . 179
38 We can estimate how addition of competitor alters frequencies . . . . . 180
39 Estimating the error rates for individual models . . . . . . . . . . . . . 181
xvi
LIST OF TABLES
Table Page
1 Fit parameters from pytc Bayesian fits . . . . . . . . . . . . . . . . . . 71
2 Protein binding model statistics . . . . . . . . . . . . . . . . . . . . . . 126
3 Binding of 12-mer phage display peptides does not depend
on solubilizing flanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4 Parameters for binding of A6cons to S100A5 and S100A6 . . . . . . . . 172
5 Parameters for binding of A5cons to S100A5 and S100A6 . . . . . . . . 173
6 Parameters for binding of NCX1 to S100A5 and S100A6 . . . . . . . . 174
7 Parameters for binding of SIP to S100A5 and S100A6 . . . . . . . . . . 175
8 Thermodynamic parameters for binding of the A5cons and SIP peptides
to hA5 ancestral reversion mutants . . . . . . . . . . . . . . . . . . . . 175
9 Accession numbers of S100 proteins used to build the multiple sequence
alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
10 Number of sequencing reads for each sample . . . . . . . . . . . . . . . 177
11 Features used in for supervised machine learning . . . . . . . . . . . . . 182
12 Predicted E and measured binding constants for peptides . . . . . . . . 183
xvii
CHAPTER I
INTRODUCTION
Evolution is the Driving Force of Biological Diversity
One of the most striking aspects of life is the vast diversity of forms and
functions displayed by living things. Organisms are beautifully adapted to a broad
range of environments and life styles; from bacteria that thrive in deep see thermal
vents [1], to plants that live in high alpine meadows [2], to exquisitely colorful
poisonous frogs that roam the rainforest [3]. With such diversity on display it is
easy enough to forget that all organisms on Earth share a common ancestor in the
distant past [4, 5, 6, 7]. Over unfathomable stretches of time, life has diversified
from that common ancestor into the amazingly complex and dynamic biosphere
that is familiar to us today. Perhaps the most incredible facet of this diversity is
that it is produced via the stochastic process of evolution [4, 8, 9, 10, 11]. How this
random process generates the rich biology observed on Earth is the primary driving
question of evolutionary biology.
Evolution occurs via the change of heritable traits over the course of many
generations of organisms. The process acts on traits that are displayed in some
way at the macroscopic, organismal level [8, 12, 13, 14, 15]. However, at the heart
of trait heritability are the genes that encode traits at the genetic level. The
projection of underlying genetics into phenotypes is commonly referred to as the
genotype-phenotype map [16, 17]. Evolutionary processes such as natural selection
and genetic drift drive the fixation of mutant genes that lead to new traits in
populations [18, 13, 12, 19]. Over time this fixation process can lead to substantial
1
changes in the genetic makeup of a population of organisms and result in the
formation of new species with different organism-level traits [20, 21, 22, 23, 14].
Most functionally-important genes encode proteins, which are the workhorses
of molecular processes in living organisms [24, 25]. They catalyze chemical
reactions, form the basis of structural scaffolds, transport ions and small molecules,
act as signals, and regulate the function and production of other molecules [24, 25].
The emergent outcome of all the intertwining protein roles is ultimately manifested
in the macroscopic phenotype of an organism. The vast array of protein functions
necessary to construct organisms requires a large diversity of proteins, all of which
are encoded by genes and integrated into the broader system. This framework
imposes an extremely complex set of constraints that govern the way in which new
traits—that can be seen by evolution—can be achieved. Thus, a molecular-level
understanding of evolution is critical to understanding the process on larger scales.
Evolutionary Biochemistry is a Powerful Tool for Understanding Biology
Most studies in traditional evolutionary biology have focused on
understanding the genetics of evolution. Genetics provides a very useful tool
to dissect the basis of evolutionary logic at the level of encoding architecture.
A genetic framework has also been critical for developing a population-level
understanding of evolution and creating useful mathematical models of evolutionary
processes [14, 15, 26, 27, 18, 13, 28, 12, 19, 29]. These models have made it
possible to make predictions that can be tested experimentally, thus furthering our
ability to understand evolutionary dynamics and outcomes [30, 31, 32, 33, 34, 35].
However, the rules that ultimately govern the inner workings of biological
organisms are those of physics and chemistry [36, 37, 38, 39, 40, 41].
2
Although the phenomenonological genetic “laws” that govern evolutionary
processes are now largely understood, it is unclear how they are connected to
the physical laws that govern the universe. This disconnect is one of the most
prominent barriers to understanding molecular evolution. To truly understand
how evolution works at the molecular level this relationship must be determined.
The need to understand how physicochemical principles shape evolutionary
outcomes has spawned the field of evolutionary biochemistry. This field seeks
to understand the evolution of molecular phenotypes at the biochemical level
and to relate the molecular phenotypes to implications for evolution at larger
scales [42, 43, 44]. Many pressing questions remain unanswered. Are there general
evolutionary trends in biochemical features over very long time scales? How robust
are protein functions to alterations in amino acid coding sequence and how does
this robustness affect the maintenance of important traits during evolution? How
do protein copies evolve after they are generated by gene duplication events?
How do correlations between mutations in protein sequences shape evolutionary
possibilities? Can we understand large-scale evolutionary processes in terms of
simpler molecular-level constraints and rules? These questions are unified by the
broader inquiry: how do physical rules shape the genotype-phenotype map?
The field of evolutionary biochemistry has rapidly expanded since its
inception and provided a great deal of insight into evolution at the molecular
level. A critical workhorse of evolutionary biochemistry has been ancestral
sequence reconstruction (ASR) [45, 43, 46, 47]. ASR is a statistical technique
that utilizes a molecular phylogeny to infer the sequences of ancestral nodes
[48, 49, 43, 50]. This technique has allowed many researches to directly assess
ancestral protein activities using biochemical experiments, making it extremely
3
powerful for characterizing evolutionary history [44, 43, 46, 47]. In some cases,
entire evolutionary trajectories—composed of historical substitutions—have been
reconstructed [51, 52, 53]. Relationships between protein structure, function,
and evolutionary history have been characterized for a wide variety of proteins
[53, 54, 55, 52, 56, 46, 57, 58, 59, 60]. Much has been learned about how
biochemistry and biophysics constrain and shape protein evolution. Furthermore,
evolutionary approaches have been used—with great success—to winnow the
substitutions observed in extant proteins down to those that are important for a
given biochemical function. For example, ASR was used to identify residues that
are important for binding selectivity of the drug Gleevec by Ab1 and Src kinases
[59].
Detailed biochemical studies have also helped to clarify the importance of
phenomena such as epistasis—the non-additivity of mutations [46, 32, 61, 62]—and
pleiotropy—in which proteins have roles in multiple distinct biological processes
[63, 64]—in determining evolutionary outcomes. These effects can reduce the
evolutionary degrees of freedom allowed for a protein and result in effects such as
historical contingency [65, 46]. For example, to evolve specificity for a new hormone
ligand the glucocorticoid receptor required a permissive subtitution that alleviated
the results of an otherwise deleterious functional substitution in the ligand binding
site [46]. Studies that incorporate biochemical and functional work have futher
demonstrated that the broader systemic architecture of the cellular environment
can constrain the mechanisms by which biochemical changes underly organismal
phenotypes [66, 67, 68, 69]. Certain systems have far greater constraints on the
allowed biochemical changes. For example, the evolution of new flower colors in
plants often requires both functional amino acid substitutions and regulatory
4
changes, but the genes that are subject to these different types of changes vary
depending on pleiotropic consequences [70, 71, 72, 73].
Evolutionary biochemistry has provided great insight into the molecular
mechanisms of evolution. However, there is a key limitation that is prevalent
in most previous work. Evolutionary biochemical studies have focused almost
exclusively on proteins that exhibit very rigidly defined biochemical features.
For example, the evolution of binding specificity has largely been studied in
proteins such as enzymes and transcription factors that exhibit exquisite binding
specificity for targets [74, 75, 76, 77]. These studies have revealed key patterns in
the evolution of specificity, such as consistent occurrence of subfunctionlization
and neofunctionalization following gene duplications [53, 58, 52, 78, 79]. Similarly,
studies of proteins binding to other biologically-relevant targets such as metal
ions have traditionally considered very well-defined coordination systems, like
those found in metalloproteases and Zinc finger proteins [80]. However, many
proteins do not exhibit such exquisite such exquisite biochemical properties
[81, 82, 83, 84, 85, 86, 87, 88]. A large number of proteins bind to targets with
low specificity and limited binding-site conservation. The biological relevance of the
biochemical properties of these proteins is less well understood. It is thus unclear
how well evolutionary studies of typical protein model systems translate to the
broad array of proteins with plastic biochemical properties.
The S100 Protein Family is a Useful Model System to Probe the Evolution of
Low-specificity Proteins
This dissertation focuses on case studies in evolutionary biochemistry that
address unanswered questions in molecular evolution. Chapter II consists of a
5
literature review addressing the evidence for global trends in protein evolution over
very long time scales. The remaining studies are unified by questions surrounding
the evolution of protein-target interactions in proteins that have labile binding
interfaces and/or highly-variable binding partners. Each case study dissects a
specific aspect of evolution at the molecular level. The studies use a combination
of experiments and computational analysis methods to address how biochemistry
relates to broader questions in evolutionary biology.
Chapters III, IV, and V of this dissertation make extensive use of the S100
proteins as an experimental model system, which warrants an introduction to
the protein family. The S100s are a large family of small, calcium-dependent
signaling proteins [89, 90, 91, 92, 93]. The proteins are generally homodimeric
and transduce signals via a calcium-ion driven conformational change [94, 89]. The
family originated at the base of the Metazoan lineage and subsequently diversified
over several hundred million years [95, 91, 96]. Mammals possess approximately
thirty S100 genes including those encoding fusion proteins, in which the S100 acts
as a single domain inside a larger domain architecture [91, 96, 97, 98, 99]. S100
proteins play a wide array of biological roles inside and outside of cells; including
inflammatory signaling [100, 101], regulation of cell proliferation [102, 103, 104],
antimicrobial activity [105, 106], and control of apoptosis in some cell types
[107, 108]. The diversity of functions performed by the S100s is perhaps surprising
considering the small size of the proteins, overall similarity of S100 amino acid
sequences, and conservation of the folded form. However, the proteins have
evolved an array of useful biochemical features that aid in carrying out biological
functions. The proteins possess the ability to bind both calcium ions and other
metal ions. Calcium-induced conformational changes result in the exposure of a
6
hydrophobic path on the S100 dimer surface, which facilitates binding of target
proteins [94, 93]. The specificity of these hydrophobic binding sites varies among
members of this family, although it has not been systematically studied prior to the
work in this dissertation [93, 109]. It is sometimes presumed that this biochemical
specificity contributes to the biological specialization of the S100s [93]. This notion
is supported by the fact that only some S100s are capable of binding to certain
target proteins. For example, many S100s bind to and activate the inflammatory
RAGE protein, but this not a universal trait of the family [101, 104]. However, it
has also been proposed that most functional specificity of S100 proteins is acheived
by control of differential expression [89, 90, 100].
The biological importance of binding to metal ions other than calcium—which
occurs at different ion binding sites—has not been well studied. This dissertation
primarily uses the S100s as a model to address evolutionary questions, because they
possess a rich evolutionary history, diverse biochemical features, and exhibit low
specificity for interaction partners. However, the evolutionary biochemical work
presented in chapters III, IV, and V also sheds light on biologically-relevant aspects
of the S100 family. The work provides several opportunities and resources for more
biologically-oriented future studies.
Chapter-by-chapter Breakdown of Dissertation
Chapter II comprises a literature review—co-written with Shion An Lim
(SAL), Susan Marqusee (SM), and my advisor Michael J. Harms (MJH). The
review addresses the question of whether or not proteins display global evolutionary
trends over very long time scales. Two case studies are used as key examples
of hypothesized trends: the gradual reduction of protein thermostability due to
7
cooling of the Earth and the gradual increase in protein binding specificity due to
continued specialization in ever-more-complex proteomes. Based on a thorough
summary of evolutionary biochemistry literature, there does in fact appear to be
some evidence for a gradual decline of thermostability on the billions-of-years time
scale. However, there are still relatively few studies that probe this question. A
more substantial body of evidence will need to be accumulated to make a strong
argument for a global trend. The need for more experimental evidence is even more
pronounced with regard to the question of broad trends in specificity. There are
few studies that have addressed this question directly, and none to date that have
done so using a truly unbiased experimental approach. This chapter provides an
overview of the idea that there are global trends in protein evolution, makes a
strong case that further experimental studies are needed to resolve the ongoing
debates on this topic, and suggests strategies and experiments to maximize the
current understanding in the field. The literature review presented in this chapter
was published in the journal Current Opinions in Structural Biology [47].
Chapter III probes the evolutionary lability of a biologically-important
biochemical feature. The S100 protein family is used as a model system to address
this question. The phylogenetic history of the S100s is reconstructed to yield the
highest-quality phylogeny of the S100 family to date. The history of transition
metal binding in the S100s is then traced by mapping the results of detailed in
vitro measurements of metal-ion binding onto this high-quality phylogeny. These
results show that binding of transition metals is conserved across almost the entire
S100 family, a more universal result than any previous study. By using mutagenesis
studies it is further established that not all S100 proteins use the same amino acids
or even the same site to bind metal ions. The binding of metal ions to a very early
8
branching S100 protein is measured for the first time, which demonstrates that
binding of transition metals is an ancestral feature of the S100 protein family. The
results of this chapter speak to the surprising level of lability—at the amino acid
level—of S100 protein metal binding sites; highlighting the fact that an ancestral
molecular phenotype can be maintained at the overall level of behavior even while
the underlying biochemical basis fluctuates over evolutionary time. The work in
this chapter has been published as a research article in PLoS One, co-authored with
Micah T. Donor (MTD), James S. Prell (JSP), and Michael J. Harms (LC Wheeler
is the first author) [96].
Chapter IV delves further into the biophysics of metal binding in one
particular member of the S100 protein family, S100A5. Little is known about
the biological roles of S100A5. A previous publication indicated that the protein
exhibits antagonism between the binding of Ca2+ and Cu2+ ions. This feature is
unique amongst S100 proteins and has been considered one of the key features
of S100A5. Proposed biological roles for the protein typically involve Ca2+/Cu2+
antagonism. In chapter IV, it is demonstrated that antagonism between the binding
of Ca2+ and Cu2+ is likely an artifact of the experiments done in the original study.
Instead, it is shown that S100A5 can bind Ca2+ and Cu2+ independently, which
changes the biological implications of metal binding to the protein. Furthermore,
this chapter adds to the evolutionary story of metal binding by demonstrating
another unique biochemical modification that has evolved in the S100 family. The
work in this paper is currently in press as a research article in the journal BMC
Biophysics, co-authored with Michael J. Harms.
Chapters V and VI address the evolution of binding specificity in two proteins
following gene duplication from a common ancestor. Again, the S100 protein
9
family proves to be a useful model system to address this question. The proteins
S100A5 and S100A6 arose from a duplication approximately 300 million years ago.
They subsequently evolved to have different protein-binding specificity, distinct
expression patterns, and perform different cellular roles. Despite having distinct
specificity, both proteins can be described as sloppy or having very low biochemical
specificity. Previous studies have addressing the question of evolving specificity
have used highly-specific proteins and small sets of known binding partners that
are biased by a priori knowledge. For these reasons, previous studies are limited in
understanding the evolution of specificity in low-specificity proteins. The sloppiness
of S100s makes them an excellent system to study how binding specificity evolves in
an inherently noisy low-specificity system.
Chapter V comprises a biochemical study of the evolution of peptide binding
specificity in the S100A5-S100A5 clade. The oldest ancestor of S100A5 and
S100A6 is resurrected using ancestral sequence reconstruction (ASR). Detailed
calorimetric measurements of binding to a small set of peptide targets are then
used to compare specificity across a set of orthologous and paralagous S100A5 and
S100A6 proteins. It is demonstrated that peptide binding is driven primarily by
the hydrophobic effect and that specificity is readily changed by the addition of
mutants into the peptide binding interface. Furthermore, this work reveals that
the specificity of S100A5 and S100A6 have undergone an apparent pattern of
subfunctionilization. This result is striking, because it demonstrates that proteins
with very low biochemical specificity can undergo similar patterns of evolution to
proteins with high specificity. The work in this chapter is in review as a research
article in the journal Biochemistry, co-authored with Jeremy A. Anderson (JAA),
10
Anneliese J. Morrison (AJM), Caitlyn E. Wong (CEW), and Michael J. Harms.
The submitted article has also been uploaded to the preprint server BioArxiv [110].
Chapter VI introduces new experimental and analysis pipelines for studying
the evolution of specificity. An unbiased high-throughput approach, incorporating
phage display and deep sequencing, is used to measure the binding of a large
random peptide library to human S100A5, human S100A6, and the last common
ancestor. Strikingly, the pipeline uncovers the lack of sequence-based rules that
govern binding preferences of the S100 proteins. Instead, preferences appear to
be defined by general physicochemical features of the peptide targets that can be
used to generate a predictive model. The pipeline reveals overall patterns in the
evolution of specificity along the S100A5 and S100A6 lineages. S100A5 exhibits
a strong signal of subfunctionilization, while S100A6 appears to differ little from
the ancestor. This chapter highlights the importance of using unbiased approaches
to study the evolution of specificity and speaks to the necessity of understanding
different classes of protein features when probing molecular evolution. The work
in this paper is being prepared as a research article that will be submitted to the
journal MBE, co-authored with Michael J. Harms.
Broader Impacts
The studies described in this dissertation contribute the broader evolutionary
biochemistry literature by addressing a set of topics that have remained ambiguous.
There has been a lack of studies addressing the evolution of biochemical features
in proteins that have highly diverse sets of binding partners. Much of the
experimental basis for understanding evolution of protein binding specificity has
instead been based on proteins with exquisite specificity profiles [74, 111]. For
11
example, enzymes, receptors, and transcription factors that have well-defined
chemical binding preferences are workhorses of evolutionary biochemistry studies
[46, 53, 52, 58]. The work presented in the following chapters probes key aspects
of the evolution of binding specificity in proteins without such obvious rules. The
S100 proteins act as an excellent model system to tackle these problems, because
they have a variety of conserved biochemical behaviors that have nontheless been
labile at the amino acid level during diversification of the family [96]. In particular,
the ability of the S100s to bind to a variety of transition metals with similar
affinities, and the ability to bind extremely diverse short peptide regions of target
proteins are used as exemplary biochemical features.
Studies on both the binding of metal ions and peptides reveal several key
evolutionary trends that speak to the evolution of biochemical features in sloppy
proteins such as the S100s. 1) a biochemical output—such as binding of transition
metals or peptides with moderate afffinity—can be acheived and conserved despite
extensive variability in amino acid ligands that form binding sites. 2) Specificity
can nontheless be achieved and conserved in proteins with highly diverse binding
partners and labile binding sites. 3) Evolutionary patterns in proteins with low
biochemical specificity nontheless resemble those observed in high-specificity
proteins. 4) Evolutionary patterns can differ along duplicate lineages following gene
duplication. 5) Unbiased high-throughput techniques are essential for inferring
historical patterns of specificity in proteins with large diverse sets of binding
partners. These observations contribute substantially to our understanding of
what types of biochemical features are important during the evolution of proteins
that do not meet the criterion of exquisite binding specificity. Despite relaxed
binding rules, flexible binding sites, and highly-diverse binding partners these
12
proteins nonetheless exhibit evolutionary patterns that are reminiscent of those
the field has come to expect from canonical examples. This key result suggests
that proteins such as the S100s—despite the variability of there biochemical
behaviors—are therefore operating under similar rules to other proteins. Therefore,
proteins with highly variable binding partners and labile binding sites do not
necessarily represent a fundamentally different class of proteins—subject to special
evolutionary constraints—but rather are similarly constrained by evolutionary and
biochemical forces in a way that can be understood by careful experimentation.
13
CHAPTER II
THERMOSTABILITY AND SPECIFICITY OF ANCIENT PROTEINS:
ASSESSING THE EVIDENCE FOR GLOBAL TRENDS
Author Contributions
Lucas Wheeler, Shion An-Lim, Michael Harms, and Susan Marqusee
conceptualized the review and chose specific topics. Lucas Wheeler and Shion
An-Lim conducted the literature review. MJH and SM administered the project.
Michael Harms and Lucas Wheeler generated figures. Lucas Wheeler, Shion An-
Lim, Michael Harms, and Susan Marqusee wrote and edited the manuscript.
Abstract
Were ancient proteins systematically different than modern proteins? The
answer to this question is profoundly important, shaping how we understand the
origins of protein biochemical, biophysical, and functional properties. Ancestral
sequence reconstruction (ASR), a phylogenetic approach to infer the sequences
of ancestral proteins, may reveal such trends. We discuss two proposed trends:
a transition from higher to lower thermostability and a tendency for proteins to
acquire higher specificity over time. We review the evidence for elevated ancestral
thermostability and discuss its possible origins in a changing environmental
temperature and/or reconstruction bias. We also conclude that there is, as yet,
insufficient data to support a trend from promiscuity to specificity. Finally, we
propose future work to understand these proposed evolutionary trends.
14
Introduction
Ancestral sequence reconstruction (ASR) has opened a window into the
sequences and properties of ancient proteins [45, 44]. In ASR, a multiple sequence
alignment of modern protein sequences is used to construct a phylogenetic tree
and the sequences of ancient proteins are inferred for specific ancestors on this
tree (Figure 1a). By synthesizing the genes encoding these sequences, these
reconstructed ancient proteins can be experimentally characterized. This approach
has yielded an explosion of results in recent years, revealing important mechanistic
insights into the evolution of protein forms and functions [112, 113, 51, 114, 115,
116, 117, 118, 60, 56].
One intriguing possibility is to use ASR to investigate whether ancient
proteins were systematically different in the past, leading to parallel, directional
changes in properties over evolutionary time (Figure 1a). Such trends are
inaccessible using comparisons between modern pro- teins. For example, studies
of the modern proteins in Figure 1b would lead one to believe the last common
ancestor had a ‘blue’ trait. By allowing direct measurement of ancestral properties,
ASR can reveal properties (‘red’, in this case) not evident in the modern proteins.
If the evolution of protein properties were directional, it would provide a new
level at which to explain and understand these properties. This is of deep interest
to evolutionary biochemists seeking to identify the general principles that shape
protein evolution. Further, a trend could mean that sampling evolutionary history
would provide access to qualitatively different proteins [119] — a boon to engineers
looking for proteins with desirable properties as templates for further engineering
[120, 121].
15
FIGURE 1 Ancestral Sequence Reconstruction (ASR) can be used to trace the
history of evolving proteins. (a) The ASR pipeline. A multiple sequence alignment
(MSA) of extant sequences of a protein family is generated using an alignment tool.
The MSA is then used to estimate an appropriate model of sequence evolution and
to estimate a phylogenetic tree. The sequences at ancestral nodes of interest (filled
black circles) are then inferred (underlined) based on the tree and a phylogenetic
evolutionary model. The maximum likelihood sequences are those with the highest
likelihood of generating the known sequences of modern proteins given the tree
and phylogenetic model. Genes encoding the inferred ancestral proteins can be
synthesized, expressed, and purified using standard molecular biology tools. The
properties of the ancestral proteins can then be experimentally characterized. (b)
A phylogenetic tree showing the evolution of a protein that can vary between two
properties—red and blue. The last common ancestor was red, but the modern
proteins are blue because of parallel changes along the lineages. This red ancestor
can only be accessed using an approach like ASR.
16
Recent work has suggested two trends over evolutionary time: decreasing
protein stability [112] and increasing specificity [60]. Particularly for protein
engineers, these trends could be extremely powerful, as high stability and broad
substrate specificity are desirable traits that could be accessed using ASR. In this
review, we review the evidence supporting and contradicting these trends, as well as
the future work required to test and extend these conclusions.
Reconstructed precambrian ancient proteins exhibit elevated
thermostability
We begin by evaluating evidence from ASR studies that indicate the
deepest ancestors of mesophilic proteins were highly thermostable. Over billion-
year timescales, reconstructed ancestral proteins display systematically higher
thermostability. Reconstructed EF-Tu [112], thioredoxin [118], DNA gyrase [116],
nucleotide diphosphate kinase [115], and β-lactamase [60] all exhibit melting
temperatures (Tm) far higher than their extant descendants. Some have argued
that this is a universal trend [119] and have interpreted this as evidence for an
ancient, hot environment [112]. The evidence, however, is not completely universal,
as reconstructed RNase H along a mesophilic lineage gives a relatively flat trend in
stability over similar time scales [56].
One difficulty in comparing these studies is that different proteins have
different absolute requirements for stability. For example, the Tm’s of EF-Tu
bacterial homologs are generally ∼2 ◦C above the environmental temperature
(Tenv), while the Tms of RNase H are ∼30 ◦C above Tenv. As a result, Tms
between protein families are not directly comparable. One way to overcome this
challenge is to convert the measured Tm of each protein to an estimate of Tenv, as
17
Tm often correlates with the growth temperature of the organism from which it was
derived [122]. In most cases, this correlation arises to maintain stability above some
critical threshold [123]. Empirically, Tm generally rises by ∼1 ◦C per 1 ◦C of Tenv,
with an offset reflecting the required stability of the protein (e.g. 2 ◦C for EF-Tu
and 30 ◦C for RNase H) [122]. This correlation has been directly established for
three of the proteins above — EF-Tu, DNA gyrase and RNase H [112, 115, 56] —
and holds generally for many other proteins [122].
When placed on the Tenv scale, reconstructed proteins report an elevated
environmental temperature ∼3 billion years ago, though with significant scatter.
Figure 2a shows the estimated Tenv over time for 17 ancestors of proteins found
in the lineages leading to mesophilic E. coli. A total least-squares fit to the data
reveals a highly significant negative slope that explains 75% of the variation in the
data (R2 = 0.75). In contrast, the estimated Tenv over time for ancestors leading
to thermophile T. thermophilus exhibits a slope statistically indistinguishable from
0 ( Figure 2b). When taken in aggregate, these data support the hypothesis that
the deepest ancestors had stabilities similar to proteins from modern thermophiles.
While these data focus on the E. coli and T. thermophilus lineages, their deepest
ancestors are shared both with each other and with most modern bacteria, thus
suggesting a global transition away from ancient thermostability, at least along
mesophilic lineages. It is not clear from these sparse, lineage-specific data whether
mesophilicity evolved in parallel along many lineages or whether it evolved on a few
key, early ancestral branches.
18
FIGURE 2 Ancient reconstructed ancestors exhibit elevated thermostability.
Estimated environmental temperatures experienced by proteins on lineages
leading to (panel a) E. coli or (panel b) T. thermophilus. Point/line series indicate
individual protein families: EF-Tu (red), thioredoxin (orange), β–lactamase
(green), RNase H (blue), and nucleotide diphosphate kinase (purple). Measured
melting temperatures for ancestors that give rise to E. coli proteins were mined
from published literature [112, 115, 118, 60, 56]. These were then converted to
estimates of Tenv using measured relationships [112, 115, 56] or by adding an
offset determined by the difference in Tm and Tenv for the E. coli (panel a) or
T. thermophilus homologs (panel B). Time estimates were drawn from original
publications or estimated from Battistuzzi et al. [124]. Time errors are standard
errors. Tenv standard errors were set to ± 10 ◦C to account for uncertainty in Tm
and the Tm/Tenv correlation. (This is a conservative estimate: when measured for
NDK, RNase H, and EF-Tu [112, 115, 56] , the Tm the standard error was <5
◦C
and the Tm to Tenv variance was <5
◦C.) Black line is a fit determined by total
linear regression. To find the standard deviation of fit slopes, we generated 1000
pseudo datasets sampled from the time and Tenv uncertainties. For E. coli, the fits
reject a slope = 0 (p = 3 × 10−8). For T. thermophilus, the fits fail to reject a zero
slope (p = 0.45).
19
Trends in Thermostability Are Complex
While ASR studies suggest that the most ancient proteins were highly
thermostable, they do not support a smooth trend in thermostability over time.
Ancestors exhibit extensive random scatter to the proposed trend. Such variation is
expected as, over more recent timescales, protein stability fluctuates in response to
neutral drift or adaptation in apparently random fashion [114, 117, 125, 126, 127].
The observed variation may also reflect uncertainty in the reconstruction, multiple
heterogeneous environments experienced by ancient organisms, or uncertainty in
the map between Tm and Tenv.
This scatter extends to the mechanism of stabilization. A recent study of
the evolution of thermostability in RNase H revealed that the thermodynamic
mechanism of stabilization for the ancestral proteins could fluctuate, even as
the Tms of the proteins varied smoothly [56]. This indicates that, even while
under selection to maintain stability in a given environment, proteins are free to
accumulate mutations to access alternate mechanisms of stabilization. Practically,
studying multiple ancestors may reveal new sequence and thermodynamic
determinants of stability. Although thermostability and the mechanism of
stabilization appear to change independently for RNase H, the generality of this
result for other proteins remains unknown.
Finally, these ASR studies generally used small, monomeric, and well-behaved
proteins. Although such simple proteins may be representative of the first proteins
to arise, studies on a greater diversity of protein families will reveal whether
observed trends are applicable to the entire proteome.
20
Can Reconstruction Errors Inflate Ancestral Thermostability?
While existing data are suggestive, further work must be done to test the
hypothesis of ancient thermostability. The primary concern is that ancestral
proteins are statistical reconstructions that cannot be directly verified. Even with
good statistical support, it is unlikely that the reconstructed ancestor will have
the exact sequence of the true ancestral state. Addressing and understanding this
uncertainty will be critical for establishing or refuting the hypothesis that the
earliest proteins were thermostable.
High stability is unlikely to arise from random errors in the reconstruction.
To account for uncertainty, ASR studies have generated different versions
of ancestral sequences to assess the robustness of the measured stability to
phylogenetic errors. For example, Hart et al. measured ten alternate sequences
of a ∼3 billion year-old ancestor and found a Tm of 76.7 ± 2 ◦C (compared to
68.0 ◦C of RNase H from E. coli) [56]. Using such approaches, many sources of
random error have been investigated: uncertain tree topology [112, 115, 128, 49],
alternate evolutionary models [129], choice of reconstruction method [114, 49],
different amino acid frequencies [112], and reconstruction ambiguity [112, 115, 119,
56, 130]. In all such studies, the properties of the ancestors have proven robust to
uncertainty.
Of bigger concern are sources of systematic error in ASR — in particular,
a bias towards elevated stability for deeper ancestors [131, 132, 133, 134]. Some
have argued that ASR could be biased towards consensus sequences, which may
lead to an increase in stability [132, 135, 136]. Simulations have also suggested
that maximum likelihood (ML), the most popular form of ASR, may give rise to
artificially elevated stability [131]. If different stabilizing mutations accumulate
21
along different lineages, ML may incorrectly incorporate all of the stabilizing
mutations, creating an artificially stable ancestor. There is also concern that
variable amino acid distributions and mutation rates can alter reconstructions
[133, 134].
There have been some limited experimental tests of these computational
predictions of bias. Comparisons between ancestral and consensus sequences
have shown distinct statistical and functional properties [115, 116, 120, 137].
This suggests that any consensus bias that exists must be subtle. Other work has
indirectly addressed this concern - the molecular basis of stability fluctuating over
evolutionary time in the RNase H family is not consistent with bias arising from a
single, convergent stabilization mechanism [56, 131].
Important experiments remain. One test would be a systematic comparison
of ancestors reconstructed using both ML and an alternative, Bayesian, method.
A Bayesian reconstruction averages over uncertainty; therefore, it is not expected
to have the same stability bias as ML reconstructions [131]. Observing high
thermostability in ancient Bayesian ancestors would be strong evidence that
thermostability is not an artifact of the ML method. The experiment is not perfect,
however, as Bayesian ancestors have more errors than ML ancestors as a result of
incorporating uncertainty [49]. Because of this, they may not accurately reflect
the ancestral state. For example, one study found that a Bayesian ancestor had
fundamentally different folding properties than the ML ancestor or any modern
protein in the family [114], consistent with a poor reconstruction.
Another test for bias would be to study the thermostability of reconstructed,
recent ancestors of rapidly evolving proteins with known mesophilic ancestral
environments. A rapidly evolving protein will accumulate similar amounts of
22
mutations relative to the deep ancestors studied to date, albeit on a much shorter
timescale. If ML reconstructions lead to biased stability, we would predict that
recent ancestors of rapidly evolving proteins would exhibit erroneously elevated
stability.
A Trend from Promiscuous to Specific is Not Yet Established
Another proposed trend is that proteins have, on average, changed from lower
to higher specificity over deep evolutionary time [60, 119]. This stems from the
idea that low specificity proteins — particularly enzymes — were important for the
ability of primordial organisms to perform diverse chemical processes with a limited
proteome [138] (Figure 3a). It is also well established that increased specificity
often follows gene duplication via subfunctionalization from a multi-functional or
promiscuous ancestral protein [139, 140] (Figure 3b). Given these considerations,
proteins may, on average, increase in specificity over time.
To date, few attempts have been made to investigate the specificity of the
deepest ancestors. One recent study found that an ancestral β-lactamase was
both promiscuous and less efficient than its descendants [60]. Likewise, a study
of RuBISCO found a promiscuous and inefficient ancestor, though this may be an
artifact of poor reconstruction [141]. Other studies have determined the activities of
ancient proteins, but not their specificity [114, 115]. On the basis of these data, it
is difficult to make solid conclusions about specificity trends; more measurements of
ancestral specificity are warranted.
The second model — gene duplication followed by subfunctionalization —
could conceivably operate continuously through evolution, leading to progressively
higher specificity proteins over all evolutionary timescales (Figure 3b). Studies of
23
FIGURE 3 Models for increased specificity of proteins over time.(a) Large dotted
ellipses denote cells. Small ellipses are proteins, colored by their specificity. Because
early proteomes were presumably smaller than modern proteomes, it has been
proposed that ancient proteins had to be promiscuous to achieve all the necessary
chemistry. As organisms evolved, their proteomes expanded, allowing each protein
to become more specific. (b) Higher specificity (subfunctionalization) is one of the
possible outcomes of a gene duplication event. A gene encoding a low-specificity
ancestral protein duplicates. Its descendants can then gain specificity and lose the
promiscuous trait.
24
the evolution of specificity for ancestors from the last ∼500 million years suggest,
however, that on average, proteins do not tend towards higher specificity over time.
Some promiscuity-to-specificity transitions have been identified [60, 58, 53, 142, 59].
However, other studies have found switches between two high-specificity states [52,
55], evolution through a less-specific intermediate [143, 57, 79], and even decreased
specificity over time [144].
This complexity likely arises because specificity is, at minimum, a bimolecular
process that involves both the protein and its target. Further, constraints placed
by the architecture of the larger system into which the proteins are embedded have
been shown to shape specificity [79, 144, 145, 146, 147, 148, 149, 111]. For example,
bioinformatic analyses have revealed that protein components of higher-complexity
regulatory modules tend to possess lower specificity than those in simpler modules
[150]. We therefore believe that it will be difficult to resolve a global evolutionary
trend from lower to higher specificity.
Conclusions
A number of ASR studies are starting to reveal a consistent pattern of
elevated thermostability for the deepest ancestors. This trend of decreasing
thermostability among mesophilic lineages is not smooth, involving fluctuations in
both Tm and mechanism of stabilization. Whether this reflects a real evolutionary
signal or simply an artifact of the reconstruction method remains to be seen. From
an engineering perspective, a ML reconstruction of an ancient ancestor appears to
be a reasonable strategy for generating a thermostable, thermophilic-like protein
that differs from a simple consensus sequence. This approach is not guaranteed —
for example, reconstructed RNase H displays non-thermophilic-like thermostability
25
∼3 billion years in the past — however, on average, deep ancestral proteins appear
to be more stable than their modern counterparts. We should also note that these
are deep trends, and thus we would not predict recent ancestors to exhibit any
detectable trend in stability, consistent with recent studies [114, 117, 126].
Information about the specificity of deep ancestral proteins remains sparse
and will thus require further investigation. Studies of more recent proteins indicate
that multiple modes of specificity evolution can be at play, suggesting a lack of
general trends.
Protein evolution is often viewed as a random, microscopically-reversible
trajectory along a fitness landscape. A global trend would suggest that the fitness
landscape changed in a systematic way, even while microscopic reversibility held.
Such systematic changes in fitness landscape would, in turn, shape the pathways
taken by proteins and provide another level at which to understand the emergence
of new properties. ASR studies are hinting at a change in fitness landscape. This
may help us, at a broad brush level, gain insight into the origins of protein features
and properties.
Bridge to Chapter III
In this chapter, the current evolutionary biochemistry literature was reviewed
to assess the available evidence for broad evolutionary trends in protein properties.
Two highly-referenced examples of trends were analyzed: the hypothesis that
proteins have undergone a gradual, monotonic decrease in thermal stability over
long time scales and the assertion that proteins generally become more specific over
time as proteomes become increasingly complex. The conclusion was drawn that
there is some evidence to support a long-term decreasing trend in thermostability.
26
However, there is still relatively sparse information available. More experiments will
need to be targeted toward addressing this question to establish firm conclusions.
With regard to the evolution of specificity, this chapter concluded that there
is vastly insufficient evidence to draw strong conclusions. Furthermore, unlike
decreases in thermostability there is no unifying theoretical reason to expect global,
parallel increases in specificity over time. This chapter established the need for
further experimentation and more complete theories to address the issue of global
trends in protein evolution. Chapter III addresses trends in a specific biochemical
feature during the evolution of an entire protein family. An interesting observation
is made regarding the evolutionary lability of this feature and its underpinnings at
the amino acid level.
27
CHAPTER III
MULTIPLE EVOLUTIONARY ORIGINS OF UBIQUITOUS CU2+
AND ZN2+ BINDING IN THE S100 PROTEIN FAMILY
Author Contributions
Lucas Wheeler and Michael Harms conceptualized the study and designed
experiments. Michael Harms acquired funding for the study. Lucas Wheeler and
Micah Donor performed experiments and analyzed experimental data. Michael
Harms and James Prell administered the project. Michael Harms conducted
phylogenetic analyses. Lucas Wheeler and Michael Harms wrote and edited the
manuscript.
Abstract
The S100 proteins are a large family of signaling proteins that play critical
roles in biology and disease. Many S100 proteins bind Zn2+, Cu2+, and/or Mn2+
as part of their biological functions; however, the evolutionary origins of binding
remain obscure. One key question is whether divalent transition metal binding
is ancestral, or instead arose independently on multiple lineages. To tackle this
question, we combined phylogenetics with biophysical characterization of modern
S100 proteins. We demonstrate an earlier origin for established S100 subfamilies
than previously believed, and reveal that transition metal binding is widely
distributed across the tree. Using isothermal titration calorimetry, we found that
Cu2+ and Zn2+ binding are common features of the family: the full breadth of
human S100 paralogs—as well as two early-branching S100 proteins found in the
28
tunicate Oikopleura dioica—bind these metals with µM affinity and stoichiometries
ranging from 1:1 to 3:1 (metal:protein). While binding is consistent across the tree,
structural responses to binding are quite variable. Further, mutational analysis
and structural modeling revealed that transition metal binding occurs at different
sites in different S100 proteins. This is consistent with multiple origins of transition
metal binding over the evolution of this protein family. Our work reveals an
evolutionary pattern in which the overall phenotype of binding is a constant feature
of S100 proteins, even while the site and mechanism of binding is evolutionarily
labile.
Introduction
The S100 protein family is an important group of calcium binding proteins
found in vertebrates [89, 91]. Humans possess 27 family members that play diverse
functional roles in inflammation [151, 101, 152], cell proliferation [153, 154, 155],
and innate immunity [156, 105, 157]. S100 proteins are particularly prominent in
inflammatory diseases and cancers, where they are used both as clinical markers
and drug targets [100, 158, 159, 160, 161, 102, 162, 163, 164, 165]. S100 proteins
are found only in chordates and are highly diverged from other calcium binding
proteins [91, 100].
Most S100 proteins share a common homodimeric structure in which ∼10 kDa
monomers come together to form a compact α-helical fold (Fig 4A). Each monomer
binds two Ca2+ ions in conserved calcium binding motifs, inducing a conformational
change that exposes a hydrophobic surface [166, 94, 167]. This surface can then
interact with and modulate the activity of downstream target proteins [168, 169].
29
FIGURE 4 Transition metal binding occurs at a common site in diverse S100
proteins. Overlay of the crystal structures of S100B (orange, PDB 3CZT) and
S100A12 (blue, PDB 1ODB) bound to Ca2+ and transition metals. Ions are shown
as colored spheres: Ca2+ (blue), Zn2+ (gray) and Cu2+ (copper). Residues ligating
the transition metals are are shown as sticks. Boxed region is shown in detail in
panel B.
In addition to Ca2+, many S100 proteins interact with divalent transition
metals such as Zn2+, Cu2+, or Mn2+ as part of their biological functions [170, 171].
Such functions include metal transport [172], modulation of signaling [173], and
antimicrobial activity [105]. Their transition metal binding constants tend to be
∼µM, consistent with their roles in metal transport and metal-dependent signaling
[174, 175]. Despite the importance played by these metals, transition metal binding
has not been studied systematically across the family [170, 171]. While one key
transition metal site—at the dimer interface—has been studied extensively (Fig
4B), the transition metal binding capacity of many S100 proteins remains unknown.
For many others, there are conflicting reports about the binding affinities, sites, and
stoichiometries for binding to divalent transition metals [170, 171].
Evolutionary history provides a powerful lens through which to understand
this metal binding diversity and its accompanying functional diversity.
Understanding when a feature evolved in the family, and thus which homologs
might share the feature, helps translate observations for one family member
into predictions about other family members. One key question is whether
30
transition metal binding is a shared ancestral feature, or whether it has been
acquired independently on multiple lineages. Although all five crystal structures
of S100 proteins bound to transition metals have similar binding sites (Fig 4B),
experimental evidence suggests that other S100s bind to divalent transition metals
at a different site than the one identified crystallographically [176, 177], consistent
with at least one more acquisition of transition metal binding.
A well-supported phylogeny of the S100 protein family would allow
observations of transition metal binding to be mapped as evolutionary characters,
thereby allowing inferences about the evolutionary history of the character. Several
phylogenies have been published [91, 100, 178, 95, 179], however, these trees are
not fully consonant with one another, making interpretation difficult. Previous
analyses were limited by the number of S100 sequences available, particularly from
early-branching vertebrate species. Further, all but one [95] relied on distance-
based phylogenetic methods. Increased taxonomic sampling, combined with more
advanced phylogenetic methods, will provide a much clearer picture of S100
evolution.
We therefore set out to understand the evolution of transition metal binding
in this family through a combination of phylogenetic analysis and biochemical
characterization of select human paralogs. Further, to establish the ancient features
of the family, we performed the first-ever biochemical characterization of two early-
diverging S100 proteins from the tunicate Oikopleura dioica. Our work sheds light
on the evolutionary process that gave the diversity of modern S100 proteins, as well
as revealing the broad-brush evolution of the transition-metal binding phenotype of
this important protein family.
31
Results
The S100 family arose in the ancestor of Olfactores
Our first goal was to establish the taxonomic distribution of the S100 family.
We began with an iterative BLAST approach. We used the full set of 27 human
S100 family members (S1 Table in supplementary directory) as a starting point
for PSI-BLAST against the NCBI non-redundant protein database. In addition
to identifying thousands of S100 sequences, this protocol picked up non-S100
calcium binding proteins such as calmodulin and troponin, indicating that we had
saturated S100 proteins in the database. We filtered our hits by reverse BLAST.
All S100 hits were within vertebrates, with the exception of four hits from the
tunicate Oikopleura dioica. To further support the taxonomic distribution of the
S100s, we then used BLAST to search directly in the genomes and transcriptomes
from representative tunicates, cephalochordates, hemichordates, and echinoderms.
Only a transcriptome from the tunicate Molgula tectiform yielded a further
S100 hit. We also queried the HMMER database, but found no new S100 family
members. The presence of S100 proteins in tunicates and vertebrates (Olfactores),
but not other chordates, suggests that the first S100 arose in the last common
ancestor of tunicates and vertebrates, ∼700 million years ago [180]. These results
are consistent with previous studies that noted the relative youth of the S100
family[91, 100, 95, 179].
Model-based phylogenetic approaches reveal well-supported clades
We next constructed a phylogenetic tree, using sequences drawn from across
Olfactores. Phylogenetic analyses of this family are challenging as it is large
32
and diverse. For example, the average sequence identity of the 27 human family
members is 29.5%, with the most divergent pair (A3 and A14) only 13.2% identical.
Further, the small size of these proteins (∼100 amino acids) means they have few
evolutionary characters and, thus, relatively weak phylogenetic signal. Finally,
many S100 paralogs exhibit highly specific tissue distributions, meaning that
transcriptomes can provide very incomplete pictures of the S100 complement of a
given organism.
To construct a tree despite these difficulties, we assembled a high-quality
dataset of 564 sequences, from 52 species, through targeted searches of key
genome/transcriptome/proteome databases (S2 Table, S1 Spreadsheet in
supplementary directory). In an effort to bracket the class-level evolutionary
origin of each S100 ortholog—despite incomplete sequence data and possible
differential loss along each lineage—we included multiple species within each
class: two Tunicata (one Ascidiacea, one Appendicularia), two Agnathan (jawless
fishes), seven Chondrichthyans (cartilaginous fishes), eight Actinopterygii (ray-
finned fishes), three Sarcopterygii (lobe-finned fishes), seven Amphibians, fourteen
Sauropsids (birds and reptiles), and seven Mammals (two monotremes, two
therians, and three eutherians). We generated a 133 character alignment from
these sequences (Fig 24 in supplement and S2 Fig in supplementary directory, S1
Alignment) and used this for model-based phylogenetics.
We used both maximum likelihood (ML) and Bayesian approaches to
construct phylogenetic trees for the family (Fig 5, S1 Tree and S2 Tree in
supplementary directory, Fig 25 in supplement ). Both approaches resolved well-
supported clades containing each of the human seed paralogs. This allowed us to
assign the orthology, relative to the human proteins, for 500 of the 564 sequences
33
in our data set (S1 Spreadsheet in supplementary directory). In addition, the
ML and Bayesian approaches revealed a set of consonant clades: A2/A3/A4;
A5/A6; the calgranulins (A7/A8/A9/A12); A13/A14; and the so-called “fused”
family (cornulin/ trichohyalin/repetin/hornerin/filaggrin) (Fig 5 and Fig 25 in
supplement). In the Bayesian consensus tree, no further relationships could be
resolved. Several other clades were resolved in the ML tree (Fig 5); A2/A3/A4
groups with A4/A5; A10 with A11; and A13/A14 groups with A16. In both trees,
the sum of the branch lengths was extremely long, reflecting the high diversity of
the family.
We were particularly interested in placing the tunicate S100 proteins on
the tree. If we could assign the orthology of these proteins, we could potentially
identify the most ancient S100 orthlog(s). Unfortunately, the placement of these
sequences on the tree was neither evolutionarily reasonable nor stable between
phylogenetic runs. For example, a single tunicate protein might end up on a long
branch within a clade of mammalian proteins in one analysis, and then in an
entirely different location in another. We thus excluded the tunicate proteins from
the final phylogenetic analysis.
Uncertainty in the deepest branching pattern precluded rooting of the tree.
We attempted to root the phylogeny by three methods; however, none proved
successful. The first method was to include non-S100 calcium-binding proteins
identified in our BLAST searches (sentan, calcineurins, troponins, and calmodulins)
as an outgroup. With the exception of sentan, these non-S100 proteins grouped
together; however, the branch leading to the clade was too long to allow robust
placement relative to the S100 proteins—minor changes to the alignment and/or
tree-building protocol would radically change their relationship to the rest of
34
FIGURE 5 Model-based phylogenetics reveal several S100 subfamilies. Maximum
likelihood phylogeny of 564 S100 proteins drawn from 52 Olfactores species.
Wedges are collapsed clades of shared orthologs, with wedge height denoting
number of included taxa and wedge length denoting longest branch length with
the clade. Support values are SH-supports, derived from an approximate likelihood
ratio test. Rooting is arbitrary, but roughly balances the distribution of jawless
fishes across the ancestral node. Icons indicate taxonomic classes represented
within each clade: tunicates (black), jawless fishes (pink), cartilaginous fishes
(purple), ray-finned fishes (light blue), lobe-finned fishes (blue), amphibans (green),
birds/reptiles (yellow), and mammals (red). Inset shows estimated divergence times
for each taxonomic class in millions of years before present.
35
the tree. We also attempted to use the tunicate proteins, but as they could not
be placed, this was ineffective. Finally, we attempted to minimize the number
of duplications and losses across the tree; however, the lack of resolution of the
deepest nodes also made identifying the precise origin (and thus gain/loss) of each
paralog problematic.
Synteny and taxonomic distribution further support relationships among S100
proteins
Because model-based phylogenetic methods provided relatively weak support
for relationships within in the family, we used the taxonomic distribution of
orthologs and synteny to further support the relationships we observed in the
model-based approaches. Fig 6 shows distribution of observed orthologs to human
genes across the species included in our analysis. (Species phylogenies taken from
[181, 182, 183, 184, 185, 186, 187, 188, 189]). We mapped these orthologs onto the
arrangement of these genes in the human genome (top). Four S100 genes (G, B, P,
and Z) are scattered on different chromosomes, while twenty-two S100 genes (A1
through A10) form a contiguous block on a single chromosome. This tight linkage
group has been noted previously [100, 179, 190], and arose at least as early as the
bony vertebrates [179].
There is strong correlation between the S100 subfamilies identified in model-
based phylogenetics and the distribution of the genes across human chromosome
I. Proteins with shared evolutionary relationships form blocks across this region,
suggesting local expansion by gene duplication. The ML relationship between
orthologs are shown above the plot in Fig 6. The clades identified in our model-
based phylogenetics form individually contiguous blocks: A13-A16, A2-A6, A7-A12,
36
FIGURE 6 Model-based phylogeny, synteny, and taxonomic distribution provide a
consonant picture of S100 evolution. The human S100 orthologs are shown across
the top, in the order they occur in the human genome. B, P, G, and Z occur on
different chromosomes; A1-A10 are in a contiguous region of chromosome I. Sentan,
an evolutionary relative, is also on a different chromosome. Species are shown
on the left, organized by taxonomy. Color indicates taxonomic class, as in Fig 5.
Squares denote the presence of an ortholog to the human gene for each species;
a number in the box indicates the number of co-orthologous genes found in that
species (if more than one); squares fused into a rectangle indicate a gene found
in an earlier branching lineage that subsequently duplicated somewhere along the
lineage leading to Homo sapiens. Total number of genes found for each species
are shown on the left. The number of genes that were not orthologous to human
genes (or could not be classified) are shown on right. Top tree shows the maximum-
likelihood phylogeny of the family mapped onto the S100 genes found in the human
genome. Circles denote SH support ≥ 0.85 (black); ≥ 0.75 (gray), < 0.75 (white).
Branches supported by both the ML phylogeny and synteny are shown in black;
branches supported by only the ML tree are shown in gray.
37
S100-fused, and A11-A10. This consonance between the phylogenetic signal and
genomic arrangement supports the shared ancestry of these subfamilies.
The species distribution of these orthologs then provides insight into the
diversification of the family. For example, A10, A11, or their common ancestor
(A10/A11) are found in all vertebrates, demonstrating that this protein arose no
later than the last common ancestor of vertebrates. Because some genes may have
been missed within each species—either through lineage-specific loss or incomplete
genomic/transcriptomic coverage—this is a lower bound on the age of the gene.
After its origin, A10/A11 then diversified in later lineages. In the bony fishes,
A10 expanded, as reflected in the increased numbers of genes co-orthologous to
A10/A11. A10/A11 gave rise to the tetrapod paralogs A10 and A11 via tandem
gene duplication in the ancestor of the lobe-finned fishes.
Another ancient S100 by this analysis is A1, which, intriguingly, brackets the
other end of the contiguous S100 genome region mammals and some fishes [179].
The simplest interpretation of this pattern would be that the A1 or A10/A11 gene
was the earliest gene in this syntenic block, and that the remaining family arose by
serial expansion from that starting point.
Other ancient S100 orthologs are B, P, and Z. Our tree provides some
evidence that A1 and Z share a common ancestor, and that B and P share
a common ancestor. Intriguingly, these four ancient proteins are scattered
throughout vertebrate genomes, rather than being a part of the expanded gene
region containing A1-A10. This suggests that the last common ancestor of jawed
vertebrates had a collection of four to five S100 proteins, but that only the region
containing A1-A10 then continued to expand with the radiation of the vertebrates.
Sentan—a close evolutionary relative to the S100 family that does not possess
38
the diagnostic pseudo EF-hand of true family members—also arose in the early
vertebrates. Given the ambiguity of the deepest branching of the tree, it is unclear
whether it is an out group or, instead, a duplication of an established S100 paralog.
The gene block containing A1-A10 expanded by what appears to be a set of
local gene duplication events. A13/A14 and A16 likely arose next, at least by the
ancestor of bony vetebrates. Like A10/A11, these genes were duplicated through
the whole genome duplications of teleost fishes, giving rise to multiple S100 genes
that are co-orthogolous to the human genes in bony fishes. The tetrapod paralogs
A13 and A14 did not arise until the amniotes, when they formed via duplication
from A13/A14. The next phase of expansion was local duplication that led to
the ancestors of A2-A6, A7-A12, and the S100-fused proteins in early tetrapods.
These founding genes then expanded across the tetrapods, with several duplicates
preserved in Sauropsids. The final mammalian complement was achieved by several
more duplications. The A7-A12 and S100-fused clades—which are directly adjacent
in mammalian chromosomes—continue to rapidly expand by duplication.
Transition metal binding is nearly universal across the family
With the phylogenetic tree in hand, we next set out to determine the
distribution of transition metal binding across the tree. Previously reported
transition metal binding is scattered across the tree (Fig 4, red proteins) [170, 171].
If this feature were ancestral, we predicted that transition metal binding would be
present across the majority of the tree. To test this hypothesis we used isothermal
titration calorimetry (ITC) to measure the ability of human S100 proteins to
bind to Zn2+ and Cu2+—the two most prevalent transition metals encountered
biologically—under approximately physiological conditions (125 mM ionic strength,
39
pH 7.4, 25◦C). We chose proteins that would maximize the sampling across clades.
Some of the proteins we selected have been reported to bind transition metals,
albeit with variable stoichiometry [177, 191]. The other paralogs have, to our
knowledge, yet to be characterized.
We found that Zn2+ and Cu2+ binding was universally distributed across
the tree: every single S100 protein we characterized bound to Zn2+ and/or
Cu2+ with low micromolar affinity (Fig 4 and Fig 26 in supplement, S3 Table in
supplementary directory) [105, 177, 192, 193, 194, 195, 196]. With one exception,
stoichiometry ranged from 1:1 to 3:1 (metal:monomer). These binding affinities
and stoichiometries are similar to previously measured transition metal binding
affinities for S100 proteins [171, 192, 195, 197]. Buffer-specific enthalpies ranged
from -5.4 to 6.1 kcal/mol; the majority of the enthalpies were negative. All of the
proteins tested bound to both Zn2+ and Cu2+, with the exception of A1 which did
not bind Cu2+ under our experimental conditions. The Zn2+ binding isotherm for
A6 and the Cu2+ binding isotherms for A2 and A4 were not well fit by standard
binding models (as is often observed for metal binding studies by ITC: [198]),
however, from the curves we could gain insight into their stoichiometry. The
A6/Zn2+ and A4/Cu2+ curves exhibited two phases, consistent with two binding
sites. The A2/Cu2+ curve was quite broad, consistent with >2 metals binding
per monomer. Representative binding isotherms for Zn2+ and Cu2+ to a variety
of S100 proteins—including the three problematic curves—are shown in Fig 26 in
supplement. All measured thermodynamic parameters are reported in S3 Table in
supplementary directory.
We next asked if the structural response to these metals, like the binding
constant, was consistent across the tree. We measured Zn2+-induced changes in
40
FIGURE 7 The human S100 paralogs are shown on the left, organized as on the
top of Fig 6. Asterisks indicate S100 proteins investigated in the current study; red
color indicates a protein for which transition metal binding has been noted in the
literature previously. Biochemical properties of the human paralogs are shown as
columns. Circles denote stoichiometry of binding for Cu2+ (orange), Zn2+ (gray),
and Ca2+ (blue). X indicates that the protein does not bind the metal; empty space
is unmeasured. Arrows indicate the change in far-UV CD signal with the indicated
metal: no change (black), increase (blue), and decrease (red). The transition metal
binding site is indicated as canonical (B-like) or alternate (some other site).
41
secondary structure by comparing the far-UV circular dichroism (CD) spectra
of these proteins with EDTA versus saturating Zn2+ (Fig 27 in supplement). We
found the response was variable across the family (Fig 4) [177, 192, 195, 199, 200,
201, 202, 203, 204, 205, 206]. For some proteins, Zn2+ induced a decrease in CD
signal (P, A2 and A4); in others, it had no effect (A1, A11, A5 and A6). We also
observed Zn2+-induced protein precipitation in the case of A14, which was rapidly
reversible by the addition of excess EDTA. We also asked whether the structural
response to Zn2+ exhibited by these proteins correlated with the response to their
canonical agonist Ca2+. We found that they were largely uncorrelated (Fig 7 and
Fig 27 in supplement). For example, P has decreased CD signal with both Zn2+
and Ca2+, while A2 shows decreased signal with Zn2+ and increased signal with
Ca2+.
When placed onto the phylogenetic tree, a few patterns in these responses
emerge (Fig 7). Phylogenetically close members of the family appear to display
similar structural responses to Zn2+ binding. For example, the closely related
A2 and A4 proteins show qualitatively similar decreases in CD signal in the
presence of Zn2+ relative to the apo form. Likewise, the far-UV CD signal of direct
sister proteins A5 and A6 is insensitive to Zn2+. This said, such patterns are not
universal. For example, B and P are directly sister but have opposite structural
responses to Zn2+. Further, family members exhibit all possible combinations of
increased and decreased CD signal with the addition of Ca2+ and Zn2+, revealing
the variability of this trait over evolutionary time.
42
Early-diverging tunicate S100s bind transition metals
Given that all human paralogs we characterized were capable of binding
transition metals, we predicted that this was a conserved, early feature of the
protein family. To test this prediction, we turned to two tunicate homologs,
which represent some of the earliest-diverging S100 proteins. We selected two
Oikopleura dioica proteins—tunA (tunicate A, CBY12809.1) and tunB (tunicate
B, CBY30360.1)—for characterization. Although the orthology of these proteins is
unclear, the proteins sample the breadth of tunicate S100 diversity, exhibiting only
26.2% identity. We expressed and purified these proteins, and then characterized
their metal binding features.
Because these proteins have not been characterized previously, we first
performed a baseline characterization to verify that they behave like other S100
proteins. We first measured Ca2+ binding. Like many other S100 proteins, both
tunA and tunB bound Ca2+ with nanomolar to micromolar dissociation constants
and 2:1 (per monomer) stoichiometry (Fig 8A and Fig 28 in supplement). Further,
both proteins exhibited changes in secondary and/or tertiary structure—as
measured by far-UV circular dichroism (CD) and intrinsic fluorescence—with
the addition of saturating amounts of Ca2+ (Fig 8B and 5C and Fig 28 in
supplement). All of the observed changes were strictly metal dependent and
reversible upon the addition of EDTA. Metal-dependent changes in conformation,
as reflected in these changes in spectroscopic signals, are a hallmark of S100
proteins [177, 207, 208, 209].
We then assessed the ability of these proteins to form homodimers—a
key feature of most S100 proteins—using native electrospray-ionization mass
spectrometry (nanoESI) [210]. For tunB, we detected homodimers (Fig 8D). The
43
FIGURE 8 Early-branching tunicate S100 binds transition metals at a non-
canonical site. Colors indicate the metal present during experiment: Zn2+ is
gray, Ca2+ is blue. A) Ca2+ binding to tunB by ITC. Top panel shows power
traces for injections; bottom curve shows integrated heats and model fit to
extract thermodynamic parameters. B) Far-UV circular dichroism spectra of
the apo protein (black), Ca2+ bound protein (blue), or Zn2+ bound protein
(gray). C) Intrinsic fluorescence spectra, with samples colored as in panel B. D)
Mass spectrum of tunB. Notes above each peak indicate molecular weight and
corresponding oligomeric state. E) Zn2+ binding to tunB by ITC, with subpanels as
in A. E) Homology model of tunB overlaid on crystal structure of human S100B
(PBD: 3ZCT). Ligating residues are shown as sticks, with C atoms shown as
spheres. A and B chains of the dimer are shown in orange and purple, respectively.
Zn2+ ion is shown as gray sphere. Top panel shows overlay, with box highlighting
the zoomed-up regions shown at right. Bottom left panel shows S100B structure
with Zn2+ chelation. Bottom left panel shows tunB homology model, highlighting
residues that would have to chelate Zn2+.
44
narrow distribution of relatively low charge states observed in the nanoESI mass
spectra for both the monomer and dimer ions indicate that the proteins are not
denatured under these conditions and undergo little unfolding during the ionization
process. The broad mass spectral peaks observed are the result of adduction of
residual sodium from solution that has survived buffer exchange. To see if the
dimer peaks were the result of non-specific aggregation during the electrospray
process, we measured dimerization at protein concentrations at which non-specific
dimerization is not expected (< 1 µM, see methods). We found homodimers, even
at 10 nM protein, consistent with a specific tunB dimer (Fig 29 in supplement).
We also observed a small amount of homotetramer; however, the tetramer was
not robust to dilution and is likely an artifact of the electrospray process (Fig
29 in supplement). For tunA, we detected homodimers; however, these were not
robust to dilution, suggesting that dimerization is relatively weak for this protein
(Fig 28 in supplement). We corroborated these observations for tunA and tunB
using a sedimentation velocity experiment (Fig 30 in supplement). Under these
conditions, we found that tunB was primarily a dimer. In contrast, tunA exhibited
both monomer and dimer species, consistent with this protein forming a weaker
dimer. Further work is required to determine the precise distribution of oligomeric
species in solution for these proteins; however, these results are consistent with
both proteins having the ability to form homodimers, like other S100 proteins [211].
We next turned our attention to Zn2+ binding. By ITC, both tunA and
tunB bound to Zn2+ with nM to µM affinity and stoichiometries of 2:1 (Fig 8E
and Fig 28 in supplement). We attempted to verify these stoichiometries by ESI-
MS; however, we were unable to disentangle specific from non-specific metal
adduction in these samples. We then measured the changes in secondary and
45
FIGURE 9 Human S100A5 does not bind transition metals at the same site as
B and the calgranulins. A) Mutated residues mapped onto the NMR structure
of Ca2+–bound human A5 (PDB: 2KAY). Dimer chains are colored purple and
orange. H17, C43, C79 and Ca2+ ions are shown as spheres. The location of H17
corresponds to the transition metal site in calgranulins and B (Fig 4B); C43 and
C79 are in different regions of S100A5. C) Binding free energies measured for Cu2+
(copper) and Zn2+ (gray) to human A5 and its mutants. Zn2+ binding constants
could not be extracted for the C43S and C43S/C79S proteins (*). C) Integrated
heats for ITC titration of Zn2+ onto A5 (black), A5/H17A (blue) and A5/C43S
(red). D) Integrated heats for ITC titration of Cu2+ onto A5 in the absence
(orange) or presence (green) of saturating (500 µM) Zn2+.
tertiary structure measured by far-UV CD and intrinsic tyrosine fluorescence.
Although both proteins bound Zn2+ tightly, only tunB displayed a pronounced
structural response, similar to that induced by Ca2+ binding (Figs 5B and 6C).
The secondary structure of tunA was insensitive to Zn2+ binding although the
protein displayed a moderate increase in intrinsic tyrosine fluorescence (Fig 28 in
supplement).
Transition metal binding occurs at independently evolved binding sites
The broad distribution of transition metal binding across human paralogs,
along with the observed transition metal binding in the early-branching tunicate
46
proteins, suggests that transition metal binding is an essentially universal property
of this family. We next sought to understand to what extent transition metal
binding across the family reflects a common binding site, or rather convergent
acquisition of metal binding on multiple lineages. Transition metal bind- ing
to S100 proteins has been extensively characterized in B and the “calgranulin”
clade (A7,A8,A9,A12,A15), where it occurs at the same site, using similar
ligating residues (Fig 4B). B is an ancient protein, arising at least as early as
the cartilaginous fishes (Fig 6). In contrast, the calgranulins arose ∼80 million
years later in the ancestor of amniotes (Fig 6). If the common site reflects shared
ancestry, we would expect to observe the same site across a wide variety of
descendants—possibly explaining the ubiquity of transition-metal binding across
the tree.
We first investigated the clade containing A2,A3,A4,A5, and A6. All
members of this clade possess a conserved histidine that, in B and the calgranulins,
coordinates transition metals (Fig 9A). We chose to investigate human A5,
as it binds to both Zn2+ and Cu2+ with 1:1 stoichiometry, and thus simplifies
identification of the binding site. We mutated His17 to Ala in human A5 and
measured metal binding of the mutant. Surprisingly the H17A mutation had
only a small effect on Zn2+ binding (1.3 +/- 0.3 to 3.0 +/- 0.1 µM), suggesting
it is not directly involved in the binding of Zn2+ in human A5. Additionally, this
mutation did not compromise Cu2+ binding (Fig 9B). Previous reports suggested
that Cys residues in the loop between helices 2 and 3, as well as those near the
N and C-termini, could play a role in binding divalent transition metals in this
clade [170, 177, 193]. We therefore mutated these residues to serine in A5 and
measured binding of Zn2+ and Cu2+ to the mutants. Mutating the C-terminal Cys
47
(C79S) had no effect on Cu2+ binding, but led to a drastic change in the Zn2+
binding curve (Fig 9C). The apparent stoichiometry of binding was drastically
reduced (∼0.1), which is consistent with only a small fraction of the protein
being competent to bind Zn2+. Additionally, the enthalpy of binding is mostly
ablated. These results clearly indicate that C79 is involved in Zn2+, but not Cu2+
binding. We attempted to ablate Cu2+ binding by also mutating the loop Cys
residue (C43S), but found that this double Cys mutant (C43S/C79S) still left
Cu2+ binding unaffected (Fig 9B). These results show that Zn2+ and Cu2+ not only
bind outside the B/calgranulin site, but bind at different sites on the same protein.
To confirm that these metals bind at different sites, we also measured binding of
Cu2+ to Zn2+-saturated human A5 and found no evidence of competition between
the two metals (Fig 6D). Finally, because mutating H17, C43, and C79 did not
disrupt Cu2+ binding, we hypothesized that the metal might bind at one of the
Ca2+ binding motifs. We therefore repeated the Cu2+ binding curve in the presence
of saturating (2 mM) Ca2+. We observed extensive aggregation, however, which
made interpretation of the ITC binding isotherm impossible. This suggests that
previously-noted antagonism between Ca2+ and Cu2+ [195] may be an artifact of
aggregation rather than true antagonism.
We next turned our attention to the tunicate protein tunB. This protein
behaves like a conventional S100 protein, forming a homodimer, binding to Ca2+
and changing its structure in response to metals (Fig 5A–5E). Further, it binds to
transition metals with a 2:1 stoichiometry. To determine if it could bind metals at
the canonical transition metal binding site, we constructed a homology model for
the protein and then inspected the residues that would form the S100B/calgranulin
binding pocket. These are Asp, Gln, Asn, and Lys (Fig 8F). The lack of a His or
48
Cys residue suggests this site is not capable of binding transition metals. Thus,
transition metal binding in this early-branching ortholog almost certainly occurs at
a different site.
Discussion
Our work provides a high-level view of the evolution of the S100 protein
family and the ability of its members to bind to divalent transition metals. Our
work provides the best-resolved phylogeny yet determined for this family. All
characterized human paralogs, as well as two early-branching tunicate S100
homologs, bind to transition metals with a physiologically relevant ∼µM binding
constant. On the other hand, different S100 proteins bind at different transition
metal binding sites. Thus, the apparently “conserved” feature of transition metal
binding actually reflects independent acquisition of metal binding on multiple
lineages. Further, the structural changes induced by transition metal binding are
variable, suggesting quite different mechanisms of binding and possible functional
consequences for different family members.
Transition metal binding occurs at independently evolved binding sites
Our work, combined with previous publications, reveals at least four
sites—and therefore four evolutionary origins—of transition metal binding in
the S100 family: the B/calgranulin site (Fig 4B), A5’s Cys-79 site (Fig 9), an N-
terminal Cys in A2 [177], and a unique glutamate-rich site in human A13 [176].
The plasticity of this feature is likely because of the relative ease, biochemically,
of creating transition metal binding sites [212, 213, 214]. A few amino acid
substitutions can create a new site, while a few other substitutions ablate an
49
existing site. This is similar to the evolutionary behavior of phosphorylation sites,
which can shift rapidly over evolutionary time [215]. Additionally, some of the
proteins may bind to transition metals in one of the Ca2+ binding motifs of an
S100. For example, Gribenko et al. proposed that human S100P may bind Zn2+ in
one of the Ca2+ binding motifs [216]. EF-hands often discriminate Ca2+ from Zn2+
and Cu2+, however, so this likely does not explain all of the observed transition
metal binding [176, 217, 218, 219].
Another feature of Zn2+ and Cu2+ binding in this family is that of variable
structural responses to the same metal. Even closely related S100 proteins undergo
different conformational changes when bound to a transition metal (Fig 7). This
likely allows different orthologs to play different functional roles in response to
transition metal binding. This can be seen for proteins that have been studied
in detail. For example, human A13, which binds Cu2+ at a unique site, has been
proposed to be involved in chaperoning Cu2+ as part of FGF release [172]. A9
provides another example of diverse responses to transition metals. When A9 is
alone, Zn2+ binding is strictly necessary for one function (TLR4-activation) [220],
but strongly inhibits another function (arachidnoic acid binding) [221]. This site is
modified in vivo through the formation of a heterodimer with A8, which changes
the ligating residues for one half of the site [105, 222]. This creates an extremely
high affinity site for Mn2+ and Zn2+ that inhibits bacterial growth by starving them
of these metals [105].
Much of the transition metal binding we have observed plays no known
role, but the observed binding constants (∼µM) are consistent with biological
concentrations of divalent transition metals. In particular, many S100 proteins are
found in the extracellular space [90], where Zn2+ concentrations can be high enough
50
to occupy these sites [223, 224]. We expect further roles of transition metal binding
to be identified in this family as it is further characterized [170, 171].
Expansion of the family
In addition to providing insight into the evolution of transition metal
binding, our phylogenetic analysis provides insight into the overall pattern of
expansion of the S100 protein family. Previous phylogenies used highly incomplete
taxonomic sampling and, with the exception of [95], distance-based phylogenetics
[91, 100, 179]. We used many more sequences, from many more taxa, and applied
a combined model-based/synteny analysis to better disentangle the history of
the family. Our work provides support for evolutionary relationships between
A13-A16, A2-A6, the calgranulins, the S100-fused proteins, and A10/A11 despite
the relatively weak support for these clades taken from a purely model-based
phylogenetic perspective. This also supports the previously proposed model of local
gene duplication [91, 100, 179].
Our work provides evidence for earlier origins of many S100 family members
than previously reported. For example, we found that the S100A2-A6 clade likely
arose in ancestor of all tetrapods, and that it had the complete mammalian
complement by the ancestor of amniotes. In contrast, Zimmer et al proposed this
clade arose in the ancestor of mammals [91]. Some orthologs (A1, B, P, and Z)
have likely been present since the last-common ancestor of vertebrates. Further,
we expect that many S100 proteins actually arose even earlier than our analysis
suggests. Despite having broader sampling than previous studies, our sampling
of tunicates, jawless fishes, and cartilaginous fishes was still relatively sparse.
Further, we relied heavily on transcriptomes, which likely underestimate the S100
51
complements for these organisms. As more genomic and transcriptomic datasets for
these species become available, we expect to observe even earlier origins of many of
the mammalian S100 orthologs.
Another difference between our tree and the published tree by Kraemer
et al. [95] is that we do not see radical, parallel expansion of the S100s in bony
fishes. Rather, most S100 proteins from the bony fishes are orthologous to
mammalian S100s. For example, we identified 15 S100 proteins in Takifugu
rubripes (pufferfish). All but two of them could be assigned as orthologs to
human proteins (Fig 6). This said, many of these do represent lineage-specific
duplications—likely via the whole genome duplications that have occurred in
teleost fishes—that are co-orthologous to human proteins. The difference between
our results and the previous phylogeny likely arises from our much broader
sequence sampling, as the Kraemer et al. dataset was strongly biased towards
sequences taken from teleosts [95].
Despite extensive taxonomic sampling, the phylogenetic tree we report
is not fully resolved: the deepest branches remain obscure. This is because of
the large amount of sequence divergence that has occurred between many S100
protein family members, their relatively short sequences, and the number of
orthologs make full resolution of this family quite challenging. Resolution can
likely be increased for individual subfamilies within the tree through even denser
sampling. For example, adding further aminotes may help resolve the relationships
between the amniote-specific clades identified in our analysis. We also believe
increasing the sampling of amphibians would be particularly powerful, as we relied
heavily on amphibian transcriptomes and likely missed S100 proteins. Better
characterization of S100 proteins from amphibians may help disentangle the origins
52
and relationships of some of the tetrapod-specific S100s (such as the calgranulins)
which are, as yet, difficult to resolve. Further, signal for these relatively recent
proteins could be boosted by using codon rather than amino acid substitution
models.
Conclusion
Our work reveals that transition metal binding is both ubiquitous and
evolutionarily labile within the S100 protein family. Many have noted that much
of the diversity of S100 function is determined by altered expression of family
members [100, 207, 225, 226, 227, 228, 229]; however, our work highlights that
these regulatory changes have also been accompanied by changes in sequence and
biochemistry. In particular, the ease of creating and destroying transition metal
binding sites has allowed rapid changes in this feature of S100 proteins. As a result,
new metal binding behavior can be exploited to achieve functional diversity in
the family [170, 171], even while Ca2+ binding and its induced structural changes
remain relatively conserved (Fig 7).
This biochemical diversification occurred rapidly during the expansion of the
S100 proteins, which are a relatively young protein family. The details of how
this diversification occurred are likely to encompass a rich evolutionary story.
As new S100s arose via gene duplication, were they required to maintain metal
binding while continuing to evolve? Or, have there been multiple cycles of loss and
subsequent regain over the course of S100 evolution? What was the exact nature of
metal-binding in the last common ancestor of all S100 proteins? Our observations
provide groundwork to begin to ask these questions.
53
Materials and Methods
Sequence Set
We generated a database of 564 S100 protein sequences, sampled from 52
chordate species, with an emphasis on even taxonomic sampling (S1 Spreadsheet).
Previous publications and preliminary database searches revealed S100 proteins
were restricted to the chordates, [91, 100, 95] so we selected specific chordate
species and characterized their S100 protein complements through extensive
BLAST searches [230]. We used human proteins as seed sequences (including
sentan and the S100-fused proteins, S1 Table in supplementary directory). No
published genome or transcriptome data were available for some species, so we
generated de novo transcriptomes from RNAseq data in the short reads archive
[231] using Trinity with default parameters [232]. The sources for our analysis are
shown in S2 Table in supplementary directory.
We removed duplicate sequences (>95% identity) from within each species
using cdhit [233], and removed sequences less than 45 amino acids long. We then
reverse BLAST’d all remaining sequences against the human proteome to verify
they encoded S100 proteins. We aligned the sequences using msaprobs [234]
followed by manual refinement in aliview [235]. Refinement was minimal and
consisted of truncating variable N-terminal and C-terminal extensions, as well as
several ambiguous indels. (We truncated the fused S100 protein sequences to 150
amino acids covering the S100 domain prior to alignment). The final alignment was
132 columns and had robustly aligned key columns (Fig 24 in supplement and S2
Fig in supplementary directory, S1 Alignment in supplementary directory).
54
Phylogenetic Trees
We generated the ML tree using phyml [236] with SPR moves starting from
the neighbor-joining tree. 10 random starting trees did not yield a higher likelihood
tree. We found LG+8 was the highest likelihood model [237]. We calculated aLRT-
SH supports for each node [238]. In pilot analyses, the tunicate sequences were
placed in random and unpredictable places on the tree (for example, coming out
with mammals or in other nonsensical places on the tree). We therefore excluded
them from the final ML analysis (S1 Tree in supplementary directory).
We generated a Bayesian phylogenetic tree using Exabayes [239]. We ran two
replicate MCMC runs starting from different random trees, each consisting of one
main and three heated chains. We stopped the runs after 10 million generations,
giving a final average split frequency of 3.97% and log likelihood ESS of 3,315. We
sampled substitution models in addition to trees, giving a final 99.8% posterior
probability for the JTT model [240]. We used uniform priors for all parameters. We
discarded the first 15% of the trees as burn-in and generated a consensus tree by
majority-rule, collapsing all nodes with posterior probabilities <50% (S2 Tree).
Molecular cloning and Protein Expression/Purification
S100 proteins were expressed from synthesized genes in a pET28/30
vector that had an N-terminal, TEV-cleavable His tag (Millipore). Proteins
were expressed in Rosetta (DE3) pLysS E. coli cells (Millipore). A saturated
overnight culture was used to inoculate 1.5 L cultures at 1:150 ratio. Bacteria were
grown to log-phase (OD600 ∼ 0.8–1.0) shaking at 37◦C, followed by induction
of protein expression in 1 mM IPTG for ∼16 hr at 16◦C. Cells were harvested
by centrifugation. Pellets were frozen at -20◦C, where they were stored for up
55
to 2 months. Cells were lysed by sonication in 25mM Tris, 100mM NaCl, 25mM
imidazole, pH 7.4.
Primary purification was done with a 5 mL HiTrap Ni-affinity column (GE
Health Science) on an A¨kta PrimePlus FPLC (GE Health Science), using a 25mL
gradient between 25 and 500 mM imidazole. Pooled fractions were then incubated
overnight at 4◦C in the presence of ∼1:5 TEV protease. This cleaves the His-
tag from the protein, leaving the amino acids Ser-Asn in front of the wildtype
starting methionine. Proteins were further purified by hydrophobic interaction
chromatography (HIC) using a 5 mL HiTrap phenyl-sepharose column (GE
Health Science). This step takes advantage of the Ca2+-dependent exposure of a
hydrophobic binding surface on the S100 proteins. Proteins were equilibrated with
2 mM CaCl2 and loaded onto the HIC column, followed by a 30mL gradient elution
in 25mM Tris, 100mM NaCl, 5mM EDTA, pH 7.4. Proteins were then dialyzed
into 4 L of 25 mM Tris, 100 mM NaCl, pH 7.4 buffer overnight at 4◦C. To remove
the small amount of uncleaved His-tagged protein present, proteins were then
passed over another 5 mL HiTrap Ni-affinity column and the flow through collected.
Finally, if any protein contaminants remained by SDS-PAGE, we performed a final
anion chromatography step using a 5mL HiTrap DEAE column (GE), 25mM Tris,
pH 7.0–8.5 (depending on protein) buffer with a 50mL gradient to 500 mM NaCl.
Purified proteins were dialyzed overnight against 2L of 25mM TES (or Tris),
100mM NaCl, pH 7.4, containing 2 g Chelex-100 resin (BioRad) to remove divalent
metals. Purity of final protein products were >95% by SDS PAGE and MALDI-
TOF mass spectrometry. Final protein products were flash frozen, dropwise,
in liquid nitrogen and stored at -80◦C. Typical protein yields were ∼20mg/L of
culture.
56
Protein characterization
Prior to all biophysical measurements, we thawed and exchanged all proteins
into an appropriate buffer by two serial NAP-25 desalting columns (GE Health
Science). We then used A280 to determine protein concentration using an empirical
extinction coefficient for each protein. To determine extinction coefficients, we first
used ProtParam [241, 242] to calculate the extinction coefficient for each protein
in 6 M GdmHCl (ε6MGdm). We then measured the difference in A280 for an
identical concentration of protein in native buffer versus in 6 M GdmHCl. We
could then estimate a native extinction coefficient using the relationship εnative =
ε6MGdm · A280,native/A280,6MGdm. For some proteins no correction from the predicted
extinction coefficient was necessary. Extinction coefficients used for calculation of
protein concentration are as follows: (hA5: 5540 M−1cm−1, hA6:5434 M−1cm−1,
tunA:1490 M−1cm−1, tunB: 5699 M−1cm−1, hA2:3230 M−1cm−1, hA4:3230
M−1cm−1, hA14:7115 M−1cm−1, hA1:8480 M−1cm−1, hA11:4595 M−1cm−1, hP:2980
M−1cm−1). We also corrected for scatter in all A280 measurements [243].
We performed ITC experiments in 25 mM buffer, 100mM NaCl at pH 7.4
that had been chelex-treated and filtered at 0.22 µm. We selected Tris or TES as
the buffering species on a case-by-case basis to ensure observable heats of binding.
We equilibrated and simultaneously degassed, either by application of vacuum to
the solution or by centrifugation at 18,000 × g at the experimental temperature
for 60 minutes. We dissolved metals (CaCl2, ZnCl2, or CuCl2) directly into the
experimental buffer immediately prior to each experiment. We performed all
experiments at 25◦C using a MicroCal ITC-200 or a MicroCal VP-ITC (GE Health
Sciences). Data were collected using low gain or no gain, with 750 rpm syringe
stir speed. Shot spacing ranged from 120s-2400s depending on gain settings and
57
relaxation time of the binding process. These setting were optimized on a per
protein basis. Data were fit to one or two site models using the Origin 7 software.
For binding curves with obvious 1:1 stoichiometry the one-site model in Origin was
used. For data with apparent 2:1 stoichiometry, evident from location of inflection
points in the data, a fit of the included two-site model was attempted. If the two-
site model could not be fit, we then used a single-site binding model with a floating
stoichiometry to extract an apparent binding constant across sites.
We collected far-UV circular dichroism data between 200–250 nm using
a J-815 CD spectrometer (Jasco) with a 1 mm quartz spectrophotometer cell
(Starna Cells, Inc. Catalog No. 1-Q-1). We prepared 20–50 µM samples in a
TES buffer identical to that used for ITC. We centrifuged at 18,000 x g in a
temperature-controlled centrifuge at the experimental temperature prior to
experiments. We collected 5 scans for each condition, and then averaged the
spectra and subtracted a blank buffer spectrum using the Jasco spectra analysis
software suite. We converted raw ellipticity into mean molar ellipticity using
the concentration and number of residues in each protein. We collected intrinsic
tyrosine and/or tryptophan fluorescence using a J-815 CD spectrometer (Jasco)
with an attached model FDT-455 fluorescence detector (Jasco) using a 1 cm quartz
cuvette (Starna Cells, Inc.). We prepared 5–20 µM samples exactly as we did for
our CD experiments. We collected 3–5 replicate scans for each condition, and
then averaged the spectra and subtracted a blank buffer spectrum (averaged from
10–15 buffer blank spectra) using the Jasco spectra analysis software suite. For all
spectroscopic measurements, we verified the reversibility of metal-induced changes
to the spectra by measuring the apo spectrum, adding the appropriate metal and
58
re-measuring the spectrum, and then adding excess EDTA and re-measuring the
spectrum.
Native electrospray ionization time-of-flight mass spectrometry (nano ESI-MS)
To prepare samples for mass spectrometry experiments small (∼200µL)
samples of the proteins used in MS experiments were dialyzed for at least 24 hr
against 2–4 L of either 10 or 100mM ammonium acetate, pH 7.4 to remove salt
and exchange into a more optimal buffer for MS. Samples were then diluted to
∼10µM in the dialysis buffer prior to experiment. All mass spectra were acquired
using a Waters Synapt G2-Si ion-mobility mass spectrometer equipped with a
nanoelectrospray (nanoESI) source and operated in Sensitivity mode. NanoESI
emitters were pulled from borosilicate capillaries (ID 0.78 mm) to a tip ID of
approximately 1µm using a Sutter Flaming-Brown P-97 micropipette puller.
3–5µL of sample were loaded into an emitter, a platinum wire was placed in
electrical contact with the solution, and a potential of +0.8–1.2 kV was applied
to the wire to initiate electrospray. The source temperature was equilibrated to
ambient temperature, trap and transfer collision voltages were set to 25 V and 5 V,
respectively, and the trap gas used was argon at a flow rate of 5 mL/min. Reported
spectra are the sum of ∼3 minutes of continuously-collected data. Mass calibration
was achieved using the series of Cs(CsI)n
1+ peaks produced from nanoESI of 0.1 M
aqueous cesium iodide (Aldrich).
We carefully controlled for spurious dimers in our nanoESI-MS experiments.
Non-specific dimers (and high-order oligomers) can arise if, by chance, more than
one monomer ends up in an electrospray drop. These non-specific aggregates are
expected to follow a roughly Poisson distribution of oligomeric states, governed
59
by the bulk concentration of monomers in solution. These non-specific species
can be distinguished from specific oligomeric species by measuring the mass
spectrum over a wide range of protein concentrations. Dimers observed at 10 µM
could be the result of non-specific interactions; dimers observed at 10 nM are
almost certainly not. This can be seen by considering the distribution of non-
specific species across drops. Under our instrumental conditions, electrospray
creates drops ∼100–200 nm in diameter, meaning that 10 nM protein solution
will yield drops that contain, on average, ∼0.003–0.025 protein molecules. Taking
the upper limit of 0.025 protein molecules per drop, one would expect only 0.2%
of drops to have non-specific dimers. Increasing to 100 nM protein takes this to
2.4% of drops. If one goes to 1 µM, non-specific dimers become quite significant
(25.6%), but this is accompanied by a large number of non-specific trimers (21.4%).
Although many factors, including relative ionization efficiency and instrumental
conditions, can affect the observed abundances of ions formed from electrospray,
these effects should be largely independent of initial solution concentration under
the instrumental conditions used here.
We interpreted the mass spectra shown in Fig 8D and S14 using this logic.
Mass spectra of proteins at low concentrations (10–100 nM) exhibit unexpectedly
abundant monomers and dimers, consistent with a specific dimer. Mass spectra at
high concentrations (1–10µM) exhibit dimers but not trimers, again consistent with
a specific dimer rather than non-specific, Poisson-governed aggregation in drops.
The small population of tetramer for tunB at 10 µM could either reflect a true
tetramer or a random partitioning of two dimers into an electrospray drop at this
high concentration.
60
Sedimentation velocity analytical ultracentrifugation
Samples were concentrated to ∼50µM and then dialyzed against 2L of 25
mM TES, 100mM NaCl, 1mM TCEP, pH 7.4) overnight at 4◦C using 6000–80000
MWCO dialysis tubing. Prior to sedimentation velocity experiments proteins were
then centrifuged at >18000 × g for 30 min. in a temperature-controlled centrifuge.
AUC experiments were performed at 50k x g in sector-shaped cells with sapphire
windows (Beckman) on a Beckman ProteomeLab XL-1 analytical ultracentrifuge.
Due to the low extinction coefficients of the proteins, sedimentation was monitored
using interference mode rather than absorbance at 280nm. Sedimentation
velocity data was fit numerically to the Lamm equation and the c(s) distribution
determined using SedFit [244, 245]. Estimated sedimentation coefficients and
molecular masses of species present in solution were calculated from the fits.
Homology model
The homology model of tunB was constructed using Modeller 9.17 [246] using
46 Ca2+ bound crystal structures (without bound peptide targets) as combined
template(PDB:1e8a, 1gqm, 1j55, 1k96, 1k9k, 1mho, 1mr8, 1odb, 1qlk, 1xk4, 1xyd,
1yut, 1yuu, 1zfs, 2egd, 2h2k, 2h61, 2k7o, 2kay, 2l51, 2psr, 2q91, 2wnd, 2wor, 2wos,
2y5i, 3c1v, 3cga, 3cr2, 3cr4, 3cr5, 3czt, 3d0y, 3d10, 3gk1, 3gk2, 3gk4, 3hcm, 3icb,
3iqo, 3lk0, 3lk1, 3lle, 3m0w, 3psr, 3rlz, and 4duq). Alignment was generated using
the PAIRWISE alignment method with default parameters. Model was generated
as a dimer, with the single tunicate sequence mapped to both the A and B chains.
Automodel was used to generate models, using default parameters. 20 models were
generated and the best selected by DOPE score. The final model had an RMSD of
61
0.65 A˚2 relative to the crystal structure of S100B bound to Ca2+ and Zn2+ (PDB:
3czt).
Bridge to Chapter IV
In this chapter, the phylogenetic tree of the S100 protein family was
reconstructed. Subsequently, the binding of transition metal ions to the proteins
was mapped onto the phylogeny by using a large set of human paralogs as
representative clade members. Some data were already available in the literature
and these were incorporated into the analysis. The results indicated that binding
of transition metal ions is an almost universally-conserved feature of the S100
family. However, binding stoichiometry, metal-driven conformational changes,
binding site ligands, and binding site location varied across the family. It was
thus concluded that binding of transition metals is a conserved feature of S100s
at the level of activity, but has diversified extensively at the biochemical level.
Two early-branching S100 proteins from the tunicate Oikopleura dioica were
also characterized for the first time. These proteins displayed all the hallmark
biochemical features of the more well-studied S100 proteins from higher metazoans:
homodimerization, calcium-binding, transition metal binding, and metal-ion driven
conformational changes. These results indicate that these canonical biochemical
features of the S100s are ancestral to the family. This chapter highlights the
striking evolutionary lability of an overall conserved biochemical feature, which
has broader implications for understanding the evolutionary meanderings of protein
traits. Chapter IV delves deeper into the biochemistry of metal binding in a specific
member of the S100 protein family. S100A5 is a calcium binding protein found in a
small subset of human tissues. Little is known about the biological roles of S100A5.
62
Previous work indicated that S100A5 displays antagonism between binding of Ca2+
and Cu2+ ions, which is one of the most commonly cited features of the protein.
The interplay between Ca2+ and Cu2+ binding by S100A5 is further characterized
in this chapter. It is shown that S100A5 can actually bind to both Ca2+ and Cu2+
simultaneously without antagonism. Furthermore, it is demonstrated that the
apparent antagonism observed in previous studies is likely due to aggregation of the
protein induced by binding of metals. This chapter highlights further the diversity
of biochemical modifications found in the S100 family and provides important
data on S100A5 that will be useful for other researchers trying to understand it’s
biological functions.
63
CHAPTER IV
HUMAN S100A5 BINDS CA2+ AND CU2+ INDEPENDENTLY
Author Contributions
Lucas Wheeler and Michael Harms conceived the study and designed the
experiments. Lucas Wheeler performed all experiments and data analysis. Michael
Harms secured funding for the work. Lucas Wheeler wrote the manuscript and
generated the figures. Both authors have read and approved the manuscript.
Abstract
S100A5 is a calcium binding protein found in a small subset of amniote
tissues. Little is known about the biological roles of S100A5, but it may be involved
in inflammation and olfactory signaling. Previous work indicated that S100A5
displays antagonism between binding of Ca2+ and Cu2+ ions—one of the most
commonly cited features of the protein. We set out to characterize the interplay
between Ca2+ and Cu2+ binding by S100A5 using isothermal titration calorimetry
(ITC), circular dichroism spectroscopy (CD), and analytical ultracentrifugation
(AUC).
We found that human S100A5 is capable of binding both Cu2+ and Ca2+
ions simultaneously. The wildtype protein was extremely aggregation-prone in the
presence of Cu2+ and Ca2+. A Cys-free version of S100A5, however, was not prone
to precipitation or oligomerization. Mutation of the cysteines does not disrupt
the binding of either Ca2+ or Cu2+ to S100A5. In the Cys-free background, we
measured Ca2+ and Cu2+ binding in the presence and absence of the other metal
64
using ITC. Saturating concentrations of Ca2+ or Cu2+ do not disrupt the binding
of one another. Ca2+ and Cu2+ binding induce structural changes in S100A5, which
are measurable using CD spectroscopy. We show via sedimentation velocity AUC
that the wildtype protein is prone to the formation of soluble oligomers, which are
not present in Cys-free samples.
S100A5 can bind Ca2+ and Cu2+ ions simultaneously and independently. This
observation is in direct contrast to previously-reported antagonism between binding
of Cu2+ and Ca2+ ions. The previous result is likely due to metal-dependent
aggregation. Little is known about the biology of S100A5, so an accurate
understanding of the biochemistry is necessary to make informed biological
hypotheses. Our observations suggest the possibility of independent biological
functions for Cu2+ and Ca2+ binding by S100A5.
Background
S100A5 is a member of the calcium-binding S100 protein family. The protein
is primarily homodimeric and is capable of binding one Ca2+ ion each at it’s
EF-hand and pseudo-EF-hand sites [207, 247]. S100A5 undergoes a notable
conformational change upon calcium-binding, resulting in the rotation and
extension of a helix [207]. This Ca2+-driven exposure of a hydrophobic surface
is the primary mode of signal transduction in the S100 proteins [94]. Through
interactions with metals and protein targets, S100s play a variety of biological roles
including control of cell proliferation, inflammatory signalling, and antimicrobial
activity [101, 92, 103, 89].
S100A5 is expressed primarily in the olfactory bulb and olfactory sensory
neurons (OSNs). Its expression is dramatically upregulated by odor stimulation
65
[248, 195, 249]. It has been proposed that S100A5 is actively involved in olfactory
signalling due to its expression profile [195]. Expression of the protein has also
been observed in a small number of other tissues [249]. It is used as a bio-marker
for several types of brain cancers and inflammatory disorders and appears to be
involved in inflammation via activation of RAGE [104, 247, 250]. Genetic work on
S100A5 has been minimal, which has limited our understanding of its biological
roles.
The first biochemical study of human S100A5 identified it as a novel Ca2+,
Cu2+, and Zn2+ binding protein [195]. The authors used flow-dialysis to measure
binding of the metal ions to the protein and concluded that S100A5 is capable
of binding four Ca2+ ions, four Cu2+ ions, and two Zn2+ ions per homodimer.
One of the most striking observations of that study was the strong antagonism
between the binding of Cu2+ and Ca2+ ions to the protein. This feature is one of
the most highly cited aspects of S100A5. Because little is known about the protein,
this fact is present in descriptions found across databases such as Uniprot, NCBI,
Wikigenes, and Genecards [251, 252, 253]. While most S100s are capable of binding
transition metal ions, antagonism with binding of Ca2+ is not known outside the
S100A5 lineage. Thus, this unique feature of S100A5 provoked speculation about
its possible biological implications [195, 170]. It was suggested that S100A5 might
act as a Cu2+ and Ca2+ regulated signal during olfaction or as a Cu2+ sink to
accommodate high Cu2+ concentrations in the olfactory bulb [195].
We sought to characterize this presumably important feature of S100A5 in
more detail. Previously, we characterized the binding of Cu2+ and Zn2+ to a large
number of S100 proteins including S100A5 [96]. Via ITC competition experiments,
we established that these two metals bind at different sites on the protein and
66
do not compete for binding [96]. We found that mutation of Cys43 and Cys79
lead to a loss of Zn2+ binding. In contrast neither of these residues was necessary
for binding of Cu2+. Due to the original report of Ca2+ /Cu2+ antagonism we
suspected that Ca2+ and Cu2+ may compete for the same sites on S100A5.
Here we report our study of the interplay between Ca2+ and Cu2+ binding by
S100A5. Using a Cysteine-free variant (C43S/C79S) of the protein, we show that
binding of Ca2+ and Cu2+ are not in fact antagonistic. The protein is capable of
binding the two metals—which induce notable structural changes—simultaneously
and independently. Furthermore, we establish that the Cysteine-containing (WT)
protein is prone to the formation of high-ordered oligomers in solution, while the
Cysteine-free variant is almost entirely dimeric. We suggest that this propensity
for formation of large oligomeric species and precipitation under our experimental
conditions may underlie the apparent antagonism observed in the original S100A5
report. Our results may suggest new biological roles for Cu2+ binding by this
protein.
Results
Ca2+ and Cu2+ binding to S100A5 are not antagonistic
Antagonism between Cu2+ and Ca2+ binding was previously identified as
a distinct feature of S100A5 relative to other S100 proteins [195, 170, 171]. We
hypothesized that Cu2+ and Ca2+ may bind using the same ligands, thus explaining
the antagonism as direct competition. It was suggested in the original paper that
Cu2+ and Ca2+ might share some ligands [195]. We performed ITC competition
experiments to test whether Cu2+ and Ca2+ directly compete. We titrated Cu2+
onto S100A5 in the presence of saturating Ca2+. However, these experiments were
67
FIGURE 10 Measurements of Cu2+ binding to wildtype S100A5 in the presence of
Ca2+ are difficult to interpret. Representative ITC trace showing Cu2+ titrated
onto Ca2+-bound wildtype S100A5. Inset shows raw data trace. Data were
characteristically noisy and the apparent fraction competent was systematically
low.
difficult to interpret due to extensive precipitation in the samples containing both
ions. ITC traces were very noisy and apparent stoichiometries were systematically
low (≈ 0.2), suggesting that a large portion of the protein sample was not
competent to bind Cu2+ (Figure 10). Together these observations suggested that
a metal-driven aggregation process could be occurring in our samples.
We found previously that neither of the two native Cys residues in S100A5
were required for Cu2+ binding [96]. We also noticed that—unlike the wildtype
protein—the Cys-free mutant did not precipitate in the presence of saturating Ca2+
and Cu2+. We thus sought to use ITC to characterize the interaction between
binding of the two metal ions using the Cys-Ser double mutant. Because some
68
of the metal-binding curves were complex and difficult to fit, we used a Bayesian
Markov Chain Monte Carlo sampler—as implemented in pytc—to estimate
thermodynamic parameters for all binding models [254]. We also included a floating
“fraction competent” parameter to capture uncertainty in the relative protein and
metal concentrations (following SEDPHAT [255]). This was necessary because a
number of factors make it difficult to obtain accurate estimates of concentrations
for components of this system. S100A5 has no tryptophan residues and, therefore,
a low extinction coefficient that makes absorbance-based concentration estimates
unreliable. Further, water absorption by dry metal salts, as well as interactions
between metal ions and buffer, can also make estimates of metal concentration
difficult. Because of these of uncertainties, ITC has been noted to provide poor
estimates of stoichiometry for protein metal binding [256].
We first used ITC to remeasure binding of Cu2+ ions to the apo form of the
S100A5 double mutant. We found the Cu2+ binding data was best described with a
single-site binding model. In line with our previous observations, the protein bound
Cu2+ with a Kd(µM) that had a 95% credibility region of 0.94 ≤ 1.81 ≤ 3.90
(Figure 11A, Table 1). We next measured the binding of Cu2+ in the presence
of saturating Ca2+. Ca2+ had no detectable effect on the binding of Cu2+ to the
protein, giving a Kd(µM) of 0.65 ≤ 0.96 ≤ 1.47 (Figure 11B; Table 1).
We next performed the inverse set of experiments. We used ITC to measure
the binding of Ca2+ to the protein in the apo and Cu2+ saturated forms. For
each condition, we used four different titrant/stationary ratios to better resolve
the complex Ca2+ binding curve and then globally fit a binding model to all four
datasets (Figure 11C). This binding curve had two distinct phases and could be
fit with a two-site binding polynomial (Figure 11C). These Ca2+ binding curves
69
FIGURE 11 S100A5 can bind Ca2+ and Cu2+ simultaneously without antagonism.
Plots show integrated data and global Bayesian fits from replicate isothermal
titration calorimetry experiments: a) Cu2+ binding to apo protein, b) Cu2+ binding
to Ca2+-saturated protein, c) Ca2+ binding to apo protein, and d) Ca2+ binding
to Cu2+-saturated protein. Points are integrated titration shots. Lines are 100
curves drawn from the posterior distribution of the MCMC samples. For Cu2+
binding experiments technical replicates are shown in blue and red. Ca2+ binding
experiments were performed with fixed protein concentration and four different
titrant/titrate ratios: 8X (blue), 10X (purple), 15X (red), and 18X (green). For
clarity Y-axes display total heat per shot, so that curves from different titrant
concentrations fall on different areas of the graph. Raw data corresponding to these
integrated heats are displayed in figure 31 in supplement.
70
presented a challenging model-fitting problem due to the complex shape of the
curve. The individual enthalpies and binding constants may therefore be under-
determined in our analysis. To resolve realistic parameter values from the binding
polynomial model, we constrained the dilution heat and dilution intercept in the
Bayesian fit to reasonable values.
TABLE 1 Table contains values for key parameters determined via global fits of
ITC data using the Bayesian MCMC fitter in pytc. 95% credibility regions from the
posterior distributions are reported for parameter values. ∆H values are reported
in kcal ·mol−1, Kd values in µM . Final parameter is fraction competent, a nuisance
parameter that captures what fraction of the metal and protein in solution are
competent for the measured reaction.
ion Cu2+ Ca2+
competitor none Ca2+ none Cu2+
∆H1 −5.7 ≤ −3.4 ≤ −2.8 −4.1 ≤ −3.6 ≤ −3.2 −1.8 ≤ −1.4 ≤ −0.7 −1.5 ≤ −1.2 ≤ −0.7
∆H2 — — −5.7 ≤ −4.6 ≤ −3.7 −5.0 ≤ −4.1 ≤ −3.4
Kd,1 0.9 ≤ 1.8 ≤ 3.9 0.7 ≤ 1.0 ≤ 1.5 0.4 ≤ 0.5 ≤ 2.7 0.03 ≤ 0.2 ≤ 2.2
Kd,2 — — 1.9 ≤ 6.3 ≤ 34.9 1.9 ≤ 10.5 ≤ 100
fx. comp. 1.40 ≤ 1.43 ≤ 1.47 1.15 ≤ 1.17 ≤ 1.19 0.61 ≤ 0.66 ≤ 0.69 0.54 ≤ 0.58 ≤ 0.61
We observed one high-affinity site (Kd(µM): 0.14 ≤ 0.46 ≤ 2.68) and one
lower-affinity site (Kd(µM): 1.85 ≤ 6.33 ≤ 34.88). The values were roughly
consistent with those reported in the literature [247]. The presence of saturating
Cu2+ did not inhibit the binding of Ca2+ ions (Figure 11D; Table 1). The Kd value
of the low affinity site (Kd(µM): 0.03 ≤ 0.18 ≤ 2.16) was not distinguishable within
uncertainty from that of the apo protein. The Kd of the high affinity site (Kd(µM):
1.86 ≤ 10.46 ≤ 100.3) is similarly indistinguishable from that for the apo protein
(Table 1). Our results clearly demonstrate that Ca2+ and Cu2+ ions do not display
strong antagonism when binding to S100A5.
S100A5 is prone to oligomerization and metal-driven aggregation
We hypothesized that the metal-driven aggregation process observed in our
ITC experiments with the wildtype protein contributed to the apparent antagonism
71
that was previously reported. To further examine this aggregation process we used
sedimentation velocity AUC to test for the presence of oligomers in solution. We
hypothesized that the oligomerization of the wildtype protein was driven by the
presence of Cysteine residues. Due to the presence of Cu2+ in some samples we
were unable to use a reducing agent in either the ITC or AUC experiments.
We performed sedimentation velocity AUC experiments on both the
wildtype and Cys-Ser double mutant proteins in both the apo form and the form
loaded simultaneously with Cu2+ and Ca2+. We fit the Lamm equation to the
sedimentation data using SedFit to calculate the c(s) distribution of the protein
in each condition [244, 245]. We found that apo S100A5 formed high-ordered
oligomers, ranging to at least dodecamers (Figure 12A). Addition of Cu2+ and Ca2+
caused a large amount of precipitation in wildtype S100A5 that we removed by
extensive centrifugation prior to loading the cell. The remaining soluble protein
was indistinguishable from the apo protein (Figure 12). In contrast, when we
performed the same experiments with the Cys-Ser double mutant we found that
the protein was primarily dimeric in solution (Figure 12C, 12D), even with the
addition of Cu2+ and Ca2+. However, monomers were also detectable in the double
mutant samples. The monomer peak appears to be more prominent in the apo-
protein sample than in the sample saturated with Cu2+ and Ca2+, suggesting that
binding of metals may stabilize the dimeric form (Figure 12C, 12D). Our AUC
results clearly demonstrate that oligomerization of S100A5 is driven by the native
cysteine residues, which also likely cause the visible aggregation we observed in the
ITC experiments. This observation strongly suggests that the previously-reported
apparent antagonism between Ca2+ and Cu2+ was due to oligomerization and/or
aggregation.
72
FIGURE 12 Wildtype S100A5 is prone to the formation of high-ordered oligomers.
Sedimentation velocity AUC distribution plots showing a) apo wildtype S100A5,
b) wildtype S100A5 saturated with Cu2+ and Ca2+, c) apo Cys-Ser double mutant,
and d) Cys-Ser double mutant saturated with Cu2+ and Ca2+. Data are normalized
to the same scale. Homodimers are the peaks near s = 2. The Cys-Ser double
mutant plots show evidence of some monomer (peak near s = 1) in solution.
73
Binding of Ca2+ and Cu2+ induce reversible changes in S100A5 secondary
structure
One hallmark feature of the S100 proteins is the change in secondary
structure observed upon binding of metal ions [96, 166, 199]. Metal-induced
conformational changes expose a binding interface that can bind downstream
targets and regulate their activities [94, 247]. In the original publication on S100A5
biochemical characterization, the authors found that the secondary structure of the
protein is insensitive to the binding of metal ions [195]. However, we previously
found that binding of Ca2+ ions to wildtype S100A5 induces a significant (≈25%)
reversible increase in α-helical secondary structure, which is consistent with
the changes observed in published NMR data [207, 247]. Due to instantaneous
sample precipitation, we were unable to reliably measure structural changes of
wildtype S100A5 in the presence of Cu2+. However, the Cys-Ser double mutant
protein alleviates this issue. We collected far-UV circular dichroism spectra of
the mutant protein in the apo form and bound to Ca2+, Cu2+, and Ca2+ and
Cu2+ simultaneously. The Cys-Ser mutant displays a notable increase in alpha-
helical signal (222nm) upon binding of Ca2+, identical to the wildtype protein.
Interestingly, addition of Cu2+ also induces an increase in α-helical signal that
is approximately half of that induced by Ca2+. The spectrum of S100A5 bound
simultaneously to both metals is identical to that of the Ca2+-bound form (Figure
13). This structural change is not due to oligomerization, as the protein remains
a dimer under these conditions by AUC (Figure 13D). All the metal induced
structural changes were instantly reversible by the addition of a molar excess of
EDTA. These results may help to explain the minor differences—such as larger
enthalpy—in Ca2+ binding to the Cu2+ bound form of the protein, which may
74
FIGURE 13 Ca2+ and Cu2+ induce increases in α-helical secondary structure
measured by far UV circular dichroism. Curves show mean molar ellipticity vs.
wavelength for each experimental condition: Apo (gray), bound to Cu2+ (orange),
bound to Ca2+ (blue), and bound to both Cu2+ and Ca2+ (red).
be due to moderate structural differences from the apo-protein. Despite the lack
of antagonism between binding affinities for Ca2+ and Cu2+ ions, there is still
indication of some structural interplay between the two metals.
Discussion
S100A5 is one of the lesser-known members of the S100 protein family.
Its expression pattern is very narrow and its biological functions are mostly
uncharacterized. However, it has been the target of multiple biochemical studies
that have sought to characterize the properties of the protein itself. Binding
of metals and proteins to S100A5 have been studied using various techniques
75
[195, 96, 207, 104, 247]. X-ray crystallography and NMR have been used to solve
structures of both apo and Ca2+ bound forms of the protein [207, 257]. Despite
the available biochemical data aspects of S100A5 have remained ambiguous. For
example, the stoichiometry of transition metal binding and structural responses to
metal binding have been variably reported [195, 96].
One of the most noted features of S100A5 is the strong antagonism between
binding of Ca2+ and Cu2+ ions. This feature is reported in the gene descriptions
found in many databases [252, 253, 251]. In this study we set out to characterize
this unique feature of S100A5, hypothesizing that it was due to competition
between the two metals for shared ligands. However, we found an absence of
direct binding antagonism between Ca2+ and Cu2+. Neither metal ion affects the
binding constant for the other. Instead, we observed a propensity of the protein for
oligomerization and metal-induced aggregation. It is possible that the reduction of
binding-competent protein caused by this aggregation process was interpreted in
the original flow dialysis study of S100A5 as antagonism between Ca2+ and Cu2+.
We also report notable changes in the secondary structure of S100A5 upon binding
of both Ca2+ and Cu2+, which is contrary the original report that S100A5 structure
is insensitive to the binding of metals.
One intriguing implication of our observations is that the Cu2+ binding site of
S100A5 must be quite distinct from that of other S100 proteins. Ca2+ and Cu2+
clearly do not share ligands, or there would be evidence of competition in our
ITC experiments. Cysteine residues are thought to be involved in metal-binding
in some other S100s [177, 170] and we previously showed that the Cys-free mutant
of S100A5 displays compromised Zn2+ binding [96]. However, neither native Cys
residue of S100A5 is required for Cu2+ binding. Furthermore, we showed that Zn2+
76
and Cu2+ do not share ligands, as they do not compete at all in ITC experiments
[96]. In addition, mutation of His17—which is present in the canonical transition
metal site of many S100s—also had no effect on Cu2+ binding in S100A5 [96]. The
results presented here with the Cys-free mutant also clearly rule out the possibility
of oligomer-dependent Cu2+ binding, such as could be achieved by the formation
of a new site in a high-order oligomeric species. Thus, we still have no clues as to
where Cu2+ ions bind on S100A5. Further characterization—such as via scanning
mutagenesis—will be necessary to determine the identity of Cu2+ ligands.
Biological roles for the binding of transition metals have been established
for some S100s and suggested for many others [170, 171, 105, 177, 172]. The
binding constants that we measured for Ca2+ and Cu2+ suggest the possibility of
physiologically relevant interactions in some tissues. Free Ca2+ concentrations in
rat olfactory neurons reach ≈ 2 µM during nerve stimulation [258]. Likewise, pools
of Cu2+ are released in and around olfactory neurons during signaling, reaching
concentrations as high as 10 µM in the synapse [259, 260, 223, 261]. Further,
despite high Cu2+ concentrations, the olfactory bulb in rats does not have elevated
expression of the typical copper chaperone metallothionein [262]. It has been
suggested that S100A5 may play a role as a Cu2+ buffer or chaperone in OSNs
during olfactory signaling [195]. The fact that Cu2+ is able to induce structural
changes in S100A5 suggests it could play a more active role: S100A5 could actually
respond to Cu2+ and propagate a resulting signal by interacting with downstream
targets.
Due to lack of antagonism, Cu2+ dependent functions could be achieved even
in the presence of saturating Ca2+ levels. Furthermore, there could be synergistic
functional roles for binding of Ca2+ and Cu2+. For example, if S100A5 is acting as
77
a Cu2+ chaperone, binding of Ca2+ could facilitate binding of protein targets—via
exposure of the hydrophobic binding interface—to which Cu2+ is being delivered.
Furthermore, S100A5 is capable of binding Zn2+ ions—which are also at high
concentration in the olfactory bulb—with similar affinity to Cu2+ [261]. Zn2+ and
Cu2+ also bind noncompetitively and thus all three metals could potentially engage
in synergistic activities [96].
One final possibility is that the oligomerization process we observed in
this study may actually have a biological function. Wildtype S100A5 is prone
to the formation of oligomers even in the apo form and is subject to extensive
aggregation in solutions containing Ca2+ and Cu2+ even at relatively low protein
concentrations. Roles for metal-driven oligomerization in S100s have been
suggested previously [89, 211, 216, 263]. It is conceivable that Ca2+ and Cu2+ drive
oligomerization of S100A5 in cells to facilitate a biological function, but further
experiments would be required to determine if this process occurs in the reducing
environment of the cell at physiologically-relevant concentrations of S100A5, Ca2+
and Cu2+.
Future experiments are needed to elucidate the biochemical features and
biological functions of S100A5 that remain unknown. It will be important to
identify the Cu2+ ligands in S100A5 to fully understand the biochemical interplay
between the binding of various biologically relevent metals. To understand how
Ca2+, Cu2+, and Zn2+ contribute to the biological activity of S100A5, experiments
should be targeted at directly testing how these metals interact with the protein
in vivo. The identification of more S100A5 biological targets and an increase in
functional studies will be required to determine the chief roles of S100A5 in it’s
cellular environment.
78
Conclusions
Antagonism between binding of Ca2+ and Cu2+ ions to S100A5 is one of the
most oft-cited aspects of this protein. Several possible biological roles have been
suggested. Using careful biophysical characterization, we discovered that binding
of Ca2+ and Cu2+ ions is not antagonistic. A Cys-free mutant version of the
protein makes measurements of metal binding using ITC possible and shows that
the protein is capable of binding both metals simultaneously and independently.
Rather than binding antagonism, it appears that the wildtype protein is prone to
oligomerization and aggregation and that these behaviors may have contributed
to the original interpretation. Furthermore, we also measured the effects of Ca2+
and Cu2+ binding on S100A5 secondary structure and found that both metals are
capable of inducing increases in α-helical secondary character. These results also
contrast the original report on S100A5 [195], but are consistent with previously
published NMR data [207]. The ability to bind Ca2+ and Cu2+ independently as
well as the structural response to Cu2+ may suggest new Cu2+ dependent biological
roles for S100A5.
Methods
Protein expression and purification
We previously generated the 6-histidine-tagged cysteine double-mutant
construct in a pet28/30 vector [96]. In this study, the protein was expressed
and purified using the same protocol detailed in the previous publication.
Briefly, the protein was expressed in a 1.5L culture of Rosetta (DE3) pLysS
cells (Millipore). Cells were lysed by sonication and treatment with DNase and
79
lysozyme. Subsequently, the tagged protein was purified using HisTrap Ni2+
affinity columns (GE). The tag was then cleaved using TEV protease and the
cleaved protein was further purified using Ca2+-dependent hydrophobic interaction
chromatography. Finally, the sample was run over a second HisTrap Ni2+ affinity
column to remove any uncleaved protein. The purified protein was dialyzed with
6000-8000 MWCO tubing (Fisher) against 2L 25 mM Tris, 100 mM NaCl, pH 7.4
with 2g chelex resin (BioRad). The dialyzed protein was filter-sterilized (0.22 µm),
flash-frozen dropwise in liquid nitrogen, and stored at −80◦C. We experimentally
determined the extinction coefficient (5002M−1cm−1) of the Cys-Ser double
mutant. We measured the A280 of the protein at the same concentration in both
buffer and denaturing 6M GdHCl (Sigma). We used ProtParam [241] to predict
an extinction coefficient for the protein based on sequence and then calculated the
corrected coefficient using the equation εnative = ε6MGdm · A280,native/A280,6MGdm.
Concentration measurements were also corrected for scattering in samples [243].
Due to the low extinction coefficient of the protein, concentration is difficult to
measure with high confidence, even with this careful protocol.
Isothermal titration calorimetry
Samples were prepared in 25 mM TES (Sigma), 100 mM NaCl (Thermo
Scientific), buffer at pH 7.4. Protein was thawed from a frozen stock and exchanged
into the experimental buffer using NAP-25 desalting columns (GE Healthcare).
For competition experiments the experimental buffer also contained either 1 mM
CaCl2 (Sigma) or 0.25 mM CuCl2 (Sigma). Titrant solutions were prepared in
matching experimental buffer to ensure identical conditions to titrate. Anhydrous
CaCl2 or CuCl2 was dissolved directly in the buffer and diluted to the appropriate
80
concentration immediately prior to experiments. Fresh stocks were made for each
set of experiments. Experiments were performed with 50-80 µM protein at 25◦C.
Two technical replicates of each Cu2+ binding experiment were performed. To
resolve the complex Ca2+ binding curves, four Ca2+ binding experiments were
performed using four different concentrations of titrant. Raw data were integrated
using the NITPIC software package—which allows uncertainty in the baseline—
and the integrated heats were exported in standard SedPhat format [264]. We then
used the Bayesian MCMC iterator included in pytc to estimate model parameters
against all experiments simultaneously [254]. We used the maximum likelihood
estimate as a starting point and then explored the likelihood surface with 100
walkers, each taking 20,000 steps. We discarded the first 10% of steps as burn
in. We restricted parameters against all experiments simultaneously. We verified
convergence by performing the sampling procedure several times. A single site
binding model was used for Cu2+ titration data and a two-site binding polynomial
was used for Ca2+ titration data [265, 266]. For Ca2+ binding fits, we constrained
the dilution heat and dilution intercept to between -3.0—0.0 kcal/mol and 0—
10,000 kcal/mol/shot respectively. All other priors were uniform.
Sedimentation velocity analytical ultracentrifugation
Experiments were done in 25 mM TES (Sigma), 100 mM NaCl (Thermo
Scientific), 100 µM EDTA at pH 7.4 with the appropriate metal added directly
to the buffer during preparation. Metals were added to a final concentration
of 250 µM . Samples were prepared at 40 µM in the appropriate experimental
buffer by overnight dialysis (6000-8000 MWCO) against 2L at 4◦C. Before
ultracentrifugation samples were centrifuged at 18, 000× g at 4◦C in a temperature-
81
controlled centrifuge for 30 minutes. Ultracentrifugation was done with sapphire
windows at 50, 000 × g in sector-shaped cells (Beckman) on a Beckman
ProteomeLab XL-1. Sedimentation was monitored using interference mode rather
than absorbance at 280nm due to the low extinction coefficient of S100A5. The
Lamm equation was fit to the sedimentation data—using SedFit—to calculate the
continuous c(s) distribution [244, 245]. Estimated sedimentation coefficients of the
species present in solution were calculated from the numerical fits.
Circular dichroism spectroscopy
Far-UV circular dichroism spectra (200250nm) were collected on a J-815 CD
spectrometer (Jasco) with a 1 mm quartz cell (Starna Cells, Inc.). We prepared
50 µM samples in a Chelex (Bio-Rad) treated, 25 mM TES (Sigma), 100 mM NaCl
(Thermo Scientific), 100 µM EDTA, buffer at pH 7.4. Samples were subsequently
diluted to 25 µM in buffers containing: no metal (apo), 1 mM Ca2+, 1 mM Cu2+,
or both 1 mM Ca2+ and 1 mM Cu2+ —all prepared in the stock buffer above.
Samples were centrifuged at 18, 000 × g at 25◦C in a temperature-controlled
centrifuge (Eppendorf) before experiments. Spectra were collected at 25◦C in
a Jasco peltier multi-cell sample unit. Reversibility of metal-induced structural
changes was confirmed by adding a molar excess of EDTA to the metal-saturated
samples and repeating spectra collection. In all cases, addition of EDTA returned
the samples to the apo state. Five scans of each condition were collected. These
scans were then averaged—using Jasco spectra analysis software—to minimize
noise. Buffer blank spectra were generated for each condition. Applicable blanks
were subtracted in the Jasco spectra analysis software. Blank-corrected data were
exported as text files and raw signal was converted into mean molar ellipticity using
82
the concentration and the number of residues (Nres = 95) in our S100A5 construct
using the equation: MME = CDsignal/c(M) · 10 · L(cm) ·Nres.
Bridge to Chapter V
In this chapter, the metal binding behavior of S100A5 was characterized
biophysically. A misunderstood aspect of S100A5 biochemistry was resolved. It
has long been thought that S100A5 exhibits strong antagonism between the binding
of Ca2+ and Cu2+ ions. However, it is demonstrated here that this antagonistic
behavior was likely an artifact of techniques used in the original biochemical study
of S100A5. Instead, it is shown that the protein is prone to the formation of high-
ordered oligomeric species. By eliminating this oligomerization process with point
mutations the metal binding behavior of S100A5 could be characterized. The
protein is capable of binding Ca2+ and Cu2+ ions simultaneously. Additionally,
the two metals were observed to induced distinct conformational changes in the
protein. This chapter is an important addition to the S100 literature, because it
overturns an erroneous paradigm and suggests new biological roles for Ca2+ and
Cu2+ ion binding by S100A5. Chapter 6 turns to another basic biochemical feature
of the S100 protein family; binding of small peptide regions of target proteins. Two
S100 proteins, S100A5 and S100A6, that arose via gene duplication in the ancestor
of amniotes are used as a model to study the diversification of binding specificity in
duplicate lineages. Ancestral sequence reconstruction is used to resurrect the last
common ancestor of all S100A5 and S100A6 proteins, allowing the evolutionary
history of binding specificity to be directly characterized. The proteins are shown
to display conserved specificity at the level of gene clades and both lineages
appear to have subfunctionalized relative to the ancestor. The work in chapter
83
VI demonstrates that proteins with low biochemical specificity nonetheless display
evolutionary patterns consistent with those observed in highly-specific proteins
following gene duplication.
84
CHAPTER V
CONSERVATION OF PEPTIDE BINDING SPECIFICITY IN S100A5 AND
S100A6
Author Contributions
Lucas Wheeler and Michael Harms conceived the study and designed the
experiments. Lucas Wheeler, Jeremy Anderson, Anneliese Morrison, and Caitlyn
Wong performed the experiments. Lucas Wheeler and Michael Harms analyzed
the experimental datasets. Michael Harms secured funding for the work. Michael
Harms and Lucas Wheeler wrote the manuscript and generated figures. All authors
have read and approved the manuscript.
Abstract
S100 proteins bind linear peptide regions of target proteins and modulate
their activity. The peptide binding interface, however, has remarkably low
specificity and can interact with many target peptides. It is not clear if the
interface discriminates targets in a biological context, or whether biological
specificity is achieved exclusively through external factors such as subcellular
localization. To discriminate these possibilities, we used an evolutionary
biochemical approach to trace the evolution of paralogs S100A5 and S100A6. We
first used isothermal titration calorimetry to study the binding of a collection of
peptides with diverse sequence, hydrophobicity, and charge to human S100A5 and
S100A6. These proteins bound distinct, but overlapping, sets of peptide targets.
We then studied the peptide binding properties of S100A5 and S100A6 orthologs
85
sampled from across five representative amniote species. We found that the pattern
of binding specificity was conserved along all lineages, for the last 320 million
years, despite the low specificity of each protein. We next used Ancestral Sequence
Reconstruction to determine the binding specificity of the last common ancestor
of the paralogs. We found the ancestor bound the whole set of peptides bound by
modern S100A5 and S100A6 proteins, suggesting that paralog specificity evolved
by subfunctionalization. To rule out the possibility that specificity is conserved
because it is difficult to modify, we identified a single historical mutation that,
when reverted in human S100A5, gave it the ability to bind an S100A6–specific
peptide. These results indicate that there are strong evolutionary constraints on
peptide binding specificity, and that, despite being able to bind a large number of
targets, the specificity of S100 peptide interfaces is indeed important for the biology
of these proteins.
Introduction
Many proteins have low specificity interfaces that can interact with a wide
variety of targets [267, 84, 81, 87, 268, 269, 270, 85, 145, 79, 82]. Such interfaces are
difficult to dissect. Crucially, it is not obvious that their specificity is biologically
meaningful: maybe such proteins are essentially indiscriminate, and biological
specificity is encoded by external factors such as subcellular localization or
expression pattern [83, 81, 93].
An evolutionary perspective allows us to probe whether specificity is,
indeed, an important aspect of these interfaces [43]. If there are functional and
evolutionary constraints on binding partners, we would expect conservation of
binding specificity similar to that observed for high–specificity protein families
86
[52, 53]. In contrast, if specificity is unimportant, we would expect it to fluctuate
randomly over evolutionary time. Further, previous work on the evolution
of specificity has revealed common patterns for the evolution of specificity
[271, 76, 47], including partitioning of ancestral binding partners among descendant
lineages [58, 272, 55, 273] and transitions through more promiscuous intermediates
[57, 79, 143]. If low–specificity proteins exhibit similar patterns, it is strong
evidence that the low specificity interface has conserved binding properties, and
that the interface makes a meaningful contribution to biological specificity.
S100 proteins are an important group of low–specificity proteins [100, 89].
Members of the family act as metal sensors [172], pro–inflammatory signals
[274, 156, 101, 104], and antimicrobial peptides [105]. Most S100s bind to linear
peptide regions of target proteins via a short hydrophobic interface exposed
on Ca2+–binding (Fig 14A). S100s recognize extremely diverse protein targets
[94, 89, 275]. No simple sequence motif for discriminating binders from non–binders
has yet been defined. The breadth of targets is much more extreme than other
low–specificity proteins such as kinases and some hub proteins, which recognize
well–defined, but degenerate, sequence motifs [267, 81, 269, 79, 82].
We set out to determine whether there was conserved specificity for two S100
paralogs, S100A5 and S100A6. These proteins arose by gene duplication in the
amniote ancestor ≈ 320 million years ago [276, 96]. S100A6 regulates the cell
cycle and cellular motility in response to stress [277]. It binds to many targets
including p53 [278, 279], RAGE [101], Annexin A1 [275], and Siah–interacting
protein [280]. A crystal structure of human S100A6 bound to a fragment of Siah–
interacting protein revealed that peptides bind via the canonical hydrophobic
interface shared by most S100 proteins [280]. The biology of S100A5 is less well
87
understood. It binds both RAGE [101, 104] and a fragment of the protein NCX1
[281] at the canonical binding site. It is highly expressed in mammalian olfactory
tissues [282, 283, 284], but its specific targets and their biological roles are not well
understood.
Using a combination of in vitro biochemistry and molecular phylogenetics,
we addressed three key questions regarding the evolution of specificity in S100A5
and S100A6. First: do the two human proteins exhibit specificity relative to one
another? Second: is the set of binding partners recognized by each protein fixed
over time, or does the set of partners fluctuate? And, third: do we see similar
patterns of specificity change after gene duplication for these low–specificity
proteins compared to high–specificity proteins? Unsurprisingly, we find that
S100A5 and S100A6 both bind to a wide variety of diverse peptides. Surprisingly,
we find that the set of partners, despite being diverse, has been conserved over
hundreds of millions of years. Further, we observe a pattern of subfunctionalization
for these low–specificity proteins that is identical to that observed in high–
specificity proteins. This suggests that these low–specificity interfaces are indeed
under selection to maintain a specific—if large—set of binding targets.
Results
Human S100A5 and S100A6 interact with diverse peptides at the same binding site
We first systematically compared the binding specificity of human S100A5
(hA5) relative to human S100A6 (hA6) for a collection of six peptides (Fig 14B).
Peptide targets have been reported for both hA5 and hA6 [280, 101, 278, 275, 279,
281, 104], but only two targets have been directly compared between paralogs.
Using Isothermal Titration Calorimetry (ITC), Streicher and colleagues found
88
FIGURE 14 Human S100A5 and S100A6 exhibit peptide binding specificity. A)
Published structures of S100 family members bound to both Ca2+ and peptide
targets at the canonical hydrophobic interface (PDB: 3IQQ, 1QLS, 3RM1, 2KRF,
4ETO, 2KBM, 1MWN, 3ZWH). Structures are aligned to the Ca2+–bound
structure of human S100A5 (2KAY). Peptides are shown in red. Blue spheres
are Ca2+ ions. B) Binding specificity of hA5 and hA6. Boxes indicate whether
the peptide binds to hA5 (purple) and/or hA6 (orange). If peptide does not bind
by ITC (KD >100 µM), the box is white. Peptide names are indicated on the
left. Peptide sequences, aligned using MUSCLE [285], are shown on the right.
Solubilizing flanks, which contribute minimally to binding (Table 3 in supplement),
are shown in lowercase letters. Annexin 1 (An1) and Annexin 2 (An2) binding
measurements are from a published study [275]. C) ITC heats for the titration
of NCX1 (blue) and SIP (red) peptides onto hA5 (top) and hA6 (bottom).
Points are integrated heats extracted from each shot. Lines are 100 different fit
solutions drawn from the fit posterior probability distributions. For the hA5/NCX1
and hA6/SIP curves, we used a single–site binding model. For hA5/SIP and
hA6/NCX1, we used a blank dilution model. Thermodynamic parameters for these
fits are in Table 4–7 in supplement.
89
that a peptide fragment of Annexin 1 bound to hA6 but not hA5, and a peptide
fragment of Annexin 2 bound to neither [275] (Fig 14B). To better quantify the
relative specificity of these proteins, we used ITC to measure the binding of two
additional peptides to recombinant hA5 and hA6. The first was a peptide from
Siah–interacting protein (SIP) previously reported to bind to hA6 [280]. We found
that this peptide bound to hA6 with a KD of 20 µM , but did not bind hA5 (Fig
14B, C). The second was a 12 amino acid fragment of the protein NCX1 that was
reported to bind to hA5 [281]. We found that this peptide bound with to hA5 with
a KD of 20 µM , but did not bind hA6 (Fig 14B, C).
To further characterize the specificity of the interface, we used phage display
to identify two additional peptides that bound to each protein. We panned a
commercial library of random 12–mer peptides fused to M13 phage with either hA5
or hA6. Phage enrichment was strictly dependent on Ca2+ (Fig 32 in supplement).
Three sequential rounds of binding and amplification with either hA5 or hA6
led to enrichment of the “A5cons” and “A6cons” peptides (Fig 15B, Fig 32 in
supplement). We then used ITC to measure binding of these peptides to hA5
and hA6. To ensure solubility, we added polar N and C–terminal flanks before
characterizing binding. A5cons bound to both hA5 and hA6 (Fig 14C). In contrast,
A6cons, bound hA6 but not hA5 (Fig 14C). To verify that binding was driven by
the central region, we re–measured binding in the presence and absence of different
versions of the flanks (Table 3 in supplement).
The peptides that bind to hA5 and hA6 are diverse in sequence,
hydrophobicity, and charge (Fig 14B). One explanation for this diversity could
be that the peptides bind at different interfaces on the protein. To test for this
possibility, we used NMR to identify residues whose chemical environment changed
90
on binding of peptide. We first verified the published assignments for hA5 using
a 3D NOESY–TROSY experiment [207]. We then collected 1H −15 N TROSY–
HSQC NMR spectra of Ca2+–bound protein in the presence of either the A5cons or
A6cons peptide. By comparing the bound and unbound spectra, we could identify
peaks whose location shifted dramatically or that broadened due to exchange.
In addition to our own work, we also included previously reported experiments
probing the hA5/NCX1 peptide interaction in the analysis [281]. For all three
peptides, we observed a consistent pattern of perturbations in helices 3 and 4
and, to a lesser extent, helix 1 upon peptide binding (Fig 15A–C). These results
suggest that all three peptides bind at the canonical interface. In addition to this
spectroscopic evidence, binding of all of these peptides was strictly dependent
on the presence of Ca2+ (Fig 15D–F)—consistent with binding at the interface
exposed on Ca2+ binding [207].
The S100A5 and S100A6 clades exhibit conserved binding specificity
Although hA5 and hA6 bind to diverse peptide targets at the same interface,
they exhibit distinct specificity relative to one another (Fig 14B). The particular
peptides that bind or not could be random if specificity fluctuates over evolutionary
time. In contrast, if specificity at the interface is strongly constrained, we would
expect conserved specificity between paralogs. We therefore set out to study the
evolution of the differences in peptide binding between the human proteins.
We first constructed a maximum–likelihood phylogeny of the clade containing
S100A2, S100A3, S100A4, S100A5, and S100A6 (Fig 16A). We built the tree using
the EX/EHO+Γ8 evolutionary model [287], which uses different evolutionary
models for sites in different structural classes. As expected from previous
91
FIGURE 15 Diverse peptides bind at the human S100A5 peptide
interface.Structures show NMR data mapped onto the structure of Ca2+–bound
hA5 (2KAY [207]). To indicate the expected peptide binding location, we aligned
a structure of hA6 in complex with the SIP peptide (2JTT [280]) to the hA5
structure, and then displayed the SIP peptide in red. Panels A–C show binding
for NCX1, A5cons, and A6cons respectively. In panel A, yellow residues are those
noted as responsive to NCX1 binding in [281]. In panels B and C, yellow residues
are those whose 1H–15N TROSY–HSQC peaks could not be identified in the
peptide–bound spectrum because the peaks either shifted or broadened. Panels
D–E show ITC data for binding of the peptides above in the presence of 2 mM
Ca2+ (blue) or 2 mM EDTA (red). Points are integrated heats extracted from each
shot. Lines are 100 different fit solutions drawn from the fit posterior probability
distributions. For the Ca2+ curves, we used a single–site binding model. For the
EDTA curves, we used a blank dilution model. Insets show raw ITC power traces
for the Ca2+ binding curves. Thermodynamic parameters for these fits are in Table
4–7 in supplement.
92
FIGURE 16 S100A5 and S100A6 arose by gene duplication in the amniote
ancestor. A) Maximum likelihood phylogeny for S100A5, S100A6 and their close
homologs. Wedges denote collections of paralogs (S100A1, S100A2, S100A3,
S100A4, S100A5, or S100A6). Wedge height corresponds to the number of
sequences and wedge length to the longest branch in that clade. SH supports,
estimated using an approximate likelihood ratio test [236], are shown above the
branches. Scale bar shows branch length in substitutions per site. Reconstructed
ancestors are denoted with circles. All proteins, with the exception of those in
the A1 clade, are taken from amniotes. A1 contains S100 proteins from bony
vertebrates and was used as an out–group to root the tree. Panels B and C
show relative conservation of residues across amniote paralogs mapped onto the
structures of hA5 (2KAY, [207]) and hA6 (1K96, [286]). Colors denote conservation
from < 20 % (dark red) to 100 % white. Sequences were taken from the alignment
used to generate the phylogeny in panel A. Dashed circles denote the peptide
binding surface for one of the two chains. Blue spheres show the location of bound
Ca2+ in the structures.
93
phylogenetic and syntenic analyses [91, 96], S100A5 and S100A6 were paralogs
that arose by gene duplication in the amniote ancestor, with S100A2, S100A3, and
S100A4 forming a closely–related out group (Fig 16A). To set our expectation for
conservation of specificity, we then calculated the conservation of residues at the
binding site across S100A5 and S100A6 homologs. Fig 16B and C show the relative
conservation of residues on hA5 (Fig 16B) and hA6 (Fig 16C). Taken as a whole,
the peptide binding region does not exhibit higher conservation than other regions
in the protein. We therefore predicted substantial variability in the peptide binding
specificity across S100A5 and S100A6 orthologs.
To test the prediction that specificity has fluctuated over time, we
expressed and purified S100A5 and S100A6 orthologs from human, mouse (Mus
musculus), tasmanian devil (Sarcophilus harrisii), American alligator (Alligator
mississippiensis), and chicken (Gallus gallus). We then characterized the peptide
binding specificity of these S100A5 and S100A6 orthologs against four peptides:
A5cons, A6cons, SIP, and NCX1 (Fig 17A). We selected these peptides because
there is direct evidence that these peptides bind at the canonical binding interface
(Fig 15, as well as [280, 281]). Surprisingly, we found that the S100A5 and S100A6
clades exhibited broadly similar, ortholog–specific binding specificity (Fig 17A). All
S100A5 orthologs bound NCX1, A5cons, and A6cons, but not SIP. In contrast,
all S100A6 orthologs bound SIP and A6cons, but not A5cons. The only labile
character is NCX1 binding to S100A6. The sauropsid and marsupial S100A6
orthologs bound NCX1, but not the eutherian mammal representatives. We also
characterized binding of these peptides to human S100A4 as an outgroup. Binding
for this protein was intermediate between the S100A5 and S100A6 clades: it bound
A5cons and A6cons, but not SIP or NCX1. Thermodynamic parameters for these
94
FIGURE 17 S100A5 and S100A6 paralogs exhibit conserved properties.A) Peptide
binding specificity mapped onto the phylogenetic tree as a collection of binary
characters. Each square denotes binding of a specific peptide to an ortholog
sampled from the species indicated at right. Squares are filled if binding was
observed by ITC. Ancestors are shown in the middle, with red arrows indicating
changes that occurred after duplication that were then conserved across orthologs.
The results for ancA5/A6 were identical for both the ML and “altAll” ancestors.
Full thermodynamic parameters are in Table 4–7 in supplement. B) Far–UV
spectra for apo (gray) and Ca2+–bound (purple) hA5. C) Far–UV spectra for
apo (gray) and Ca2+–bound (orange) hA6. D) Spectroscopic properties mapped
onto the phylogeny. The left column shows the ratio of absorbance at 222 nm/208
nm for the apo protein. The right column shows the percentage increase in signal
at 222 nm upon addition of Ca2+. Dashed lines show the mean values across all
experiments. Raw spectra are given in Fig 34 in supplement.
binding experiments are given in Table 4–7 in supplement. Representative ITC
traces for each protein are shown in Fig 33 in supplement.
The strong conservation of peptide binding suggested that other features—
such as structural features—might be conserved between paralogs as well. To
test for this, we characterized the secondary structure and response to Ca2+ for
all proteins using far–UV circular dichroism (CD) spectroscopy. A Ca2+–driven
change in α–helical secondary structure is a conserved feature of S100 proteins
[96, 100]. We asked whether this behavior was conserved across orthologs, which
95
would indicate similar structural properties. As with peptide binding, we found
that the CD spectrum and response to Ca2+ were diagnostic within each clade (Fig
17B–D, Fig 34 in supplement). S100A5 orthologs exhibited deep minima at 208 and
222 nm, corresponding to a largely α–helical secondary structure (Fig 17B,D). This
signal increased upon addition of saturating Ca2+, consistent with the ordering
of the C–terminus of the human protein reported by NMR [207]. In contrast, all
S100A6 orthologs exhibited a deeper minimum at 208 nm, likely corresponding to a
mixture of α–helical and random coil secondary structure. The secondary structure
of these proteins changed comparatively little on addition of Ca2+ (Fig 17C,D).
Specificity evolved from an apparently promiscuous ancestor
Surprisingly, despite the diversity of peptides that bind to each paralog,
peptide binding specificity is conserved across across paralogs. We next asked
whether these proteins exhibited comparable evolutionary patterns to those
observed in high–specificity proteins, such as the partitioning of ancestral binding
partners along duplicate lineages [58, 272, 55]. Using our phylogeny, we used
ancestral sequence reconstruction (ASR) to reconstruct the last common ancestors
of S100A5 orthologs (ancA5) and S100A6 orthologs (ancA6) [288]. These proteins
were well reconstructed, having mean posterior probabilities of 0.93 and 0.96,
respectively. Their sequences are given in Table 8 in supplement. We expressed
and purified both of these proteins. We found that they shared similar secondary
structures and Ca2+–binding responses with their descendants by far–UV CD (Fig
17C). We then measured binding to the suite of four peptides described above
using ITC. These ancestors gave the pattern we would expect given the binding
specificities of the derived proteins (Fig 17D). AncA5 is indistinguishable from a
96
modern S100A5 ortholog, binding A5cons, A6cons, and NCX1, but not SIP (Fig
4D). AncA6 also behaves as expected, binding A6cons and SIP, but not A5cons.
It does not bind NCX1, consistent with this character being labile in the S100A6
lineage (Fig 17D).
We next characterized the last common ancestor of S100A5 and S100A6
(ancA5/A6). This reconstruction had a mean posterior probability of 0.83 (Table
S6). AncA5/A6 has a secondary structure content identical to ancA6 and the
S100A6 descendants. It also responds to Ca2+ in a similar fashion (Fig 17C,
Fig 33 in supplement). Unlike any modern protein, however, ancA5/A6 binds
to all four peptides (Fig 18). To verify that this result was not an artifact of the
reconstruction, we also made an “AltAll” ancestor of ancA5/A6 in which we
swapped all ambiguous sites in the maximum–likelihood ancestor with their next
most likely alternative [50] (Table 8 in supplement, methods). This protein is quite
different than ancA5/A6—differing at 21 of 93 sites—but the binding profile for the
four peptides was identical to the maximum–likelihood ancestor. Thermodynamic
parameters for these binding experiments are given in Table 4–7 in supplement.
Binding specificity can be changed with a single mutation
Our work revealed that S100A5 and S100A6, despite having low overall
specificity, display the same basic evolutionary patterns as high–specificity proteins
[58, 55, 273]: they exhibit conserved partners across modern orthologs and display
a pattern of subfunctionalization from a less specific ancestor. While suggestive,
this does not establish that there is selection to maintain specificity. Another
possibility is that switching specificity is intrinsically difficult, and that the pattern
97
we observe reflects this difficulty rather than selective pressure to maintain a
particular specificity profile.
To distinguish these possibilities, we attempted to shift the binding specificity
of hA5 by introducing mutations at the binding interface. We selected five
historical substitutions that occurred along the branch between ancA5/A6 and
ancA5: e2A, i44L, k54D, a78M, m83A (with the ancestral amino acid in lowercase
and modern amino acid in uppercase). We chose these substitutions using three
criteria: 1) the ancestral amino acid was conserved in S100A6 orthologs, 2) the
derived amino acid was conserved in S100A5 orthologs, 3) and the mutations were
located at the peptide binding interface. Fig 18A shows the positions of candidate
substitutions mapped onto the structure of hA5 [207].
We reversed each of these sites individually to the ancestral state in hA5.
We then measured binding of two clade–specific peptides, SIP and A5cons, to
each mutant using ITC (Table 9 in supplement). We found that reverting a
single substitution (A83m) to its ancestral state in hA5 enabled it to bind the
SIP peptide (Fig 18B). This reversion does not compromise binding to A5cons,
thus recapitulating the ancestral specificity (Table S3). Reversion to the ancestral
methionine at residue 83 likely makes more favorable hydrophobic packing
interactions with the SIP peptide than the extant alanine. This demonstrates that
a single mutation at the peptide binding interface is capable of shifting specificity
in S100A5. None of the remaining four ancestral reversions led to measurable
changes in A5cons or SIP binding. Amino acids at these positions either do not
interact with these peptides, or the ancestral and derived amino acids interact in
roughly equivalent fashion.
98
FIGURE 18 Small changes are sufficient to alter binding specificity at the interface.
A) Ca2+–bound structure of human S100A5 (2KAY) [207] with ancestral reversions
marked in gray (no effect on SIP binding) and red (A83m—allows SIP binding).
Blue spheres are Ca2+ ions. B) ITC traces showing titration of SIP onto hA5
A83m (red) versus wildtype hA5 (blue). ITC experiments were performed at
25◦C in 25 mM TES, 100 mM NaCl, 2 mM CaCl2, 1mM TCEP, pH 7.4. Points
are integrated heats extracted from each shot. For each experiment, we sampled
fit parameters using Bayesian Markov Chain Monte Carlo as implemented in
pytc. For the A83m curve, we used a single–site binding model. For the wt
curve, we used a blank dilution model, where the linear slope is indicative of
peptide dilution without binding. Lines are 100 different solutions drawn from
the Bayesian posterior probability distributions. C) ITC traces from experiments
done at multiple temperatures: 10◦C (purple), 15◦C (green), 20◦C (blue), and 25◦C
(red). Experiments were performed in 25mM TES, 100mM NaCl, 2mM CaCl2,
1mM TCEP, pH 7.4. Dots are integrated heats with uncertainty calculate using
NITPIC [264]. There is a clear temperature dependence of the binding enthalpy.
A global Van’t Hoff model was fit to the data using the Bayesian MCMC fitter
in pytc. We were unable to fit the model without inclusion of a ∆C◦p parameter
(−0.40≤− 0.36≤− 0.32kcal ·mol−1 ·K−1), suggesting that there is a change in heat
capacity as a function of temperature, which is indicative of a hydrophobically–
driven interaction. Lines are 100 curves drawn from the posterior distribution of
the fits. Table 9 in supplement. D) Van’t Hoff plot of temperature dependence
data. Thick black line shows Maximum Likelihood curve, gray lines are 500 curves
drawn from the posterior distribution of the Bayesian fit. There is slight, but
detectable curvature in the plot, consistent with the small ∆C◦p parameter obtained
from global model.
99
Another way to view specificity is in terms of binding mechanism. If binding
affinity is mostly due to the hydrophobic effect, we would predict it would be
relatively easy to alter binding by small changes to packing interactions. To
test for relative contributions of the hydrophobic effect versus polar contacts to
binding affinity, we did a van’t Hoff analysis for the binding of A5cons to hA5. We
performed ITC at temperatures ranging from 10 ◦C to 25 ◦C and then globally
fit van’t Hoff models to the binding isotherms (Fig 5C–D). We first attempted
fits using a fixed enthalpy of binding (∆C◦p = 0.0), but the fits did not converge.
When we allowed ∆C◦p to float, we found it was negative (−0.40 ≤ −0.36 ≤
−0.32 kcal · mol−1 · K−1), indicating that binding is driven by the hydrophobic
effect [289]. This observation is consistent with binding at the hydrophobic surface
exposed by the Ca2+induced conformational change [207] and may help to explain
why specificity can be readily altered via a single substitution in the interface.
Discussion
Our work highlights the paradoxical nature of peptide binding specificity
for these low–specificity S100 proteins. The binding interface has low specificity,
interacting with very diverse peptides with no obvious binding motif (Fig 14B).
Further, the specificity is fragile, and can be altered with a single point mutation
(Fig 18). One might therefore conclude that this binding specificity is only weakly
constrained. In contrast, binding specificity has been conserved over 320 million
years along both lineages, exhibiting a pattern of subfunctionalization similar to
what has been observed previously for the evolution of high–specificity proteins
(Fig 17). This strongly points to the binding specificity being important, despite
being very broad.
100
Low specificity through a hydrophobic interface
The binding specificity of these proteins is likely driven almost entirely
by shape complementarity and packing. The protein interface exposed on Ca2+
binding is hydrophobic and likely makes few protein–peptide polar contacts. This
prediction is validated, at least for the hA5/A5cons interaction, by the negative
∆C◦p on binding, pointing to an important contribution from the hydrophobic
effect on binding (Fig 18C). The lack of polar contacts is the likely explanation
for the low specificity of the interface. Peptides need only match hydrophobicity
and packing, meaning that a large number of possible peptides bind with similar
affinity.
The hydrophobic nature of the interface explains the low specificity, but
makes the conservation of specificity over 320 million years quite surprising.
There is likely no diagnostic set of polar contacts that can be conserved maintain
specificity. It should therefore be straightforward to change specificity with minimal
perturbation. Indeed, we found that a single mutation, from a small to a large
hydrophobic amino acid, is able to switch the specificity of the interface (Fig 18A).
Yet, over evolutionary time, binding specificity—at least for this set of targets—has
been maintained (Fig 17). Amazingly, this is achieved without strict conservation
of the binding site. The peptide binding region does not exhibit higher conservation
than other residues in either S100A5 or S100A6 (Fig 16B–C).
Our work shows that protein binding specificity is likely an important feature
of these proteins, but does not reveal the set of biological targets for S100A5 and
S100A6. Identifying these targets will require further experiments. This could
include coupling S100A5 and S100A6 knockouts to proteomics or transcriptomics,
pull downs followed by proteomics, and/or large–scale screens of peptide targets
101
via a technique like phage display. We also anticipate that external factors—such
as coexpression, large complex assembly, and subcellular localization—will add
critical additional layers of specificity to the low–specificity binding interfaces of
these proteins. Understanding the interplay between the biochemical specificity and
these external factors will be important for dissecting the biology of these proteins.
S100s may allow the evolution of new calcium regulation
The existence of a conserved set of binding partners also has intriguing
implications for the evolution of Ca2+ signaling pathways in vertebrates. This can
be seen by contrasting S100 proteins with calmodulin, a protein that also exposes a
protein interaction surface and regulates the activity of target proteins in response
to Ca2+ [84]. It has been proposed that calmodulin provides a universal Ca2+
response across tissues, while S100 proteins allow for fine–tuned, tissue–specific
responses [100, 89]. Our results allow us to extend this idea along an evolutionary
axis.
Our results suggest that S100 proteins may provide a minimally pleiotropic
pathway for the evolution of new Ca2+ regulation. Calmodulin is broadly expressed
across tissues. As a result, a mutation that causes a protein to interact with
calmodulin will have the same effect in all tissues where that protein is expressed.
This could lead to unfavorable pleiotropic effects that prevent fixation of the
mutation. In contrast, S100 proteins have highly differentiated tissue expression.
S100A5, for example, is expressed almost exclusively in olfactory tissues. This
means that a protein that acquires an interaction with S100A5 will do so only
in olfactory tissue, with minimal pleiotropic effects in other tissues. The pattern
of subfunctionalization we observed is consistent with this idea (Fig 17D), as
102
subfunctionalization is one way to escape adaptive conflict that arises due to
pleiotropic effects of mutations [66, 290]. This is only possible because S100A5
evolved a distinct binding profile relative to S100A6 (and presumably other S100
proteins), meaning that acquisition of a new S100A5 interaction does not imply an
interaction with a large number of other S100 proteins, which would itself lead to
extensive pleiotropy.
Additionally, our results suggest that S100 proteins would provide a much
simpler path for the evolution of new Ca2+ regulation than calmodulin. The
calmodulin sequence has been conserved for over a billion years and is basically
unchanged across fungi and animals. As a result, evolution of a new calmodulin–
regulated target requires that the target change its sequence to bind to calmodulin.
This would likely mean that slowly evolving proteins would not be able to evolve
Ca2+ regulation, as neither the calmodulin nor possible new target would be able
to acquire the necessary mutations to form the new interaction. In contrast, S100
proteins are evolving rapidly. For example, human S100A5 and S100A6 only exhibit
53% sequence identity, despite sharing an ancestor ≈ 320 million years ago. This
means that, particularly after gene duplication, S100 proteins can acquire new
interactions through mutations to the S100 itself. This would allow them to capture
slowly evolving target proteins, opening a different avenue for the evolution of Ca2+
regulation that would not be accessible by calmodulin alone.
Evolution of low–specificity proteins
Our results also shed light on the evolution of low specificity proteins in
general. Many proteins besides S100 proteins exhibit low specificity including
other signaling proteins [84, 83], hub proteins [81, 269, 145, 82], and many others
103
[267, 79, 268, 85, 87]. Further experiments will be required to determine the
generality of our observations for low–specificity proteins, but our work suggests
that low–specificity proteins can evolve with similar dynamics to the high–
specificity proteins that have been studied in detail. Partners for low–specificity
proteins can be strongly conserved and evolve by subfunctionalization, just like a
high–specificity protein.
One important question is whether S100A5 and S100A6 did, indeed, gain
specificity over time. The current study, like many others [291, 292, 293, 271,
294, 58, 119], revealed an ancestral protein that appears less specific than its
descendants. Some have proposed this is a general evolutionary trend [291, 271,
119]. Caution is warranted before interpreting these data as evidence for this
hypothesis. We selected a small set of peptides to study; therefore, other patterns
may be consistent with our observations. For example, it could be that the proteins
both acquired more peptides that we did not sample in this experiment (actual
neofunctionalization), while becoming more specific for the chosen set of targets
(apparent subfunctionalization). Particularly given the large number of targets
for these proteins, distinguishing these possibilities will require an unbiased, high–
throughout approach to measuring specificity. Advances in high–throughput protein
characterization have made such experiments tractable [295, 296, 149, 297, 298].
With the right method, we will be able to resolve whether the shifts in specificity
we observed indeed reflect increased specificity over evolutionary time, or instead
the small size of the binding set we investigated.
Whatever the precise evolutionary process, our results reveal that S100
proteins—despite binding diverse peptides at a low–specificity hydrophobic
104
interface—have maintained the same binding profile for the last 320 million years.
Low–specificity does not imply no specificity, nor a lack of evolutionary constraint.
Materials and Methods
Molecular cloning, expression and purification of proteins
Synthetic genes encoding the S100 proteins and codon–optimized for
expression in E. coli were ordered from Genscript. The accession numbers for
the modern sequences are: Homo sapiens S100A5: P33763, S100A6: P06703;
Mus musculus S100A5: P63084, S100A6: P14069; Sarcophilus harrisii S100A5:
G3W581, S100A6: G3W4S8; Alligator mississippiensis S100A5: XP 006264408.1,
S100A6: XP 006264409.1; Gallus gallus S100A6: Q98953. All accession
numbers are for the uniprot database [299], with the exception of the Alligator
mississippiensis accessions, which are for the NCBI database [300].
Genes were sub–cloned into a pET28/30 vector containing an N–terminal
His tag with a TEV protease cleavage site (Millipore). Expression was carried
out in Rosetta (DE3) pLysS E. coli cells. 1.5 L cultures were inoculated at a
1:100 ratio with saturated overnight culture. E.coli were grown to high log–
phase (OD600≈0.8 − 1.0) with 250rpm shaking at 37◦C. Cultures were induced
by addition of 1 mM IPTG along with 0.2% glucose overnight at 16C. Cultures
were centrifuged and the cell pellets were frozen at −20◦C and stored for up to 2
months. Lysis of the cells was carried out via sonication in 25mM Tris, 100mM
NaCl, 25mM imidazole, pH 7.4.
Purification of all S100s used in this study was carried out as follows. The
initial purification step was performed using a 5 mL HiTrap Ni–affinity column (GE
Health Science) on an kta PrimePlus FPLC (GE Health Science). Proteins were
105
eluted using a 25mL gradient from 25–500mM imidazole in a background buffer
of 25mM Tris, 100mM NaCl, pH 7.4. Peak fractions were pooled and incubated
overnight at 4◦C with ≈1:5 TEV protease (produced in the lab). TEV protease
removes the N–terminal His–tag from the protein and leaves a small Ser–Asn
sequence N–terminal to the wildtype starting methionine. Next hydrophobic
interaction chromatography (HIC) was used to purify the S100s from remaining
bacterial proteins and the added TEV protease. Proteins were passed over a 5
mL HiTrap phenyl–sepharose column (GE Health Science). Due to the Ca2+–
dependent exposure of a hydrophobic binding, the S100 proteins proteins adhere
to the column only in the presence of Ca2+. Proteins were pre–saturated with 2mM
Ca2+ before loading on the column and eluted with a 30mL gradient from 0mM
to 5mM EDTA in 25mM Tris, 100mM NaCl, pH 7.4. Peak fractions were pooled
and dialyzed against 4 L of 25 mM Tris, 100 mM NaCl, pH 7.4 buffer overnight at
4◦C to remove excess EDTA. The proteins were then passed once more over the
5 mL HiTrap Ni–affinity column (GE Health Science) to removed any uncleaved
His–tagged protein. The cleaved protein was collected in the flow–through. Finally,
protein purity was examined by SDS–PAGE. If any trace contaminants appeared
to be present we performed anion chromatography with a 5mL HiTrap DEAE
column (GE). Proteins were eluted with a 50mL gradient from 0–500mM NaCl in
25mM Tris, pH 7.08.5 (dependent on protein isolectric point) buffer. Pure proteins
were dialyzed overnight against 2L of 25mM TES (or Tris), 100mM NaCl, pH 7.4,
containing 2 g Chelex–100 resin (BioRad) to remove divalent metals. After final
purification step, the purity of proteins products was assessed by SDS PAGE and
MALDI–TOF mass spectrometry to be >95%. Final protein products were flash
106
frozen, dropwise, in liquid nitrogen to form frozen spherical pellets and stored at
−80◦C. Protein yields were typically on the order of 25mg/1.5L of culture.
Isothermal titration calorimetry
ITC experiments were performed in 25 mM TES, 100mM NaCl, 2mM CaCl2,
1mM TCEP, pH 7.4. Although most experiments were performed at 25◦C , some
were done at cooler temperatures depending to ensure measurable binding heats
and sufficient curvature for fitting. Samples were equilibrated and degassed by
centrifugation at 18, 000xg at the experimental temperature for 30 minutes.
Peptides (GenScript, Inc.) were dissolved directly into the experimental buffer
prior to each experiment. All experiments were performed at on a MicroCal
ITC–200 or a MicroCal VP–ITC (Malvern). Gain settings were determined on
a case–by–case basis to ensured quality data. A 750 rpm syringe stir speed was
used for all ITC–200 experiments while 400rpm speed was used for experiments
on the VP–ITC. Spacing between injections ranged from 300s–900s depending
on gain settings and relaxation time of the binding process. These setting were
optimized for each binding interaction that was measured. Titration data were
fit to a single–site binding model using the Bayesian fitter in pytc. For each
protein/peptide combination, one clean ITC trace was used to fit the binding
model. Negative results were double–checked to ensure accuracy. Some were done
at lower temperatures (10◦C or 15◦C) to confirm lack of binding, because peptide
binding enthalpy should be dependent on temperature.
107
2D HSQC NMR experiments
We collected 2D 1H −15 N TROSY-HSQC NMR spectra for 2 mM hA5 in
the presence of Ca2+ alone and with the addition of the 2 mM A5cons. We also
collected the spectra of 0.5 mM hA5 with the addition of 0.5 mM A6cons peptide,
which was done at lower concentration due to poorer solubility of A6cons in the
aqueous buffer. We transfered published assignments to the Ca2+-alone spectrum
(BMRB: 16033, [207]), and then used 3D NOESY-TROSY spectra to verify the
assignments. We were able to unambiguously assign 76 peaks of the 91 non-
proline amino acids in the Ca2+-bound form. We then added saturating A5cons
or A6cons peptide to the sample and remeasured the TROSY-HSCQ spectrum. We
then noted which peaks had either shifted or entered intermediate exchange upon
addition of the peptide. Of the 76 unambiguously assigned non-proline amino acids
26 shifted or disappeared in the A5cons-bound form, and 35 shifted or disappeared
in the A6cons bound form.
All NMR experiments were performed at 25 ◦C on an 800 MHz (18.8T)
Bruker spectrometer at Oregon State University. TROSY spectra were collected
with 32 transients, 1024 direct points with a signal width of 12820, and 256 indirect
points with a signal width of 2837 Hz in 15N . NOESY-TROSYs were run with
8 transients, non-uniform sampling with 15% of data points used, and a 150 ms
mixing time. All spectra were processed using NMRPipe [301]; data were visualized
and assignments transferred using the CCPNMR analysis program [302].
Far–UV CD spectroscopy
Far–UV circular dichroism spectra (200250nm) were collected on a J–
815 CD spectrometer (Jasco) with a 1 mm quartz cell (Starna Cells, Inc.). We
108
prepared 20–40 µM samples in a Chelex (Bio–Rad) treated, 25mM TES (Sigma),
100mM NaCl (Thermo Scientific) buffer at pH 7.4. Samples were centrifuged at
18,000 x g at 25◦C in a temperature–controlled centrifuge (Eppendorf) before
experiments. Spectra were measured in the absence and presence of saturating
Ca2+. Reversibility of Ca2+–induced structural changes was confirmed by
subsequently adding a molar excess of EDTA to the Ca2+–saturated samples and
repeating the measurements. Five scans were collected for each condition and
averaged to minimize noise. A buffer blank spectrum was subtracted with the
built–in subtraction feature in the Jasco spectra analysis software. Raw ellipticity
was later converted into mean molar ellipticity based on the concentration and
residue length of each protein. These calculations were performed on the buffer–
blanked data.
Preparation of biotinylated proteins for phage display
A small amount of the purified proteins were biotinylated in the following
way using the EZ–link BMCC–biotin system (ThermoFisher Scientific). This kit
used a maleimide linker to attach biotin at a Cys residue on the protein. ≈1mg
BMCC–biotin was dissolved directly in 100% DMSO to a concentration of 8mM
for labeling. Proteins were exchanged into 25mM phosphate, 100mM NaCl, pH 7.4
using a Nap–25 desalting column (GE Health Science) and degassed for 30 minutes
at 25◦C using a vacuum pump (Malvern Instruments). While stirring at room
temperature, 8mM BMCC–biotin was added dropwise to a final 10X molar excess.
Reaction tubes were sealed with PARAFILM (Bemis) and the maleimide–thiol
reactions were allowed to proceed for 1 hour at room temperature with stirring.
The reactions were then transferred to 4◦C and incubated with stirring overnight
109
to allow completion of the reaction. Excess BMCC–biotin was removed from the
labeled proteins by exchanging again over a Nap–25 column (GE Health Science),
and subsequently a series of 3 concentration–wash steps on a NanoSep 3K spin
column (Pall corporation), into the Ca–TeBST loading loading buffer. Complete
labeling was confirmed by MALDI–TOF mass spectrometry by observing the
≈540Da shift in the protein peak. Final stocks of labeled proteins were prepared
at 10 µM by dilution into the loading buffer.
Phage display panning
Phage display experiments were performed using the PhD–12 peptide phage
display kit (NEB). All steps involving the pipetting of phage–containing samples
was done using filter tips to prevent cross–contamination (Rainin). 100L samples
containing phage (2.5x1010 PFU) and biotin–protein 0.01 µM (or 0.01 µM biotin
in the negative control) and 50 µM peptide competitor (in competitor samples)
were prepared at room temperature in a background of Ca–TeBST loading buffer
(25mM TES, 100mM NaCl, 2mM CaCl2, 0.01% Tween–20, pH 7.4) to ensure
saturation of the S100s with Ca2+. Samples were incubated at room temperature
for 1hr. Each sample was then applied to one well of a 96–well high–capacity
streptavidin plate (previously blocked using PhD–12 kit blocking buffer and washed
6X with 150 µL loading buffer). Samples were incubated on the plate with gentle
shaking for 20min. 1 µL of 10mM biotin (NEB) was then added to each sample
on the plate and incubated for an additional five minutes to compete away purely
biotin–dependent interactions. Samples were then pulled from the plate carefully
by pipetting and discarded. Each well was washed 5X with 200 µL of loading
buffer by applying the solution to the well and then immediately pulling off by
110
pipetting. Finally, 100 µL of EDTA–TeBST (25mM TES, 100mM NaCl, 5mM
EDTA, 0.01% Tween–20, pH 7.4) elution buffer was applied to each well and the
plate was incubated with gentle shaking for 1hr at room temperature to elute. Two
replicates of the experiment were performed with each protein.
Eluates were pulled from the plate carefully by pipetting and stored at 4◦C
Eluates were titered to quantify enrichment as follows. Serial dilutions of the
eluates from 10−1–10−6 were prepared in LB medium. These were used to inoculate
200 µL aliquots of mid–log–phase ER2738 E. coli (NEB) by adding 10 µL to each.
Each 200 µL aliquot was then mixed with 3mL of pre–melted top agar, applied to
a LB/agar/XGAL/IPTG (Rx Biosciences) plate, and allowed to cool. The plates
were incubated overnight at 37◦C to allow formation of plaques. The next morning,
blue plaques were counted and used to calculate PFU/mL phage concentration.
Enrichment was calculated as a ratio of experimental samples to the biotin–only
negative control.
For subsequent rounds of panning the eluates were amplified as follows.
20mL 1:100 dilutions of an ER2738 overnight culture were prepared. Each 20mL
culture was inoculated with one entire sample of remaining phage eluate. The
cultures were incubated at 37◦C with shaking for 4.5 hours to allow phage growth.
Bacteria were then removed by centrifugation and the top 80% of the culture was
removed carefully with a filtered serological pipette and transferred to a fresh tube
containing 1/6 volume of PEG/NaCl (20% w/v PEG–8000, 2.5M NaCl). Samples
were incubated overnight at 4◦C to precipitate phage. Precipitated phage were
isolated by centrifugation and subsequently purified by an additional PEG/NaCl
precipitation on ice for 1hr. Isolated phage were resuspended in 200 µL each sterile
loading buffer, titered to measure PFU/mL, and stored at 4◦C for use in the next
111
panning round. This process was repeated for 3 total rounds of panning. Plaques
were pulled from final reound eluate titer plates and amplified in 1mL ER2738
culture for 4.5 hours. ssDNA was isolated from the phage cultures using the
Qiagen M13 spin kit. 10 plaques per replicate experiment were Sanger sequenced
(GeneWiz, Inc.). These plaque sequences were used to construct the A5cons and
A6cons consensus peptides.
Phylogenetics and ancestral reconstruction
We used targeted BLAST searches to build an database of 49 S100A2–S100A6
sequences sampled from across the amniotes, as well as six telost fish S100A1
sequences as an outgroup. We attempted to achieve even taxonomic sampling
across amniotes. Database accession numbers are in Table 9 in supplement. We
used MSAPROBS for the initial alignment [234], followed by manual refinement.
Our final alignment is available as a supplemental stockholm file (File S1 in
supplementary directory).
We constructed our phylogenetic tree using the EX/EHO+Γ8 model, which
incorporates information about secondary structure and solvent accessibility into
the phylogenetic inference [287]. We assigned the secondary structure and solvent
accessibility of each site using 115 crystallographic and NMR structures of S100A2,
S100A3, S100A4, S100A5 and S100A6 paralogs: 1a03, 1a4p, 1b4c, 1bt6, 1cb1, 1cdn,
1cfp, 1clb, 1cnp, 1ig5, 1igv, 1irj, 1jwd, 1k2h, 1k8u, 1k9p, 1ksm, 1kso, 1m31, 1mq1,
1nsh, 1ozo, 1psb, 1psr, 1sym, 1uwo, 1yur, 1yus, 2bca, 2bcb, 2cnp, 2cxj, 2jpt, 2jtt,
2k8m, 2kax, 2ki4, 2ki6, 2kot, 2l0p, 2l50, 2l5x, 2le9, 2lhl, 2llt, 2llu, 2lnk, 2pru, 2rgi,
2wc8, 2wcb, 2wce, 2wcf, 3ko0, 3nsi, 3nsk, 3nsl, 3nso, 3nxa, 1b1g, 1e8a, 1gqm, 1j55,
1k96, 1k9k, 1mho, 1mr8, 1odb, 1qlk, 1xk4, 1xyd, 1yut, 1yuu, 1zfs, 2egd, 2h2k,
112
2h61, 2k7o, 2kay, 2l51, 2psr, 2q91, 2wnd, 2wor, 2wos, 2y5i, 3c1v, 3cga, 3cr2, 3cr4,
3cr5, 3czt, 3d0y, 3d10, 3gk1, 3gk2, 3gk4, 3hcm, 3icb, 3iqo, 3lk0, 3lk1, 3lle, 3m0w,
3psr, 3rlz, 4duq, 1mwn, 1qls, 2k2f, 2kbm, 3iqq, 3rm1, 3zwh, 4eto. We calculated
the secondary structure for each site using DSSP and the solvent accessibility using
NACCESS [303, 304]. To remove redundancy—whether from identical sequences
solved under slightly different conditions or from the multiple models in the NMR
models—we took the majority rule consensus secondary structure and the average
solvent accessibility for all structures with identical sequences before doing averages
across unique sequences. We then assigned the secondary structure for each column
using a majority–rule across unique sequences. We assigned the solvent accessibility
as the average across unique sequences at that site. Our structural annotation is
available in our alignment stockholm file (File S1 in supplementary directory).
We then constructed our tree using the EX/EHO+Γ8 model [287], enforcing
correct species relationships within groups of orthologs [128]. We compared the
final likelihood of this tree to trees generated using LG+Γ8 and JTT+Γ8 models
[240, 237]. Although the EX/EHO model has seven more floating parameters than
either LG or JTT, the final tree had a log–likelihood 61 units higher than the next–
best model. An AIC test strongly supports the more complex model (p = 3× 10−30)
. One important output from an EX/EHO calculation is χ, a term that measures
the fraction of sites that use the structural models relative to a linear combination
of all of them [287]. For our analysis, χ = 0.72. We rooted the tree using the
S100A1 sequences, which included S100s from several bony fishes.
To reconstruct ancestors using the EX/EHO+Γ8 model, we used PAML to
reconstruct ancestors using each of the six possible EX/EHO matrices [288, 305],
as well as their linear combination. We then mixed the resulting ancestral posterior
113
probabilities using the secondary structure calls and apparent accessibility at each
site, as well as χ (see Equation 3 in [287]). The code implementing this approach is
posted on github: https://github.com/harmslab/exexho phylo mixer. We assigned
gaps using parsimony. We generated the AltAll sequence as described in Eick et
al [50]. This incorporates uncertainty in the reconstruction by taking the next–
best reconstruction at each all ambiguous sites. We took each site at which the
posterior probability of the next–best reconstruction was greater than 0.20 and the
introduced that alternate reconstruction at the site of interest. Our AltAll sequence
differed from the maximum likelihood sequence at 21 positions (24% of sites). File
S2 in supplementary directory has the posterior probabilities of reconstructions at
each site in the ancestor, as well as the final sequences characterized.
Bridge to Chapter VI
In this chapter, the behavior of two low–specificity proteins were characterized
following gene duplication from a shared ancestor. Two members of the S100
protein family, S100A5 and S100A6, were used a model system. The biochemical
specificity of the two human proteins was characterized by measuring the binding of
two known peptide targets and two target peptides identified via phage display.
The human proteins displayed obvious patterns of binding specificity. The
study was then expanded to characterize conservation of the specificity profiles
across clades of S100A5 and S100A6 orthologs. Surprisingly, despite the highly
variable nature of S100 binding partners, there was a clear signal of conservation
in specificity profiles. Finally, ancestral sequence reconstruction was used to
resurrect the last common ancestor of all S100A5 and S100A6 proteins. The
binding specificity of this ancestor for the same set of peptides was measured,
114
revealing an apparent pattern of subfunctionalization along both duplicate lineages.
Furthermore, careful biophysical experiments and a mutagenesis study were used
to determine that peptide binding specificity is readily altered by a single amino
acid substitution and binding is driven primarily by the hydrophobic effect. This
chapter revealed that proteins with low biochemical specificity nonetheless undergo
similar patterns of evolutionary change to high–specificity proteins following gene
duplications. Chapter VI introduces a new method for directly measuring the
biochemical specificity of proteins in an unbiased fashion. Random–peptide phage
display and high–throughput sequencing are combined with ancestral sequence
reconstruction are applied to directly trace the evolution of binding specificity
in S100A5 and S100A6. This method improves upon the results presented in
chapter V, by allowing an estimate to be made of the entire set of possible binding
partners for each protein, including the oldest ancestor. This technique highlights
the subtlety of evolutionary changes in specificity following a gene duplication.
While the low–throughput methods shown in chapter V indicate that specificity
may have subfunctionalized in both the S100A5 and S100A6 lineages, the ubiased
high–throughput approach introduced in chapter VI demonstrates that human
S100A5 has indeed undergone a constriction of specificity onto a subset ancestral
binding partners. Meanwhile, human S100A6 appears to have shifted relative to the
ancestor, indicative of neofunctionalization. This key result shows the importance
of using unbiased methods to probe the evolution of specificity. A low–throughput
method can suggest an incorrect picture of how specificity evolved, simply due to a
lack of sufficient statistical sampling.
115
CHAPTER VI
EVOLUTION OF INCREASED BINDING SPECIFICITY IN
S100A5
Author Contributions
Lucas Wheeler and Michael Harms conceived the study and designed the
experiments. Lucas Wheeler performed all experiments. Michael Harms and Lucas
Wheeler analyzed experimental datasets. Michael Harms secured funding for the
work. Lucas Wheeler and Michael Harms wrote the manuscript and generated the
figures. Michael Harms and Lucas Wheeler edited the manuscript. All authors have
read and approved the manuscript.
Abstract
Some have hypothesized that ancestral proteins are, on average, less specific
than their descendants. If true, this would provide directionality to evolution and
suggest that reconstructed ancestral proteins would be practical starting points
for engineering. In support of this idea, studies of reconstructed ancestral proteins
have revealed ancestors that interact with more targets than their descendants.
These experimental results, are, however, also compatible with divergence from a
common set of ancestral partners: the set of partners shifts, rather than shrinks,
along each lineage. We set out to distinguish these two possibilities for a historical
evolutionary transition. Previously, we studied the acquisition of peptide binding
specificity in the proteins S100A5 and S100A6. Using a handful of peptides, we
found that the reconstructed last common ancestor of these proteins bound to
116
more peptides than its descendants. In the current study, we revisit this transition,
estimating changes in the total set of peptides that bind to each protein using a
quantitative phage display experiment coupled to supervised machine learning.
We uncover a more nuanced picture of the historical transition. Human S100A5
exhibits increased specificity over time, binding a subset of the peptides recognized
by the ancestor. In contrast, human S100A6 actually loses specificity, acquiring new
targets and binding to a larger number of peptides than the ancestral protein. The
S100A5 result is a direct demonstration that the total set of partners recognized by
a protein can shrink over time. In contrast, our findings along the S100A6 lineage
caution against interpreting changes in binding for a small number of targets as
evidence that the ancestor is less specific than its descendants.
Introduction
Changes in protein specificity are critical for evolutionary change [292, 271,
290, 306, 77, 307, 55, 273]. One intriguing suggestion is that, on average, proteins
become more specific over evolutionary time [138, 76, 47]. If true, this would be
a directional “arrow” for protein evolution[112, 308, 119, 47]. Such features are
rare—and controversial—but could ultimately provide fundamental insight into the
evolutionary process [131, 47]. For example, increasing specificity might indicate
that proteins become less evolvable over time, as they have fewer promiscuous
interactions that can be exploited to acquire new functions. From a practical
standpoint, it has also been suggested that less-specific reconstructed ancestors
would be powerful starting points for engineering new protein functions [60].
There are several reasons that proteins may evolve towards higher specificity.
First, gene duplication followed by subfunctionalization could lead to a partitioning
117
of ancestral binding partners between descendants, and thus increase specificity
along each lineage [309, 58, 55, 273]. Second, as metabolic and interaction networks
become more complex, proteins must use more sophisticated rules to “parse” the
environment: if an ancestral protein had to discriminate between fewer targets
than modern proteins, it could be less specific and still achieve the same biological
activity [58]. Finally, on the deepest evolutionary timescales, it has been pointed
out that the proteome of the last universal common ancestor was small. As a
result, each protein would have been required to perform multiple tasks and hence
have lower specificity [138, 76].
The increasing-specificity hypothesis can be represented as a Venn diagram:
the set of targets recognized by the ancestor is larger than the sets of targets
recognized by its descendants (Fig 19A). The sets in this diagram consist of all
possible interaction targets, not just those encountered biologically. From an
evolutionary perspective, promiscuous interactions—targets that a protein does
not encounter biologically, but would recognize if present—are critical for the
evolution of new function and are thus a component of its specificity. Furthermore,
if ancestors are to be used as good starting points for engineering applications they
must possess a larger set of allowed binding partners.
Much of the empirical support for the increasing-specificity hypothesis
comes from ancestral reconstruction studies. The results from one such study
are shown in Fig 19B. We previously studied the evolution of peptide binding
specificity in the amniote proteins S100A5 and S100A6. These proteins bind
to ≈ 12 amino acid linear peptide regions of target proteins to modulate their
activity (Fig 19B) [94, 280, 207, 101, 277, 275, 278, 281, 89]. We found that
S100A5 and S100A6 orthologs bound to distinct peptides, but that the last
118
common ancestor bound to all of the peptides we tested (Fig 19B) [110]. Other
studies, probing other classes of interaction partners, have found similar results:
the ancestor interacts with a broader range of partners than extant descendants
[292, 58, 60, 310, 119, 291, 55, 293, 141, 311, 273, 110].
Such results are not, however, sufficient to test the increasing-specificity
hypothesis. This can be seen in Fig 19C, which illustrates two radically different
Venn diagrams consistent with our experimental observations of peptide binding in
Fig 19B. One possibility is increasing specificity (the descendant sets are smaller
than the ancestral set). Another possibility is shifting specificity (the descendant
sets remain the same size but diverge in their composition). Distinguishing these
possibilities requires estimating the populations in each region of the Venn diagram,
which can only be done with a much larger, unbiased sample of the set of binding
partners (Fig 19C).
To perform a proper test for the evolution of increased specificity, we set
out to estimate changes in the total set of peptides between ancA5/A6 and
two of its descendants—human S100A5 (hA5) and human S100A6 (hA6). This
evolutionary transition is an ideal model to probe this question. We already have
a reconstructed ancestral protein that exhibits an apparent gain in specificity
over time, at least for a small collection of peptides [110]. Further, because they
bind to ≈12 amino acid peptides, the set of binders is discrete and enumerable
(2012 = 4× 1015 targets).
We estimated changes in the total sets of partners recognized by these
proteins using a combination of high-throughput characterization, machine learning,
and in vitro biochemistry. We start by measuring the protein-specific enrichment
of a huge collection of peptides using phage display. This is a noisy measure of
119
binding that also suffers from sampling issues, as each experiment samples a
different set of peptides. To solve these problems, we use supervised machine
learning to train models linking amino acid sequence to peptide enrichment for
each protein. We then calibrate these models against measured binding constants
for individual peptides. Finally, we apply each calibrated model to a common set of
one million peptides, allowing us to estimate the changes in the binding set for the
proteins over time. This approach provides a quantitative estimate of changes in
specificity over time—revealing that S100A5 and S100A6 did not evolve by a simple
process of increasing specificity. This implies the evidence for the global increasing
specificity hypothesis should be re-evaluated.
Results
Our goal was to measure changes in the total binding sets between
human S100A5 (hA5), human S100A6 (hA6), and their last common ancestor
(ancA5/A6). We therefore performed high-throughput characterization of peptide
binding to these three proteins. To account for uncertainty in the reconstructed
ancestral sequence, we studied two different versions of the last common ancestor:
“ancA5/A6” and “altAll.” ancA5/A6 is the maximum likelihood reconstruction of
the ancestral sequence; altAll has all ambiguous sites in the reconstruction flipped
to their next most-likely state [50, 110]. Both proteins have the same low-resolution
peptide specificity (Fig 19B) [110].
Estimating peptide interactions by phage display
We first assayed the binding of tens of thousands of peptides to each protein
using phage display. We panned a commercial library of randomized 12-mer
120
FIGURE 19 Testing the increased specificity hypothesis requires extensive
sampling of targets. A) Venn diagram of the increasing-specificity hypothesis.
The large circle is set of targets recognized by the ancestor; the smaller circles are
sets of targets represented its descendants. There is no strict requirement that
descendants be subsets of the ancestor. B) Experimentally measured changes in
peptide binding specificity for S100A5 and S100A6 (taken from [110]). Structure:
location of peptide (red) binding to a model of S100A5 (gray, PDB: 2KAY).
Bound Ca2+ are shown as blue spheres. Phylogeny: Boxes represent binding of
four different peptides (arranged left to right) to nine different proteins (arranged
top to bottom). A white box indicates the peptide does not bind that protein; a
colored box indicates the peptide binds. Colors denote ancA5/A6 (green), S100A5
(purple), and S100A6 (orange). Red arrows highlight ancestral peptides lost in the
modern proteins. C) Venn diagrams show overlap in peptide binding sets between
ancA5/A6, S100A5, and S100A6. Crosses denote experimental observations.
Columns show two evolutionary scenarios: increasing specificity (left) versus
shifting specificity (right). Rows show to different sampling methods: small sample
(top) versus random sampling (bottom). Colors are as in panel B.
121
peptides expressed as fusions with the M13 phage coat protein. The S100 peptide-
binding interface is only exposed upon Ca2+-binding (Fig 19B); therefore, we
performed phage panning experiments in the presence of Ca2+ and then eluted
the bound phage using EDTA (Fig 20A). The population of enriched phage
will be a mixture of phage that bind at the site of interest and phage that bind
adventitiously (blue and purple phage, Fig 20A). Peptides in this latter category
enrich in Ca2+-dependent manner through avidity or binding at an alternate site
[312, 313]. To separate these populations, we repeated the panning experiment in
the presence of a saturating concentration of a competitor peptide known to bind
at the site of interest (Fig 20B) [110]. This should lower enrichment of peptides
that bind at the site of interest, while allowing any adventitious interactions to
remain. By comparing the competitor and non-competitor pools, we can distinguish
between actual and adventitious binders.
We performed this experiment with and without competitor, in biological
duplicate, for hA5, hA6, ancA5/A6, and altAll. We found that phage enriched
strongly for all proteins relative to a biotin-only control (Fig 35 in supplement).
Further, the addition of competitor binding knocked down enrichment in all
samples (Fig 35 in supplement). After panning, we sequenced the resulting phage
pools, as well as the input library, by Illumina. We applied strict quality control
(see methods), discarding any peptide that exhibited less than six counts (Fig 36 in
supplement). After quality control, we had a total of 265 million reads spread over
17 samples (Table 10 in supplement).
We estimated changes in the frequencies of peptides between samples
with and without competitor peptide. For each peptide i, we determined
Ei = −ln(βi/αi), where βi and αi are the frequencies of the peptide in the
122
FIGURE 20 Set of binding peptides can be estimated using phage display. Rows
show two different experiments, done in parallel, for each protein. Biotinylated,
Ca2+-loaded, S100 is added to a population of phage either alone (row A) or with
saturating competitor peptide added in trans (row B). Phage that bind to the
protein (blue or purple) are pulled down using a streptavidin plate. Bound phage
are then eluted using EDTA, which disrupts the peptide binding interface. In the
absence of competitor (row A), phage bind adventitiously (purple) as well as at the
interface of interest (blue). In the presence of competitor (row B), only adventitious
binders are present.
non-competitor and competitor samples, respectively. Defined this way, a more
negative value of E corresponds to a larger decrease in peptide frequency upon
addition of competitor peptide. We used a clustering approach to estimate E for
≈40, 000 different peptides for each protein. We found that E exhibited a bimodal
distribution for all four proteins, apparently reflecting two underlying processes
(Fig 21A, Fig 37 in supplement). The dominant peak consists of “unresponsive”
peptides whose frequencies change little in response to competitor peptide. A
second, broader, distribution describes “responsive” peptides whose frequencies
change dramatically with the addition of competitor. There was no systematic
difference between estimates of E between biological replicates (Fig 21B, Fig 38
in supplement). For hA5, the regression line between replicates has a slope 1.06 and
an intercept of −0.05. This axis of variation explains ≈81% of the total variation
123
FIGURE 21 A subpopulation of the phage respond to the addition of competitor
peptide. A) Distribution of enrichment values for peptides taken from pooled
biological replicates of hA5. The measured distribution (gray) can be fit by the
sum of two Gaussian distributions: responsive (blue) and unresponsive (purple),
which sum to the total (yellow). B) Enrichment values from biological replicates
are strongly correlated. Axes are enrichment for replicate #1 or replicate #2.
Points are individual peptides. Distributions for each replicate are shown on the
top and right, respectively. The red dashed line is the best fit line (orthoganol
distance regression), explaining ≈81% of the variation in the data.
in the data. There are two distinct regions in the correlation plot, corresponding
to the unresponsive and responsive peptide distributions. The unresponsive
distribution forms a large cloud about zero. In contrast, the responsive peptide
distribution extends along the 1:1 line in a correlated fashion.
Supervised machine learning allows prediction of binding
Our phage display experiment yielded a collection of peptides whose
enrichment is disrupted by competitor, however, this information is not sufficient
to construct the desired Venn diagram. First, a Venn diagram requires knowing
the binding for a common set of peptides to all proteins. Because the total sets of
partners are large for all proteins, we observed different peptides in each experiment
(hA5 vs. hA6 vs. ancA5/A6 vs. altAll). Second, phage display is an imperfect
proxy for binding. It has confounding non-biological factors: peptides are in the
context of phage particles, the protein becomes immobilized by a biotin tag, there
124
is the possibility of avidity, and enrichment is determined by off-rate rather than
equilibrium.
To solve these issues, we sought to relate our measured E for these peptides
back to binding of a common set of peptides. We used supervised machine learning
to train models to predict binding from amino acid sequence for each protein. We
then applied each model to an identical set of peptides, allowing us to directly
compute a Venn diagram for peptide specificity.
We trained our models against 57 chemical features that we could could
readily calculate from an amino acid sequence. These included measures of
hydrophobicity, hydrogen bonding, geometry, secondary structure propensity, and
electrostatics. In addition to these specific features, we also defined 20 “meta”
features by taking the principle components of the entire aaindex database [314],
which reports 590 quantitative values for each of the 20 amino acids. For most
chemical features, we simply added the values for each amino acid in a sequence.
For example, we would sum up the number of hydrogen bond donors across the
sequence and treat that as a chemical feature. We also used CIDER to calculate
a few non-additive electrostatic features for each sequence [315], such as the
isoelectric point. A full list of the features we calculated is given in Table 11 (in
supplement).
We calculated these 57 features for the entire sequence and for all sliding
windows ranging from 1 to 11 amino acids (Fig 22A). This introduces neighbor-
neighbor correlation between features that improves model power. Overall, we
calculated the features for 78 sliding windows on each peptide, giving us a total
of 57 × 78 = 4, 446 features per sequence (Fig 22A). We then trained a random
forest regression model to predict E using the features of the ≈40, 000 we observed
125
for each protein. A random forest model finds weights for a collection of random
decision trees based on a set of input features [316]. Prior to training, we withheld
10% of the peptides as a test set. We then optimized nuisance parameters such
as the number of trees and choice of data weighting scheme using k-fold cross
validation within the training set (k = 10). After training, the R2 between our
model and the training set was ≈ 97% for all proteins (Table 2).
TABLE 2 Protein binding model statistics.
protein num. training observations R2train R
2
test AUC FPR FNR
hA5 40,887 97.6 85.1 98.9 0.35 0.35
hA6 42,156 97.4 82.9 96.1 0.41 0.41
ancA5/A6 43,938 97.7 84.2 97.4 0.35 0.35
altAll 51,903 96.6 80.0 95.1 0.45 0.15
After our final optimization, we tested our models against their test sets. R2
for test sets ranged from 80−87% (Fig 22B, Table 1). For all models, the regression
line reveals a slope slightly greater than one (e.g. 1.16 for hA5, Fig 22B). Further,
the scatter is nonrandom, with the most negative values of E being overestimated
and the most positive values underestimated. This makes intuitive sense, as the
best-of-the-best and the worst-of-the-worst enriching sequences likely depend
strongly on details not captured by our rather crude amino acid model.
To calculate a Venn diagram, we need to classify peptides as binders or non-
binders. We therefore tested how well our models would operate as classifiers.
To facilitate this comparison, we normalized E for each protein such that the
competitor peptide had an enrichment value of -1. We did this by Enorm =
E/|Ecomp|, where Ecomp is the enrichment of the competitor peptide. We then asked
whether our models could predict if peptides in the test set had measured Enorm <
−1. We then attempted to classify peptides into the categories Enorm < −1 vs.
Enorm ≥ −1.
126
FIGURE 22 Peptide binding can be predicted from amino acid sequence. A)
Schematic showing our strategy for training a binding model. We break the 12-
mer peptide into 78 different sliding windows. For each peptide, we calculate 57
features (black box), giving a total of 4,446 features per peptide. We then use
40,000 peptides to train a model predicting E (green arrows). B) Correlation
between predicted Enorm and measured Enorm for ≈4,000 peptides in test set for
hA5. Each point is a peptide. Red line is least squares regression line. Blue dashed
line is our classification line (see panel C). C) Receiver Operator Characteristic
(ROC) curves for binding models. Colored series show ability of models to classify
measured Enorm as ≤ −1 (the blue dashed line form panel B). Curves are hA5
(purple), hA6 (orange), ancA5/A6 (dark green), and altAll (light green). Black
line is the ROC curve for predicting the binding of 44 isolated peptides. D) Error
rates for predicting isolated peptides that bind as function of Enorm cutoff for the
classifier. False negative rate (red) and false positive rate (red) cross at Enorm =
−1.19 (dashed line) with a value of ≈0.35. Solid lines are fits of the modified Hill
equation to the to error rates.
127
We swept along cutoffs in predicted values of Enorm and calculated our false
positive and false negative rate using the measured values of Enorm for test-set
peptides. As expected, increasing the cutoff increased the false positive rate and
decreased the false negative rate for each model. We quantified this behavior with
Receiver Operator Characteristic (ROC) curves. A ROC curve is a plot of the true
positive rate against the false positive rate as one changes the classifier cutoff. A
perfect predictor will have a cutoff value where the false positive rate is 0 and the
true positive rate is 1. As a consequence, the Area Under the Curve (AUC) will
be 1.0. In contrast, a random predictor will follow the 1:1 line and will have an
AUC of 0.5. All of our models had steep ROC curves that gave AUC values from
0.95 to 0.99 (Fig 22C). Given the amino acid sequence of a 12-mer peptide, we
can therefore predict with high confidence whether a peptide will respond to the
addition of competitor peptide in a phage display experiment.
We next set out to calibrate our phage enrichment values against binding of
isolated peptides. We did this by calculating Enorm for 44 peptide/protein pairs
and then measuring their binding using Isothermal Titration Calorimetry (Table
12 in supplement). We used 17 peptides, some with known binding properties
[280, 275, 281], others that were in the freezer for other projects, and still others
were extracted from the human proteome as possible S100 targets. We measured
binding of 16 of these peptides to hA5, 13 to hA6, 8 to ancA5/A6, and 6 to altAll.
We classified any peptide with a measurable binding constant (KD 100 µM) as
“binding” and all others as “non-binding.”
We then swept along Enorm and attempted to classify the 44 measured
binders. The ROC curve came off the diagonal, with an AUC of 0.71 (Fig 22C).
While this is significantly worse than the predictions of Enorm < −1, it is not
128
unexpected given that we trained our model on phage display data and are now
attempting to use it to predict isolated peptide binding. To verify that this low
AUC curve indicated real binding signal, we simulated 44-observation ROC curves
using a random predictor. We found that the probability of observing an AUC of
0.71 or greater by chance was 0.007—a strong indication that there is signal in our
binding model.
To identify a cutoff for predicting binders, we plotted the false positive rate
and false negative rate against Enorm for all 44 peptides. We then identified the
value of Enorm that simultaneously minimized the false positive and false negative
rates. To estimate the crossover point, we fit the modified Hill equation to each
curve, which empirically captures the basic shape of these curves. We found that
these curves crossed for Enorm = −1.19, with false positive and false negative
rates of ≈0.35. These rates are high and therefore preclude confidently predicting
whether a specific peptide binds. This is, however, sufficient to determine a Venn
diagram for the binding specificity of these proteins.
Venn diagrams can be estimated using MCMC
We next used our trained and calibrated models to estimate the Venn
diagram describing the binding sets for the modern and ancestral proteins (Fig 19).
We applied our models for hA5, hA6, ancA5/A6, and altAll to a common collection
of 1,000,000 random 12-mer peptides, classifying any peptide with Enorm < −1.19
as binding. We then calculated the overlap between these sets, placing the counts
for each region of the Venn diagram into the vector ~Vobs.
Because we have high (and uncertain) false positive and false negative rates,
the counts in ~Vobs may not be identical to the real populations of the Venn diagram
129
(~V ). We therefore sampled over counts in ~V , as well as possible false positive and
false negative rates, using Bayesian Markov Chain Monte Carlo (MCMC). We
wrote a transition matrix T that maps ~V into ~Vobs (~Vobs = ~V · T). T defines the
probability of each class of miss-call given all false positive and false negative rates.
For example, one element in T encodes the probability that we mistakenly identify
a hA5-specific peptide as a hA6-specific peptide (e.g. the false negative rate for
hA5 times the false positive rate for hA6). The details of matrix construction are
given in the supplemental text.
We allowed each protein to have its own false positive and false negative error
rates. We set the prior probabilities for error rates by estimating the false positive
and false negative rate for binding to each protein at the cutoff of Enorm < −1.19
(Fig 39 in supplement). We then used MCMC to sample values of ~V and the error
rates, comparing the resulting vector to ~Vobs. We ran two samplers in parallel until
convergence (≈2 million steps each). This allowed us to estimate both the Venn
diagram and our uncertainty in its composition.
hA5 is more specific than hA6 or the ancestor
We found that the total size of each binding set ranged from 1.3% [0.9,1.8]
of peptides (for hA5) to 22.6% [21.8,22.9] of all peptides (for the altAll construct)
(Fig 23). The values in the brackets denote the 95% credibility region from the
posterior distribution. The large sizes of these sets likely reflects the low-specificity,
hydrophobic nature of the S100 binding interface [110].
We found that hA5 exhibits increased specificity relative to the ancestral
proteins, apparently evolving by a process of subfunctionalization. The hA5 peptide
set is a subset of the ancestral binding set (Fig 23). While ≈85% of peptides are
130
FIGURE 23 Changes in binding sets over time. Circles denote estimated binding
sets for hA5 (purple), hA6 (orange), or ancestors (green). Areas and numbers in
each region indicate the percent of random peptides in that region of the Venn
diagram. The left panel shows the maximum likelihood ancestor (ancA5/A6); the
right panel shows the altAll reconstruction.
shared with the ancestor, only ≈9% of peptides arose specifically for binding to
hA5. This result is robust to phylogenetic uncertainty, as very similar overlaps
are observed for ancA5/A6 (86.5% [83.9,95.2]) and the altAll construct (84.6%
[81.1,91.4]). The hA5 peptide set was also largely a subset of the hA6 set: 80.6%
[76.8,88.5] of hA5 peptides are also hA6 peptides. This demonstrates that, although
the protein has subfunctionalized from the ancestor, it has constricted onto a set of
peptides that mostly overlaps with the paralogous hA6 (Fig 23).
The results for hA6 depend on the reconstruction used for the ancestral state.
The hA6 binding set was much larger than hA5—consisting of 10.1% [9.2,11.7]
of peptides. This is is expanded relative to the ancA5/A6 set. While there is a
extensive overlap (37.4% [36.4,38.4]), most hA6 binding targets were acquired
after gene duplication (Fig 23). Fully 62.0% [61.1,63.1] of peptides are unique to
hA6. In this scenario, hA6 kept its ancestral partners, and then added a large
131
collection of new partners. However, this pattern varies when inferred using the
altAll construct. 82.9% [80.6,85.2] of hA6 peptides are shared with the altAll
ancestor (Fig 23). Under this alternate scenario, hA6 binding specificity is the
result of both subfunctionalization and the acquisition of new binding partners
distinct from the ancestor (neofunctionalization). Subfunctionalization in this
scenario appears to have occcurred to a far lesser extent than in the paralogous
hA5 lineage, with the hA6 binding set representing only a minor constriction of the
ancestral set. However, the maximum likelihood scenario is strongly supported over
the alternative one. Furthermore, the overall distribution of enrichment values for
the altAll construct was systematically lower than that for any other protein. This
result suggests that the alternate construction may actually have compromised—
rather than simply distinct—activity.
The alternate reconstruction is a very agressive attempt to incorporate
phylogenetic uncertainty. In this case, the altAll protein differs at 21 sites from
the maximum-likelihood ancA5/A6 and has a much lower likelihood. The results
for hA5 are not changed between these estimates, but the interpretation of hA6
does differ. Thus it is difficult to conclude the exact nature of the transition
that occurred along the hA6 lineage. Nonetheless, it is clear that the patterns
observed along the hA5 and hA6 lineages are distinctly different. The transitions
in specificity that occurred following gene duplication are more nuanced than a
simple partitioning of ancestral binding partners across desecendant proteins.
Discussion
Previously, we used a low-throughput approach to characterize the evolution
of the biochemical specificity of S100A5 and S100A6 [110]. We observed a strong
132
signal of phylogenetic conservation in the pattern of peptide binding specificity. For
a small set of peptides, the pattern of specificity is diagnostic for each clade. The
small sample of binding targets suggested a pattern in which the ancestral binding
partners were partitioned along descendant lineages to yield more specifiic derived
proteins. This result was consistent with many previous low-throughput studies
showing patterns of subfunctionilization following gene duplication [292, 58, 60,
310, 119, 291, 55, 293, 141, 311, 273, 110].
Re-evaluating the empirical support for the increasing specificity hypothesis
In this study we used an unbiased, quantitative phage display experiment
to reveal a more subtle pattern. We observed increased specificity on one lineage:
the set of peptides recognized by hA5 shrunk dramatically and consists almost
entirely of peptides drawn from the ancestral set of peptides. In contrast, hA6
appears to have actually shifted and most likely expanded its set of peptides,
although we were unable to resolve these two scenarios given the difference
between the maximum likelihood and alternate ancestral reconstructions. The
single pattern, when viewed with higher resolution, resolves into two distinct
patterns. Our observations suggest that the empirical support for the increasing
specificity hypothesis is rather weak. Patterns of specificity may evolve following
gene duplication in much more complex ways than previously thought. Our results
reveal how problematic inferring specificity from a small set of targets can be.
Despite apparent subfunctionalization in low-throughput experiments, hA5 and
hA6 exhibit opposite patterns of specificity when probed using a high-throughput
approach. hA5 gained specificity, binding to a small subset of the ancestral
sequences. hA6, in contrast, lost specificity, acquiring entirely new binding targets.
133
These findings do not refute the increasing specificity hypothesis, but rather help
us to understand the nature of the inference. A small, biased set of targets is
insufficient to infer “absolute” changes in specificity.
To date, there is still insufficient evidence to support or refute the hypothesis
of increasing specificity over long times scales [47]. Our results suggest that
unbiased high-throughput studies should be used to provide the necessary
statistical power needed to address this question. The approach presented here
will be broadly useful for addressing these questions systematically. Nonetheless,
the key limitation of our results for inferring such global patterns in the evolution
of specificity is the shallow time-scale of S100 evolution.S100A5 and S100A6 arose
via gene duplication in the amniote ancestor ≈320 million years ago [91, 96, 110].
This time-scale is far shorter than that on which we might expect to observe the
effects of global trends that permeate all of life. Thus, this study has instead tested
what happens after a more recent gene duplication, presumably independent of
global trends that have been proposed [138, 76, 47]. Currently, no high-throughput
studies of very deep ancestral protein-protein interactions have been conducted.
Several previous studies have targeted very old (>3 BYA) ancestral enzymes and
observed apparent promiscuity-to-specificity transitions. However, these studies
were on enzymes that are already exquisitely specificity compared to sloppy S100
protein-protein interactions. Furthermore, other evolutionary scenarios—such
as neofunctionlization and transitions through promiscuous intermediates—are
known to occur in some systems [53, 79, 143, 57, 317]. To directly address the
hypothesis of increasing specificity, high-throughput studies should be performed
that trace specificity in sequentially deeper ancestral proteins stretching backward
134
in evolutionary time. Such studies would provide a direct test of the hypothesis
that proteins have undergone long time-scale promiscuity-to-specificity trends.
Our results display a very strong signal for subfunctionalization along
the hA5 lineage. This observation suggests that some evolutionary patterns of
specificity may be consistent with hypothetical expectations even in proteins
with very low biochemical specificity. However, we have only characterized one
gene duplication event, and even here have observed subtlety in the patterns
of specificity. Ultimately, more proteins from a diversity of families should be
studied using similar methods. Applying unbiased high-throughput screens to
diverse proteins with diverse functions will further clarify the generality of our
observations. Furthermore, empirical studies will help to build improved theoretical
models of proteins with evolving specificity. The role of constraints such as
pleiotropy, epistasis, and architectural properties of protein-protein interaction
networks can be determined.
Quantifying the evolution of specificity informs S100 biology and biochemistry
The large sets for each protein likely reflect the hydrophobic nature of the
hA5 and hA6 binding interfaces. Previously we showed that the binding of one A5-
specific peptide was driven primarily by the hydrophobic effect [110]. The binding
set of hA6 may be larger than that of hA5 due to its extended binding surface
relative to other S100 proteins [280]. This larger extende surface may allow it to
accommodate a larger number of register-shifted peptides. Rather than only using
the canonical interface, peptides can wrap around the protein and bind into an
extended groove. This may explain both its broader specificity and the acquisition
of targets not observed in the ancestral protein.
135
Interestingly, the scope of these binding set sizes mirrors the tissue
distributions of the two proteins. In mammals, S100A5 has an extremely narrow
tissue distribution, being found primarily in the olfactory bulb and olfactory
sensory neurons [282, 283, 284]. In contrast, S100A6 is expressed ubiquitously.
This is counterintuitive if one starts with the “parsing environment” perspective, as
S100A6 has broader specificity even while experiencing more diverse environments.
This also suggests that S100A5 has become specialized for a subset of biological
targets.
It remains unknown whether the hA5 and hA6 binding sets are shared among
modern orthologs, or whether these sets have fluctuated relative to one another.
We previously found strong evidence for conservation of specificity—for a small set
of peptides—in orthologs across amniote species. This results suggested an overall
conservation of biochemical specificity in the S100s. However, as noted above there
is insufficient sampling in the low throughput experiments to distinguish differences
in the gross specificity of the proteins. Thus, the high-throuhgput approach used in
this study would need to be applied to sets of orthologs to quantitatively determine
the degree to which patterns of specificity were conserved as the lineages diverged.
Future directions
The current analysis was specifically designed to probe targets that may
not be realized biologically. Even very weak binders can act as starting points
for future evolutionary optimization; therefore, our inclusive approach is the right
one for addressing how biochemical specificity evolved in this system. However,
this approach does leave an important questions unanswered; how do these results
translate across a range of binding affinities? How does the inference of binding
136
sets change if we restrict our analysis to only the highest affinity binders? This
cannot be answered given our data, as the correlation between enrichment in the
phage display experiment and binding affinity is too noisy to allow this to be done
rigorously. There are several ways the current work can be extended and developed.
An improved method with decreased noise in the esimate of affinities would
allow these questions to be quantitatively answered. The scope of binding sets—
as a function of stratified binding affinity—could be traced through time within
and across lienages. Sampling multiple ancestors along a lineage, would allow us to
detect patterns that occur as proteins evolved specificity for new binding partners.
Would we observe continuous trends, akin to a gradually shrinking Venn diagram?
Or would we instead instead observed random fluctuations, as the Venn diagram
dilates between more and less specific nodes? Would these patterns be sensitive to
the architectural constraints of the chose protein system?
Finally, similar methods could be utilized to study the evolution of other
types of binding interactions. High-throuhgput “bind-and’seq” assays have
previously been applied to study DNA and RNA binding specificity of extant
proteins [295, 298, 318, 319]. High-throughput mass spectrometry has been used to
characterize the enzymatic specificity of venom proteases [320]. One could envision
a high-throughput enzyme assay to measure activity against a diverse, unbiased
set of small molecule substrates. These approaches could be coupled to ASR—
analagous to what we have done here—and used to probe a broad range of protein-
target interactions.
137
Implications for protein engineering
Protein egineers seek to design proteins with specific functions; a goal that is
often acheived by both rational design and directed evolution [321, 322, 323, 324,
325]. Protein engineers have proposed using ancestral proteins as starting points for
engineering, as they may be less specific—and therefore be more generic starting
points for an engineering protocol [60]. Thus characterizing global evolutionary
trends in specificity—if they exist—would be potentially aid engineering efforts
that seek to use ancestral starting points. The ability to build a quantitative
picture of how specificity evolves in diverse protein systems would provide a
framework for understanding and engineering protein binding properties. By
applying unbiased high-throughput approaches such as ours can we understand
patterns in biochemical specificity. The ability to control this feature rationally
would be a boon to protein engineers.
Conclusions
Our work provides direct evidence for a transition in which an ancestral
set of binding partners was partitioned along a derived lineage. With respect to
S100A5, the ancestor would indeed be a better engineering starting point—and
presumably evolutionary starting point—given its lower overall speficity. The
work also cautions against interpreting low-throughput data as evidence for such a
change, as the pattern observed along the S100A6 lineage does not cleanly conform
to the promiscuity-to-specificity concept. This protein appears to have expanded
or shifted its binding set. Overall, the work presented here has allowed us to
quantititavely characterize the subtleties of evolutionary transitions in specificity,
reavealing a more nuanced picture than expected from simple hypotheses.
138
Materials and Methods
Molecular cloning, expression and purification in of S100 proteins
Proteins were expressed in a pET28/30 vector containing an N-terminal His
tag with a TEV protease cleavage site (Millipore). For each protein, expression
was carried out in Rosetta E.coli (DE3) pLysS cells. 1.5 L cultures were inoculated
at a 1:100 ratio with saturated overnight culture. E.coli were grown to high log-
phase (OD600 ≈0.8–1.0) with 250 rpm shaking at 37 ◦C. Cultures were induced
by addition of 1 mM IPTG along with 0.2% glucose overnight at 16 ◦C. Cultures
were centrifuged and the cell pellets were frozen at 20 ◦C and stored for up to 2
months. Lysis of the cells was carried out via sonication on ice in 25 mM Tris, 100
mM NaCl, 25 mM imidazole, pH 7.4. The initial purification step was performed
at 4 ◦C using a 5 mL HiTrap Ni-affinity column (GE Health Science) on an
kta PrimePlus FPLC (GE Health Science). Proteins were eluted using a 25 mL
gradient from 25-500 mM imidazole in a background buffer of 25 mM Tris, 100mM
NaCl, pH 7.4. Peak fractions were pooled and incubated overnight at 4 ◦C with
≈1:5 TEV protease (produced in the lab). TEV protease removes the N-terminal
His-tag from the protein and leaves a small Ser-Asn sequence N-terminal to the
wildtype starting methionine. Next hydrophobic interaction chromatography (HIC)
was used to purify the S100s from remaining bacterial proteins and the added TEV
protease. Proteins were passed over a 5 mL HiTrap phenyl-sepharose column (GE
Health Science). Due to the Ca2+-dependent exposure of a hydrophobic binding,
the S100 proteins proteins adhere to the column only in the presence of Ca2+.
Proteins were pre-saturated with 2mM Ca2+ before loading on the column and
139
eluted with a 30mL gradient from 0 mM to 5 mM EDTA in 25 mM Tris, 100 mM
NaCl, pH 7.4.
Peak fractions were pooled and dialyzed against 4 L of 25 mM Tris, 100
mM NaCl, pH 7.4 buffer overnight at 4 ◦C to remove excess EDTA. The proteins
were then passed once more over the 5 mL HiTrap Ni-affinity column (GE Health
Science) to removed any uncleaved His-tagged protein. The cleaved protein was
collected in the flow-through. Finally, protein purity was examined by SDS-
PAGE. If any trace contaminants appeared to be present we performed anion
chromatography with a 5 mL HiTrap DEAE column (GE). Proteins were eluted
with a 50 mL gradient from 0-500 mM NaCl in 25 mM Tris, pH 7.4 buffer. Pure
proteins were dialyzed overnight against 2L of 25 mM TES (or Tris), 100 mM
NaCl, pH 7.4, containing 2 g Chelex-100 resin (BioRad) to remove divalent metals.
After the final purification step, the purity of proteins products was assessed by
SDS PAGE and MALDI-TOF mass spectrometry to be >95%. Final protein
products were flash frozen, dropwise, in liquid nitrogen and stored at −80 ◦C.
Protein yields were typically on the order of 25 mg/1.5 L of culture.
Isothermal Titration Calorimetry
For all peptides, we attempted to measure binding at 25 ◦C. ITC experiments
were performed in 25 mM TES, 100mM NaCl, 2 mM CaCl2, 1mM TCEP, pH
7.4. Samples were equilibrated and degassed by centrifugation at 18, 000xg at
the experimental temperature for 35 minutes. Peptides were dissolved directly
into the experimental buffer prior to each experiment. All experiments were
performed at on a MicroCal ITC-200. Gain settings were determined on a case-
by-case basis to ensured quality data. A 750 rpm syringe stir speed was used for
140
all experiments. Spacing between injections ranged from 300s-900s depending
on gain settings and relaxation time of the binding process. These setting were
optimized for each binding interaction that was measured. A single-site binding
model was fit to the titration data using the Bayesian MCMC fitter in pytc
(https://github.com/harmslab/pytc). For each protein/peptide combination, one
clean ITC trace was used to fit the binding model. Negative results were double-
checked to ensure accuracy.
Preparation of biotinylated proteins for phage display
A mutant version of hA5 with a single N-terminal Cys residues were
generated via site-directed mutagenesis using the QuikChange lightning system
(Agilent). The Cys was introduced in the Ser-Asn tag leftover from TEV protease
cleavage as Ser-Asn-Cys. The proteins were expressed and purified as described
in the previous section. A small amount of the purified proteins were biotinylated
using the EZ-link BMCC-biotin system (ThermoFisher Scientific). ≈1 mg BMCC-
biotin was dissolved directly in 100% DMSO to a concentration of 8 mM for
labeling. Proteins were exchanged into 25mM phosphate, 100mM NaCl, pH 7.4
using a Nap-25 desalting column (GE Health Science) and degassed for 30 min
at 25 ◦C using a vacuum pump (Malvern Instruments). While stirring at room
temperature, 8mM BMCC-biotin was added dropwise to a final 10X molar excess.
Reaction tubes were sealed with PARAFILM (Bemis) and the maleimide-thiol
reactions were allowed to proceed for 1 hour at room temperature with stirring.
The reactions were then transferred to 4◦C and incubated with stirring overnight
to allow completion of the reaction. Excess BMCC-biotin was removed from the
labeled proteins by exchanging again over a Nap-25 column (GE Health Science),
141
and subsequently a series of 3 concentration-wash steps on a NanoSep 3K spin
column (Pall corporation), into the Ca-TeBST loading loading buffer. Complete
labeling was confirmed by MALDI-TOF mass spectrometry by observing the
≈540Da shift in the protein peak. Final stocks of labeled proteins were prepared
at 10 µM by dilution into the loading buffer.
Phage display
Phage display experiments were performed using the PhD-12 peptide phage
display kit (NEB). All steps involving the pipetting of phage-containing samples
was done using filter tips (Rainin). We prepared 100 µL samples containing phage
(5.5 × 1011 PFU) and 0.01 µM biotin-protein (or biotin alone in the negative
control) and 20 µM peptide competitor (in competitor samples) were prepared
at room temperature in a background of Ca2+-TeBST loading buffer (50mM TES,
100mM NaCl, 2mM CaCl2, 0.01% Tween-20, pH 7.4) to ensure Ca
2+-saturation of
the S100 proteinss. For the experiments including the use of a peptide competitor,
the peptide was included at 20 µM in the loading buffer. Samples were incubated
at room temperature for 2hr. Each sample was then applied to one well of a 96-
well high-capacity streptavidin plate (previously blocked using PhD-12 kit blocking
buffer and washed 6X with 150 µL loading buffer). Samples were incubated on
the plate with gentle shaking for 20min. 1 µL of 10 mM biotin (NEB) was then
added to each sample on the plate and incubated for an additional five minutes
to compete away purely biotin-dependent interactions. Samples were then pulled
from the plate carefully by pipetting and discarded. Each well was washed 5X with
200 L of loading buffer by applying the solution to the well and then immediately
pulling off by pipetting. Finally, 100 L of EDTA-TeBST elution buffer (50mM
142
TES, 100mM NaCl, 5mM EDTA, 0.01% Tween-20, pH 7.4) was applied to each
well and the plate was incubated with gentle shaking for 1hr at room temperature
to elute. Eluates were pulled from the plate carefully by pipetting and stored at
4◦C. Eluates were titered to quantify eluted phage as follows. Serial dilutions of
the eluates from 1 : 10–1 : 105 were prepared in LB medium. These were used to
inoculate 200 L aliquots of mid-log-phase ER2738 E. coli (NEB) by adding 10 L to
each. Each 200 L aliquot was then mixed with 3mL of pre-melted top agar, applied
to a LB agar XGAL/IPTG (Rx Biosciences) plate, and allowed to cool. The plates
were incubated overnight at 37◦C to allow formation of plaques. The next morning,
blue plaques were counted and used to calculate PFU/mL phage concentration.
Enrichment was calculated as a ratio of experimental samples to the biotin-only
negative control.
To generate the pre-conditioned phage library the nave library was first
screened in duplicate against each of the four proteins as descrived above. Each
of these lineages was subsequently amplified in ER2738 E. coli (NEB) as follows.
20mL 1:100 dilutions of an ER2738 overnight culture were prepared. Each 20mL
culture was inoculated with one entire sample of remaining phage eluate. The
cultures were incubated at 37◦C with shaking for 4.5 hours to allow phage growth.
Bacteria were then removed by centrifugation and the top 80% of the culture was
removed carefully with a filtered serological pipette and transferred to a fresh
tube containing 1/6 volume of PEG/NaCl (20% w/v PEG-8000, 2.5M NaCl).
Samples were incubated overnight at 4◦C to precipitate phage. Precipitated
phage were isolated by centrifugation and subsequently purified by an additional
PEG/NaCl precipitation on ice for 1hr. These individually amplified pools were
then resuspended in 200 L each of terile loading buffer and mixed together to
143
form a pre-conditioned library in order to minimize the impact of sampling on the
subsequent panning experiment. The pool was diluted 1:1 with 100% glycerol and
stored at −20◦C for use in the final panning experiments.
Preparation of deep sequencing libraries
Phage genomic ssDNA was isolated from leftover amplified eluates from
each round of panning using the M13 spin kit (Qiagen). Products were stored
in low TE buffer. These ssDNA were used as the template for 2 replicate PCRs
with the Cs1 forward (5′-acactgacgacatggttctacagtggtacctttctattctcactct-3′)
and PhD96seq-Cs2 reverse (5′-tacggtagcagagacttggtctccctcatagttagcgtaacg-3′)
primers. Products were isolated from these PCR products using the GeneJet gel
extraction kit (Thermo Scientific) and pooled. The pooled products were then used
as templates for a secondary reaction with the barcoded primers. Products were
isolated from these final PCRs using the GeneJet gel extraction kit. Concentration
of barcoded samples was measured by A260/A280 using a 1mm cuvette on an
Eppendorf biospectrometer. Multiplexing was done by mixing samples according
to mass. The concentration of the multiplexed library was corrected using qPCR
with the P5 and P7 Illumina flow-cell primers. The library was then diluted to a
final concentration of 10nM and Illumina sequenced on two lanes of a HiSeq 4000
instrument, using the Cs1 F’ as the R1 sequencing primer. The lanes were spiked
with 20% PhiX control DNA due to the relatively low diversity of the library.
Phage display analysis pipeline
We performed quality control on three read features. First, we verified
that the sequence had exactly the anticipated length from the start of the phage
144
sequence through the stop codon. Second, we only took sequences in which
the invariant phage sequence differed by at most one base from the anticipated
sequence. This allows for a single point mutation and or sequencing errors, but
not wholesale changes in the sequence. Finally, we took only reads with an average
phred score better than 15. The vast majority of the reads that failed our quality
control did not have the variable region, representing reversion to phage with a
wildtype-like coat protein. This analysis is encoded in the hops count.py script,
which takes a gzipped fastq file as input and returns the counts for every peptide
in the file. Before our main analysis, we discarded any peptide that had fewer than
6 reads associated with it (see Table 10 in supplement). In total, 74.0% of reads
passed our quality control and read cutoff.
We clustered peptides using our own implementation of the DBSCAN
algorithm [326] using the Damerau-Levensthein distance [327]. The main parameter
for DBSCAN clustering is ε—the neighborhood cutoff. Clusters are defined as
sequences that can be reached through a series of ε-step moves. We found that
ε = 1 gave the best results for our downstream machine learning analysis. Our
whole enrichment pipeline—including clustering—can be run given a peptide count
file for the non-competitor experiment and a peptide-count file for the competitor
experiment using the hops enrich.py script.
We implemented our machine learning model in Python 3 extended with
numpy [328], scipy [329], and matplotlib [330]. We used sklearn for our random
forest regression [331, 316, 332]. A full list of the calculated features is shown
in Table 11 (in supplement). As noted, some features were calculated using
CIDER [315]. Our full implementation, including all data files, is available at
https://github.com/harmslab/hops.
145
Identifying the read count cutoff
One critical question is at what point the number of reads correlates with
the frequency of a peptide. If we set the cutoff too low, we incorporate noise
into downstream analyses. If we set the cutoff too high, we remove valuable
observations from our dataset. To identify an appropriate cutoff, we studied the
mapping between ci (the number of reads arising from peptide i) and fi (the actual
frequency of peptide i in the experiments). Our goal was to find P (fi|ci, N): the
probability peptide i is at fi given we observe it ci times in N counts. Using Bayes
theorem, we can write
P (fi|ci, N) = P (ci|fi, N)P (fi)
P (ci)
,
where N is the total number of reads. We calculated P (ci|fi, N) assuming a
binomial sampling process: what is the probability of observing exactly c counts
given N independent samples when a population with a peptide frequency fi?
This gives the curve seen in Fig 36A (in supplement). We then estimated ˆP (fi)
from the distribution of frequencies in the input library, constructing a histogram
of apparent peptide frequencies (Fig 36B in supplement). Empirically, we found
that frequencies followed an exponential distribution over the measurable range
of frequencies. Finally, we assumed that all counts have equal prior probabilities,
turning P (ci) into a scalar that normalizes the integral of P (fi|ci, N) so it sums to
1.
Using the information from Fig 36A and B (in supplement), we could then
calculate P (fi|ci, N) for any number of reads in an experiment N . Fig 36C (in
supplement) shows this calculation for N = 2.0 × 107 reads—a typical number
146
of reads from our experimental replicates. This curve is linear above 6 reads. Below
this, counts no longer correlates linearly with frequency, as it is possible to obtain
5 reads random sampling from low frequency library members. We therefore used a
cutoff of 6 counts for all downstream analyses.
Measuring enrichment values
We next set out to measure changes in the frequency of peptides between the
competitor and non-competitor samples. The simplest way to do this would be to
identify peptides seen in both experiments, and then measure how their frequencies
change between conditions. Unfortunately, these proteins all bind a wide swath of
peptide targets and relatively few peptides were shared between conditions. This
approach would thus exclude the majority of sequences. For example, only 8,672 of
the 112,681 unique peptides observed for hA5 were present in both the competitor
and non-competitor, even after pooling biological replicates. Worse, because we
are interested in peptides that are lost when competitor peptide is added, ignoring
peptides with no counts in the competitor sample means ignoring some of the most
informative peptides.
To solve this problem, we clustered similar peptides and measured enrichment
for peptide clusters rather than individual peptides. We extracted all peptides
that were observed across the competitor and non-competitor samples for a given
protein. We then used DBSCAN to cluster those peptides according to sequence
similarity, as measured by their their Damerau-Levenshtein distance [327, 326].
This revealed extensive structure in our data. For example, hA5 yielded 8,645
clusters with more than one peptide, incorporating more than half of the unique
peptides (Fig 21A, Fig 38A in supplement). We chose clustering parameters
147
that led to highly similar peptides within each cluster, as can be seen by the
representative sequence logos for three clusters of hA5 (Fig 38B in supplement).
Sequences that were not placed in clusters were treated as clusters with a size of
one.
We then used the enrichment of each cluster to estimate the enrichment of
individual peptides. We defined enrichment as:
Ecluster = −ln
(∑i≤N
i=1 βi∑i≤N
i=1 αi
)
, (6.1)
where N is the total number of peptides in the cluster, βi is the frequency of
peptide i in the competitor sample, and αi is the frequency of peptide i in the
non-competitor sample. We then made the approximation that all members of the
cluster have the same enrichment:
Ei ≈ Ecluster, (6.2)
allowing us to estimate the enrichment of all i peptides in the cluster (Fig 38C in
supplement). Peptides lost because of competition for the interface will add zeros
to the numerator of Eq. 6.1, leading to an overall decrease in enrichment. Peptides
missed because of finite sampling will add zeros evenly to the competitor and non-
competitor samples, leading to no net enrichment.
We tested this cluster-based approximation using the 8,672 peptides of
hA5 for which we could directly calculate enrichment (that is, those peptides
seen in both the competitor and non-competitor experiments). We calculated
the enrichment of each peptide individually and compared these values to those
obtained by the cluster method. There is no systematic difference in the values
148
estimated using the two methods, and the linear model explains 98.4% of the
variation between the two methods.
Principle Component Analysis
To generate the aaindex meta features, we performed a principle component
analysis on all 590 features from the aaindex database. Any missing value was
assigned the mean value of that feature. Prior to performing the PCA, we
standardized all values to a mean of zero and a standard deviation of 1. This
yielded 20 principle components.
Incorporating uncertainty into an estimate of a Venn diagram
We used a Bayesian approach to estimate the overlaps between the binding
sets of proteins, despite high false positive and false negative rates. Consider a set
of peptides binding to the proteins A and B. The binding of these peptides can be
described by a Venn diagram with four regions: [A∪B]c (peptides that bind neither
A nor B), A\B (peptides that bind A alone), B\A (peptides that bind B alone),
and A ∩ B (peptides that bind both A and B). The number of peptides in each
region is given by ~V , while the number of peptides observed in each region is given
by ~Vobs. ~V and ~Vobs can differ as there may be both false positives (at rates mA
and mB) and false negatives (at rates nA and nB). We can write a row-stochastic
matrix that describes the probability of observing a peptide in a region given its
actual region as:
149
T =

P ([A∪B]c|[A∪B]c) P (A\B|[A∪B]c) P (B\A|[A∪B]c) P (A∩B|[A∪B]c)
P ([A∪B]c|A\B) P (A\B|A\B) P (B\A|A\B) P (A∩B|A\B)
P ([A∪B]c|B\A) P (A\B|B\A) P (B\A|B\A) P (A∩B|B\A)
P ([A∪B]c|A∩B) P (A\B|A∩B) P (B\A|A∩B) P (A∩B|A∩B)

where each conditional probability P (X|Y ) describes the probability of observing the
peptide in region X given it is actually in region Y . If we know this matrix and we know
the real population in each region, we can calculate ~Vobs by:
~Vobs = ~V ·T.
We can construct T using the false positive and false negative rates for binding to
protein A or B. For example, the probability of seeing a peptide that binds to A alone
when it actually does not bind to either A or B would be
P (A\B|[A ∪B]c) = mA −mAmB :
the probability of a false positive for A less the probability of a false positive for both A
and B. Using appropriate combinations of false positive and false negative rates, we can
calculate every value in T:
T =

1−(mA+mB−mAmB) mA−mAmB mB−mAmB mAmB
nA−nAmB 1−(nA+mB−nAmB) nAmB mB−nAmB
nB−mAnB mAnB 1−(mA+nB−mAnB) mA−mAnB
nAnB nB−nAnB nA−nAnB 1−(nA+nB−nAnB)

This can be readily extended to any number of proteins with any number of possible
overlaps.
150
We can then estimate ~V using Bayesian Markov Chain Monte Carlo (MCCE). We
first write a likelihood function:
ln
[
P (~Vobs|~V , {m}, {n})
]
= −1
2
∑
i
[
(~Vobs,i − ~ViT)2/σ2i + ln(σ2i )
]
where i indexes regions in the Venn diagram, σ2i is the uncertainty of the counts in region
i, {m} is the set of false positive rates and {n} is the set of false negative rates. We can
then sample values in ~V , {m} and {n} by MCCE. For ~V , we used the prior:
ln
[
P (~V )
]
=

−∞ ~V < 0
0 ~V ≥ 0
,
thus requiring all regions to have positive counts. We also constrained the number of
counts in ~V be within 5% of the number of counts in ~Vobs (N):
ln[P (~V )] =

−∞ ∑ ~V < 0.95N
0 0.95N ≤∑ ~V ≤ 1.05N
−∞ ∑ ~V > 1.05N
For every false positive or false negative rate (denoted as rj), we used the prior:
ln [P (rj)] =

−∞ rj < 0
− (rj−µˆj)2
2σ2j
+
√
2piσ2j 0 ≤ rj ≤ 1,
−∞ rj > 1
where µˆj is the estimate of the value of rj from our binding experiments and σj was set
to 0.2. For values outside of 0 and 1, the log prior is −∞, enforcing bounds on these
parameters.
151
Bridge to Chapter VII
In this chapter a novel experimental pipeline was developed to quantify protein
binding specificity in an unbiased fashion. Peptide phage display and high-throughput
sequencing were coupled to trace the meanderings of peptide binding specificity in
a clade formed by two S100s—S100A5 and S100A6—resultant from an ancient gene
duplication. Vast datasets of peptides bound by the proteins were generated and the
data were then analyzed using an advanced machine learning approach. The results
of this pipeline demonstrate that the bulk of explanatory power that allows peptide
binding partners to be predicted comes from overall biochemical features of the peptide
targets. This is in stark contrast to studies done previously on more specific protein
families, wherein sequence-properties are sufficient to predict targets. Estimating the
total sets of binding partners recognized by S100A5, S100A6, and ancA5/A6 (the
resurrected ancestor of the two clades) reveals how total sets of potential binding
partners and the biochemical determinants of these sets evolved. An interesting historical
evolutionary pattern is uncovered, wherein diversification appears to have happened
on both protein lineages following gene duplication. However, the S100A5 lineage
experiences subfunctionilzation of binding partners relative to the ancestral protein,
while S100A6 appears to have shifted it’s specificity laterally. This result demonstrates
that diversification of biochemical phenotypes can be nuanced and complex when viewed
from an unbiased, global perspective. It further highlights a disagreement between
low-specificity and high-specificity methods for measuring changes in specificity and
introduced a broadly-useful technique for addressing this issue. Furthermore, this chapter
emphasizes that experiments with low-specificity proteins—such as the S100s—should
focus on understanding how global biochemical features underly binding preferences,
rather than attempting to focus on sequence motifs. In chapter VII, the results of the
entire dissertation are summarized and the implications are discussed. The broader
152
contributions to the field of evolutionary biochemistry are highlighted. The limitations
of the results are also discussed and future directions are outlined that expand on the
work presented here.
153
CHAPTER VII
SUMMARY AND CONCLUDING REMARKS
Contributions to the field of evolutionary biochemistry
The tools of evolutionary biochemistry have become an important approach to
understanding molecular evolution. This field provides the basis to determine how the
biochemical properties of organisms—at the level of proteins—shape the evolution of new
phenotypes. A broad array of biologically-relevant molecular-level changes have been
characterized, revealing the importance of biochemical constraints in contributing to
evolutionary changes in signalling [52], metabolism [53], nutrient transport [55], and many
other aspects of biology. In many cases these changes required biochemical modifications
that altered the recognized binding partners of proteins [58, 53, 272, 273, 60, 290, 292].
Despite substantial progress in understaning the molecular basis of evolutionary
alterations, there are still a vast number of unsanwered questions. One limitation
of previous studies has been to focus on proteins with very specific sets of binding
partners. Exquisite specificity is important for biology. However, many proteins exhibit
the ability to interact with highly-diverse binding partners. These proteins are also
critical for many biological processes, but previous work has shied away from using such
proteins as models. The biological roles and relevance of biochemically-defined sets of
binding partners are less obvious in these cases than in more canonical examples of
proteins with tight specificity. This dissertation focused on helping to close the gap in
understanding how proteins with highly-variable binding partners and binding sites evolve
new biochemical specificity.
The work presented in this dissertation represents a contribution to the field of
evolutionary biochemistry. It makes important contributions to understanding the
evolution of proteins with diverse biochemical features, which differ from many other
154
model protein systems in the variability of their binding partners. The S100 proteins
proved to be a useful model for probing the evolution of binding specificity. Tracing the
evolution of of both metal ions and small peptide binding by the S100s revealed several
key observations. 1) A biochemical output can be conserved over evolutionary time
despite extensive amino acid turnover in binding sites. 2) Proteins with low biochemical
specificity can nonetheless be subject to evolutionary constraints that maintain a given
specificity profile. 3) The evolutionary patterns that follow gene duplications in low-
specicificty proteins are similar to those observed in high-specificity proteins. 4) Following
gene duplication the duplicate lineages can undergo differential changes in specificity, such
as subfunctionalization on one lineage and neofunctionalizatin on the other.
Limitations and future directions
The work in this dissertation has contributed to understanding how proteins with
highly-variable binding partners evolve new biochemical specificity. The studies presented
here are the first to address this problem in a systematic way. It paves the way for future
studies to improve upon methods, model systems, and connect what has been learned to
more nuanced questions regarding the evolution of binding specificity.
Nonetheless, there are limitations to this work that should be considered. For
example, the evolution of metal binding was traced across the entire S100 protein family.
This work revealed two key observations: 1) the overall biochemical output of transition
metal can be maintained despite extreme lability of metal-binding sites, and 2) there
is substantial variation in the structural output of metal binding. These observations
suggested that there has been some degree of biochemical specilization in the response
to metal binding that could potentially have direct biological consequences. The known
roles of transition metal binding in S100s range from antimicrobial sequestration metals
[105] to metal-chaperoning as part of a signalling pathway [172]. It seems unlikely that
a biochemical ouput conserved across the family has been maintained for hundreds of
155
millions of years in the absence of a biological role. However, the roles of transition metal
binding remain unknown for most S100 proteins [170, 171, 96]. The work presented here
remains merely suggestive of the biological relevance of this biochemical behavior. The
downstream output of the S100 biochemistry was not characterized in its natural cellular
environment. Future work should thus focus on understanding what role transition metal
binding is playing in the relevant physiological context. Furthermore, although this work
revealed the extreme variability of binding site ligands and locations, it nonetheless leaves
open the question of exactly which ligands are used throughout the family. For example,
the Cu2+ binding ligands of S100A5 still remain unknown. Future studies should this also
seek to map out the transition metal binding sites of S100s. Eliminating binding sites
in vivo via site-directed mutagenesis will further allow their biology of transition metal
binding to be probed.
With regard to the studies of peptide binding specificity presented in this
dissertation, an unbiased approach was used to trace the evolutionary history of peptide
binding specificity in S100A5 and S100A6 following gene duplication. This method
allowed the total scope of binding partners to be estimate for each of the duplicate
lineages. Combining the method with ASR further allowed the evolutionary dimension
to be directly assessed, revealing the historical patterns of changes in specificity that
occurred following duplication. The patterns, when compared to those observed via
a more traditional low-throughput method, highlighted the importance of using such
a global, unbiased approach. The low-throughput method provided valuable, gold-
standard information that clearly demonstrates conservation of specificity profiles within
paralogous lineages. However, it lacked the resolution to characterize how the size
and divsersity of binding sets had changed during evolution. What appeared to be a
symmetric pattern of subfunctionilzation from a less-specific ancestor on both the S100A5
and S100A6 lineages was revealed by the high-throughput approach to be far more
nuanced. S100A5 did in fact undergo subfunctionlization, but S100A6 actually underwent
156
a shift in specificity. In fact, it is possible that the scope of binding partners for S100A6
actually increased over evolutionary time. This result clearly shows the superiority of
using an unbiased high-throughput approach to chacterize the evolution of specificity,
which is further accentuated by the very low specificity of proteins like the S100s.
The unbiased nature of the approach used to measure specificity is the key
strength of the method. However, this is also one of the key limiations. The very fact
that the approach utilized a random set of peptide targets divorces it from direct
biological implications. The low-throughput study revealed strong conservation of binding
specificity profiles in duplicate lineage. This results strongly argues that there is indeed
biological relevance for the biochemical specificity of these proteins, because it seems
very unlikely that these profiles would be maintained purely by chance over 320 million
years of evolution without some sort of selection to maintain the set of binding patners.
This argument is particularly strong considering that specificiy can be readily altered in
these proteins by a single amino acid substitution. However, the biological implications
of biochemical specificity in S100A5 and S100A6 were not directly assessed and remain
firmly in the realm of speculation. This limitation opens up three key questions that
future studies should seek to address. 1) How does the biologically-realized set of binding
partners compare to the scope of all possible partners defined by biochemistry? 2) What
are the biological forces that winnow the possible set of partners to the realized set? 3)
What is the biological output of the biochemical specificity that has been conserved
for so long? Answering these questions will provide a more complete picture of how
specificity evolves in the S100s, the forces that shape it, and the biological implications
for evolutionary changes in specificity.
157
APPENDIX A
SUPPLEMENTAL MATERIAL FOR CHAPTER III
Supplemental Figures
This section includes the supplemental figures referenced in chapter III. Other
supplemental files such as spreadsheets, newick trees, and multiple sequence alignments
are included in the chapter 3 sub–directory of the zipped supplemental directory
submitted with this dissertation.
158
FIGURE 24 Sequence logo indicates relative frequency of amino acids at each
position in the alignment. Taller letters indicate higher frequency at that position.
Arrows indicate 13 key residues we used to verify/anchor the alignment.
159
FIGURE 25 Tree is a majority rule consensus tree, with all nodes with posterior
probabilities <50% collapsed into polytomies. Wedges are collapsed clades of
shared orthologs, with wedge height denoting number of included taxa and wedge
length denoting longest branch length with the clade. Support values are posterior
probabilities. Rooting is arbitrary given the poor resolution at the base of the
taxonomic tree. Icons indicate taxonomic classes represented within each clade:
tunicates (black sea squirt), jawless fishes (pink lamprey), cartilaginous fishes
(purple ray), ray-finned fishes (light blue fish), lobe-finned fishes (blue coelacanth),
amphibans (green frog), birds/reptiles (yellow lizard), and mammals (red mouse).
Inset shows estimated divergence times for each taxonomic class in millions of years
before present.
160
FIGURE 26 Each panel is a single human paralog, indicated by the name on the
graph. Color of fit indicates metal used as titrant: Zn2+ (gray) or Cu2+ (copper).
Top sub-panel for each panel is a raw power vs. time curve. Bottom sub-panel for
each panel is integrated heat versus molar ratio. The model fit is denoted by the
heavy line through the fit points.
161
FIGURE 27 Curves are far-UV CD spectra (mean molar ellipticity vs. wavelength).
Colors represent metal: apo (black), Zn2+ (gray), and Ca2+ (blue). Paralog is
indicated to the right of each spectrum.
162
FIGURE 28 A) ITC trace for binding of Ca2+. B) ITC trace for binding of Zn2+.
C) Far-UV CD spectra for tunA in apo form (black), presence of Ca2+ (blue) and
presence of Zn2+ (gray). D) Intrinsic fluorescence spectra for tunA with conditions
as in panel C. E-H) ESI-MS spectra for tunA, titrating from 10 µM to 0.01 µM
protein. Icons indicate species (monomer or dimer). Numbers indicate charge
state. Dimer is lost preferentially during dilution, suggesting it is an artifact of
electrospray process.
163
FIGURE 29 tunB mass spectra at concentrations of a) 10 µM, b) 1 µM, c) 0.1
µM, and d) 0.01 µM demonstrate that tunB homodimers are robust to dilution,
indicating that this is a specific interaction. Homotetramer is observed only in the
most concentrated sample, thus homotetramer signal likely arises from non-specific
interactions during the electrospray process.
164
FIGURE 30 Graph shows the distribution of sedimentation coefficient determined
for tunA (black) and tunB (blue). The apparent mass of the homodimer peaks are
indicated above each peak, with the mass expected from the amino acid sequence of
the protein in parentheses.
165
APPENDIX B
SUPPLEMENTAL MATERIAL FOR CHAPTER IV
Supplemental Figures
This section includes the supplemental figures referenced in chapter IV.
166
FIGURE 31 Raw data corresponding to integrated heats in figure 11. a) hA5
binding Cu2+ , b) Ca 2+ loaded hA5 binding Cu 2+ , c) hA5 binding Ca2+ , and
d) Cu2+—loaded hA5 binding Ca2+ .
167
APPENDIX C
SUPPLEMENTAL MATERIAL FOR CHAPTER V
Supplemental Figures
This section includes the supplemental figures referenced in chapter V. Other
supplemental files such as spreadsheets, newick trees, and multiple sequence alignments
are included in the chapter 5 sub–directory of the zipped supplemental directory
submitted with this dissertation.
TABLE 3 Binding of 12-mer phage display peptides does not depend on solubilizing
flanks. List of phage display consensus peptides used in the study. The sequences
of flank variants of A5cons and A6cons are shown. Flanks are indicated by lower-
case letters. The third column shows dissociation constants for peptides binding to
hA5 with 95% credibility regions from Bayesian fits of one ITC dataset per variant.
Flank variants bind with similar KD.
Peptide Name Amino Acid Sequence KD(µM)
A5cons (variant 1) rshsSSFQDWLLSRLPgggsae 4.9 ≤ 6.1 ≤ 7.8
A5cons (variant 2) ----SSFQDWLLSRLP-ggsae 1.1 ≤ 2.8 ≤ 7.9
A5cons (variant 3) rshsSSFQDWLLSRLP------ 7.2 ≤ 9.6 ≤ 13.1
A6cons (variant 1) rshsGFDWRWGMEALTgggsae 0.3 ≤ 0.9 ≤ 2.4
A6cons (variant 2) ----GFDWRWGMEALT-ggsae 1.5 ≤ 2.5 ≤ 4.0
168
FIGURE 32 Randomer phage enrichment is dependent on Ca2+ and protein. Bar
graphs show the plaque forming units (PFU) for phage solutions after the third
round of enrichment for screens using hA5 (A) or hA6 (B). For each round of
panning, we incubated phage with biotinylated protein, pulled down bound phage
via a streptavidin plate, and finally eluted the phage from the protein with an
elution buffer. To verify that binding occurred in a Ca2+-dependent manner, we
compared Ca+-loading/EDTA-elution to EDTA-loading/EDTA-elution. We also
performed a Ca2+-loading/EDTA-elution experiment using biotin alone. Insets
show sequence logos (WebLogo) generated from 20 plaque sequences from each
Ca2+/EDTA panning experiment. The most frequent residue at each position was
used to generate the A5cons and A6cons peptides.
169
FIGURE 33 ITC traces show baseline-corrected titration of various peptides onto
S100 proteins in the presence of 2 mM Ca2+. All experiments were done with
≈ 100 µM protein in 25 mM TES, 100 mM NaCl, 1 mM TCEP at pH 7.4,
25 ◦C.CD spectra are mapped onto a diagram of the S100A5-S100A6 clade. Curves
are spectra of apo (gray) and Ca2+bound (orange/purple) proteins. The S100A5
proteins (purple) are characterized by a deep alpha-helical signal at 222nm that
substantially increases in response to binding of Ca2+. S100A6 proteins (orange)
show comparatively minimal response and maintain a deeper peak at 208nm. These
patterns hold for the ancestors at the base of each clade. The spectra of ancA5/A6
and the ancA5/A6 altAll version (both shown in green) resemble that of an extant
S100A6, indicating that the large Ca2+-driven conformational change seen in the
extant S100A5s is a derived feature of this lineage.
170
FIGURE 34 Far UV CD spectra are diagnostic for the S100A5 and S100A6 clades.
CD spectra are mapped onto a diagram of the S100A5-S100A6 clade. Curves
are spectra of apo (gray) and Ca2+bound (orange/purple) proteins. The S100A5
proteins (purple) are characterized by a deep alpha-helical signal at 222nm that
substantially increases in response to binding of Ca2+. S100A6 proteins (orange)
show comparatively minimal response and maintain a deeper peak at 208nm. These
patterns hold for the ancestors at the base of each clade. The spectra of ancA5/A6
and the ancA5/A6 altAll version (both shown in green) resemble that of an extant
S100A6, indicating that the large Ca2+-driven conformational change seen in the
extant S100A5s is a derived feature of this lineage.
171
TABLE 4 Thermodynamic parameters for binding of the peptide
rshsGFDWRWAMEALTggsae (A6cons) to S100A5 and S100A6 proteins.Species
abbreviations are “alli” (alligator), “gal” (chicken), “sar” (tasmanian devil),
“m” (mouse), and “h” (human). Fit parameters, with standard deviation from
fits, for to the data shown schematically in Fig 4A. Parameters are for a single-
site binding model. “NA” indicates that there was no detectable binding. We
floated the fraction competent parameter to capture uncertainty in peptide and
protein concentration, particularly given the low extinction coefficients of S100A5
and S100A6. If an experiment was done at both 10 and 25 ◦C, the parameters
correspond to the 10 ◦C experiment.
protein KA (M
−1) ∆H◦ (kcal/mol) fx comp. num reps T (◦C)
ancA5/A6 8.30e5± 1.9e5 −12.20± 1.2 0.70± 0.02 2 25
altAll 1.10e5± 5.2e4 −8.70± 0.6 0.90± 0.07 2 25
ancA5 7.70e5± 2.1e5 −3.30± 0.8 0.80± 0.07 2 25
alliA5 4.40e5± 6.8e4 −10.60± 0.7 0.70± 0.01 2 25
sarA5 2.50e5± 1.6e5 −5.90± 2.5 0.90± 0.13 2 25
mA5 2.10e5± 5.4e4 −11.70± 2.7 1.10± 0.05 2 25
hA5 4.10e5± 9.8e4 8.50± 1.5 1.00± 0.02 2 25
ancA6 2.80e5± 1.9e5 −6.40± 2.7 1.10± 0.14 2 25
alliA6 9.50e4± 4.7e4 −10.40± 3.7 0.60± 0.09 2 25
gA6 4.20e5± 2.0e5 −8.10± 2.0 0.70± 0.06 2 25
sarA6 1.40e5± 6.7e4 −6.20± 1.9 0.80± 0.10 2 25
mA6 2.60e5± 1.2e5 −6.40± 1.5 0.60± 0.05 2 25
hA6 2.00e5± 4.8e4 9.60± 1.4 0.80± 0.02 2 25
hA4 2.80e6± 6.5e6 −1.80± 0.5 0.60± 0.04 2 25
172
TABLE 5 Thermodynamic parameters for binding of the peptide
rshsSSFQDWLLSRLPgggsae (A5cons) to S100A5 and S100A6 proteins.Species
abbreviations are “alli” (alligator), “gal” (chicken), “sar” (tasmanian devil),
“m” (mouse), and “h” (human). Fit parameters, with standard deviation from
fits, for to the data shown schematically in Fig 4A. Parameters are for a single-
site binding model. “NA” indicates that there was no detectable binding. We
floated the fraction competent parameter to capture uncertainty in peptide and
protein concentration, particularly given the low extinction coefficients of S100A5
and S100A6. If an experiment was done at both 10 and 25 ◦C, the parameters
correspond to the 10 ◦C experiment.
protein KA (M
−1) ∆H◦ (kcal/mol) fx comp. num reps T (◦C)
ancA5/A6 9.30e4± 3.0e4 −5.20± 1.6 1.40± 0.07 2 25
altAll 4.70e4± 2.2e4 −3.90± 1.3 1.30± 0.19 2 25
ancA5 1.30e5± 3.6e4 −6.90± 1.3 0.90± 0.05 2 10, 25
alliA5 2.30e4± 3.8e3 13.80± 2.4 1.10± 0.07 2 10, 25
sarA5 2.10e5± 1.5e5 −4.80± 1.9 0.70± 0.1 2 25
mA5 4.70e4± 1.9e4 −6.90± 2.1 0.60± 0.08 2 25
hA5 3.60e5± 2.1e5 −5.70± 1.7 0.80± 0.06 2 25
ancA6 NA NA NA 2 10, 25
alliA6 NA NA NA 2 25
gA6 NA NA NA 2 25
sarA6 NA NA NA 2 10, 25
mA6 NA NA NA 2 25
hA6 NA NA NA 2 25
hA4 1.70e4± 5.1e3 −4.10± 0.8 0.90± 0.3 2 25
173
TABLE 6 Thermodynamic parameters for binding of the peptide
RRLLFYKYVYKR (NCX1) to S100A5 and S100A6 proteins.Species abbreviations
are “alli” (alligator), “gal” (chicken), “sar” (tasmanian devil), “m” (mouse), and
“h” (human). Fit parameters, with standard deviation from fits, for to the data
shown schematically in Fig 4A. Parameters are for a single-site binding model.
“NA” indicates that there was no detectable binding. We floated the fraction
competent parameter to capture uncertainty in peptide and protein concentration,
particularly given the low extinction coefficients of S100A5 and S100A6. If an
experiment was done at both 10 and 25 ◦C, the parameters correspond to the
10 ◦C experiment. (*) Data from ancA5 binding to NCX1 were difficult to fit. The
binding curves for this interaction had shallow curvature and did not appear to
reach baseline saturation even with higher titrant/titrate molar ratio, leading to the
high fraction competent.
protein KA (M
−1) ∆H◦ (kcal/mol) fx comp. num reps T (◦C)
ancA5/A6 3.3e4± 7.6e3 −1.70± 0.3 0.60± 0.05 2 25
altAll 2.3e4± 8.3e3 −3.80± 1.2 0.60± 0.05 2 25
ancA5* 1.98e5± 1.7e5 −0.68± 0.4 2.90± 0.20 2 10
alliA5 5.80e3± 1.6e3 −7.20± 2.0 0.90± 0.26 2 10, 25
sarA5 2.50e4± 1.7e4 −2.80± 1.3 0.70± 0.20 2 25
mA5 1.20e5± 1.7e5 −1.30± 0.5 0.80± 0.20 2 25
hA5 5.50e4± 1.3e4 −3.60± 0.9 1.40± 0.10 2 25
ancA6 NA NA NA 2 10, 25
alliA6 4.60e4± 3.3e4 −2.50± 0.2 0.70± 0.15 2 10, 25
gA6 1.10e5± 1.7e4 3.40± 0.6 1.70± 0.05 2 25
sarA6 1.30e4± 5.8e3 −4.30± 1.8 0.90± 0.30 2 25
mA6 NA NA NA 2 25
hA6 NA NA NA 2 25
hA4 NA NA NA 2 25
174
TABLE 7 Thermodynamic parameters for binding of the peptide
SEGLMNVLKKIYEDG (SIP) to S100A5 and S100A6 proteins.Species
abbreviations are “alli” (alligator), “gal” (chicken), “sar” (tasmanian devil),
“m” (mouse), and “h” (human). Fit parameters, with standard deviation from
fits, for to the data shown schematically in Fig 4A. Parameters are for a single-
site binding model. “NA” indicates that there was no detectable binding. We
floated the fraction competent parameter to capture uncertainty in peptide and
protein concentration, particularly given the low extinction coefficients of S100A5
and S100A6. If an experiment was done at both 10 and 25 ◦C, the parameters
correspond to the 10 ◦C experiment.
protein KA (M
−1) ∆H◦ (kcal/mol) fx comp. num reps T (◦C)
ancA5/A6 1.30e4± 1.7e3 −8.10± 0.9 1.50± 0.01 2 25
altAll 2.40e4± 1.2e4 −1.50± 0.5 1.30± 0.20 2 25
ancA5 NA NA NA 2 25
alliA5 NA NA NA 2 10, 25
sarA5 NA NA NA 2 25
mA5 NA NA NA 2 25
hA5 NA NA NA 2 25
ancA6 3.90e4± 3.0e2 3.80± 0.3 1.20± 0.02 2 10, 25
alliA6 3.00e4± 9.3e3 3.20± 0.7 1.40± 0.09 2 25
gA6 5.80e4± 8.9e3 4.00± 0.4 1.90± 0.04 2 25
sarA6 3.30e5± 1.5e5 0.90± 0.1 2.00± 0.02 2 25
mA6 3.90e5± 2.8e5 0.50± 0.3 1.80± 0.02 2 10, 25
hA6 3.80e4± 5.7e3 2.90± 0.3 1.50± 0.03 2 15, 25
hA4 NA NA NA 2 25
TABLE 8 Thermodynamic parameters for binding of the A5cons and SIP peptides
to hA5 ancestral reversion mutants.Table entries show 95% credibility region from
the posterior distribution of each parameter. Parameters are for a single-site
binding model. “NA” parameters indicate that there was no detectable binding.
All experiments were done at 25 ◦C.
protein peptide KA (×105 M−1) ∆H◦ (kcal/mol) fx comp.
hA5 A5cons
hA5 SIP NA NA NA
hA5/E2a A5cons 1.2 ≤ 1.3 ≤ 1.4 −5.2 ≤ −5.1 ≤ −4.88 0.87 ≤ 0.90 ≤ 0.93
hA5/E2a SIP NA NA NA
L44i A5cons 1.6 ≤ 1.7 ≤ 1.9 −5.2 ≤ −5.1 ≤ −4.88 0.87 ≤ 0.90 ≤ 0.93
L44i SIP NA NA NA
D54k A5cons 3.2 ≤ 3.5 ≤ 3.7 −5.1 ≤ −5.0 ≤ −4.9 1.15 ≤ 1.17 ≤ 1.19
D54k SIP NA NA NA
M78a A5cons 1.0 ≤ 1.1 ≤ 1.2 −3.5 ≤ −3.3 ≤ −3.1 1.24 ≤ 1.28 ≤ 1.32
M78a SIP NA NA NA
A83m A5cons 2.6 ≤ 2.8 ≤ 3.0 −5.5 ≤ −5.5 ≤ −5.3 1.52 ≤ 1.53 ≤ 1.55
A83m SIP 0.4 ≤ 0.6 ≤ 1.0 −0.8 ≤ −0.6 ≤ −0.4 0.99 ≤ 1.22 ≤ 1.54
175
TABLE 9 Accession numbers of S100 proteins used to build the multiple sequence
alignment.
paralog accession species
A1 F1R758 Danio rerio
A1 A5WW32 Danio rerio
A1 H2TQM5 Takifugu rubripes
A1 H2ST19 Takifugu rubripes
A1 H2L492 Oryzias latipes
A1 H2M1B8 Oryzias latipes
A1 G3NKS0 Gasterosteus aculeatus
A1 G3PEI0 Gasterosteus aculeatus
A2 P29034 Homo sapiens
A2 F6Q7Q8 Ornithorhynchus anatinus
A2 P10462 Bos taurus
A2 G3W672 Sarcophilus harrisii
A2 JH205580.1 Pelodiscus sinensis
A3 P33764 Homo sapiens
A3 P62818 Mus musculus
A3 A4FUH7 Bos taurus
A3 G3W5T7 Sarcophilus harrisii
A3 F6SL13 Monodelphis domestica
A3 F6Q7S6 Ornithorhynchus anatinus
A3 JH205580.1 Pelodiscus sinensis
A4 P35466 Bos saurus
A4 predicted* Crocodylus porosus
A4 P26447 Homo sapiens
A4 H0Z1G5 Taeniopygia guttata
A4 P07091 Mus musculus
A4 F6SKU1 Monodelphis domestica
A4 F6Q7T6 Ornithorhynchus anatinus
A4 XP 015743713.1 Python bivittatus
A4 JH205580.1 Pelodiscus sinensis
A4 G3W5H2 Sarcophilus harrisii
A4 H9H0S2 Meleagris gallopavo
A5 P33763 Homo sapiens
A5 P63084 Mus musculus
A5 E1B8S0 Bos taurus
A5 G3W581 Sarcophilus harrisii
A5 XP 019412310.1 Crocodylus porosus
A5 JH205580.1 Pelodiscus sinensis
A6 P06703 Homo sapiens
A6 P14069 Mus musculus
A6 F6SKR4 Monodelphis domesitica
A6 F6R394 Ornithorhynchus anatinus
A6 G3W4S8 Sarcophilus harrisii
A6 H9H0S3 Meleagris gallopavo
A6 XP 019412316.1 Crocodylus porosus
A6 Q98953 Gallus gallus
A6 EOB07085.1 Anas platyrhynchos
A6 XP 015284753.1 Gekko japonicus
A6 XP 007429160.1 Python bivittatus
A6 JH205580.1 Pelodiscus sinensis
176
APPENDIX D
SUPPLEMENTAL MATERIAL FOR CHAPTER VI
Supplemental Figures
This section includes the supplemental figures and tables referenced in chapter VI.
TABLE 10 Number of sequencing reads for each sample. Sample, and whether or
not competitor was added, are indicated on the right. Columns show biological
replicates 1 or 2. “total” columns indicate reads returned by the Illumina software
pipeline. “good” columns indicate reads that passed our quality control and were
used to calculate enrichment values.
rep1 rep2
sample competitor total good total good
hA5 - 24,794,016 19,695,958 29,085,203 16,773,567
hA5 + 15,053,706 11,523,991 17,631,137 13,612,463
hA6 - 22,728,393 17,722,779 7,769,003 5,972,295
hA6 + 13,953,466 11,004,701 23,026,469 18,128,759
ancA5/A6 - 23,690,810 18,387,038 14,534,333 11,034,524
ancA5/A6 + 19,441,043 15,053,276 18,030,887 14,217,877
altAll - 34,565,905 18,387,038 17,975,086 13,343,678
altAll + 17,091,918 13,111,649 19,703,343 15,300,950
raw library 39,700,991 32,190,368 — —
177
FIGURE 35 Phage enrichment is reduced in the presence of competitor peptide.
Figure shows eluted plaque forming units (PFU) (estimated from phage titer) for
two biological replicates of each condition. Enrichment is shown for biotin-only
control (gray), hA5 (purple), hA6 (orange), ancA5/A6 (dark green), and ancA5/A6
atlAll (light green) with (+) and without (-) competitor peptide. Error bars show
standard error for two biological replicates. (*) hA6 without competitor is shown
for only one replicate due to failure of the titer for the other replicate.
FIGURE 36 We can identify the number of counts that reliably reports on
frequency in a sequenced phage pool. A) Using binomial sampling, we can calculate
the probability of observing exactly ci counts in N samples from that has a peptide
of actual frequency fi. Figure shows curves for counts ranging from 1 (red) to 1,000
(pink), all using N = 2.0×107. B) Panel shows a histogram of frequencies estimated
from 3.9×107 reads taken from the input library. The black points are experimental
data. The red curve is an exponential distribution fit to that curve. C) Using
the sampling from panel A and the fit curve from panel B, we can determine
P (fi|ci, N). The solid curve shows the relationship between the number of reads
for peptide i (x-axis) against the maximum-likelihood estimate of the frequency
(y-axis). The red line highlights the cutoff we used in our experiments.
178
FIGURE 37 Enrichment distributions for all proteins. Panels show distribution of
E for each protein (pooled bio-replicates). Points are raw histograms. Curves are
two Gaussian fit: blue (responsive), purple (unresponsive) and yellow (sum).
179
FIGURE 38 We can estimate how addition of competitor peptide alters the
frequencies of peptides. A) Distribution of sizes of peptide clusters from hA5
experiment. Pie chart shows number of peptides placed in clusters (56,003; 53.8%)
versus not (48,032; 46.2%). B) Three example clusters taken from the clusters in
panel A. The letter height at each position indicates its frequency in the sequences
within that cluster. C) Toy example showing how enrichment is calculated for
a cluster containing peptides {A,B,C,D}. Peptides A and C were observed in
the no competitor sample at frequencies αA and αC . Peptides A, B, and D were
observed in the competitor sample at frequencies βA, βB and βD. The enrichment
of the cluster is given by Eclust = −ln[(βA + βB + βD)/(αA + αC)]. All members
of the cluster are then assigned E ≈ Eclust. D) Comparison of enrichment values
for hA5 peptides determined using a direct comparison of frequencies with and
without competitor (x-axis) versus the clustering method (y-axis). Each point is an
individual peptide. Red line is a least-squares regression line fit to the data. The
dashed line is the 1:1 line.
180
FIGURE 39 Estimating the error rates for individual models. Panels show
individual models: A) hA5, B) hA6, C) ancA5/A6, and D) altAll. For each panel,
the left graph shows the error rate for peptide binding as a function of the cutoff
in Enorm chosen for classification. Lines are fits of the modified Hill equation to
the error rates. Colors indicate the false negative rate (red) and false positive rate
(black). The dashed vertical line indicates the cutoff used for prediction of the
Venn diagrams in Fig 6 (Enorm = -1.19). The error rates associated with Enorm =
-1.19 are indicated with arrows pointing right. The distributions in each panel show
the prior distributions used for the false negative (red) and false positive (black)
error rates in the Bayesian estimator. These distributions are centered at the error
rate estimate, with standard deviations of 0.2.
181
TABLE 11 Features used in for supervised machine learning. Features
denoted (CIDER) were calculated using the CIDER software package [315].
Other features were calculated using our own software package (HOPS:
https://github.com/harmslab/hops).
feature ref
num. hbond acceptors —
num. hbond donors —
κ (CIDER) [315]
∆ (CIDER) [315]
Ω (CIDER) [315]
FER (CIDER) [315]
Σ (CIDER) [315]
dmax (CIDER) [315]
∆max [315]
NCPR [315]
F+(CIDER) [315]
F−(CIDER) [315]
FCR (CIDER) [315]
mean hydropathy (CIDER) [315]
White Interface scale [333]
Engleman scale [334]
% buried in structures [335]
Kyte/Doolittle scale [336]
Octanol scale [337]
Hopp-Woods scale [338]
Uversky scale [315]
cumulative mean hydropathy [315]
side chain accessible area [304]
main chain accessible area [304]
Chou-Fasman, β [339]
Chou-Fasman, α [339]
Chou-Fasman, turn [339]
fraction poly-proline II [315]
predicted charge at pH 4 —
predicted charge at pH 5 —
predicted charge at pH 6 —
predicted charge at pH 7 —
predicted charge at pH 8 —
predicted charge at pH 9 —
num. positive amino acids —
num. neutral amino acids —
num. negative amino acids —
net charge —
isoelectric point —
knob main chain, b [340]
socket main chain, x [340]
socket main chain, y [340]
socket main chain, h [340]
knob side chain, b [340]
socket side chain, x [340]
socket side chain, y [340]
socket side chain, h [340]
side chain volume [341]
molecular weight —
aromatic —
182
TABLE 12 Predicted E and measured binding constants for peptides. Columns
indicate calculated E and measured KD for the peptides indicated on the left. For
KD, an entry of “> 100” indicates that we performed an ITC experiment, but that
no binding was detectable better than ≈100 µM . An entry of “—” indicates no
experiment was performed.
hA5 hA6 aA5A6 altAll
name sequence E KD E KD E KD E KD
Q86UW7 AGSSQRAPPAPTREGRRD -4.03 >100 0.26 — -0.77 — 0.41 —
An1 AMVSEFLKQAWFIE -1.71 >100 -1.68 13 -1.73 — -0.37 —
O75170 DAPGAGAPPAPGKKEAPP -3.94 >100 0.50 >100 -0.74 — 0.78 >100
p3 DWSSWVYRDTQTGGSAE -1.26 >100 -1.10 >100 -1.28 — -0.11 —
p1 EPSPVSMNEGTFGGSAE -0.27 — -0.54 10 -0.34 >100 -0.45 —
Q13424 GAGGERWQRVLLSLAEDT -4.45 3 -1.11 — -1.98 — -0.10 —
B2RNZ0 KEIKTAMWRLFVKIYFLQK -3.53 >100 -2.72 >100 -2.38 >100 -1.36 >100
p6 QPELTQGRVGINGGGSAE -1.02 >100 0.30 — -0.79 — -0.08 —
NCX1 RRLLFYKYVYKR -1.20 18 -1.33 >100 -1.40 30 -0.59 47
A6cons RSHSGFDWRWGMEALTGGGSAE -1.97 3 -0.84 5 -1.12 1 -0.14 9
A5cons RSHSSSFQDWLLSRLPGGGSAE -2.75 3 -0.81 >100 -2.05 11 -0.32 24
Q14147 SEDDRAGPAPPGASDGVD -3.88 >100 0.71 >100 -1.20 — 0.47 —
SIP SEGLMNVLKKIYEDG -0.60 >100 -1.05 26 -0.64 77 -0.32 42
p4 SIGASELHVYRSGGSAE -0.76 >100 -0.21 >100 -0.18 >100 -0.40 —
p7 STTVRNGESPNCGGSAE -0.45 >100 -0.84 — -0.24 — 0.16 —
p5 STVHEILSKLSEGY -0.19 >100 -0.85 >100 -0.13 — 0.07 —
p2 TAKYLPMRPGPLGGGSAE -1.79 >100 0.20 >100 -1.04 >100 0.25 >100
183
REFERENCES CITED
[1] Robert A. Zierenberg, Michael W. W. Adams, and Alissa J. Arp. Life in extreme
environments: Hydrothermal vents. Proceedings of the National Academy of
Sciences, 97(24):12961–12962, November 2000. ISSN 0027-8424, 1091-6490.
[2] L. C. Bliss. Adaptations of Arctic and Alpine Plants to Environmental
Conditions. Arctic, 15(2):117–144, 1962. ISSN 0004-0843.
[3] Kyle Summers and Mark E. Clough. The evolution of coloration and toxicity in
the poison frog family (Dendrobatidae). Proceedings of the National Academy
of Sciences of the United States of America, 98(11):6227–6232, May 2001.
ISSN 0027-8424.
[4] Charles Darwin. On the origin of species by means of natural selection, or, The
preservation of favoured races in the struggle for life /, volume -1859. London
:John Murray,, 1859.
[5] Madeline C. Weiss, Filipa L. Sousa, Natalia Mrnjavac, Sinje Neukirchen, Mayo
Roettger, Shijulal Nelson-Sathi, and William F. Martin. The physiology and
habitat of the last universal common ancestor. Nature Microbiology, 1(9):
nmicrobiol2016116, July 2016. ISSN 2058-5276.
[6] Nicolas Glansdorff, Ying Xu, and Bernard Labedan. The Last Universal
Common Ancestor: emergence, constitution and genetic legacy of an elusive
forerunner. Biology Direct, 3:29, July 2008. ISSN 1745-6150.
[7] Nick Lane, John F. Allen, and William Martin. How did LUCA make a living?
Chemiosmosis in the origin of life. BioEssays, 32(4):271–280, April 2010.
ISSN 1521-1878.
[8] Richard C. Lewontin. The genetic basis of evolutionary change [by] R. C.
Lewontin. Columbia University Press New York, 1974. ISBN 0-231-03392-3
0-231-08318-1.
[9] F. Jacob. Evolution and tinkering. Science, 196(4295):1161–1166, June 1977.
ISSN 0036-8075, 1095-9203.
[10] C. R. Woese, O. Kandler, and M. L. Wheelis. Towards a natural system of
organisms: proposal for the domains Archaea, Bacteria, and Eucarya.
Proceedings of the National Academy of Sciences, 87(12):4576–4579, June
1990. ISSN 0027-8424, 1091-6490.
[11] Masatoshi Nei and Jianzhi Zhang. Molecular Origin of Species. Science, 282
(5393):1428–1429, November 1998. ISSN 0036-8075, 1095-9203.
184
[12] J.H. Gillespie. Population Genetics: A Concise Guide. A Johns Hopkins
Paperback: Science. Johns Hopkins University Press, 1998. ISBN
978-0-8018-5754-6.
[13] John H. Gillespie. Molecular Evolution Over the Mutational Landscape.
Evolution, 38(5):1116–1129, 1984. ISSN 0014-3820.
[14] Sir Fisher, Ronald Aylmer. The genetical theory of natural selection.
OxfordClarendon Press.
[15] Sewall Wright. Evolution in Mendelian Populations. Genetics, 16(2):97–159,
March 1931. ISSN 0016-6731.
[16] Walter Fontana and Peter Schuster. Shaping Space: the Possible and the
Attainable in RNA Genotypephenotype Mapping. Journal of Theoretical
Biology, 194(4):491–515, October 1998. ISSN 0022-5193.
[17] Peter F. Stadler and Brbel M. R. Stadler. Genotype-Phenotype Maps.
Biological Theory, 1(3):268–279, September 2006. ISSN 1555-5542, 1555-5550.
[18] John H. Gillespie. A simple stochastic gene substitution model. Theoretical
Population Biology, 23(2):202–215, April 1983. ISSN 0040-5809.
[19] H. Allen Orr. The population genetics of adaptation: the adaptation of DNA
sequences. Evolution; International Journal of Organic Evolution, 56(7):
1317–1330, July 2002. ISSN 0014-3820.
[20] Vanda T. K. McNIVEN, Hlne LeVASSEUR-VIENS, Rachelle L. Kanippayoor,
Meghan Laturney, and Amanda J. Moehring. The genetic basis of evolution,
adaptation and speciation. Molecular Ecology, 20(24):5119–5122, December
2011. ISSN 1365-294X.
[21] R. Abbott, D. Albach, S. Ansell, J. W. Arntzen, S. J. E. Baird, N. Bierne,
J. Boughman, A. Brelsford, C. A. Buerkle, R. Buggs, R. K. Butlin,
U. Dieckmann, F. Eroukhmanoff, A. Grill, S. H. Cahan, J. S. Hermansen,
G. Hewitt, A. G. Hudson, C. Jiggins, J. Jones, B. Keller, T. Marczewski,
J. Mallet, P. Martinez-Rodriguez, M. Mst, S. Mullen, R. Nichols, A. W.
Nolte, C. Parisod, K. Pfennig, A. M. Rice, M. G. Ritchie, B. Seifert, C. M.
Smadja, R. Stelkens, J. M. Szymura, R. Vinl, J. B. W. Wolf, and D. Zinner.
Hybridization and speciation. Journal of Evolutionary Biology, 26(2):229–246,
February 2013. ISSN 1420-9101.
[22] S. Blair Hedges, Julie Marin, Michael Suleski, Madeline Paymer, and Sudhir
Kumar. Tree of Life Reveals Clock-Like Speciation and Diversification.
Molecular Biology and Evolution, 32(4):835–845, April 2015. ISSN 0737-4038,
1537-1719.
185
[23] Scott P. Egan, Gregory J. Ragland, Lauren Assour, Thomas H.Q. Powell,
Glen R. Hood, Scott Emrich, Patrik Nosil, and Jeffrey L. Feder. Experimental
evidence of genome-wide impact of ecological selection during early stages of
speciation-with-gene-flow. Ecology Letters, 18(8):817–825, August 2015. ISSN
1461-0248.
[24] Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Protein Function. 2002.
[25] D. Whitford. Proteins: Structure and Function. Wiley, 2005. ISBN
978-0-470-01241-3.
[26] Motoo Kimura and James F. Crow. The Number of Alleles That Can Be
Maintained in a Finite Population. Genetics, 49(4):725–738, April 1964. ISSN
0016-6731, 1943-2631.
[27] James F. Crow. The Mathematics of Heredity. Gustave Malcot. Translated
from the French edition (Paris, 1948), revised, and edited by Demetrios M.
Yermanos. Freeman, San Francisco, 1969. xx + 92 pp., illus. $4. Science, 168
(3932):721–721, May 1970. ISSN 0036-8075, 1095-9203.
[28] M. Nei. Relative Roles of Mutation and Selection in the Maintenance of Genetic
Variability. Philosophical Transactions of the Royal Society of London. Series
B, Biological Sciences, 319(1196):615–629, 1988. ISSN 0080-4622.
[29] H. Allen Orr. The Probability of Parallel Evolution. Evolution, 59(1):216–220,
January 2005. ISSN 1558-5646.
[30] Paul A. Hohenlohe, Patrick C. Phillips, and William A. Cresko. USING
POPULATION GENOMICS TO DETECT SELECTION IN NATURAL
POPULATIONS: KEY CONCEPTS AND METHODOLOGICAL
CONSIDERATIONS. International Journal of Plant Sciences, 171(9):
1059–1071, November 2010. ISSN 1058-5893.
[31] J. Romiguier, P. Gayral, M. Ballenghien, A. Bernard, V. Cahais, A. Chenuil,
Y. Chiari, R. Dernat, L. Duret, N. Faivre, E. Loire, J. M. Lourenco,
B. Nabholz, C. Roux, G. Tsagkogeorga, A. a.-T. Weber, L. A. Weinert,
K. Belkhir, N. Bierne, S. Glmin, and N. Galtier. Comparative population
genomics in animals uncovers the determinants of genetic diversity. Nature,
515(7526):261–263, November 2014. ISSN 0028-0836.
[32] Trudy F. C. Mackay. Epistasis and quantitative traits: using model organisms
to study gene-gene interactions. Nature Reviews Genetics, 15(1):22–33,
January 2014. ISSN 1471-0056.
186
[33] Peter Tiffin and Jeffrey Ross-Ibarra. Advances and limits of using population
genetics to understand local adaptation. Trends in Ecology & Evolution, 29
(12):673–680, December 2014. ISSN 0169-5347.
[34] Pu Huang, Maximilian Feldman, Stephan Schroder, Bochra A. Bahri, Xianmin
Diao, Hui Zhi, Matt Estep, Ivan Baxter, Katrien M. Devos, and Elizabeth A.
Kellogg. Population genetics of Setariaviridis, a new model system. Molecular
Ecology, 23(20):4912–4925, October 2014. ISSN 1365-294X.
[35] Michael Lynch. Mutation and Human Exceptionalism: Our Future Genetic
Load. Genetics, 202(3):869–875, March 2016. ISSN 0016-6731, 1943-2631.
[36] H.C. Berg. Random Walks in Biology. Princeton paperbacks. Princeton
University Press, 1993. ISBN 978-0-691-00064-0.
[37] Guy Sella and Aaron E. Hirsh. The application of statistical physics to
evolutionary biology. Proceedings of the National Academy of Sciences of the
United States of America, 102(27):9541–9546, July 2005. ISSN 0027-8424,
1091-6490.
[38] K. Dill and S. Bromberg. Molecular Driving Forces: Statistical Thermodynamics
in Biology, Chemistry, Physics, and Nanoscience. Taylor & Francis Group,
2010. ISBN 978-1-136-67299-6.
[39] Shigeru Kondo and Takashi Miura. Reaction-Diffusion Model as a Framework
for Understanding Biological Pattern Formation. Science, 329(5999):
1616–1620, September 2010. ISSN 0036-8075, 1095-9203.
[40] Ken A. Dill, Kingshuk Ghosh, and Jeremy D. Schmit. Physical limits of cells
and proteomes. Proceedings of the National Academy of Sciences, 108(44):
17876–17882, November 2011. ISSN 0027-8424, 1091-6490.
[41] Kingshuk Ghosh, Adam M. R. de Graff, Lucas Sawle, and Ken A. Dill. Role of
Proteome Physical Chemistry in Cell Behavior. The Journal of Physical
Chemistry. B, 120(36):9549, September 2016.
[42] R.E. Feeney and R.G. Allison. Evolutionary biochemistry of proteins:
homologous and analogous proteins from avian egg whites, blood sera, milk,
and other substances. Wiley-Interscience, 1969.
[43] Michael J. Harms and Joseph W. Thornton. Evolutionary biochemistry:
revealing the historical and physical causes of protein properties. Nature
Reviews Genetics, 14(8):559–571, August 2013. ISSN 1471-0056.
[44] Michael J Harms and Joseph W Thornton. Analyzing protein structure and
function using ancestral gene reconstruction. Current Opinion in Structural
Biology, 20(3):360–366, June 2010. ISSN 0959-440X.
187
[45] Linus Pauling and E. Zuckerkandl. Chemical Paleogenetics. Acta chem. scand,
17:S9–S16, 1963.
[46] Michael J. Harms and Joseph W. Thornton. Historical contingency and its
biophysical basis in glucocorticoid receptor evolution. Nature, 512(7513):
203–207, August 2014. ISSN 0028-0836.
[47] Lucas C Wheeler, Shion A Lim, Susan Marqusee, and Michael J Harms. The
thermostability and specificity of ancient proteins. Current Opinion in
Structural Biology, 38:37–43, June 2016. ISSN 0959-440X.
[48] D.A. Liberles. Ancestral Sequence Reconstruction. Oxford biosciences. OUP
Oxford, 2007. ISBN 978-0-19-929918-8.
[49] Victor Hanson-Smith, Bryan Kolaczkowski, and Joseph W. Thornton.
Robustness of Ancestral Sequence Reconstruction to Phylogenetic
Uncertainty. Molecular Biology and Evolution, 27(9):1988–1999, September
2010. ISSN 0737-4038.
[50] Geeta N. Eick, Jamie T. Bridgham, Douglas P. Anderson, Michael J. Harms,
and Joseph W. Thornton. Robustness of Reconstructed Ancestral Protein
Functions to Statistical Uncertainty. Molecular Biology and Evolution, 34(2):
247–261, February 2017. ISSN 0737-4038.
[51] Jamie T. Bridgham, Eric A. Ortlund, and Joseph W. Thornton. An epistatic
ratchet constrains the direction of glucocorticoid receptor evolution. Nature,
461(7263):515–519, September 2009. ISSN 0028-0836.
[52] AlesiaN. McKeown, JamieT. Bridgham, DaveW. Anderson, MichaelN. Murphy,
EricA. Ortlund, and JosephW. Thornton. Evolution of DNA Specificity in a
Transcription Factor Family Produced a New Gene Regulatory Module. Cell,
159(1):58–68, September 2014. ISSN 0092-8674.
[53] Jeffrey I. Boucher, Joseph R. Jacobowitz, Brian C. Beckett, Scott Classen, and
Douglas L. Theobald. An atomic-resolution view of neofunctionalization in
the evolution of apicomplexan lactate dehydrogenases. eLife, 3:e02304, June
2014. ISSN 2050-084X.
[54] Dave W. Anderson, Alesia N. McKeown, and Joseph W. Thornton.
Intermolecular epistasis shaped the function and evolution of an ancient
transcription factor and its DNA binding sites. eLife, 4:e07864, July 2015.
ISSN 2050-084X.
[55] BenE. Clifton and ColinJ. Jackson. Ancestral Protein Reconstruction Yields
Insights into Adaptive Evolution of Binding Specificity in Solute-Binding
Proteins. Cell Chemical Biology, 23(2):236–245, February 2016. ISSN
2451-9456.
188
[56] Kathryn M. Hart, Michael J. Harms, Bryan H. Schmidt, Carolyn Elya,
Joseph W. Thornton, and Susan Marqusee. Thermodynamic System Drift in
Protein Evolution. PLoS Biol, 12(11):e1001994, November 2014.
[57] Christopher D. Aakre, Julien Herrou, Tuyen N. Phung, Barrett S. Perchuk,
Sean Crosson, and Michael T. Laub. Evolving New Protein-Protein
Interaction Specificity through Promiscuous Intermediates. Cell, 163(3):
594–606, October 2015. ISSN 0092-8674.
[58] Geeta N. Eick, Jennifer K. Colucci, Michael J. Harms, Eric A. Ortlund, and
Joseph W. Thornton. Evolution of Minimal Specificity and Promiscuity in
Steroid Hormone Receptors. PLoS Genetics, 8(11), November 2012. ISSN
1553-7390.
[59] C. Wilson, R. V. Agafonov, M. Hoemberger, S. Kutter, A. Zorba, J. Halpin,
V. Buosi, R. Otten, D. Waterman, D. L. Theobald, and D. Kern. Using
ancient protein kinases to unravel a modern cancer drugs mechanism. Science,
347(6224):882–886, February 2015. ISSN 0036-8075, 1095-9203.
[60] Valeria A. Risso, Jose A. Gavira, Diego F. Mejia-Carmona, Eric A. Gaucher,
and Jose M. Sanchez-Ruiz. Hyperstability and Substrate Promiscuity in
Laboratory Resurrections of Precambrian -Lactamases. Journal of the
American Chemical Society, 135(8):2899–2902, February 2013. ISSN
0002-7863.
[61] Tyler N. Starr and Joseph W. Thornton. Epistasis in protein evolution. Protein
Science, 25(7):1204–1218, July 2016. ISSN 1469-896X.
[62] Zachary R. Sailer and Michael J. Harms. Detecting High-Order Epistasis in
Nonlinear Genotype-Phenotype Maps. Genetics, 205(3):1079–1088, March
2017. ISSN 0016-6731, 1943-2631.
[63] Jason B. Wolf, Daniel Pomp, Eugene J. Eisen, James M. Cheverud, and
Larry J. Leamy. The contribution of epistatic pleiotropy to the genetic
architecture of covariation among polygenic traits in mice. Evolution &
Development, 8(5):468–476, October 2006. ISSN 1520-541X.
[64] Gnter P. Wagner and Vincent J. Lynch. The gene regulatory logic of
transcription factor evolution. Trends in Ecology & Evolution, 23(7):377–385,
July 2008. ISSN 0169-5347.
[65] Zachary D. Blount, Christina Z. Borland, and Richard E. Lenski. Historical
contingency and the evolution of a key innovation in an experimental
population of Escherichia coli. Proceedings of the National Academy of
Sciences, 105(23):7899–7906, June 2008. ISSN 0027-8424, 1091-6490.
189
[66] David L. Des Marais and Mark D. Rausher. Escape from adaptive conflict after
duplication in an anthocyanin pathway gene. Nature, 454(7205):762–765,
August 2008. ISSN 0028-0836.
[67] Stacey D. Smith and Mark D. Rausher. Gene loss and parallel evolution
contribute to species difference in flower color. Molecular Biology and
Evolution, 28(10):2799–2810, October 2011. ISSN 1537-1719.
[68] Stacey D. Smith, Shunqi Wang, and Mark D. Rausher. Functional evolution of
an anthocyanin pathway enzyme during a flower color transition. Molecular
Biology and Evolution, 30(3):602–612, March 2013. ISSN 1537-1719.
[69] Trevor R. Sorrells, Lauren N. Booth, Brian B. Tuch, and Alexander D. Johnson.
Intersecting transcription networks constrain gene regulatory evolution.
Nature, 523(7560):361–365, July 2015. ISSN 0028-0836.
[70] Matthew A. Streisfeld and Mark D. Rausher. Altered trans-regulatory control
of gene expression in multiple anthocyanin genes contributes to adaptive
flower color evolution in Mimulus aurantiacus. Molecular Biology and
Evolution, 26(2):433–444, February 2009. ISSN 1537-1719.
[71] Matthew A. Streisfeld and Mark D. Rausher. Genetic changes contributing to
the parallel evolution of red floral pigmentation among Ipomoea species. The
New Phytologist, 183(3):751–763, August 2009. ISSN 1469-8137.
[72] Carolyn A. Wessinger and Mark D. Rausher. Lessons from flower colour
evolution on targets of selection. Journal of Experimental Botany, 63(16):
5741–5749, October 2012. ISSN 0022-0957.
[73] Carolyn A. Wessinger and Mark D. Rausher. Predictability and irreversibility of
genetic changes associated with flower color evolution in Penstemon barbatus.
Evolution; International Journal of Organic Evolution, 68(4):1058–1070, April
2014. ISSN 1558-5646.
[74] Ali Zarrinpar, Sang-Hyun Park, and Wendell A. Lim. Optimization of
specificity in a cellular protein interaction network by negative selection.
Nature, 426(6967):676–680, December 2003. ISSN 0028-0836.
[75] Daniel M. Weinreich, Nigel F. Delaney, Mark A. DePristo, and Daniel L. Hartl.
Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter
Proteins. Science, 312(5770):111–114, April 2006. ISSN 0036-8075, 1095-9203.
[76] Shelley D. Copley. Toward a Systems Biology Perspective on Enzyme
Evolution. The Journal of Biological Chemistry, 287(1):3–10, January 2012.
ISSN 0021-9258.
190
[77] Aaron W. Reinke, Jiyeon Baek, Orr Ashenberg, and Amy E. Keating. Networks
of bZIP Protein-Protein Interactions Diversified Over a Billion Years of
Evolution. Science, 340(6133):730–734, May 2013. ISSN 0036-8075,
1095-9203.
[78] William H. Hudson and Eric A. Ortlund. The structure, function and evolution
of proteins that bind DNA and RNA. Nature Reviews Molecular Cell Biology,
15(11):749–760, November 2014. ISSN 1471-0072.
[79] Conor J. Howard, Victor Hanson-Smith, Kristopher J. Kennedy, Chad J. Miller,
Hua Jane Lou, Alexander D. Johnson, Benjamin E. Turk, and Liam J. Holt.
Ancestral resurrection reveals evolutionary mechanisms of kinase plasticity.
eLife, 3:e04126, November 2014. ISSN 2050-084X.
[80] Steven M Yannone, Sophia Hartung, Angeli L Menon, Michael WW Adams,
and John A Tainer. Metals in biology: defining metalloproteomes. Current
Opinion in Biotechnology, 23(1):89–95, February 2012. ISSN 0958-1669.
[81] Diana Ekman, Sara Light, sa K Bjrklund, and Arne Elofsson. What properties
characterize the hub proteins of the protein-protein interaction network of
Saccharomyces cerevisiae? Genome Biology, 7(6):R45, 2006. ISSN 1465-6906.
[82] Nobuyuki Uchikoga, Yuri Matsuzaki, Masahito Ohue, and Yutaka Akiyama.
Specificity of broad protein interaction surfaces for proteins with multiple
binding partners. Biophysics and Physicobiology, 13:105–115, July 2016. ISSN
2189-4779.
[83] Shibani Bhattacharya, Christopher G. Bunick, and Walter J. Chazin. Target
selectivity in EF-hand calcium binding proteins. Biochimica et Biophysica
Acta (BBA) - Molecular Cell Research, 1742(13):69–79, December 2004. ISSN
0167-4889.
[84] David Chin and Anthony R Means. Calmodulin: a prototypical calcium sensor.
Trends in Cell Biology, 10(8):322–328, August 2000. ISSN 0962-8924.
[85] Patrick S Mitchell, Michael Emerman, and Harmit S Malik. An evolutionary
perspective on the broad antiviral specificity of MxA. Current Opinion in
Microbiology, 16(4):493–499, August 2013. ISSN 1369-5274.
[86] D. Gfeller, F. Butty, M. Wierzbicka, E. Verschueren, P. Vanhee, H. Huang,
A. Ernst, N. Dar, I. Stagljar, L. Serrano, S. S. Sidhu, G. D. Bader, and P. M.
Kim. The multiple-specificity landscape of modular peptide recognition
domains. Molecular Systems Biology, 7(1):484–484, April 2014. ISSN
1744-4292.
191
[87] Gideon Schreiber and Amy E Keating. Protein binding specificity versus
promiscuity. Current Opinion in Structural Biology, 21(1):50–61, February
2011. ISSN 0959-440X.
[88] Shelley D. Copley. An evolutionary biochemist’s perspective on promiscuity.
Trends in Biochemical Sciences, 40(2):72–78, January 2015. ISSN 0968-0004.
[89] R. Donato, B.R. Cannon, G. Sorci, F. Riuzzi, K. Hsu, D.J. Weber, and C.L.
Geczy. Functions of S100 Proteins. Current molecular medicine, 13(1):24–57,
January 2013. ISSN 1566-5240.
[90] Rosario Donato. Intracellular and extracellular roles of S100 proteins.
Microscopy Research and Technique, 60(6):540–551, April 2003. ISSN
1097-0029.
[91] Danna B. Zimmer, Jeannine O. Eubanks, Dhivya Ramakrishnan, and
Michael F. Criscitiello. Evolution of the S100 family of calcium sensor
proteins. Cell Calcium, 53(3):170–179, March 2013. ISSN 0143-4160.
[92] Claus W. Heizmann and Jos A. Cox. New perspectives on S100 proteins: a
multi-functional Ca 2+ -, Zn 2+ - and Cu 2+ -binding protein family.
Biometals, 11(4):383–397. ISSN 0966-0844, 1572-8773.
[93] Walter J. Chazin. Relating Form and Function of EF-hand Calcium Binding
Proteins. Accounts of chemical research, 44(3):171–179, March 2011. ISSN
0001-4842.
[94] Liliana Santamaria-Kisiel, Anne C. Rintala-Dempsey, and Gary S. Shaw.
Calcium-dependent and -independent interactions of the S100 protein family.
Biochemical Journal, 396(2):201–214, June 2006. ISSN 0264-6021, 1470-8728.
[95] Andreas M. Kraemer, Luis R. Saraiva, and Sigrun I. Korsching. Structural and
functional diversification in the teleost S100 family of calcium-binding
proteins. BMC Evolutionary Biology, 8:48, 2008. ISSN 1471-2148.
[96] Lucas C. Wheeler, Micah T. Donor, James S. Prell, and Michael J. Harms.
Multiple Evolutionary Origins of Ubiquitous Cu2+ and Zn2+ Binding in the
S100 Protein Family. PLOS ONE, 11(10):e0164740, October 2016. ISSN
1932-6203.
[97] Kenji Kizawa, Hidenari Takahara, Masaki Unno, and Claus W. Heizmann. S100
and S100 fused-type protein families in epidermal maturation with special
focus on S100a3 in mammalian hair cuticles. Biochimie, 93(12):2038–2047,
December 2011. ISSN 1638-6183.
192
[98] Michael F. Gutknecht, Marc E. Seaman, Bo Ning, Daniel Auger Cornejo, Emily
Mugler, Patrick F. Antkowiak, Christopher A. Moskaluk, Song Hu,
Frederick H. Epstein, and Kimberly A. Kelly. Identification of the S100
fused-type protein hornerin as a regulator of tumor vascularity. Nature
Communications, 8(1):552, September 2017. ISSN 2041-1723.
[99] Romuald Contzler, Bertrand Favre, Marcel Huber, and Daniel Hohl. Cornulin, a
new member of the ”fused gene” family, is expressed during epidermal
differentiation. The Journal of Investigative Dermatology, 124(5):990–997,
May 2005. ISSN 0022-202X.
[100] Ingo Marenholz, Claus W. Heizmann, and Gnter Fritz. S100 proteins in mouse
and man: from evolution to function and pathology (including an update of
the nomenclature). Biochemical and Biophysical Research Communications,
322(4):1111–1122, October 2004. ISSN 0006-291X.
[101] Estelle Leclerc, Gnter Fritz, Stefan W. Vetter, and Claus W. Heizmann.
Binding of S100 proteins to RAGE: An update. Biochimica et Biophysica
Acta (BBA) - Molecular Cell Research, 1793(6):993–1007, June 2009. ISSN
0167-4889.
[102] Francesca Riuzzi, Guglielmo Sorci, and Rosario Donato. S100b protein
regulates myoblast proliferation and differentiation by activating FGFR1 in a
bFGF-dependent manner. J Cell Sci, 124(14):2389–2400, July 2011. ISSN
0021-9533, 1477-9137.
[103] Weidong Zhu, Yi Xue, Chao Liang, Rihua Zhang, Zhihong Zhang, Hongyan Li,
Dongming Su, Xiubin Liang, Yuanyuan Zhang, Qiong Huang, Menglan Liu,
Lu Li, Dong Li, Allan Z. Zhao, and Yun Liu. S100a16 promotes cell
proliferation and metastasis via AKT and ERK cell signaling pathways in
human prostate cancer. Tumor Biology, 37(9):12241–12250, September 2016.
ISSN 1010-4283, 1423-0380.
[104] Ching Chang Cho, Ruey Hwang Chou, and Chin Yu. Pentamidine blocks the
interaction between mutant S100a5 and RAGE V domain and inhibits the
RAGE signaling pathway. Biochemical and Biophysical Research
Communications, 477(2):188–194, August 2016. ISSN 0006-291X.
[105] Steven M. Damo, Thomas E. Kehl-Fie, Norie Sugitani, Marilyn E. Holt,
Subodh Rathi, Wesley J. Murphy, Yaofang Zhang, Christine Betz, Laura
Hench, Gnter Fritz, Eric P. Skaar, and Walter J. Chazin. Molecular basis for
manganese sequestration by calprotectin and roles in the innate immune
response to invading bacterial pathogens. Proceedings of the National
Academy of Sciences, 110(10):3841–3846, March 2013. ISSN 0027-8424,
1091-6490.
193
[106] Joshua A. Hayden, Megan Brunjes Brophy, Lisa S. Cunden, and Elizabeth M.
Nolan. High-Affinity Manganese Coordination by Human Calprotectin Is
Calcium-Dependent and Requires the Histidine-Rich Site Formed at the
Dimer Interface. Journal of the American Chemical Society, 135(2):775–787,
January 2013. ISSN 0002-7863.
[107] James N. Tsoporis, Alexander Marks, Abraham Haddad, Fayez Dawood,
Peter P. Liu, and Thomas G. Parker. S100b Expression Modulates Left
Ventricular Remodeling After Myocardial Infarction in Mice. Circulation, 111
(5):598–606, February 2005. ISSN 0009-7322, 1524-4539.
[108] James N. Tsoporis, Shehla Izhar, and Thomas G. Parker. Expression of
S100a6 in Cardiac Myocytes Limits Apoptosis Induced by Tumor Necrosis
Factor-. Journal of Biological Chemistry, 283(44):30174–30183, October 2008.
ISSN 0021-9258, 1083-351X.
[109] Lucas N. Wafer, Franco O. Tzul, Pranav P. Pandharipande, and George I.
Makhatadze. Novel Interactions of the TRTK12 Peptide with S100 Protein
Family Members: Specificity and Thermodynamic Characterization.
Biochemistry, 52(34):5844–5856, August 2013. ISSN 0006-2960.
[110] Lucas C. Wheeler, Jeremy A. Anderson, Annelise J. Morrison, Caitlyn E.
Wong, and Michael J. Harms. Conservation of specificity in two
low-specificity proteins. bioRxiv, page 207324, October 2017.
[111] Michael A. Stiffler, Jiunn R. Chen, Viara P. Grantcharova, Ying Lei, Daniel
Fuchs, John E. Allen, Lioudmila A. Zaslavskaia, and Gavin MacBeath. PDZ
Domain Binding Selectivity Is Optimized Across the Mouse Proteome.
Science, 317(5836):364–369, July 2007. ISSN 0036-8075, 1095-9203.
[112] Eric A. Gaucher, Sridhar Govindarajan, and Omjoy K. Ganesh.
Palaeotemperature trend for Precambrian life inferred from resurrected
proteins. Nature, 451(7179):704–707, February 2008. ISSN 0028-0836.
[113] Karin Voordeckers, Chris A. Brown, Kevin Vanneste, Elisa van der Zande,
Arnout Voet, Steven Maere, and Kevin J. Verstrepen. Reconstruction of
Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying
Evolutionary Innovation through Gene Duplication. PLoS Biol, 10(12):
e1001446, December 2012.
[114] Joanne K. Hobbs, Charis Shepherd, David J. Saul, Nicholas J. Demetras,
Svend Haaning, Colin R. Monk, Roy M. Daniel, and Vickery L. Arcus. On
the Origin and Evolution of Thermophily: Reconstruction of Functional
Precambrian Enzymes from Ancestors of Bacillus. Molecular Biology and
Evolution, 29(2):825–835, February 2012. ISSN 0737-4038, 1537-1719.
194
[115] Satoshi Akanuma, Yoshiki Nakajima, Shin-ichi Yokobori, Mitsuo Kimura,
Naoki Nemoto, Tomoko Mase, Ken-ichi Miyazono, Masaru Tanokura, and
Akihiko Yamagishi. Experimental evidence for the thermophilicity of
ancestral life. Proceedings of the National Academy of Sciences, 110(27):
11067–11072, July 2013. ISSN 0027-8424, 1091-6490.
[116] Satoshi Akanuma, Shoko Iwami, Tamaki Yokoi, Nana Nakamura, Hideaki
Watanabe, Shin-ichi Yokobori, and Akihiko Yamagishi. Phylogeny-Based
Design of a B-Subunit of DNA Gyrase and Its ATPase Domain Using a Small
Set of Homologous Amino Acid Sequences. Journal of Molecular Biology, 412
(2):212–225, September 2011. ISSN 0022-2836.
[117] N. B. Loughran, M. J. O’Connell, B. O’Connor, and C. ’Fgin. Stability
properties of an ancient plant peroxidase. Biochimie, 104:156–159, September
2014. ISSN 0300-9084.
[118] Raul Perez-Jimenez, Alvaro Ingls-Prieto, Zi-Ming Zhao, Inmaculada
Sanchez-Romero, Jorge Alegre-Cebollada, Pallav Kosuri, Sergi
Garcia-Manyes, T. Joseph Kappock, Masaru Tanokura, Arne Holmgren,
Jose M. Sanchez-Ruiz, Eric A. Gaucher, and Julio M. Fernandez.
Single-molecule paleoenzymology probes the chemistry of resurrected
enzymes. Nature Structural & Molecular Biology, 18(5):592–596, May 2011.
ISSN 1545-9993.
[119] Valeria A. Risso, Jose A. Gavira, and Jose M. Sanchez-Ruiz. Thermostable
and promiscuous Precambrian proteins. Environmental Microbiology, 16(6):
1485–1489, June 2014. ISSN 1462-2920.
[120] Megan F. Cole and Eric A. Gaucher. Utilizing natural diversity to evolve
protein function: applications towards thermostability. Current Opinion in
Chemical Biology, 15(3):399–406, June 2011. ISSN 1879-0402.
[121] Jason H. Whitfield, William H. Zhang, Michel K. Herde, Ben E. Clifton,
Johanna Radziejewski, Harald Janovjak, Christian Henneberger, and Colin J.
Jackson. Construction of a robust and sensitive arginine biosensor through
ancestral protein reconstruction. Protein Science, 24(9):1412–1422, September
2015. ISSN 1469-896X.
[122] M. M. Gromiha, M. Oobatake, and A. Sarai. Important amino acid properties
for enhanced thermostability from mesophilic to thermophilic proteins.
Biophysical Chemistry, 82(1):51–67, November 1999. ISSN 0301-4622.
[123] Darin M. Taverna and Richard A. Goldstein. Why are proteins marginally
stable? Proteins, 46(1):105–109, January 2002. ISSN 0887-3585.
195
[124] Fabia U. Battistuzzi, Andreia Feijao, and S. Blair Hedges. A genomic
timescale of prokaryote evolution: insights into the origin of methanogenesis,
phototrophy, and the colonization of land. BMC Evolutionary Biology, 4:44,
November 2004. ISSN 1471-2148.
[125] B. A. Malcolm, K. P. Wilson, B. W. Matthews, J. F. Kirsch, and A. C.
Wilson. Ancestral lysozymes reconstructed, neutrality tested, and
thermostability linked to hydrocarbon packing. Nature, 345(6270):86–89, May
1990. ISSN 0028-0836.
[126] Pouria Dasmeh, Adrian W. R. Serohijos, Kasper P. Kepp, and Eugene I.
Shakhnovich. Positively Selected Sites in Cetacean Myoglobins Contribute to
Protein Stability. PLOS Computational Biology, 9(3):e1002929, March 2013.
ISSN 1553-7358.
[127] Lizhi Ian Gong, Marc A. Suchard, and Jesse D. Bloom. Stability-mediated
epistasis constrains the evolution of an influenza protein. eLife, 2:e00631, May
2013. ISSN 2050-084X.
[128] Mathieu Groussin, Joanne K. Hobbs, Gergely J. Szllsi, Simonetta Gribaldo,
Vickery L. Arcus, and Manolo Gouy. Toward more accurate ancestral protein
genotype-phenotype reconstructions with the use of species tree-aware gene
trees. Molecular Biology and Evolution, 32(1):13–22, January 2015. ISSN
1537-1719.
[129] Satoshi Akanuma, Shin-ichi Yokobori, Yoshiki Nakajima, Mizumo Bessho, and
Akihiko Yamagishi. Robustness of predictions of extremely thermally stable
proteins in ancient organisms. Evolution, 69(11):2954–2962, November 2015.
ISSN 1558-5646.
[130] Hagit Bar-Rogovsky, Adi Stern, Osnat Penn, Iris Kobl, Tal Pupko, and Dan S.
Tawfik. Assessing the prediction fidelity of ancestral reconstruction by a
library approach. Protein engineering, design & selection: PEDS, 28(11):
507–518, November 2015. ISSN 1741-0134.
[131] Paul D. Williams, David D. Pollock, Benjamin P. Blackburne, and Richard A.
Goldstein. Assessing the Accuracy of Ancestral Protein Reconstruction
Methods. PLOS Computational Biology, 2(6):e69, June 2006. ISSN
1553-7358.
[132] Shimon Bershtein, Korina Goldin, and Dan S. Tawfik. Intense neutral drifts
yield robust and evolvable consensus proteins. Journal of Molecular Biology,
379(5):1029–1044, June 2008. ISSN 1089-8638.
196
[133] David D. Pollock, Grant Thiltgen, and Richard A. Goldstein. Amino acid
coevolution induces an evolutionary Stokes shift. Proceedings of the National
Academy of Sciences, 109(21):E1352–E1359, May 2012. ISSN 0027-8424,
1091-6490.
[134] Richard A. Goldstein, Stephen T. Pollard, Seena D. Shah, and David D.
Pollock. Nonadaptive Amino Acid Convergence Rates Decrease over Time.
Molecular Biology and Evolution, 32(6):1373–1381, June 2015. ISSN
0737-4038.
[135] Brian Gaschen, Jesse Taylor, Karina Yusim, Brian Foley, Feng Gao, Dorothy
Lang, Vladimir Novitsky, Barton Haynes, Beatrice H. Hahn, Tanmoy
Bhattacharya, and Bette Korber. Diversity considerations in HIV-1 vaccine
selection. Science (New York, N.Y.), 296(5577):2354–2360, June 2002. ISSN
1095-9203.
[136] Denise L. Kothe, Yingying Li, Julie M. Decker, Frederic Bibollet-Ruche,
Kenneth P. Zammit, Maria G. Salazar, Yalu Chen, Zhiping Weng, Eric A.
Weaver, Feng Gao, Barton F. Haynes, George M. Shaw, Bette T. M. Korber,
and Beatrice H. Hahn. Ancestral and consensus envelope immunogens for
HIV-1 subtype C. Virology, 352(2):438–449, September 2006. ISSN 0042-6822.
[137] Valeria A. Risso, Jose A. Gavira, Eric A. Gaucher, and Jose M. Sanchez-Ruiz.
Phenotypic comparisons of consensus variants versus laboratory resurrections
of Precambrian proteins. Proteins, 82(6):887–896, June 2014. ISSN 1097-0134.
[138] R. A. Jensen. Enzyme Recruitment in Evolution of New Function. Annual
Review of Microbiology, 30(1):409–425, 1976.
[139] M Lynch and A Force. The probability of duplicate gene preservation by
subfunctionalization. Genetics, 154(1):459–473, January 2000. ISSN
0016-6731.
[140] Gavin C. Conant and Kenneth H. Wolfe. Turning a hobby into a job: How
duplicated genes find new functions. Nature Reviews Genetics, 9(12):938–950,
December 2008. ISSN 1471-0056.
[141] Sheng Ma, Jacqueline Martin-Laffon, Morgane Mininno, Ocane Gigarel, Sabine
Brugire, Olivier Bastien, Marianne Tardif, Stphane Ravanel, and Claude
Alban. Molecular Evolution of the Substrate Specificity of Chloroplastic
Aldolases/Rubisco Lysine Methyltransferases in Plants. Molecular Plant, 9(4):
569–581, April 2016. ISSN 1752-9867.
[142] Merridee A. Wouters, Ke Liu, Peter Riek, and Ahsan Husain. A
despecialization step underlying evolution of a family of serine proteases.
Molecular Cell, 12(2):343–354, August 2003. ISSN 1097-2765.
197
[143] Camille Sayou, Marie Monniaux, Max H. Nanao, Edwige Moyroud, Samuel F.
Brockington, Emmanuel Thvenon, Hicham Chahtane, Norman Warthmann,
Michael Melkonian, Yong Zhang, Gane Ka-Shu Wong, Detlef Weigel, Franois
Parcy, and Renaud Dumas. A Promiscuous Intermediate Underlies the
Evolution of LEAFY DNA Binding Specificity. Science, 343(6171):645–648,
February 2014. ISSN 0036-8075, 1095-9203.
[144] A. Chinen, Y. Naito, N. Handa, and I. Kobayashi. Evolution of sequence
recognition by restriction-modification enzymes: selective pressure for
specificity decrease. Molecular Biology and Evolution, 17(11):1610–1619,
November 2000. ISSN 0737-4038.
[145] Orit Peleg, Jeong-Mo Choi, and EugeneI. Shakhnovich. Evolution of
Specificity in Protein-Protein Interactions. Biophysical Journal, 107(7):
1686–1696, October 2014. ISSN 0006-3495.
[146] Juhan Kim and Shelley D. Copley. Inhibitory cross-talk upon introduction of a
new metabolic pathway into an existing metabolic network. Proceedings of the
National Academy of Sciences of the United States of America, 109(42):
E2856–E2864, October 2012. ISSN 0027-8424.
[147] Jungeui Hong and David Gresham. Molecular Specificity, Convergence and
Constraint Shape Adaptive Evolution in Nutrient-Poor Environments. PLoS
Genet, 10(1):e1004041, January 2014.
[148] Marjon G. J. de Vos, Alexandre Dawid, Vanda Sunderlikova, and Sander J.
Tans. Breaking evolutionary constraint with a tradeoff ratchet. Proceedings of
the National Academy of Sciences, 112(48):14906–14911, December 2015.
ISSN 0027-8424, 1091-6490.
[149] Andreas Ernst, David Gfeller, Zhengyan Kan, Somasekar Seshagiri, Philip M.
Kim, Gary D. Bader, and Sachdev S. Sidhu. Coevolution of PDZ
domainligand interactions analyzed by high-throughput phage display and
deep sequencing. Molecular BioSystems, 6(10):1782, 2010. ISSN 1742-206X,
1742-2051.
[150] Alexander J. Stewart and Joshua B. Plotkin. The evolution of complex gene
regulation by low-specificity binding sites. Proceedings of the Royal Society of
London B: Biological Sciences, 280(1768):20131313, October 2013. ISSN
0962-8452, 1471-2954.
198
[151] Ronald Wolf, O. M. Zack Howard, Hui-Fang Dong, Christopher Voscopoulos,
Karen Boeshans, Jason Winston, Rao Divi, Michele Gunsior, Paul Goldsmith,
Bijan Ahvazi, Triantafyllos Chavakis, Joost J. Oppenheim, and Stuart H.
Yuspa. Chemotactic activity of S100a7 (Psoriasin) is mediated by the
receptor for advanced glycation end products and potentiates inflammation
with highly homologous but functionally distinct S100a15. Journal of
Immunology (Baltimore, Md.: 1950), 181(2):1499–1506, July 2008. ISSN
1550-6606.
[152] Guglielmo Sorci, Gloria Giovannini, Francesca Riuzzi, Pierluigi Bonifazi,
Teresa Zelante, Silvia Zagarella, Francesco Bistoni, Rosario Donato, and
Luigina Romani. The danger signal S100b integrates pathogen- and
danger-sensing pathways to restrain inflammation. PLoS pathogens, 7(3):
e1001315, March 2011. ISSN 1553-7374.
[153] Sean S. Shaw, Ann Marie Schmidt, Amy K. Banes, Xiaodan Wang, David M.
Stern, and Mario B. Marrero. S100b-RAGE-mediated augmentation of
angiotensin II-induced activation of JAK2 in vascular smooth muscle cells is
dependent on PLD2. Diabetes, 52(9):2381–2388, September 2003. ISSN
0012-1797.
[154] Jrg Klingelhfer, Henrik D. Mller, Eren U. Sumer, Christian H. Berg, Maria
Poulsen, Darya Kiryushko, Vladislav Soroka, Noona Ambartsumian, Mariam
Grigorian, and Eugene M. Lukanidin. Epidermal growth factor receptor
ligands as new extracellular targets for the metastasis-promoting S100a4
protein. The FEBS journal, 276(20):5936–5948, October 2009. ISSN
1742-4658.
[155] Xiangyu Wang, Jing Yang, Jingfeng Qian, Zhihua Liu, Hongyan Chen, and
Zhumei Cui. S100a14, a mediator of epithelial-mesenchymal transition,
regulates proliferation, migration and invasion of human cervical cancer cells.
American Journal of Cancer Research, 5(4):1484–1495, March 2015. ISSN
2156-6976.
[156] Zheng Yang, Wei Xing Yan, Hong Cai, Nicodemus Tedla, Chris Armishaw,
Nick Di Girolamo, Hong Wei Wang, Taline Hampartzoumian, Jodie L.
Simpson, Peter G. Gibson, John Hunt, Prue Hart, J. Margaret Hughes,
Michael A. Perry, Paul F. Alewood, and Carolyn L. Geczy. S100a12 provokes
mast cell activation: a potential amplification pathway in asthma and innate
immunity. The Journal of Allergy and Clinical Immunology, 119(1):106–114,
January 2007. ISSN 0091-6749.
199
[157] Joseph P. Zackular, Walter J. Chazin, and Eric P. Skaar. Nutritional
Immunity: S100 Proteins at the Host-Pathogen Interface. Journal of
Biological Chemistry, 290(31):18991–18998, July 2015. ISSN 0021-9258,
1083-351X.
[158] F Sedaghat and A Notopoulos. S100 protein family and its application in
clinical practice. Hippokratia, 12(4):198–204, 2008. ISSN 1108-4189.
[159] Rosario Donato. RAGE: A Single Receptor for Several Ligands and Different
Cellular Responses: The Case of Certain S100 Proteins. Current Molecular
Medicine, 7(8):711–724, December 2007.
[160] N. R. West and P. H. Watson. S100a7 (psoriasin) is induced by the
proinflammatory cytokines oncostatin-M and interleukin-6 in human breast
cancer. Oncogene, 29(14):2083–2092, April 2010. ISSN 0950-9232.
[161] Michelle M. Averill, Shelley Barnhart, Lev Becker, Xin Li, Jay W. Heinecke,
Renee C. LeBoeuf, Jessica A. Hamerman, Clemens Sorg, Claus Kerkhoff, and
Karin E. Bornfeldt. S100a9 Differentially Modifies Phenotypic States of
Neutrophils, Macrophages, and Dendritic CellsClinical Perspective.
Circulation, 123(11):1216–1226, March 2011. ISSN 0009-7322, 1524-4539.
[162] Kjetil Boye and Gunhild M. Mlandsmo. S100a4 and Metastasis: A Small
Actor Playing Many Roles. The American Journal of Pathology, 176(2):
528–535, February 2010. ISSN 0002-9440.
[163] Masaya Yamaoka, Norikazu Maeda, Seiji Nakamura, Takuya Mori, Kana Inoue,
Keisuke Matsuda, Ryohei Sekimoto, Susumu Kashine, Yasuhiko Nakagawa,
Yu Tsushima, Yuya Fujishima, Noriyuki Komura, Ayumu Hirata, Hitoshi
Nishizawa, Yuji Matsuzawa, Ken-ichi Matsubara, Tohru Funahashi, and
Iichiro Shimomura. Gene expression levels of S100 protein family in blood
cells are associated with insulin resistance and inflammation (Peripheral blood
S100 mRNAs and metabolic syndrome). Biochemical and Biophysical
Research Communications, 433(4):450–455, April 2013. ISSN 0006-291X.
[164] Stephane R. Gross, Connie Goh Then Sin, Roger Barraclough, and Philip S.
Rudland. Joining S100 proteins and migration: for better or for worse, in
sickness and in health. Cellular and Molecular Life Sciences, 71(9):1551–1579,
June 2013. ISSN 1420-682X, 1420-9071.
[165] Anne R. Bresnick, David J. Weber, and Danna B. Zimmer. S100 proteins in
cancer. Nature Reviews Cancer, 15(2):96–109, February 2015. ISSN
1474-175X.
200
[166] Ivano Bertini, Valentina Borsi, Linda Cerofolini, Soumyasri Das Gupta, Marco
Fragai, and Claudio Luchinat. Solution structure and dynamics of human
S100a14. JBIC Journal of Biological Inorganic Chemistry, 18(2):183–194,
November 2012. ISSN 0949-8257, 1432-1327.
[167] Richard R. Rustandi, Alexander C. Drohat, Donna M. Baldisseri, Paul T.
Wilder, and David J. Weber. The Ca2+-Dependent Interaction of S100b()
with a Peptide Derived from p53. Biochemistry, 37(7):1951–1960, February
1998. ISSN 0006-2960.
[168] Danna B. Zimmer, Patti Wright Sadosky, and David J. Weber. Molecular
mechanisms of S100-target protein interactions. Microscopy Research and
Technique, 60(6):552–559, April 2003. ISSN 1059-910X.
[169] Danna B. Zimmer and David J. Weber. The Calcium-Dependent Interaction of
S100b with Its Protein Targets. Cardiovascular Psychiatry and Neurology,
2010, 2010. ISSN 2090-0163.
[170] Olga V. Moroz, Keith S. Wilson, and Igor B. Bronstein. The role of zinc in the
S100 proteins: insights from the X-ray structures. Amino Acids, 41(4):
761–772, March 2010. ISSN 0939-4451, 1438-2199.
[171] Benjamin A. Gilston, Eric P. Skaar, and Walter J. Chazin. Binding of
transition metals to S100 proteins. Science China Life Sciences, pages 1–10,
July 2016. ISSN 1674-7305, 1869-1889.
[172] Vaithiyalingam Sivaraja, Thallapuranam Krishnaswamy Suresh Kumar,
Dakshinamurthy Rajalingam, Irene Graziani, Igor Prudovsky, and Chin Yu.
Copper Binding Affinity of S100a13, a Key Component of the FGF-1
Nonclassical Copper-Dependent Release Complex. Biophysical Journal, 91(5):
1832–1843, September 2006. ISSN 0006-3495.
[173] Jrg Heierhorst, Richard J. Mann, and Bruce E. Kemp. Interaction of the
Recombinant S100a1 Protein with Twitchin Kinase, and Comparison with
Other Ca2+-Binding Proteins. European Journal of Biochemistry, 249(1):
127–133, October 1997. ISSN 1432-1033.
[174] Thomas V. O’Halloran and Valeria Cizewski Culotta. Metallochaperones, an
Intracellular Shuttle Service for Metal Ions. Journal of Biological Chemistry,
275(33):25057–25060, August 2000. ISSN 0021-9258, 1083-351X.
[175] Wolfgang Maret. Zinc Biochemistry: From a Single Zinc Enzyme to a Key
Element of Life. Advances in Nutrition: An International Review Journal, 4
(1):82–91, January 2013. ISSN , 2156-5376.
201
[176] Fabio Arnesano, Lucia Banci, Ivano Bertini, Adele Fantoni, Leonardo Tenori,
and Maria Silvia Viezzoli. Structural Interplay between Calcium(II) and
Copper(II) Binding to S100a13 Protein. Angewandte Chemie International
Edition, 44(39):6341–6344, October 2005. ISSN 1521-3773.
[177] Michael Koch, Shibani Bhattacharya, Torsten Kehl, Mario Gimona, Milan
Vak, Walter Chazin, Claus W. Heizmann, Peter M. H. Kroneck, and Gnter
Fritz. Implications on zinc binding to S100a2. Biochimica et Biophysica Acta
(BBA) - Molecular Cell Research, 1773(3):457–470, March 2007. ISSN
0167-4889.
[178] Timothy Ravasi, Kenneth Hsu, Jesse Goyette, Kate Schroder, Zheng Yang,
Farid Rahimi, Les P. Miranda, Paul F. Alewood, David A. Hume, and
Carolyn Geczy. Probing the S100 protein family through genomic and
functional analysis. Genomics, 84(1):10–22, July 2004. ISSN 0888-7543.
[179] Xuan Shang, Hanhua Cheng, and Rongjia Zhou. Chromosomal mapping,
differential origin and evolution of the S100 gene family. Genetics Selection
Evolution, 40:449, 2008. ISSN 1297-9686.
[180] S. Blair Hedges, Fabia U. Battistuzzi, and Jaime E. Blair. Molecular Timescale
of Evolution in the Proterozoic. In Shuhai Xiao and Alan J. Kaufman,
editors, Neoproterozoic Geobiology and Paleobiology, number 27 in Topics in
Geobiology, pages 199–229. Springer Netherlands, 2006. ISBN
978-1-4020-5201-9 978-1-4020-5202-6.
[181] R. Alexander Pyron and John J. Wiens. A large-scale phylogeny of Amphibia
including over 2800 species, and a revised classification of extant frogs,
salamanders, and caecilians. Molecular Phylogenetics and Evolution, 61(2):
543–583, November 2011. ISSN 1055-7903.
[182] Ylenia Chiari, Vincent Cahais, Nicolas Galtier, and Frdric Delsuc.
Phylogenomic analyses support the position of turtles as the sister group of
birds and crocodiles (Archosauria). BMC Biology, 10:65, 2012. ISSN
1741-7007.
[183] Brant C. Faircloth, Laurie Sorenson, Francesco Santini, and Michael E. Alfaro.
A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based
upon Targeted Sequencing of Ultraconserved Elements (UCEs). PLOS ONE,
8(6):e65923, June 2013. ISSN 1932-6203.
202
[184] Richard E. Green, Edward L. Braun, Joel Armstrong, Dent Earl, Ngan
Nguyen, Glenn Hickey, Michael W. Vandewege, John A. St John, Salvador
Capella-Gutirrez, Todd A. Castoe, Colin Kern, Matthew K. Fujita, Juan C.
Opazo, Jerzy Jurka, Kenji K. Kojima, Juan Caballero, Robert M. Hubley,
Arian F. Smit, Roy N. Platt, Christine A. Lavoie, Meganathan P. Ramakodi,
John W. Finger, Alexander Suh, Sally R. Isberg, Lee Miles, Amanda Y.
Chong, Weerachai Jaratlerdsiri, Jaime Gongora, Christopher Moran, Andrs
Iriarte, John McCormack, Shane C. Burgess, Scott V. Edwards, Eric Lyons,
Christina Williams, Matthew Breen, Jason T. Howard, Cathy R. Gresham,
Daniel G. Peterson, Jrgen Schmitz, David D. Pollock, David Haussler,
Eric W. Triplett, Guojie Zhang, Naoki Irie, Erich D. Jarvis, Christopher A.
Brochu, Carl J. Schmidt, Fiona M. McCarthy, Brant C. Faircloth, Federico G.
Hoffmann, Travis C. Glenn, Toni Gabaldn, Benedict Paten, and David A.
Ray. Three crocodilian genomes reveal ancestral patterns of evolution among
archosaurs. Science, 346(6215):1254449, December 2014. ISSN 0036-8075,
1095-9203.
[185] N. Satoh, D. Rokhsar, and T. Nishikawa. Chordate evolution and the
three-phylum system. Proceedings of the Royal Society B: Biological Sciences,
281(1794):20141729–20141729, September 2014. ISSN 0962-8452, 1471-2954.
[186] Susanne Gallus, Axel Janke, Vikas Kumar, and Maria A. Nilsson.
Disentangling the relationship of the Australian marsupial orders using
retrotransposon and evolutionary network analyses. Genome Biology and
Evolution, 7(4):985–992, April 2015. ISSN 1759-6653.
[187] Richard O. Prum, Jacob S. Berv, Alex Dornburg, Daniel J. Field, Jeffrey P.
Townsend, Emily Moriarty Lemmon, and Alan R. Lemmon. A comprehensive
phylogeny of birds (Aves) using targeted next-generation DNA sequencing.
Nature, 526(7574):569–573, October 2015. ISSN 0028-0836.
[188] Pndaro Daz-Jaimes, Natalia J. Bayona-Vsquez, Douglas H. Adams, and
Manuel Uribe-Alcocer. Complete mitochondrial DNA genome of bonnethead
shark, Sphyrna tiburo, and phylogenetic relationships among main
superorders of modern elasmobranchs. Meta Gene, 7:48–55, February 2016.
ISSN 2214-5400.
[189] James E. Tarver, Mario dos Reis, Siavash Mirarab, Raymond J. Moran, Sean
Parker, Joseph E. OReilly, Benjamin L. King, Mary J. OConnell, Robert J.
Asher, Tandy Warnow, Kevin J. Peterson, Philip C. J. Donoghue, and Davide
Pisani. The Interrelationships of Placental Mammals and the Limits of
Phylogenetic Inference. Genome Biology and Evolution, 8(2):330–344,
February 2016. ISSN , 1759-6653.
203
[190] J. R. Dorin, E. Emslie, and V. van Heyningen. Related calcium-binding
proteins map to the same subregion of chromosome 1q and to an extended
region of synteny on mouse chromosome 3. Genomics, 8(3):420–426,
November 1990. ISSN 0888-7543.
[191] P. O. Tsvetkov, F. Devred, and A. A. Makarov. Thermodynamics of zinc
binding to human S100a2. Molecular Biology, 44(5):832–835, October 2010.
ISSN 0026-8933, 1608-3245.
[192] H. Vorum, P. Madsen, H. H. Rasmussen, M. Etzerodt, I. Svendsen, J. E. Celis,
and B. Honor. Expression and divalent cation binding properties of the novel
chemotactic inflammatory protein psoriasin. Electrophoresis, 17(11):
1787–1796, November 1996. ISSN 0173-0835.
[193] Jolanta Kordowska, Walter F. Stafford, and C.-L. Albert Wang. Ca2+ and
Zn2+ bind to different sites and induce different conformational changes in
human calcyclin. European Journal of Biochemistry, 253(1):57–66, April 1998.
ISSN 1432-1033.
[194] Gnter Fritz, Peer R. E. Mittl, Milan Vasak, Markus G. Grtter, and Claus W.
Heizmann. The Crystal Structure of Metal-free Human EF-hand Protein
S100a3 at 1.7- Resolution. Journal of Biological Chemistry, 277(36):
33092–33098, September 2002. ISSN 0021-9258, 1083-351X.
[195] Beat W. Schfer, Jean-Marc Fritschy, Petra Murmann, Heinz Troxler, Isabelle
Durussel, Claus W. Heizmann, and Jos A. Cox. Brain S100a5 Is a Novel
Calcium-, Zinc-, and Copper Ion-binding Protein of the EF-hand Superfamily.
Journal of Biological Chemistry, 275(39):30623–30630, September 2000. ISSN
0021-9258, 1083-351X.
[196] J. Baudier, N. Glasser, and D. Gerard. Ions binding to S100 proteins. I.
Calcium- and zinc-binding properties of bovine brain S100 alpha alpha, S100a
(alpha beta), and S100b (beta beta) protein: Zn2+ regulates Ca2+ binding
on S100b protein. Journal of Biological Chemistry, 261(18):8192–8203, June
1986. ISSN 0021-9258, 1083-351X.
[197] Olga V. Moroz, Will Burkitt, Helmut Wittkowski, Wei He, Anatoli Ianoul,
Vera Novitskaya, Jingjing Xie, Oxana Polyakova, Igor K. Lednev, Alexander
Shekhtman, Peter J. Derrick, Per Bjoerk, Dirk Foell, and Igor B. Bronstein.
Both Ca2+ and Zn2+ are essential for S100a12 protein oligomerization and
function. BMC Biochemistry, 10:11, 2009. ISSN 1471-2091.
[198] Dean E. Wilcox. Isothermal titration calorimetry of metal ions binding to
proteins: An overview of recent studies. Inorganica Chimica Acta, 361(4):
857–867, March 2008. ISSN 0020-1693.
204
[199] Emmanuel Sturchler, Jos A. Cox, Isabelle Durussel, Mirjam Weibel, and
Claus W. Heizmann. S100a16, a Novel Calcium-binding Protein of the
EF-hand Superfamily. Journal of Biological Chemistry, 281(50):38905–38917,
December 2006. ISSN 0021-9258, 1083-351X.
[200] T. Becker, V. Gerke, E. Kube, and K. Weber. S100p, a novel Ca(2+)-binding
protein from human placenta. cDNA cloning, recombinant protein expression
and Ca2+ binding properties. European journal of biochemistry / FEBS, 207
(2):541–547, July 1992. ISSN 0014-2956.
[201] S. Rty, J. Sopkova, M. Renouard, D. Osterloh, V. Gerke, S. Tabaries,
F. Russo-Marie, and A. Lewit-Bentley. The crystal structure of a complex of
p11 with the annexin II N-terminal peptide. Nature Structural Biology, 6(1):
89–95, January 1999. ISSN 1072-8368.
[202] Paul T. Wilder, Donna M. Baldisseri, Ryan Udan, Kristen M. Vallely, and
David J. Weber. Location of the Zn2+-Binding Site on S100b As Determined
by NMR Spectroscopy and Site-Directed Mutagenesis. Biochemistry, 42(46):
13410–13421, November 2003. ISSN 0006-2960.
[203] Nathan T. Wright, Kristen M. Varney, Karen C. Ellis, Joseph Markowitz,
Rossitza K. Gitti, Danna B. Zimmer, and David J. Weber. The
Three-dimensional Solution Structure of Ca2+-bound S100a1 as Determined
by NMR Spectroscopy. Journal of Molecular Biology, 353(2):410–426,
October 2005. ISSN 0022-2836.
[204] Sarah C. Garrett, Louis Hodgson, Andrew Rybin, Alexei Toutchkine, Klaus M.
Hahn, David S. Lawrence, and Anne R. Bresnick. A biosensor of S100a4
metastasis factor activation: inhibitor screening and cellular activation
dynamics. Biochemistry, 47(3):986–996, January 2008. ISSN 0006-2960.
[205] Jill I Murray, Michelle L Tonkin, Amanda L Whiting, Fangni Peng, Benjamin
Farnell, Jay T Cullen, Fraser Hof, and Martin J Boulanger. Structural
characterization of S100a15 reveals a novel zinc coordination site among S100
proteins and altered surface chemistry with functional implications for
receptor binding. BMC Structural Biology, 12:16, July 2012. ISSN 1472-6807.
[206] Elena Babini, Ivano Bertini, Valentina Borsi, Vito Calderone, Xiaoyu Hu,
Claudio Luchinat, and Giacomo Parigi. Structural characterization of human
S100a16, a low-affinity calcium binder. Journal of biological inorganic
chemistry: JBIC: a publication of the Society of Biological Inorganic
Chemistry, 16(2):243–256, February 2011. ISSN 1432-1327.
205
[207] Ivano Bertini, Soumyasri Das Gupta, Xiaoyu Hu, Tilemachos Karavelas,
Claudio Luchinat, Giacomo Parigi, and Jing Yuan. Solution structure and
dynamics of S100a5 in the apo and Ca2+-bound states. JBIC Journal of
Biological Inorganic Chemistry, 14(7):1097–1107, June 2009. ISSN 0949-8257,
1432-1327.
[208] Rajam S. Mani and Cyril M. Kay. Circular dichroism studies on the
zinc-induced conformational changes in S-100a and S-100b proteins. FEBS
Letters, 163(2):282–286, November 1983. ISSN 1873-3468.
[209] Beat W. Schfer and Claus W. Heizmann. The S100 family of EF-hand
calcium-binding proteins: functions and pathology. Trends in Biochemical
Sciences, 21(4):134–140, April 1996. ISSN 0968-0004.
[210] Helena Hernndez and Carol V. Robinson. Determining the stoichiometry and
interactions of macromolecular assemblies from mass spectrometry. Nature
Protocols, 2(3):715–726, March 2007. ISSN 1754-2189.
[211] Werner W. Streicher, Maria M. Lopez, and George I. Makhatadze. Modulation
of Quaternary Structure of S100 Proteins by Calcium Ions. Biophysical
chemistry, 151(3):181–186, October 2010. ISSN 0301-4622.
[212] M. M. Yamashita, L. Wesson, G. Eisenman, and D. Eisenberg. Where metal
ions bind in proteins. Proceedings of the National Academy of Sciences, 87
(15):5648–5652, August 1990. ISSN 0027-8424, 1091-6490.
[213] Mariana Babor, Sergey Gerzon, Barak Raveh, Vladimir Sobolev, and Marvin
Edelman. Prediction of transition metal-binding sites from apo protein
structures. Proteins: Structure, Function, and Bioinformatics, 70(1):208–217,
January 2008. ISSN 1097-0134.
[214] Jeffrey T. Rubino and Katherine J. Franz. Coordination chemistry of copper
proteins: How nature handles a toxic cargo for essential function. Journal of
Inorganic Biochemistry, 107(1):129–143, February 2012. ISSN 0162-0134.
[215] Liam J. Holt, Brian B. Tuch, Judit Villn, Alexander D. Johnson, Steven P.
Gygi, and David O. Morgan. Global Analysis of Cdk1 Substrate
Phosphorylation Sites Provides Insights into Evolution. Science, 325(5948):
1682–1686, September 2009. ISSN 0036-8075, 1095-9203.
[216] Alexey V. Gribenko and George I. Makhatadze. Oligomerization and divalent
ion binding properties of the S100p protein: a Ca2+/Mg2+-switch model1.
Journal of Molecular Biology, 283(3):679–694, October 1998. ISSN 0022-2836.
[217] J. S. Mills and J. D. Johnson. Metal ions as allosteric regulators of calmodulin.
Journal of Biological Chemistry, 260(28):15100–15105, December 1985. ISSN
0021-9258, 1083-351X.
206
[218] Zenon Grabarek. Insights into Modulation of Calcium Signaling by Magnesium
in Calmodulin, Troponin C and Related EF-hand Proteins. Biochimica et
biophysica acta, 1813(5):913–921, May 2011. ISSN 0006-3002.
[219] Hee Jung Chung, Du Young Ko, Hyo Jung Moon, and Byeongmoon Jeong.
EF-Hand Mimicking Calcium Binding Polymer. Biomacromolecules, 17(3):
1075–1082, March 2016. ISSN 1526-4602.
[220] Per Bjrk, Anders Bjrk, Thomas Vogl, Martin Stenstrm, David Liberg, Anders
Olsson, Johannes Roth, Fredrik Ivars, and Tomas Leanderson. Identification
of Human S100a9 as a Novel Target for Treatment of Autoimmune Disease
via Binding to Quinoline-3-Carboxamides. PLOS Biol, 7(4):e1000097, April
2009. ISSN 1545-7885.
[221] Claus Kerkhoff, Thomas Vogl, Wolfgang Nacken, Claudia Sopalla, and
Clemens Sorg. Zinc binding reverses the calcium-induced arachidonic
acid-binding capacity of the S100a8/A9 protein complex. FEBS Letters, 460
(1):134–138, October 1999. ISSN 1873-3468.
[222] Derek M. Gagnon, Megan Brunjes Brophy, Sarah E. J. Bowman, Troy A.
Stich, Catherine L. Drennan, R. David Britt, and Elizabeth M. Nolan.
Manganese binding properties of human calprotectin under conditions of high
and low calcium: X-ray crystallographic and advanced electron paramagnetic
resonance spectroscopic analysis. Journal of the American Chemical Society,
137(8):3004–3016, March 2015. ISSN 1520-5126.
[223] Alexander Hopt, Stefan Korte, Herbert Fink, Ulrich Panne, Reinhard Niessner,
Reinhard Jahn, Hans Kretzschmar, and Jochen Herms. Methods for studying
synaptosomal copper release. Journal of Neuroscience Methods, 128(1-2):
159–172, September 2003. ISSN 0165-0270.
[224] Taisun H. Hyun, Elizabeth Barrett-Connor, and David B. Milne. Zinc intakes
and plasma concentrations in men with osteoporosis: the Rancho Bernardo
Study. The American Journal of Clinical Nutrition, 80(3):715–721, September
2004. ISSN 0002-9165, 1938-3207.
[225] Haimoto H, Hosoda S, and Kato K. Differential distribution of immunoreactive
S100-alpha and S100-beta proteins in normal nonnervous human tissues.
Laboratory investigation; a journal of technical methods and pathology, 57(5):
489–498, 1987. ISSN 0023-6837.
[226] D. B. Zimmer and L. J. Van Eldik. Tissue distribution of rat S100 alpha and
S100 beta and S100-binding proteins. American Journal of Physiology - Cell
Physiology, 252(3):C285–C289, March 1987. ISSN 0363-6143, 1522-1563.
207
[227] Jacek Kunicki, Anna Filipek, Peter Heimann, Leszek Kaczmarek, and Boena
Kamiska. Tissue specific distribution of calcyclin 10.5 kDa Ca2+ -binding
protein. FEBS Letters, 254(1):141–144, August 1989. ISSN 0014-5793.
[228] Danna B. Zimmer, Emily H. Cornwall, Aimee Landar, and Wei Song. The
S100 protein family: History, function, and expression. Brain Research
Bulletin, 37(4):417–429, 1995. ISSN 0361-9230.
[229] Alexey V. Gribenko, James E. Hopper, and George I. Makhatadze. Molecular
Characterization and Tissue Distribution of a Novel Member of the S100
Family of EF-Hand Proteins,. Biochemistry, 40(51):15538–15548, December
2001. ISSN 0006-2960.
[230] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic
local alignment search tool. Journal of Molecular Biology, 215(3):403–410,
October 1990. ISSN 0022-2836.
[231] Rasko Leinonen, Hideaki Sugawara, and Martin Shumway. The Sequence Read
Archive. Nucleic Acids Research, 39(Database issue):D19–D21, January 2011.
ISSN 0305-1048.
[232] Manfred G. Grabherr, Brian J. Haas, Moran Yassour, Joshua Z. Levin,
Dawn A. Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima
Raychowdhury, Qiandong Zeng, Zehua Chen, Evan Mauceli, Nir Hacohen,
Andreas Gnirke, Nicholas Rhind, Federica di Palma, Bruce W. Birren, Chad
Nusbaum, Kerstin Lindblad-Toh, Nir Friedman, and Aviv Regev. Trinity:
reconstructing a full-length transcriptome without a genome from RNA-Seq
data. Nature biotechnology, 29(7):644–652, May 2011. ISSN 1087-0156.
[233] W. Li, L. Jaroszewski, and A. Godzik. Clustering of highly homologous
sequences to reduce the size of large protein databases. Bioinformatics
(Oxford, England), 17(3):282–283, March 2001. ISSN 1367-4803.
[234] Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell. MSAProbs: multiple
sequence alignment based on pair hidden Markov models and partition
function posterior probabilities. Bioinformatics, 26(16):1958–1964, August
2010. ISSN 1367-4803, 1460-2059.
[235] Anders Larsson. AliView: a fast and lightweight alignment viewer and editor
for large datasets. Bioinformatics, 30(22):3276–3278, November 2014. ISSN
1367-4803, 1460-2059.
[236] Stphane Guindon, Jean-Franois Dufayard, Vincent Lefort, Maria Anisimova,
Wim Hordijk, and Olivier Gascuel. New algorithms and methods to estimate
maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.
Systematic Biology, 59(3):307–321, May 2010. ISSN 1076-836X.
208
[237] Si Quang Le and Olivier Gascuel. An Improved General Amino Acid
Replacement Matrix. Molecular Biology and Evolution, 25(7):1307–1320, July
2008. ISSN 0737-4038, 1537-1719.
[238] Maria Anisimova, Manuel Gil, Jean-Franois Dufayard, Christophe Dessimoz,
and Olivier Gascuel. Survey of Branch Support Methods Demonstrates
Accuracy, Power, and Robustness of Fast Likelihood-based Approximation
Schemes. Systematic Biology, page syr041, May 2011. ISSN 1063-5157,
1076-836X.
[239] Andre J. Aberer, Kassian Kobert, and Alexandros Stamatakis. ExaBayes:
Massively Parallel Bayesian Tree Inference for the Whole-Genome Era.
Molecular Biology and Evolution, page msu236, August 2014. ISSN 0737-4038,
1537-1719.
[240] David T. Jones, William R. Taylor, and Janet M. Thornton. The rapid
generation of mutation data matrices from protein sequences. Computer
applications in the biosciences : CABIOS, 8(3):275–282, June 1992. ISSN
1367-4803, 1460-2059.
[241] Stanley C. Gill and Peter H. von Hippel. Calculation of protein extinction
coefficients from amino acid sequence data. Analytical Biochemistry, 182(2):
319–326, November 1989. ISSN 0003-2697.
[242] John M. Walker, editor. The Proteomics Protocols Handbook. Humana Press,
Totowa, NJ, 2005. ISBN 978-1-58829-343-5 978-1-59259-890-8.
[243] B. Birdsall, R. W. King, M. R. Wheeler, C. A. Lewis, S. R. Goode, R. B.
Dunlap, and G. C. Roberts. Correction for light absorption in fluorescence
studies of protein-ligand interactions. Analytical Biochemistry, 132(2):
353–361, July 1983. ISSN 0003-2697.
[244] P Schuck. Size-distribution analysis of macromolecules by sedimentation
velocity ultracentrifugation and lamm equation modeling. Biophysical
Journal, 78(3):1606–1619, March 2000. ISSN 0006-3495.
[245] Patrick H. Brown and Peter Schuck. Macromolecular Size-and-Shape
Distributions by Sedimentation Velocity Analytical Ultracentrifugation.
Biophysical Journal, 90(12):4651–4661, June 2006. ISSN 0006-3495.
[246] Benjamin Webb and Andrej Sali. Comparative Protein Structure Modeling
Using MODELLER. In Current Protocols in Bioinformatics. John Wiley &
Sons, Inc., 2002. ISBN 978-0-471-25095-1.
209
[247] Iktae Kim, Ko On Lee, Young-Joo Yun, Jea Yeon Jeong, Eun-Hee Kim,
Haekap Cheong, Kyoung-Seok Ryu, Nak-Kyoon Kim, and Jeong-Yong Suh.
Biophysical characterization of Ca2+-binding of S100a5 and Ca2+-induced
interaction with RAGE. Biochemical and Biophysical Research
Communications, 483(1):332–338, January 2017. ISSN 0006-291X.
[248] Adrian M. Fischl, Paula M. Heron, Arnold J. Stromberg, and Timothy S.
McClintock. Activity-Dependent Genes in Mouse Olfactory Sensory Neurons.
Chemical Senses, 39(5):439–449, June 2014. ISSN 0379-864X.
[249] Takumi Teratani, Takumi Watanabe, Kaori Yamahara, Hiromichi Kumagai,
Akira Ishikawa, Kazumori Arai, and Ryushi Nozawa. Restricted Expression of
Calcium-Binding Protein S100a5 in Human Kidney. Biochemical and
Biophysical Research Communications, 291(3):623–627, March 2002. ISSN
0006-291X.
[250] S. Hancq, I. Salmon, J. Brotchi, O. De Witte, H.-J. Gabius, C. W. Heizmann,
R. Kiss, and C. Decaestecker. S100a5: a marker of recurrence in WHO grade
I meningiomas. Neuropathology and Applied Neurobiology, 30(2):178–187,
April 2004. ISSN 1365-2990.
[251] WikiGenes - Collaborative Publishing, .
[252] S100a5 Gene - GeneCards | S10a5 Protein | S10a5 Antibody, .
[253] S100a5 - Protein S100-A5 - Homo sapiens (Human) - S100a5 gene & protein, .
[254] pytc: python program for analyzing isothermal titration calorimetry data, May
2017.
[255] Huaying Zhao, Grzegorz Piszczek, and Peter Schuck. SEDPHAT A platform
for global ITC analysis and global multi-method analysis of molecular
interactions. Methods, 76(Supplement C):137–148, April 2015. ISSN
1046-2023.
[256] Yi Zhang, Shreeram Akilesh, and Dean E. Wilcox. Isothermal Titration
Calorimetry Measurements of Ni(II) and Cu(II) Binding to His, GlyGlyHis,
HisGlyHis, and Bovine Serum Albumin: A Critical Evaluation. Inorganic
Chemistry, 39(14):3057–3064, July 2000. ISSN 0020-1669.
[257] Melissa Liriano. Protein dynamics of calcium-S100a5 in the presence and
absence of target peptide. Grantome.
[258] Johannes Reisert, Paul J. Bauer, King-Wai Yau, and Stephan Frings. The
Ca-activated Cl Channel and its Control in Rat Olfactory Receptor Neurons.
The Journal of General Physiology, 122(3):349–364, September 2003. ISSN
0022-1295, 1540-7748.
210
[259] Bronwen Gardner, Birger V. Dieriks, Steve Cameron, Lakshini H. S. Mendis,
Clinton Turner, Richard L. M. Faull, and Maurice A. Curtis. Metal
concentrations and distributions in the human olfactory bulb in Parkinsons
disease. Scientific Reports, 7(1):10454, September 2017. ISSN 2045-2322.
[260] J. Herms, T. Tings, S. Gall, A. Madlung, A. Giese, H. Siebert, P. Schrmann,
O. Windl, N. Brose, and H. Kretzschmar. Evidence of presynaptic location
and function of the prion protein. The Journal of Neuroscience: The Official
Journal of the Society for Neuroscience, 19(20):8866–8875, October 1999.
ISSN 1529-2401.
[261] M. S. Horning and P. Q. Trombley. Zinc and copper influence excitability of
rat olfactory bulb neurons by multiple mechanisms. Journal of
Neurophysiology, 86(4):1652–1660, October 2001. ISSN 0022-3077.
[262] Shin-Ichi Ono and M. George Cherian. Regional distribution of
metallothionein, zinc, and copper in the brain of different strains of rats.
Biological Trace Element Research, 69(2):151–159, August 1999. ISSN
0163-4984, 1559-0720.
[263] Olga V. Moroz, Elena V. Blagova, Anthony J. Wilkinson, Keith S. Wilson, and
Igor B. Bronstein. The Crystal Structures of Human S100a12 in Apo Form
and in Complex with Zinc: New Insights into S100a12 Oligomerisation.
Journal of Molecular Biology, 391(3):536–551, August 2009. ISSN 0022-2836.
[264] Sandro Keller, Carolyn Vargas, Huaying Zhao, Grzegorz Piszczek, Chad A.
Brautigam, and Peter Schuck. High-Precision Isothermal Titration
Calorimetry with Automated Peak Shape Analysis. Analytical Chemistry, 84
(11):5066–5073, June 2012. ISSN 0003-2700.
[265] T. Wiseman, S. Williston, J. F. Brandts, and L. N. Lin. Rapid measurement of
binding constants and heats of binding using a new titration calorimeter.
Analytical Biochemistry, 179(1):131–137, May 1989. ISSN 0003-2697.
[266] Ernesto Freire, Arne Schn, and Adrian VelazquezCampoy. Chapter 5
Isothermal Titration Calorimetry: General Formalism Using Binding
Polynomials. In Methods in Enzymology, volume 455 of Biothermodynamics,
Part A, pages 127–155. Academic Press, January 2009.
[267] Andres Kreegipuu, Nikolaj Blom, Sren Brunak, and Jaak Jrv. Statistical
analysis of protein kinase specificity determinants. FEBS Letters, 430(1):
45–50, June 1998. ISSN 0014-5793.
211
[268] Kenji S. Nakahara, Chikara Masuta, Syouta Yamada, Hanako Shimura, Yukiko
Kashihara, Tomoko S. Wada, Ayano Meguro, Kazunori Goto, Kazuki
Tadamura, Kae Sueda, Toru Sekiguchi, Jun Shao, Noriko Itchoda, Takeshi
Matsumura, Manabu Igarashi, Kimihito Ito, Richard W. Carthew, and Ichiro
Uyeda. Tobacco calmodulin-like protein provides secondary defense by
binding to and directing degradation of virus RNA silencing suppressors.
Proceedings of the National Academy of Sciences, 109(25):10113–10118, June
2012. ISSN 0027-8424, 1091-6490.
[269] Paola Bertolazzi, Mary Ellen Bock, and Concettina Guerra. On the functional
and structural characterization of hubs in protein-protein interaction
networks. Biotechnology Advances, 31(2):274–286, April 2013. ISSN
1873-1899.
[270] So Nakagawa, Stephen S. Gisselbrecht, Julia M. Rogers, Daniel L. Hartl, and
Martha L. Bulyk. DNA-binding specificity changes in the evolution of
forkhead transcription factors. Proceedings of the National Academy of
Sciences, 110(30):12349–12354, July 2013. ISSN 0027-8424, 1091-6490.
[271] Olga Khersonsky and Dan S. Tawfik. Enzyme Promiscuity: A Mechanistic and
Evolutionary Perspective. Annual Review of Biochemistry, 79(1):471–505,
2010.
[272] William H. Hudson, Bradley R. Kossmann, Ian Mitchelle S. de Vera, Shih-Wei
Chuo, Emily R. Weikum, Geeta N. Eick, Joseph W. Thornton, Ivaylo N.
Ivanov, Douglas J. Kojetin, and Eric A. Ortlund. Distal substitutions drive
divergent DNA specificity among paralogous transcription factors through
subdivision of conformational space. Proceedings of the National Academy of
Sciences, page 201518960, December 2015. ISSN 0027-8424, 1091-6490.
[273] T. Alhindi, Z. Zhang, P. Ruelens, H. Coenen, H. Degroote, N. Iraci, and
K. Geuten. Protein interaction evolution from promiscuity to specificity with
reduced flexibility in an increasingly complex network. Scientific Reports, 7,
March 2017. ISSN 2045-2322.
[274] Carla Mouta Carreira, Theresa M. LaVallee, Francesca Tarantini, Anthony
Jackson, Julia Tait Lathrop, Brian Hampton, Wilson H. Burgess, and Thomas
Maciag. S100a13 Is Involved in the Regulation of Fibroblast Growth Factor-1
and p40 Synaptotagmin-1 Release in Vitro. Journal of Biological Chemistry,
273(35):22224–22231, August 1998. ISSN 0021-9258, 1083-351X.
[275] Werner W. Streicher, Maria M. Lopez, and George I. Makhatadze. Annexin I
and Annexin II N-Terminal Peptides Binding to S100 Protein Family
Members: Specificity and Thermodynamic Characterization. Biochemistry, 48
(12):2788–2798, March 2009. ISSN 0006-2960.
212
[276] S. Blair Hedges, Joel Dudley, and Sudhir Kumar. TimeTree: A Public
Knowledge-Base of Divergence Times among Organisms. Bioinformatics, 22
(23):2971–2972, December 2006. ISSN 1367-4803.
[277] Wies\lawa Leniak, \Lukasz P. S\lomnicki, and Anna Filipek. S100a6 New
Facts and Features. Biochemical and Biophysical Research Communications,
390(4):1087–1092, December 2009. ISSN 0006-291X.
[278] \Lukasz P. S\lomnicki, Barbara Nawrot, and Wies\lawa Leniak. S100a6 Binds
P53 and Affects Its Activity. The International Journal of Biochemistry &
Cell Biology, 41(4):784–790, April 2009. ISSN 1357-2725.
[279] Jan van Dieck, Maria R. Fernandez-Fernandez, Dmitry B. Veprintsev, and
Alan R. Fersht. Modulation of the Oligomerization State of P53 by
Differential Binding of Proteins of the S100 Family to P53 Monomers and
Tetramers. Journal of Biological Chemistry, 284(20):13804–13811, May 2009.
ISSN 0021-9258, 1083-351X.
[280] Young-Tae Lee, Yoana N. Dimitrova, Gabriela Schneider, Whitney B.
Ridenour, Shibani Bhattacharya, Sarah E. Soss, Richard M. Caprioli, Anna
Filipek, and Walter J. Chazin. Structure of the S100a6 Complex with a
Fragment from the C-Terminal Domain of Siah-1 Interacting Protein: A Novel
Mode for S100 Protein Target Recognition. Biochemistry, 47(41):
10921–10932, October 2008. ISSN 0006-2960.
[281] Melissa A. Liriano. Structure, Dynamics and Function of S100B and S100A5
Complexes. Ph.D., University of Maryland, Baltimore, United States
Maryland, 2012.
[282] Thomas K. Knott, Pasil A. Madany, Ashley A. Faden, Mei Xu, Jrg Strotmann,
Timothy R. Henion, and Gerald A. Schwarting. Olfactory Discrimination
Largely Persists in Mice with Defects in Odorant Receptor Expression and
Axon Guidance. Neural development, 7(1):17, 2012.
[283] Jeremy C. McIntyre, Erica E. Davis, Ariell Joiner, Corey L. Williams, I.-Chun
Tsai, Paul M. Jenkins, Dyke P. McEwen, Lian Zhang, John Escobado, Sophie
Thomas, and others. Gene Therapy Rescues Cilia Defects and Restores
Olfactory Function in a Mammalian Ciliopathy Model. Nature medicine, 18
(9):1423–1428, 2012.
[284] Tsviya Olender, Ifat Keydar, Jayant M. Pinto, Pavlo Tatarskyy, Anna Alkelai,
Ming-Shan Chien, Simon Fishilevich, Diego Restrepo, Hiroaki Matsunami,
Yoav Gilad, and Doron Lancet. The Human Olfactory Transcriptome. BMC
genomics, 17(1):619, August 2016. ISSN 1471-2164.
213
[285] Robert C. Edgar. MUSCLE: multiple sequence alignment with high accuracy
and high throughput. Nucleic Acids Research, 32(5):1792–1797, March 2004.
ISSN 0305-1048.
[286] Ludovic R. Otterbein, Jolanta Kordowska, Carlos Witte-Hoffmann,
C. L. Albert Wang, and Roberto Dominguez. Crystal Structures of S100a6 in
the Ca2+-Free and Ca2+-Bound States: The Calcium Sensor Mechanism of
S100 Proteins Revealed at Atomic Resolution. Structure, 10(4):557–567, April
2002. ISSN 0969-2126.
[287] Si Quang Le and Olivier Gascuel. Accounting for Solvent Accessibility and
Secondary Structure in Protein Phylogenetics Is Clearly Beneficial.
Systematic Biology, 59(3):277–287, May 2010. ISSN 1063-5157.
[288] Z. Yang, S. Kumar, and M. Nei. A New Method of Inference of Ancestral
Nucleotide and Amino Acid Sequences. Genetics, 141(4):1641–1650,
December 1995. ISSN 0016-6731, 1943-2631.
[289] P R Connelly and J A Thomson. Heat Capacity Changes and Hydrophobic
Interactions in the Binding of FK506 and Rapamycin to the FK506 Binding
Protein. Proceedings of the National Academy of Sciences of the United States
of America, 89(11):4781–4785, June 1992. ISSN 0027-8424.
[290] Misha Soskine and Dan S. Tawfik. Mutational effects and the evolution of new
protein functions. Nature Reviews Genetics, 11(8):572–582, August 2010.
ISSN 1471-0056.
[291] Taisong Zou, Valeria A. Risso, Jose A. Gavira, Jose M. Sanchez-Ruiz, and
S. Banu Ozkan. Evolution of Conformational Dynamics Determines the
Conversion of a Promiscuous Generalist into a Specialist Enzyme. Molecular
Biology and Evolution, 32(1):132–143, January 2015. ISSN 0737-4038,
1537-1719.
[292] Sean Michael Carroll, Jamie T. Bridgham, and Joseph W. Thornton.
Evolution of Hormone Signaling in Elasmobranchs by Exploitation of
Promiscuous Receptors. Molecular Biology and Evolution, 25(12):2643–2652,
December 2008. ISSN 0737-4038.
[293] Titu Devamani, Alissa M. Rauwerdink, Mark Lunzer, Bryan J. Jones,
Joanna L. Mooney, Maxilmilien Alaric O. Tan, Zhi-Jun Zhang, Jian-He Xu,
Antony M. Dean, and Romas J. Kazlauskas. Catalytic Promiscuity of
Ancestral Esterases and Hydroxynitrile Lyases. Journal of the American
Chemical Society, 138(3):1046–1056, January 2016. ISSN 0002-7863.
214
[294] Karin Voordeckers, Ksenia Pougach, and Kevin J Verstrepen. How do
regulatory networks evolve and expand throughout evolution? Current
Opinion in Biotechnology, 34(Supplement C):180–188, August 2015. ISSN
0958-1669.
[295] Clayton D. Carlson, Christopher L. Warren, Karl E. Hauschild, Mary S. Ozers,
Naveeda Qadir, Devesh Bhimsaria, Youngsook Lee, Franco Cerrina, and
Aseem Z. Ansari. Specificity landscapes of DNA binding molecules elucidate
biological function. Proceedings of the National Academy of Sciences, 107(10):
4544–4549, March 2010. ISSN 0027-8424, 1091-6490.
[296] Douglas M. Fowler, Carlos L. Araya, Sarel J. Fleishman, Elizabeth H. Kellogg,
Jason J. Stephany, David Baker, and Stanley Fields. High-resolution mapping
of protein sequence-function relationships. Nature Methods, 7(9):741–746,
September 2010. ISSN 1548-7091.
[297] Joan Teyra, Sachdev S. Sidhu, and Philip M. Kim. Elucidation of the binding
preferences of peptide recognition modules: SH3 and PDZ domains. FEBS
letters, 586(17):2631–2637, August 2012. ISSN 1873-3468.
[298] Matthew Slattery, Todd Riley, Peng Liu, Namiko Abe, Pilar Gomez-Alcala,
Iris Dror, Tianyin Zhou, Remo Rohs, Barry Honig, Harmen J. Bussemaker,
and Richard S. Mann. Cofactor binding evokes latent differences in DNA
binding specificity between Hox proteins. Cell, 147(6):1270–1282, December
2011. ISSN 0092-8674.
[299] UniProt: a hub for protein information. Nucleic Acids Research, 43(D1):
D204–D212, January 2015. ISSN 0305-1048.
[300] Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. Entrez
Gene: gene-centered information at NCBI. Nucleic Acids Research, 33
(suppl 1):D54–D58, January 2005. ISSN 0305-1048.
[301] Frank Delaglio, Stephan Grzesiek, Geerten W. Vuister, Guang Zhu, John
Pfeifer, and Ad Bax. NMRPipe: A Multidimensional Spectral Processing
System Based on UNIX Pipes. Journal of Biomolecular NMR, 6(3):277–293,
November 1995. ISSN 0925-2738, 1573-5001.
[302] S. P. Skinner, B. T. Goult, R. H. Fogh, W. Boucher, T. J. Stevens, E. D. Laue,
and G. W. Vuister. Structure Calculation, Refinement and Validation Using
CcpNmr Analysis. Acta Crystallographica Section D: Biological
Crystallography, 71(1):154–161, January 2015. ISSN 1399-0047.
[303] Dmitrij Frishman and Patrick Argos. Knowledge-Based Protein Secondary
Structure Assignment. Proteins: Structure, Function, and Bioinformatics, 23
(4):566–579, December 1995. ISSN 1097-0134.
215
[304] Simon J. Hubbard and Janet M. Thornton. Naccess. Computer Program,
Department of Biochemistry and Molecular Biology, University College
London, 2(1), 1993.
[305] Ziheng Yang. PAML 4: Phylogenetic Analysis by Maximum Likelihood.
Molecular Biology and Evolution, 24(8):1586–1591, August 2007. ISSN
0737-4038.
[306] Hiroyuki Kanzaki, Kentaro Yoshida, Hiromasa Saitoh, Koki Fujisaki, Akiko
Hirabuchi, Ludovic Alaux, Elisabeth Fournier, Didier Tharreau, and Ryohei
Terauchi. Arms race co-evolution of Magnaporthe oryzae AVR-Pik and rice
Pik genes driven by their physical interactions. The Plant Journal, 72(6):
894–907, December 2012. ISSN 1365-313X.
[307] Miriam Kaltenbach and Nobuhiko Tokuriki. Dynamics and constraints of
enzyme evolution. Journal of Experimental Zoology Part B: Molecular and
Developmental Evolution, 322(7):468–487, November 2014. ISSN 1552-5015.
[308] Ranjan V. Mannige, Charles L. Brooks, and Eugene I. Shakhnovich. A
Universal Trend among Proteomes Indicates an Oily Last Common Ancestor.
PLoS Comput Biol, 8(12):e1002839, December 2012.
[309] Chris Todd Hittinger and Sean B. Carroll. Gene duplication and the adaptive
evolution of a classic genetic switch. Nature, 449(7163):677–681, October
2007. ISSN 0028-0836.
[310] Ksenia Pougach, Arnout Voet, Fyodor A. Kondrashov, Karin Voordeckers,
Joaquin F. Christiaens, Bianka Baying, Vladimir Benes, Ryo Sakai, Jan Aerts,
Bo Zhu, Patrick Van Dijck, and Kevin J. Verstrepen. Duplication of a
promiscuous transcription factor drives the emergence of a new regulatory
network. Nature Communications, 5, September 2014. ISSN 2041-1723.
[311] Alissa Rauwerdink, Mark Lunzer, Titu Devamani, Bryan Jones, Joanna
Mooney, Zhi-Jun Zhang, Jian-He Xu, Romas J. Kazlauskas, and Antony M.
Dean. Evolution of a Catalytic Mechanism. Molecular Biology and Evolution,
33(4):971–979, April 2016. ISSN 1537-1719.
[312] Sachdev S. Sidhu, Henry B. Lowman, Brian C. Cunningham, and James A.
Wells. Phage Display for Selection of Novel Binding Peptides. Methods in
Enzymology, 328:333–IN5, January 2000. ISSN 0076-6879.
[313] William G. T. Willats. Phage Display: Practicalities and Prospects. Plant
Molecular Biology, 50(6):837–854, December 2002. ISSN 0167-4412,
1573-5028.
216
[314] Shuichi Kawashima, Piotr Pokarowski, Maria Pokarowska, Andrzej Kolinski,
Toshiaki Katayama, and Minoru Kanehisa. AAindex: Amino Acid Index
Database, Progress Report 2008. Nucleic Acids Research, 36(Database issue):
D202–205, 2008. ISSN 1362-4962.
[315] Alex S. Holehouse, James Ahad, Rahul K. Das, and Rohit V. Pappu. CIDER:
Classification of Intrinsically Disordered Ensemble Regions. Biophysical
Journal, 108(2):228a, January 2015. ISSN 0006-3495.
[316] Leo Breiman. Random Forests. Machine learning, 45(1):5–32, 2001.
[317] Phillip A. Steindel, Emily H. Chen, Jacob D. Wirth, and Douglas L. Theobald.
Gradual neofunctionalization in the convergent evolution of trichomonad
lactate and malate dehydrogenases. Protein Science, 25(7):1319–1331, July
2016. ISSN 1469-896X.
[318] Rafael G. Miranda, Margarita Rojas, Michael P. Montgomery, Kyle P. Gribbin,
and Alice Barkan. RNA binding specificity landscape of the pentatricopeptide
repeat protein PPR10. RNA, page rna.059568.116, January 2017. ISSN
1355-8382, 1469-9001.
[319] Tyler N. Starr, Lora K. Picton, and Joseph W. Thornton. Alternative
evolutionary histories in the sequence space of an ancient protein. Nature, 549
(7672):409–413, September 2017. ISSN 0028-0836.
[320] Andr Zelanis, Pitter F. Huesgen, Ana Karina Oliveira, Alexandre K. Tashima,
Solange M. T. Serrano, and Christopher M. Overall. Snake venom serine
proteinases specificity mapping by proteomic identification of cleavage sites.
Journal of Proteomics, 113:260–267, January 2015. ISSN 1874-3919.
[321] Frances H. Arnold. Protein engineering for unusual environments. Current
Opinion in Biotechnology, 4(4):450–455, August 1993. ISSN 0958-1669.
[322] Prachi Anand, Alison ONeil, Emily Lin, Trevor Douglas, and Mand Holford.
Tailored delivery of analgesic ziconotide across a blood brain barrier model
using viral nanocontainers. Scientific Reports, 5:srep12497, August 2015.
ISSN 2045-2322.
[323] Benjamin Schwarz, Kaitlyn M. Morabito, Tracy J. Ruckwardt, Dustin P.
Patterson, John Avera, Heini M. Miettinen, Barney S. Graham, and Trevor
Douglas. Viruslike Particles Encapsidating Respiratory Syncytial Virus M and
M2 Proteins Induce Robust T Cell Responses. ACS Biomaterials Science &
Engineering, 2(12):2324–2332, December 2016.
[324] Frances H. Arnold. Directed Evolution: Bringing New Chemistry to Life.
Angewandte Chemie International Edition, pages n/a–n/a. ISSN 1521-3773.
217
[325] Stephan C. Hammer, Anders M. Knight, and Frances H. Arnold. Design and
evolution of enzymes for non-natural chemistry. Current Opinion in Green
and Sustainable Chemistry, 7(Supplement C):23–30, October 2017. ISSN
2452-2236.
[326] Martin Ester, Hans-Peter Kriegel, Jrg Sander, and Xiaowei Xu. A
Density-Based Algorithm for Discovering Clusters in Large Spatial Databases
with Noise. pages 226–231. AAAI Press, 1996.
[327] Fred J. Damerau. A Technique for Computer Detection and Correction of
Spelling Errors. Commun. ACM, 7(3):171–176, March 1964. ISSN 0001-0782.
[328] S. van der Walt, S. C. Colbert, and G. Varoquaux. The NumPy Array: A
Structure for Efficient Numerical Computation. Computing in Science
Engineering, 13(2):22–30, March 2011. ISSN 1521-9615.
[329] Eric Jones, Travis Oliphant, Pearu Peterson, and others. SciPy: Open source
scientific tools for Python. 2001.
[330] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science
& Engineering, 9(3):90–95, 2007.
[331] Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen.
Classification and Regression Trees. CRC press, 1984.
[332] Fabian Pedregosa, Gal Varoquaux, Alexandre Gramfort, Vincent Michel,
Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David
Cournapeau, Matthieu Brucher, Matthieu Perrot, and douard Duchesnay.
Scikit-Learn: Machine Learning in Python. Journal of Machine Learning
Research, 12:2825–2830, 2011.
[333] Tara Hessa, Hyun Kim, Karl Bihlmaier, Carolina Lundin, and others.
Recognition of Transmembrane Helices by the Endoplasmic Reticulum
Translocon. Nature, 433(7024):377, 2005.
[334] D. M. Engelman, T. A. Steitz, Goldman, and A. Identifying Nonpolar
Transbilayer Helices in Amino Acid Sequences of Membrane Proteins. Annual
Review of Biophysics and Biophysical Chemistry, 15(1):321–353, 1986.
[335] Catherine H. Schein. Solubility as a Function of Protein Structure and Solvent
Components. Nature Biotechnology, 8(4):308–317, April 1990. ISSN
1087-0156.
[336] J. Kyte and R. F. Doolittle. A Simple Method for Displaying the Hydropathic
Character of a Protein. Journal of Molecular Biology, 157(1):105–132, May
1982. ISSN 0022-2836.
218
[337] William C. Wimley and Stephen H. White. Experimentally Determined
Hydrophobicity Scale for Proteins at Membrane Interfaces. Nature Structural
& Molecular Biology, 3(10):842–848, October 1996.
[338] T. P. Hopp and K. R. Woods. Prediction of Protein Antigenic Determinants
from Amino Acid Sequences. Proceedings of the National Academy of Sciences
of the United States of America, 78(6):3824–3828, June 1981. ISSN 0027-8424.
[339] P. Y. Chou and G. D. Fasman. Empirical Predictions of Protein
Conformation. Annual Review of Biochemistry, 47:251–276, 1978. ISSN
0066-4154.
[340] Hyun Joo and Jerry Tsai. An Amino Acid Code for -Sheet Packing Structure.
Proteins, 82(9):2128–2140, September 2014. ISSN 0887-3585.
[341] F. M. Richards. Areas, Volumes, Packing and Protein Structure. Annual
Review of Biophysics and Bioengineering, 6:151–176, 1977. ISSN 0084-6589.
219