Evolution of Innate Immune Protein Complexes, Toll-like Receptor 4 and Calprotectin, in Early 
Vertebrates and Zebrafish 
by  
Kona Nikole Orlandi 
  
A dissertation accepted and approved in partial fulfillment of the   
requirements for the degree of  
Doctor of Philosophy   
in Biology 
  
Dissertation Committee:  
David Garcia, Chair  
Michael Harms, Advisor  
Karen Guillemin, Core Member  
Laura McKnight, Core Member  
Raghuveer Parthasarathy, Institutional Representative  
 
 
University of Oregon  
 
Spring 2024 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2024 Kona Nikole Orlandi  
  
  
 2 
DISSERTATION ABSTRACT 
 
Kona Nikole Orlandi 
 
Doctor of Philosophy in Biology 
 
Title: Evolution of Innate Immune Protein Complexes, Toll-like Receptor 4 and Calprotectin, in 
Early Vertebrates and Zebrafish 
 
 
The innate immune system is our first line of defense against pathogens as well as our 
interface with our commensal microbiota. Toll-like receptor 4 (TLR4) and calprotectin are two 
innate immune proteins that are tightly associated with inflammatory disorders. Zebrafish (Danio 
rerio) has been successfully used to model the human innate immune system, but TLR4 and 
calprotectin models have not been developed because of their significant divergence in humans 
and zebrafish. Here, we set out to reveal the evolutionary and functional relationships between 
human and zebrafish TLR4 and calprotectin. We used phylogenetic analyses to define the 
evolutionary relationships between homologous proteins and characterized their immune 
functions in cell-based assays. We found that an antagonist of human TLR4 is a potent agonist 
for zebrafish TLR4, but when tested in live fish there was no difference in immune stimulation. 
We further investigated the evolutionary origin of this change in ligand specificity and determine 
that TLR4 in the cyprinid order of fish likely convergently evolved sensitivity to LPS. Our 
characterization of zebrafish proteins homologous to human calprotectin also suggest that the 
zebrafish proteins do not share functional similarities to calprotectin during the immune 
response. We conclude that although humans and zebrafish share many immune system 
characteristics, the TLR4 and calprotectin immune responses are not directly comparable. 
This dissertation includes previously published and unpublished co-authored material. 
Supplement includes multiple sequence alignments and phylogenetic trees for TLR4 and MD-2. 
 3 
CURRICULUM VITAE 
 
 
NAME OF AUTHOR:  Kona Nikole Orlandi 
 
 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 
 University of Oregon, Eugene 
 University of California, Santa Cruz 
 
 
DEGREES AWARDED: 
 
 Doctor of Philosophy, Biology, 2024, University of Oregon 
 Bachelor of Science, Biochemistry and Molecular Biology, 2016, University of 
California, Santa Cruz 
 
 
AREAS OF SPECIAL INTEREST: 
 
 Protein Evolution 
 Molecular Biology 
 Biochemistry 
 Immunology 
 
 
PROFESSIONAL EXPERIENCE: 
 
Graduate Student Researcher, University of Oregon, September 2018 – June 2024 
 
Graduate Teaching Assistant, University of Oregon, September 2018 – June 2019 
 
Research Intern, J. Craig Venter Institute for Environmental Genomics, 2018 
 
Post-baccalaureate Researcher, UCSC, 2017 
 
Course Assistant in Eukaryotic Molecular Biology, UCSC, 2017 
 
 
GRANTS, AWARDS, AND HONORS: 
 
Raymond-Stevens Fellowship, University of Oregon, 2024 
 
National Institutes of Health (NIH) Molecular Biology and Biophysics Training Grant 
Appointment T32, University of Oregon, 2019-2022 
 4 
 
Institute of Molecular Biology Best Poster Award, University of Oregon, 2019 
 
Promising Scholar Award, University of Oregon, 2018 
 
B.Sc. with Honors in the Major of Biochemistry and Molecular Biology, UCSC, 2016 
 
Blue and Gold Opportunity Plan Scholar, UCSC, 2012-2015 
 
PUBLICATIONS: 
 
Chisholm LO, Orlandi KN, Phillips SR, Shavlik MJ, Harms MJ. “Ancestral 
Reconstruction and the Evolution of Protein Energy Landscapes” Annual review of biophysics, 
10.1146/annurev-biophys-030722-125440. 22 Dec. 2023, doi:10.1146/annurev-biophys-030722-
125440. 
 
Erin A. Garza, Vincent A. Bielinski, Josh L. Espinoza, Kona Orlandi, Josefa Rivera 
Alfaro, Tayah M. Bolt, Karen Beeri, Philip D. Weyman, and Christopher L. Dupont. “Validating 
a Promoter Library for Application in Plasmid-Based Diatom Genetic Engineering.” ACS 
Synthetic Biology 2023 12 (11), 3215-3228. DOI: 10.1021/acssynbio.3c00163 
 
Orlandi KN*, Phillips SR*, Sailer ZR, Harman JL, Harms MJ. “Topiary: Pruning the 
manual labor from ancestral sequence reconstruction.” Protein Sci. 2023 Feb;32(2):e4551. doi: 
10.1002/pro.4551. PMID: 36565302; PMCID: PMC9847077. 
 
McKnight, L. E., Crandall, J. G., Bailey, T. B., Banks, O. G., Orlandi, K. N., Truong, V. 
N., Donovan, D. A., Waddell, G. L., Wiles, E. T., Hansen, S. D., Selker, E. U., McKnight, J. N. 
Rapid and inexpensive preparation of genome-wide nucleosome footprints from model and non-
model organisms. STAR Protocols 2: 2 (2021). 
 
Orlandi, K. and McKnight, J. Bulky Histone Modifications May Have an Oversized Role 
in Nucleosome Dynamics. BioEssays 42:1 (2020).  
 
 
 5 
ACKNOWLEDGMENTS 
 
First, I would like to thank my advisor, Dr. Mike Harms, for his unwavering support of 
my professional growth and his enthusiasm for scientific exploration. Mike’s commitment to 
facilitating student development as scientists and individuals has been an inspiration to me and 
has allowed me to gain skills and knowledge in several scientific disciplines throughout the 
course of my graduate work, even in areas that Mike is not an expert. He continually provided 
both the challenge and support I needed to become a well-rounded scientist.  
At the time of joining Mike’s lab, there were many reasons for him to feel uncertain 
about taking me on as a trainee: I had not previously worked in his lab, he had already taken on 
two students for that year, and it was the peak of a global pandemic. Without Mike and his lab’s 
optimism and faith, I would not have completed my doctorate at the University of Oregon. I will 
forever be grateful to them for taking me under their wing. I am very lucky to have experienced 
such a supportive lab community throughout the last three and a half years. Thank you, to all past 
and present Harms lab members, especially to those whom I worked most closely with, Corinthia 
Brown, Sophia Phillips, José Sanchez-Borbón, Lauren Chisholm, and Dr. Jon Muyskens. 
I want to express my gratitude to my collaborators and the university staff who made it 
possible for me to accomplish this work. For guidance on zebrafish experiments and 
interpretation of the results, I greatly appreciate Dr. Karen Guillemin, Dr. Cathy Robinson, Dr. 
Raghu Parthasarathy, Dr. Julia Ngo, Piyush Amitabh, Jonah Sokoloff, Patrick Horve, Dr. Adam 
Fries, and Rose Sockol. Thank you so much for teaching me and supporting me on this journey. I 
also greatly value the discussions and advice I received for experiments in bacteria from Dr. 
Jarrod Smith, Dr. Cathy Robinson, Dr. Karen Guillemin, and Dr. Melanie Spero. Thank you to 
Stu Johnson for keeping the institute running and for all of your spot-on song reference sign offs! 
 6 
And finally, my deepest gratitude goes to my past and present dissertation advisory committee 
members, Dr. David Garcia, Dr. Alice Barkan, Dr. Eric Selker, Dr. Karen Guillemin, Dr. Raghu 
Parthasarathy, and Dr. Laura McKnight, for your encouragement, critical questions, and guidance 
throughout the years. 
Special recognition for my success in graduate school is due to the late Dr. Jeff McKnight 
and members of his lab, Dr. Laura McKnight, Dr. Orion Banks, Dr. Thomas Bailey, Vi Truong, 
Dr. Drake Donovan, and Abigail Vaaler. I came to the University of Oregon because Jeff 
believed in me. I had a difficult time adjusting to graduate school, but I found my confidence 
when I joined their exceptionally compassionate and quirky lab community. In addition to his 
lessons in yeast genetics, biochemistry, and chromatin remodeling, Jeff taught me how important 
and effective it is to lift up and create constructive space for others who face discrimination in 
any form, and especially in academia. Jeff was a true role model to me. I will always miss him 
and wonder what could have been. Thank you also to Dr. Hinrich Boeger who helped me get 
started in molecular biology research, connected me with Jeff, and supported my career 
development into graduate school. 
I want to acknowledge Dr. David Garcia and Dr. Alice Barkan for their resolve to support 
me through the difficult transition when I needed to decide how to continue my career foreseeing 
Jeff’s passing. Without this sincere source of assurance, guidance, and support no matter my 
decision, I would have been lost.  
Through all the highs and lows of my graduate experience, I always received love and 
reassurance from the friends I made in Eugene. Thank you to Dr. Ethan Shaw, Dr. Julia Ngo, 
Madelyn Green, Travis Heeren, Dr. Jordan Munroe, Dr. Elizabeth Vargas, Zac Bush, Dr. Michael 
Shavlik, Dr. Bryce LaFoya, Nan Provant, Zac Provant, and everyone on the Specific Heat soccer 
 7 
team for being there for me from the beginning and always creating a warm and upbeat 
atmosphere. The sense of camaraderie among these folks has brought me so much happiness. 
Our community has grown to include more dear friends who have brought abundant joy. Thank 
you, Emily Dennis, Max Horrocks, Sophia Phillips, Acadia DiNardo, Molly Shallow, Jared 
Freedman, William Crow, Sam Horst, Sofia Carlson, Hannah Wilson, and Brenden Campbell. I 
will miss our community, our soccer games, floating, camping, yard games, tailgating, brewing, 
skiing, bonfires, surfing, derby day, and even football watch parties! 
I could not have come to graduate school without the support of my parents, Arlene and 
Bruce Orlandi. They have always valued my education and my dreams and made sacrifices to 
help me continue pursuing my career aspirations, even when that career means I can’t pay my 
own bills until I’m 30. They are my biggest source of encouragement and backing, my first-
choice vacation destination, the roots that keep me grounded, and the best parents anyone could 
ever hope for. I also want to thank my sister, Makayla Orlandi, and my extended family for 
always being a phone call away. 
Finally, I owe very special acknowledgements to my sweet angel pup, Lucy, and my 
fiancé, Ethan. Lucy came into my life when I was 18 and was by my side through 11 years full of 
life’s transitions. We had a profound bond and a tremendous trust in one another that I did not 
know was possible. Her companionship throughout graduate school encouraged me to get 
outside every day, make friends, and remember to appreciate the little things. I wish she could be 
here now to start the next chapter of life with me, but she did leave me in the capable hands of 
her favorite person, Ethan. Thank you, Ethan, for caring so much about Lucy and me from the 
very first moment we met. Your kindness, generosity, and thoughtfulness have always amazed 
me. I am so grateful that you have been, and will always be there to console me when I am 
 8 
feeling down, celebrate with me when things are great, and cook food for me when I forget. It 
has been so special sharing our graduate training together and I can’t wait to spend the rest of my 
life growing and learning with you. 
 
This investigation was supported in part by the National Institute of General Medical 
Sciences through Molecular Biology and Biophysics Training Grant Appointments, 
5T32GM007759-41 & 42, to me and by a grant, R01-GM146114, to Dr. Michael Harms at the 
University of Oregon. This work was also partially funded by a Raymond-Stevens fellowship 
awarded to me. 
 
  
 9 
 
DEDICATION 
 
I dedicate this work to the ones who started this journey with me but aren’t here to 
celebrate its culmination: My Sweet Little Lucy Lady, Auntie Lori Mason, Uncle Greg 
McMurrough, Grandpa Burl Bradbury, and Prof Jeff McKnight. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 10 
TABLE OF CONTENTS 
Chapter Page 
 
 
I. INTRODUCTION .................................................................................................... 19 
 Human microbial communities impart complex influences on our health ............ 20 
 The innate immune system identifies specific microbes via TLR4 ....................... 20 
 Calprotectin mediates TLR4-induced inflammation and fights infections ............ 23 
 Zebrafish is a powerful model organism for studies of the host-microbe 
 interface.................................................................................................................. 24 
 Evolutionary differences confound the use of model organisms to study human  
 biology ................................................................................................................... 25 
 Protein evolution is a framework for mapping between model organisms and  
 humans ................................................................................................................... 26 
 Summary of contributions: .................................................................................... 28 
 Bridge to Chapter II ............................................................................................... 29 
II. TOPIARY: PRUNING THE MANUAL LABOR FROM ANCESTRAL  
SEQUENCE RECONSTRUCTION ............................................................................ 31 
 Abstract .................................................................................................................. 32 
 Introduction ............................................................................................................ 33 
 Overview of Ancestral Sequence Reconstruction.................................................. 35 
Define the Problem .......................................................................................... 35 
Construct a Sequence Dataset .......................................................................... 37 
Sequence Alignment ........................................................................................ 38 
Infer a Maximum Likelihood Gene Tree ......................................................... 38 
 11 
Reconcile the Gene Tree to the Species Tree .................................................. 40 
Reconciliation: The Special Case of Microbial Genes .................................... 41 
Reconstruct Ancestors ..................................................................................... 42 
Evaluate Results ............................................................................................... 43 
 The Topiary Pipeline.............................................................................................. 44 
Software Design ............................................................................................... 45 
Stage 1: Seed to Alignment.............................................................................. 47 
Initial Dataset Construction ............................................................................. 47 
Redundancy Reduction, Quality Control, and Alignment ............................... 49 
Alignment ........................................................................................................ 52 
Stage 2: Alignment to Ancestors ..................................................................... 53 
Infer the Evolutionary Model........................................................................... 53 
Build a Maximum Likelihood Gene Tree ........................................................ 54 
Reconcile Gene and Species Tree .................................................................... 55 
Reconstruct Ancestors ..................................................................................... 56 
Branch Supports ............................................................................................... 57 
Output .............................................................................................................. 57 
 Protocol .................................................................................................................  58 
Construct a Seed Dataset ................................................................................  58 
Run the Seed-to-Alignment Pipeline ............................................................... 58 
Inspect and Edit Alignment ............................................................................. 59 
Perform the Ancestral Inference ...................................................................... 60 
Checking Gene/Species-Tree Reconciliation .................................................. 61 
 12 
Selecting Ancestors .......................................................................................... 65 
On Black Boxes ............................................................................................... 67 
 Pipeline Validation................................................................................................  68 
 Conclusion ............................................................................................................  71 
 Bridge to Chapter III .............................................................................................  71 
III. TOLL-LIKE RECEPTOR 4 EVOLUTION OF LPS SPECIFICITY IN EARLY 
VERTEBRATES AND DIVERGENCE IN ZEBRAFISH ......................................... 72 
 Abstract .................................................................................................................. 73 
 Introduction ............................................................................................................ 74 
 Results .................................................................................................................... 79 
Zebrafish TLR4/MD-2 is potently activated by tetra-acyl LPS in vitro .......... 79 
A subset of teleost fish evolved a functionally necessary MD-2 C-terminal 
peptide .............................................................................................................. 83 
Live zebrafish exhibit reduced immune response to lipid IVa compared to  
E. coli LPS ....................................................................................................... 87 
Zebrafish and human TLR4 evolved from an ancestor with low LPS  
sensitivity ......................................................................................................... 92 
Initial investigations into possible CD14 functional homologs in fish ............ 97 
 Discussion .............................................................................................................. 102 
Zebrafish TLR4 ohnologs might play important physiological roles .............. 103 
Possible functional roles of the zebrafish MD-2 C-terminal peptide .............. 103 
Is TLR4 used in the zebrafish innate immune response to Gram-negative  
bacteria? ........................................................................................................... 105 
 13 
Further probing ancestral complexes will help us understand TLR4 
ligand responses ............................................................................................... 106 
Did zebrafish lose CD14 as a mechanism to avoid LPS toxicity? ................... 108 
 Materials and Methods ..........................................................................................  109 
Ancestral sequence reconstruction ..................................................................  109 
Plasmids ..........................................................................................................  110 
Cell culture and transfection conditions .........................................................  111 
Oral microgavage of LPS................................................................................  113 
Brain tectum microinjection of LPS ...............................................................  113 
 Bridge to Chapter IV .............................................................................................  114 
IV. ZEBRAFISH DO NOT HAVE CALPROTECTIN .............................................. 115 
 Abstract .................................................................................................................. 116 
 Introduction ............................................................................................................ 117 
 Results .................................................................................................................... 120 
Zebrafish s100a10b is only distantly related to human S100A9 and  
S100A9 ............................................................................................................ 120 
Chromosome placement indicates a shared origin but complicated 
evolution of homologous human and zebrafish S100s .................................... 121 
Human calprotectin and zebrafish s100 protein sequences have low 
sequence identity .............................................................................................. 122 
Single cell RNA sequencing datasets mining points to candidate zebrafish 
s100 proteins expressed similar to calprotectin ............................................... 124 
Recombinant zebrafish s100 proteins fold and interact with calcium ............  126 
 14 
Zebrafish s100s do not exhibit nutritional immunity like calprotectin ............ 128 
Zebrafish s100s do not exhibit proinflammatory activity like S100A9 ........... 130 
 Discussion .............................................................................................................. 132 
 Materials and Methods ..........................................................................................  134 
Protein purification .......................................................................................... 134 
Far-UV circular dichroism and fluorescence spectroscopy ............................. 137 
Nutritional immunity assay .............................................................................. 138 
Proinflammatory activity assay........................................................................ 140 
 Bridge to Chapter V ..............................................................................................  142 
V. SUMMARY AND CLOSING REMARKS ............................................................  144 
REFERENCES CITED ................................................................................................ 146 
SUPPLEMENTAL FILES 
PDF: JOHN WILEY AND SONS LICENSE TERMS AND CONDITIONS 
 FASTA: TLR4 SEQUENCE ALIGNMENT 
PDF: TLR4 PHYLOGENETIC TREE 
 FASTA: TLR4 ANCESTOR SEQUENCES 
 FASTA: MD-2 SEQUENCE ALIGNMENT 
PDF: MD-2 PHYLOGENETIC TREE 
FASTA: MD-2 ANCESTOR SEQUENCES 
 
  
 15 
LIST OF FIGURES 
 
Figure Page 
 
 
1. Figure 2.1. Define the ancestral reconstruction problem ....................................... 36 
 
2. Figure 2.2. Ancestral sequence reconstruction has six main steps ........................ 39 
 
3. Figure 2.3. Summarized topiary ASR pipeline ...................................................... 46 
4. Figure 2.4: Topiary redundancy reduction and quality control ............................. 51 
5. Figure 2.5. Example trees at each step in the ASR calculation ............................. 62 
6. Figure 2.6. Graphs for evaluating ancestor quality ................................................ 63 
7. Figure 2.7. Validation of the topiary pipeline ........................................................ 64 
 
8. Figure 3.1. Current knowledge of the evolution of TLR4/MD-2 LPS specificity . 75 
 
9. Figure 3.2. Revealing differences in LPS specificity between human and  
 zebrafish TLR4 complexes .................................................................................... 81 
10. Figure 3.3. Zebrafish ohnologs tlr4bb and tlr4al do not respond on their own  
to LPS variants in vitro .......................................................................................... 83 
11. Figure 3.4. The C-terminal peptide of fish MD-2 is necessary for zebrafish  
TLR4 signaling ...................................................................................................... 86 
12. Figure 3.5. Zebrafish response to (L6)-LPS-EK and (L4)-lipid IVa challenge  
via microgavage and hindbrain injection ............................................................... 90 
13. Figure 3.6. Characterization of TLR4 activity for reconstructed early vertebrate  
 ancestors and modern fish and amphibian sequences ............................................ 93 
 
14. Figure 3.7. Hybridizing the human TLR4 transmembrane and TIR domains to  
 other species extracellular domains does not reveal function ................................ 96 
 
15. Figure 3.8. Zebrafish TLR2, CD180/MD-1, and human transferrin do not  
 16 
rescue the zebrafish TLR4 response to lipid IVa in the absence of CD14 ............ 99 
16. Figure 4.1: Phylogenetic analyses reveal there is no calprotectin ortholog 
outside of amniotes ................................................................................................ 123 
17. Figure 4.2: Structural comparisons of human and zebrafish S100s....................... 129 
18. Figure 4.3: Zebrafish s100s do not exhibit nutritional immunity activity like  
human calprotectin ................................................................................................. 130 
19. Figure 4.4: Zebrafish s100s do not exhibit the pro-inflammatory characteristics  
of human S100A9 .................................................................................................. 132 
 
  
 17 
LIST OF TABLES 
 
Table Page 
 
 
1. Table 2.1: Example seed dataset ............................................................................ 59 
2. Table 2.2: Protein families used to validate the topiary pipeline ........................... 64 
 
3. Table 4.1: Single-cell RNAseq profiles for zebrafish s100s syntenic to human  
 calprotectin ............................................................................................................. 126 
  
  
 
  
 18 
CHAPTER I 
INTRODUCTION 
In this dissertation, I will describe my contributions to three bodies of work concerning 
protein evolution, with emphasis on the divergence between innate immune proteins in humans 
and zebrafish. Understanding the process of protein evolution is imperative for interpreting 
discoveries in biology, especially those gained from model organisms. 
Chapter II is a published co-authored manuscript describing a bioinformatic 
phylogenetics tool for ancestral protein sequence reconstruction that I helped develop and make 
available to the public. Sophia Phillips and I were co-lead authors of this manuscript, Zachary 
Sailer and Joseph Harman are co-authors, and Michael Harms was the project and software 
development lead and major writing contributor.  
Chapter III describes unpublished findings, including material contributions from José 
Sánchez-Borbón, Cathy Robinson, and Corinthia Brown, and important insights from Michael 
Harms, Sophia Phillips and Karen Guillemin. This chapter covers my investigation of the 
evolutionary history and fluctuating ligand specificity of Toll-like receptor 4 complexes in 
zebrafish, other modern vertebrates, and their ancestors.  
Finally, Chapter IV is a manuscript soon to be submitted evaluating whether zebrafish 
have convergently evolved a functional homolog of calprotectin, an important biomarker of 
inflammation severity in human patients. Michael Harms contributed experimental guidance and 
oversaw the writing of this work. The following introductory sections will provide context and a 
through line for these studies, with more specific information about each topic in their respective 
chapters. 
 
 19 
Human microbial communities impart complex influences on our health 
 All animals exist with populations of microbes. At birth, humans are colonized with 
microorganisms including bacteria, fungi, viruses, archaea, protozoa, and helminths on every 
surface exposed to the environment1–3 and even within tissues4,5. In the body, human cells are 
outnumbered by microorganisms, which are most abundant in the gut.6–8 The various interactions 
between host and microbe may be commensal (beneficial for microbe and no effect on host), 
mutualistic (both parties benefit), or pathogenic (microbe causes disease). The field describes 
most symbiotic microbes as ‘commensals’ though they generally provide host benefits, and some 
can even become opportunistic pathogens. 
Symbiotic microbes contribute positively to host health by supporting host physiology, 
immunological development, metabolism, resistance to infection by pathogenic microorganisms, 
as well as other essential functions.3,9–16 Mutualistic microbiota and their collective genomes 
(microbiome) provide us with genetic and metabolic capacities we have not been required to 
evolve on our own,17 and vice versa. However, pathogenic microbes have continuously provoked 
the evolution of both the human immune system and our commensal microbiota to keep harmful 
microbes at bay. 
 
The innate immune system identifies specific microbes via TLR4 
A major field of study is in the communication between us and our microbiome.18 How 
does the host influence microbial community structure? How do microbes influence our 
development, health, mood, and behaviors? How does our immune system distinguish 
commensals from pathogens? One of the primary points of contact between these microbial 
communities and animal physiology is the innate immune system. The innate immune system is 
 20 
our first line of defense against infection and disease. It consists of barriers like the skin and 
mucosa, effector cells that destroy pathogens and mediate the immune response, secreted 
antimicrobial molecules that inhibit pathogen growth, release of proinflammatory or anti-
inflammatory signals, and cellular receptors that sense microbial and infectious signals both 
outside and inside of cells and subsequently activate the immune response.19 
The cellular receptors of the innate immune system are known as pattern recognition 
receptors (PRRs). They are transmembrane receptors expressed on innate immune cells like 
macrophages, neutrophils, dendritic cells, natural killer cells, mast cells, basophils, and 
eosinophil.20 A major function of PRRs is to sense highly conserved microbe-associated 
molecular patterns (MAMPs) and then transduce this signal across cell membranes to activate 
immune responses.21 One of the most well-studied MAMPs is lipopolysaccharide. 
Lipopolysaccharide (LPS) is a major structural feature of Gram-negative bacteria outer 
membranes and acts as a permeability barrier.22 LPS was first discovered by Richard Pfeiffer in 
1892 as the causative agent of sepsis and was coined ‘endotoxin’ because it was associated with 
the insoluble part of bacterial cells rather than secreted like other bacterial toxins known at the 
time.23 Sepsis is a life-threatening condition that arises when then body’s inflammatory response 
to a Gram-negative bacterial infection causes damage to its own tissues and organs.24 
A century after the discovery of endotoxin (LPS), it was determined that Toll-like 
receptor 4 (TLR4) is the binding partner that discriminates LPS from host lipids and transduces 
signals across the membrane.25–27 Ligand-induced activation of TLR4 triggers signaling cascades 
that upregulate the expression and secretion of cytokines and other proinflammatory proteins.28–
31 Because of this, TLR4’s ability to recognize LPS was identified to be the underpinning of 
sepsis.25  
 21 
LPS molecules are present in almost all Gram-negative bacteria and are structurally 
diverse. The architecture of LPS consists of three domains. 1) The O-antigen is a repeating 
hydrophilic oligosaccharide that is structurally varied even within a single bacterium and fulfills 
a range of functions, depending on bacterial lifestyles. 2) The hydrophilic core oligosaccharide 
linking the other two domains. 3) A hydrophobic lipid A moiety containing a glucosamine 
disaccharide that can have one or two phosphates and supports 4-8 fatty acid acyl chains of 
varying lengths.29,32,33 The lipid A portion forms the outer leaflet of the outer membrane of 
Gram-negative bacteria and is therefore highly conserved across species. The lipid A moiety is 
recognized by TLR4 and thus confers to LPS its proinflammatory characteristics.32  
Structural and functional analyses show that the most proinflammatory form of lipid A 
has two phosphate groups and six fatty acyl groups with 12-14 carbon chains, which are 
generally purified from Escherichia coli and Salmonella strains.34 Lipid A with either more or 
less fatty acyl chains, longer chains, or a single phosphate group are typically less active and can 
act as antagonists of toxic LPS.34,35  
To recognize LPS, TLR4 forms a complex with accessory protein MD-2. It is well-
established that MD-2 forms the LPS-binding pocket in the TLR4/MD-2 complex and confers 
LPS specificity.31,36,37 These hypo-acylated and hypo-phosphorylated LPS variants tightly bind 
human TLR4/MD-2 in a non-productive fashion, inhibiting other LPS molecules from activating 
the complex.37–42 However, slight variations in the LPS-binding pocket of mouse MD-2 and 
TLR4 permits mouse TLR4 signaling with several of these LPS variants.43–46 The foundation for 
this is still being explored. 
One way that several commensal bacteria contribute to the flourishing microbiota in the 
human gut relies on their coevolution with human TLR4/MD-2. Many members of our 
 22 
microbiota produce LPS, but not all of this LPS is immunogenic.47–50 For example, it has been 
shown that several members of the order Bacteroidales, which are the dominant Gram-negative 
bacteria in healthy human gut microbiomes, produce potently antagonistic tetra- and penta-
acylated forms of LPS that can silence TLR4 signaling for the entire microbial community.47 
These findings raise questions of how symbiotic relationships evolve. How does this symbiosis 
affect our health? Did these bacteria adapt to exploit a blind spot in their host’s immune 
surveillance?51–56 Did humans lose the ability to recognize these bacteria due to advantageous 
selective pressure? Does this symbiosis impact our ability to fight infections from other Gram-
negative bacteria? Could we use these LPS variants to suppress TLR4-induced sepsis in human 
patients?57,58 
 
Calprotectin mediates TLR4-induced inflammation and fights infections 
PRRs also recognize endogenously produced danger-associated molecular patterns 
(DAMPs) including molecules released from dying cells and damaged tissue such as 
extracellular DNA, RNA, and proteins.59 S100A8, S100A9, and their heterodimer state known as 
‘calprotectin’ are all DAMPs recognized by TLR4.60,61 S100A8 and S100A9 proteins are 
multifunctional regulators of the immune response. They exist as homodimers but predominantly 
form the more stable heterodimeric calprotectin complex.62–64 S100A8, S100A9, and calprotectin 
have been shown to play several intracellular roles in calcium-dependent signaling, microtubule 
reorganization, and arachidonic acid metabolism.65–68 During an immune response, these proteins 
are released to the extracellular space where they exert several proinflammatory and antibacterial 
roles.61,69–72  
 23 
S100A8 and S100A9 homodimers and heterodimers released during the immune response 
locally activate TLR4 and amplify inflammatory responses.73,74 This is essential for successful 
pathogen-clearing but can be detrimental in excess. One regulatory mode imposed on this 
proinflammatory activity is the proteolytic sensitivity of homodimer and heterodimer states.75–80 
In the presence of calcium, which is expected at a site of inflammation, two heterodimers form a 
heterotetramer (S100A8/S100A9)2. This calcium-induced tetramerization inhibits 
proinflammatory activity, conferring the complex protease resistance.76,81,82 This regulatory 
method is essential for preventing excessive inflammatory amplification that can lead to sepsis, 
autoimmune disorders, and cancer.74 
Calprotectin is also antibacterial in both heterodimer and heterotetramer states. The 
hexahistidine site created at the heterodimer interface can chelate essential transition metal ions 
like zinc, manganese, and iron: this inhibits bacterial growth during an infection.83–90 As a stable 
complex, calprotectin is of particular interest in medicine because it can be used as a non-
invasive biomarker for inflammation severity in addition to its roles in immunity.91–94 
 
Zebrafish is a powerful model organism for studies of the host-microbe interface 
The host-microbe interface is complex and ever evolving. At any given time, each person 
has a unique population of microbes whose ecological architecture is influenced by interactions 
between microbes, our genetics, and everything we do like what we eat, who/what we encounter, 
our hygiene, sudden lifestyle changes, and even the buildings in which we live and work. 
Because these systems are so complicated, we need to leverage model organisms to facilitate our 
investigations of host-microbe relations. Indeed, almost everything we know about TLR4 and 
calprotectin comes from studies in mice. 
 24 
Danio rerio, the zebrafish, is an outstanding model organism to study vertebrate innate 
immunity and host-microbe interactions. For the first few weeks of life the zebrafish has a fully 
functional innate immune system but has not yet developed its adaptive immunity, permitting 
investigations solely on the innate immune response without genetic modifications.95 Larval 
zebrafish are optically transparent at least up to 5-6 days old which facilitates live imaging of 
fluorescently tagged proteins and microbes. Also, in the first week of life zebrafish survive using 
the nutrients in their yolk, and do not need food. This makes it relatively simple to generate 
gnotobiotic fish (fish with a defined microbiome) by sterilizing the chorion and water the fish 
develop in, and then introducing only desired microbes.96 Moreover, genetically tractable tools, 
ease of rearing, mating, and maintenance, and large clutch sizes make zebrafish ideal for the 
research setting.97  
A major hurdle that all model organism research faces is the confounding variable of 
evolution. Not all genes, proteins, or physiology have direct matches with other species. This 
often obscures our ability to learn about human biology when performing studies in zebrafish 
and other model organisms. Zebrafish exhibit a great amount of homology to the mammalian 
immune system including a high degree of conservation in inflammatory proteins, effector cell 
types, and receptors like the Toll-like receptors.98 However, humans and zebrafish have 
experienced diverse pressures necessitating the evolution of modified immune defense 
techniques unique to each species, many of which are actively being explored. 
 
Evolutionary differences confound the use of model organisms to study human biology 
 The long and independent evolutionary divergence of humans and zebrafish complicates 
efforts to compare their biology. The most recent common ancestor of Homo sapiens and Danio 
 25 
rerio was a bony fish that lived roughly 430 million years ago.99 For context, the species Homo 
sapiens and the mouse Mus musculus diverged roughly 87 million years ago from a placental 
mammal.99 Together with the knowledge that life underwater comes with a different set of 
evolutionary pressures, we can expect great divergence in zebrafish and human proteins, more so 
than between mouse and human.  
Overall, approximately 70% of human genes have at least one obvious zebrafish 
ortholog.100 There are three orthologs of human TLR4 in the zebrafish and a single MD-2 
gene.101 Zebrafish also share the S100 family with all other vertebrates, but do not possess the 
proinflammatory calgranulin genes.102 Zebrafish have a heightened tolerance to LPS challenge 
compared to mammalian species, requiring much higher doses to be lethal, but exhibit similar 
inflammatory responses, like immune cell migration and transcriptional changes.103–105 It was 
proposed that this LPS tolerance might be an evolutionary advantage for organisms in intimate 
contact with microbes, such as that experienced in aqueous environments.105 Previous work 
shows that although one of the zebrafish TLR4/MD-2 complexes can activate a low-level 
immune response to LPS in vitro, fish with an MD-2 loss-of-function mutation do not exhibit the 
classic drastic protection against LPS toxicity observed in mice.101 Their findings suggest that 
zebrafish have a low-sensitivity TLR4/MD-2 complex that confers LPS responsiveness to a 
specific set of immune cells, but that there are likely other pathways involved in the zebrafish 
immune response to LPS. In Chapter III of this dissertation, I revisit this hypothesis.  
 
Protein evolution is a framework for mapping between model organisms and humans 
My work has been done under the premise that an explicitly evolutionary lens can allow 
us to understand how to map studies of the innate immune system between zebrafish and 
 26 
humans. For example, the low immune response of zebrafish to LPS challenge could reflect the 
behavior of TLR4/MD-2 in the last common ancestor of humans and zebrafish. If so, results in 
zebrafish provide a baseline from which to understand human innate immunity: a comparison of 
the human and zebrafish immune systems would reveal how humans gained high sensitivity.  
In contrast, if the last common ancestor of humans and zebrafish had a high response to 
LPS, it would indicate that the evolutionary change happened on the zebrafish lineage. In this 
scenario, the low response is not a baseline from which high human activity evolved; rather, low 
activity is an evolutionary innovation specific to the zebrafish. This would mean a comparison of 
human and zebrafish immunity reveals how zebrafish lost sensitivity, not how humans gained 
sensitivity.  
Resolving this scenario requires understanding the evolutionary history of the genes 
encoding innate immune proteins. Key questions include: Which human innate immune genes 
have orthologs in zebrafish? If not all of these genes are present, was this due to differential gain 
or loss? Do the genes themselves have the same basic functions?  
Answering these questions requires a phylogenetic approach, where we trace the 
evolutionary history of individual genes. Here, we use computational tools that test whether 
genes from different organisms are homologous, and whether they are orthologous (arose by 
speciation), paralogous (arose by gene duplication), or took some more complex path of 
speciation, duplication, and loss. This allows us to know whether we are comparing the same 
gene (orthologs) or different genes (paralogs, ohnologs, etc.) between humans and zebrafish. 
Another powerful evolutionary approach involves tracing how the sequences of proteins have 
changed over time.106 This can reveal what sequence changes correlate with what functional 
 27 
changes, and thus allow us to isolate specific functional transitions and better resolve when and 
how functional transitions happened.  
 
Summary of contributions 
Phylogenetic approaches, while powerful, are also technically difficult. They require 
multiple software packages, working together in concert, as well as maintaining a database of 
sequences used for the study. In Chapter II, I describe work I did with members of the Harms lab 
to develop topiary, a software tool for ancestral sequence reconstruction. This tool automates a 
wide variety of tasks in evolutionary inference, and thus allowed me to study the evolution of 
TLR4 and MD-2 in bony vertebrates.  
In Chapter III, I show an example in which I use our ancestral reconstruction software 
pipeline, along with careful in vitro and in vivo characterization, to better understand the 
evolutionary history of the zebrafish TLR4/MD-2 complex. I found that the zebrafish TLR4/MD-
2 complex has a higher specificity for tetra-acylated LPS variants rather than the hexa-acylated 
variant typically used to assess TLR4 function. This suggested to me that previous zebrafish 
experiments done with LPS with 6- or 7-acyl chains may have missed important biological 
insight of TLR4 in the zebrafish immune response. I was also curious to know what structural 
differences between human and zebrafish TLR4/MD-2 impart this alteration in specificity. 
Identifying the molecular basis for these functional differences would be quite difficult, though, 
considering human and zebrafish TLR4 and MD-2 only share 39% and 26% identities, 
respectively.  
With this new information, I investigated the ligand specificity of previously 
uncharacterized modern species and ancestral TLR4/MD-2 complexes linking zebrafish and 
 28 
human evolution. I used topiary to infer the sequences of ancestral TLR4 and MD-2 proteins and 
resurrected them in the lab for functional characterization. I also investigated the functional role 
of a C-terminal peptide unique to a subset of teleost fish MD-2s that appears to be positioned to 
influence LPS binding and TLR4 activation. Importantly, I tested the hypothesis that zebrafish 
would exhibit a stronger immune response when challenged with tetra-acylated LPS in vivo 
compared to hexa-acylated LPS. Overall, my results confirm that TLR4 complexes from 
zebrafish and organisms more closely related to them have low sensitivity for LPS in our in vitro 
system and that the zebrafish in vivo response to LPS is not directly comparable to human 
biology. 
In Chapter IV I describe a study that hinges on identifying specific innate immune 
proteins in zebrafish. I investigated the claim that zebrafish might have convergently evolved a 
calprotectin-like protein which was previously suggested in the literature.107,108 S100A9 and 
S100A8 do not have orthologs in zebrafish, so I used insights from phylogenetic studies and 
transcriptional response datasets to identify homologous candidates. I found that the zebrafish 
protein classified as “calprotectin”, and all other candidate proteins tested, do not exhibit 
canonical antibacterial or proinflammatory functions of calprotectin. I conclude from this study 
that when developing zebrafish models of innate immunity, it is necessary and prudent to 
account for evolutionary divergence and provide functional characterizations of the proteins 
considered.   
 
Conclusion 
 In conclusion, my work has revealed that the genes and mechanisms responsible for 
innate immune recognition and response to pathogenic bacteria have evolved independently 
 29 
between vertebrate species and have significantly diverged between humans and zebrafish. These 
results challenge the assumption that innate immune recognition universally relies on specific 
germline-encoded receptors, but supports that the response to microbial products, like LPS, is an 
ancestral trait.109 Although the innate immune roles of TLR4 and calprotectin cannot be directly 
related between humans and zebrafish, there is much we can still learn about the evolution of 
protein complexes and innate immunity through further studies to determine how the zebrafish 
defends itself from infection and injury. 
 
Bridge to Chapter II 
 Many studies of protein evolution make use of ancestral sequence reconstruction to infer 
the sequences and structures of proteins from ancestral organisms and compare their function to 
present day proteins. This technique requires copious amounts of time and stitching together 
many complex bioinformatic tools, which necessitates expert knowledge of coding, 
phylogenetics, evolutionary models, and the tools available for your specific application. 
Previous graduate students in Harms lab, including Dr. Zach Sailer, Dr. Joseph Harman, and Dr. 
Andrea Loes, had worked with Dr. Harms to develop an ancestral sequence reconstruction 
pipeline to study the TLR4 complex. When Sophia Phillips and I began our own investigations 
into the evolution of S100A9 and TLR4, we set out to further develop this pipeline into a widely 
available and accessible tool. With Dr. Michael Harms as the software development and project 
administration lead, we created topiary: a publicly available ancestral sequence reconstruction 
pipeline integrating several well-established software packages with handy time-saving scripts to 
reduce the workload, accompanying explanations of what the software is doing at each step, and 
guidance on how to tailor and interpret your study.  
 30 
CHAPTER II 
TOPIARY: PRUNING THE MANUAL LABOR FROM ANCESTRAL SEQUENCE 
RECONSTRUCTION  
 
*This chapter contains previously published co-authored material. See supplement for copyright 
terms and conditions. 
 
Orlandi KN, Phillips SR, Sailer ZR, Harman JL, Harms MJ. (2022). Topiary: Pruning the manual 
labor from ancestral sequence reconstruction. Protein Science. 32:e4551. 
 
Author contributions: K. N. O.: Conceptualization (equal); data curation (equal); methodology 
(equal); software (equal); validation (equal); visualization (equal); writing – original draft (lead); 
writing – review and editing (lead). S. R. P.: Conceptualization (equal); data curation (equal); 
investigation (equal); methodology (equal); software (equal); validation (equal); visualization 
(equal); writing – original draft (lead); writing – review and editing (lead). J. L. H.: 
Conceptualization (equal); methodology (equal); software (supporting); validation (equal); 
writing – review and editing (equal). Z. R. S.: Conceptualization (equal); methodology (equal); 
software (equal); validation (equal); writing – review and editing (supporting). M. J. H.: 
Conceptualization (equal); funding acquisition (lead); investigation (equal); methodology 
(equal); project administration (lead); software (lead); visualization (equal); writing – original 
draft (equal); writing – review and editing (equal). 
  
 31 
ABSTRACT 
Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of 
proteins and thus gain deep insight into the relationships among protein sequence, structure, and 
function. A major barrier to its broad use is the complexity of the task: it requires multiple 
software packages, complex file manipulations, and expert phylogenetic knowledge. Here we 
introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users 
prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope 
for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed 
sequence quality control and redundancy reduction; (3) Constructs a multiple sequence 
alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the 
species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch 
supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and 
graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics 
software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the 
Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, 
describe the specific design choices made in topiary, provide a detailed protocol for users, and 
then validate the pipeline using datasets from a broad collection of protein families. Topiary is 
freely available for download: https://github.com/harmslab/topiary. 
 
 
 
 
 
 32 
INTRODUCTION 
Since it was first proposed in 1963, ancestral sequence reconstruction (ASR) has become 
a well-established method to study the evolutionary history of modern-day proteins.110,111 Studies 
of ancestral proteins uniquely reveal sequence features that are important for function and 
stability that cannot be readily identified from studies on modern-day proteins alone.106 For 
example, ASR has been used for crystallographic and kinetic studies on ancestral proteins when 
their modern-day descendants were not amenable to crystallization112, for bioengineering 
enzymes that are both thermally stable and catalytically active using ancestral enzymes as 
templates113, and in the discovery of an ancestral coagulation factor VIII protein that is now used 
as a therapeutic for people with hemophilia114. These, and many other studies76,111,114–118 have 
established this technique as an incredibly powerful tool in the protein scientist's toolkit.  
Despite its utility, ASR has largely remained a technique for phylogenetics experts. In 
part, this is due to the complexity of the task. The individual steps of an ASR study—dataset 
construction, multiple sequence alignment, inference of a phylogenetic tree, and ancestor 
reconstruction—are usually done using separate software. This means a would-be ASR user must 
learn and intelligently select the most useful combination of software from a large pool.111 The 
problem is made worse because some often-used software is no longer maintained: for example, 
PAML4 was last updated in 2007.119 It can also be extraordinarily difficult to organize and 
convert the outputs from one program into inputs for the next. At best, this is an unproductive 
use of time; at worst, this can lead to information loss or even errors in the final reconstructed 
sequences.  
Here we introduce topiary, an ASR software pipeline that addresses these problems. Our 
first goal was to simplify and streamline the tasks necessary for an ASR study, simplifying and 
 33 
codifying existing best-practice ASR into one convenient package. We hope achieving this goal 
will make ASR accessible to non-experts. We further hope this will improve reconstruction 
quality generally by removing monotony and manual file manipulations that can lead to 
mistakes. Our second goal is to promote and enable high-quality reconstructions. To do so, we 
built our pipeline around modern software tools and incorporated important-but-sometimes-
difficult steps directly into the pipeline: validation of protein identity by reciprocal BLAST, 
gene-species tree reconciliation, and explicit ancestral character reconstruction of gaps.  
There are two design features that set topiary apart from many other methods. The first is 
the use of spreadsheets rather than arcane text formats for inputs and to store the sequence 
database/alignment through all steps. This makes it much simpler to prepare inputs and track 
changes over the course of the pipeline. The second design feature is that topiary is species-
aware through all steps. From the first step onward, it uses the Open Tree of Life synthetic 
species tree to inform every choice120: how to focus initial BLAST queries, how to lower 
sequence redundancy while preserving taxonomic diversity, and how to construct the best 
possible evolutionary tree consistent with both the protein and organismal evolutionary signals. 
This integration greatly simplifies the user experience and ultimately yields rooted, well-resolved 
phylogenetic trees for ancestral reconstruction.  
We have broken our description of the software package into four sections. In the first 
section, we go through the process of ASR in general, describing the state-of-the-art for such a 
calculation. Our goal is to familiarize non-specialist readers with the workflow so they can 
understand what topiary does (and why), as well as interpret the output from a topiary 
calculation. In the second section, we describe the specific pipeline and design decisions within 
the topiary package. This section focuses on the automated, software-driven steps in the pipeline. 
 34 
In the third section, we briefly describe the protocol for running topiary in practical terms for the 
user, working through an example calculation. Finally, in the fourth section, we describe the 
work done to validate the pipeline.  
 
OVERVIEW OF ANCESTRAL SEQUENCE RECONSTRUCTION  
Define the problem  
The most important task in an ASR study is to define the problem. What ancestors do you 
want to reconstruct? What feature(s) of those proteins will you measure? For an evolutionary 
biochemist or protein engineer, ASR studies often involve tracing the evolution of functions 
observed in modern proteins. Figure 2.1 shows this schematically for a hypothetical protein 
family. Paralog A has some activity (denoted with a star); paralog B does not. (As a reminder, 
paralogs are homologs that arose by gene duplication; orthologs are homologs that arose by 
speciation.) If we are interested in the evolution of the star activity, we would likely be interested 
in reconstructing ancAB and ancA (arrows, Figure 1). Because all A paralogs have the activity, 
we predict ancA did as well. But because only A paralogs have the activity— and not paralog B 
or the fish proteins—we predict ancAB was not active. By reconstructing ancA and ancAB, we 
can isolate and study the key sequence differences between the ancestors that conferred the 
activity. 
The first step in an ASR study is to build up a picture of the functions of modern proteins 
in the family through pilot studies and literature searches. Specifically, one must know: (1) The 
biochemical/functional features of interest and, (2) What homologs exist in what organisms. In 
our example, identifying ancA and ancAB as the ancestors of interest required knowing the 
distribution of function across modern proteins. If we knew only the function of human paralog 
 35 
A, but no other proteins in the family, we would be hard-pressed to choose the appropriate scope 
for the ASR study. Likewise, if we knew that paralog A but not paralog B existed, we would not 
predict the ancAB to ancA transition. The topiary package uses a list of modern proteins 
covering the relevant paralogs and species as the starting point for the ASR pipeline (later: the 
“seed dataset”).  
 
Figure 2.1. Define the ancestral reconstruction problem. The panel shows the evolutionary 
history of a hypothetical protein family with two paralogs, A and B. The tree is rooted: ancestors 
are arranged from ancient to recent, left to right. Black circles at the tips of the tree denote 
modern protein sequences from the indicated species. Colored internal nodes indicate gene 
duplications (purple) or speciations (green). An ASR study aims to reconstruct the sequences of 
these ancestral nodes. The node annotated with a blue “x” is not reconstructable (see text). A 
biological activity of interest is indicated on the tips: active (star), inactive (black dash). The 
simplest evolutionary scenario would have activity evolving between ancAB and ancA; these 
would be good candidates for reconstruction. 
 36 
Note: it is important that the ancestral protein of interest cannot be the root of the 
phylogenetic tree. To reconstruct an ancestor, one needs input from three branches: the 
descendants and the previous ancestor. The ancFamily ancestor in Figure 2.1 at the root of the 
tree has no sequence information from the ancestral branch (dashed line) thus we cannot 
reconstruct ancFamily. This contrasts with ancAB, which can be reconstructed because it forms a 
node at the intersection of three branches: a descendant branch leading to ancA, a descendant 
branch leading to ancB, and an ancestral branch leading back to the fish proteins (known as the 
outgroup). This sets a limit on our deepest reconstructable ancestor: our dataset must include an 
outgroup that diverged one node earlier than our deepest ancestor of interest.  
 
Construct a sequence dataset  
Once we have identified the ancestors we would like to reconstruct (Figure 2.1), we 
begin the steps of the ASR pipeline (Figure 2.2). The first step is to create a dataset of high-
quality sequences spanning the relevant species and protein family members. Continuing our 
example, we start with a handful of sequences that cover bony vertebrates (humans through fish) 
and the two paralogs (A and B) (Figure 2.2a). We then collect as many sequences from as many 
species as possible, usually by BLASTing against online databases using our starting sequences 
as queries (Figure 2.2b).  
Our confidence in our reconstructed ancestral sequences depends on the quality and 
diversity of the sequences in the alignment.121 Because of this, we perform quality control on the 
resulting sequence dataset. We want to avoid low-quality or partial sequences, keep only one 
sequence per gene per species, and maintain an even representation of proteins across species. In 
our example in Figure 2.1, the branches leading from ancAB are amniotes 
 37 
(mammals/birds/reptiles), amphibians, and ray-finned fishes. To maximize reconstruction 
quality, we should ensure a good representation of protein sequences from these species in our 
dataset.  
 
Sequence alignment  
The next step is to build a multiple sequence alignment (MSA) (Figure 2.2c). Alignment 
quality is critical for a successful reconstruction study.121 This is because an MSA makes 
homology statements, asserting that sites within each column arose by evolutionary descent. 
Incorrect homology statements will lead to poor reconstructions. We use alignment software to 
generate an MSA, followed by more quality control. Usually, alignment quality ends up being 
assessed by computational tools122,123 and/or by manual evaluation and editing124,125. Generally, 
we remove difficult-to-align termini, poorly aligned sequences, or whole regions of an alignment 
that may not be of interest for an ASR study (for example, a disordered and evolutionarily 
divergent linker region).  
 
Infer a maximum likelihood gene tree  
The next step is to construct a phylogenetic tree describing the evolutionary relationships 
between the sequences in our alignment (Figure 2.2d, tree on the right). Most ASR studies do 
this using probabilistic models of sequence evolution. These are built around substitution 
matrices that describe the probability of specific amino acid changes over evolutionary time. (For 
example, aspartic acid to glutamic acid will have a much higher probability than aspartic acid to 
phenylalanine.) Most models consist of parameters defined in the model as well as parameters 
 38 
estimated from the input alignment. Selecting the correct substitution matrix is critical to high 
quality ancestral reconstruction.126 
 
Figure 2.2. Ancestral sequence reconstruction has six main steps. (a) Start with a handful of 
homologous protein sequences spanning the paralogs of interest and their taxonomic distribution. 
Throughout the figure, color indicates the identity of the protein (orange: paralog A, blue: 
paralog B, and green: outgroup); the icon indicates the species (human, chicken, frog, fish). (b) 
Use these sequences as BLAST queries to construct an initial sequence dataset. Some returned 
sequences are not homologs of interest (purple); others are low quality (i.e., a partial sequence 
indicated by ‘x’). (c) Select high quality sequences and generate a multiple sequence alignment 
from that dataset. (d) Infer a maximum likelihood gene tree ‘G' for the protein sequences in the 
alignment. This infers branching relationships but does not orient the tree with respect to time. 
Poorly reconstructed protein relationships may exist (clade in gray box). (e) Reconcile the gene 
tree with the species tree ‘S', yielding a reconciled gene tree ‘R'. This corrects weakly supported 
protein relationships and roots the tree in time. (f ) Reconstruct the sequences of ancestral 
proteins of interest using the reconciled tree. Sequences are selected by posterior probability 
(PP). Sequence logo depicts ancestor “ancA” with letter height proportional to amino acid PP. 
Position 5 is unambiguously “S”; position 6 is likely “L” but could be “M”; position 7 could be 
“E”, “R”, or “K”. Examples of maximum likelihood ancestral sequences are shown in brown for 
the specified nodes. (g) Assess confidence in tree topology. Branch supports for two different 
trees indicate strong support for the top tree (98) and weak support for the bottom tree (2). 
 39 
Most ASR studies use a maximum likelihood (ML) modeling framework. The goal is to 
find the substitution model and evolutionary tree that give the highest probability of observing 
the sequences in the alignment. The maximization process involves selecting a substitution 
matrix, tuning quantitative model features, inferring the tree topology (i.e., the pattern of 
branching events that gave rise to the modern sequences), and optimizing the branch lengths 
(how much evolutionary change occurs between each branching event). This is a complex, 
many-parameter, optimization problem. For more details, and discussions of alternative 
approaches including Bayesian methods, see111,118,127.  
After this step, one has an ML gene tree with a branching pattern describing the 
evolutionary relationships between all sequences in the alignment (Figure 2.2d, tree G). The 
inferred tree reveals which sequences group together, but not the order in which these groupings 
evolved. In technical terms, the tree is unrooted. This is because most probabilistic evolutionary 
models are time reversible; the probability of the evolutionary branching relationship is 
independent of where one starts the evolutionary process. In practical terms, it means we cannot 
determine which ancestors were the most ancient without outside information. 
 
Reconcile the gene tree to the species tree  
We now reconcile the inferred gene tree with the species tree to obtain our gene-species 
reconciled tree (R in Figure 2.2e). In this process, we identify nodes in the gene tree that 
correspond to speciation versus gene duplication events (green and purple nodes on tree R, 
respectively). Note that reconciliation is not always possible or desirable; however, we will leave 
that consideration until the next section.  
 40 
This reconciliation process has two important outcomes. First, it roots the gene tree, 
allowing us to order the occurrence of ancestors in time. This is because the species tree is 
rooted; we know which ancestors occurred at what times based on outside information, such as 
the fossil record. By identifying speciation events in the gene tree, we learn the temporal order of 
ancestors in the gene tree.  
Second, reconciliation resolves ambiguous relationships within the gene tree. This is 
shown in the gray boxes in Figure 2.2d,e. The initial gene tree placed human and frog proteins 
together to the exclusion of the chicken protein. This does not match known species 
relationships. One might explain this through a complicated set of gene duplications and losses: 
maybe, after an early duplication, humans and frogs independently lost one copy of the gene and 
chickens lost the other. A far simpler explanation is that the gene tree incorrectly placed humans 
and frogs together. Reconciliation software uses a variety of strategies to determine whether to 
add evolutionary events or rearrange the tree topology.128 
For an ASR study, the key takeaway is that gene-species tree reconciliation yields a 
rooted gene tree that incorporates additional species-level information. This leads to higher 
quality reconstructed ancestral sequences and allows us to order those ancestors in time.129 
 
Reconciliation: The special case of microbial genes  
Although reconciliation should, in principle, yield a more accurate picture of the 
evolutionary history of a protein, in practice, reconciliation is not always possible. Problems are 
particularly likely for microbial genes. This is because we have relatively low confidence in the 
microbial species tree. (Indeed, some question the existence of a single microbial species tree, or 
even the concept of a microbial species130) As a result, ASR studies of microbial proteins have 
 41 
generally relied on unreconciled gene trees.131 Reflecting this reality, topiary does not reconcile 
the gene and species trees for datasets consisting of purely microbial genes. Instead, topiary roots 
the resulting tree using the midpoint approximation method.132 Because reconciliation is not 
performed, topiary does not label nodes with evolutionary events such as duplication or 
speciation. For the rest of this walk through, we will describe the approach assuming 
reconciliation is performed as this is a more complex version of the pipeline than the simplified 
microbial workflow.  
 
Reconstruct ancestors  
We can now reconstruct ancestral sequences (Figure 2.2f). We traverse the reconciled 
tree and estimate the sequences of every ancestor.133 For each ancestor, we consider sites 
individually. We calculate the likelihood of all 20 amino acids at that site given the ML 
parameters of the probabilistic model and the amino acids observed at that position in the 
alignment. From these, we determine the posterior probability (PP) for each amino acid. This is 
the likelihood of a given amino acid relative to the likelihoods of all amino acids. (In 
mathematical terms, PPi = Li/sum(Laa), where Li is the likelihood of amino acid i and sum(Laa) 
is the sum of the likelihoods of all amino acids.)  
We use these posterior probabilities to construct ML ancestors. For each site, we select 
the amino acid with the highest posterior probability. For example, at ancA site 5 in Figure 2.2e, 
we select “S” because it has a PP close to 1.0. This is an unambiguous reconstruction. Not all 
sites are this clear cut. At site 6, two amino acids are possible; we select the amino acid with the 
higher probability of the two (“L” over “M”). At site 7, there are multiple possibilities; however, 
we would still select the amino acid with the highest PP. For ancA, the sequence that maximizes 
 42 
the posterior probability at these positions is “SLE”. (Note: gaps are usually treated separately 
and reconstructed using maximum parsimony; see Section 3 for details).  
 
Evaluate results  
Before synthesizing and characterizing ancestral proteins, we evaluate their quality. We 
look at two metrics. The first is the average posterior probability for the ML amino acid at all 
positions in the ancestor. A well reconstructed ancestor would have an average PP of 1.0, 
meaning the model has high confidence in the sequence at all sites. At the other extreme, a 
completely ambiguous ancestor would have an average PP of 1/20 (0.05), meaning each site 
could have any one of the amino acids. Generally, ancestors in published studies have PP > 0.85.  
To assess the effect of phylogenetic uncertainty on inferences about the functions of 
ancestors, we synthesize two versions of every ancestor. The first is the ML ancestor, as 
described above. The second is the so-called altAll ancestor.134 For the altAll ancestor, we 
replace all ambiguous ML amino acids with the next most-probable amino acid. If an ancestor 
has 10 ambiguous sites, the ML and altAll would differ at all 10 of these sites. By functionally 
characterizing both the ML and altAll versions of an ancestors, we can determine which features 
are robust to uncertainty in the reconstruction.76,115,135–138 
The second quality metric is the branch support for a given ancestral node. Posterior 
probabilities measure our confidence in the ancestral sequence given a particular phylogenetic 
tree, but they do not measure our confidence in the tree itself. (Put another way, we have the 
sequence of an ancestral node, but how confident are we that the node existed?) Branch supports 
measure this confidence. We discuss how these are estimated in Section 3; for now, we focus on 
interpretation.  
 43 
A branch support measures our confidence that a given group of sequences cluster 
together, typically on a 0–100 scale. Figure 2.2g shows branch supports for two possible 
arrangements of the tree: placing paralog A with B (orange with blue) or paralog B with the fish 
outgroup (blue with green). In this example we have high support (98/100) for placing paralogs 
A and B together, with contrasting low support for separating them (2/100). For an ASR study, 
we need to have high confidence that an ancestral node existed (typically branch support >85) 
prior to characterizing the ancestral protein.  
 
THE TOPIARY PIPELINE  
The steps above are relatively complex, involving multiple different software packages 
for dataset construction, sequence quality control, alignment, model selection, gene tree 
inference, gene-species tree reconciliation, and ancestral reconstruction. Further, there are places 
where expert phylogenetic knowledge might be required. How does one obtain a species tree? 
How does one select which species to include when trying to reconstruct a specific ancestor? 
How does one evaluate whether a given ancestor is well reconstructed? The topiary package 
aims to streamline this process, simplifying the workflow and helping non-experts make 
evolutionarily informed decisions.  
Only a few steps in ASR require human input: defining the problem, checking the 
alignment, and characterizing the resulting ancestors. The rest of the steps are computational, 
with different software packages typically chained together via user manipulation. Given this 
process, we set out to build software that facilitates the few human-centric steps and then 
automates the rest of the pipeline (Figure 2.3). In this section we walk through the topiary 
pipeline, describing the design decisions and software used throughout. Here we emphasize the 
 44 
automated steps; the following Section 4 focuses on the human steps. Both will closely parallel 
the steps described in general terms in Figure 2.2. 
 
Software design  
One of our design goals was to use software that is state-of-the-art, up-to-date, and 
currently maintained. Topiary uses Muscle5 for alignment139; RAxML-NG for maximum 
likelihood gene tree and ancestral sequence inference140; GeneRax for gene-species tree 
reconciliation128; and PastML for gap reconstruction141. Under the hood, it uses the ETE 3 library 
for tree manipulations142; Biopython to access NCBI BLAST and the NCBI database143,144; 
python-opentree to interact with the Open Tree of Life taxonomic database120,145; and toytree for 
drawing trees146. This is implemented within a standard Python 3 scientific computing 
environment built around numpy and pandas.  
The pipeline (Figure 2.3) is broken into two stages: (1) Construct an MSA from the seed 
sequences and (2) Construct phylogenetic tree ancestors given the MSA. The first computational 
stage of the pipeline can be run on a user's personal computer (Linux, macOS, Windows); the 
second stage is best run using a high-performance computing environment and requires Linux or 
macOS. Users can run the pipeline via a few command-line programs, or work through each step 
individually and interactively in a Jupyter notebook. For ease of installation, the software and all 
dependencies are readily installed using the “conda” software environment. The software is also 
available for direct download at https://github.com/ harmslab/topiary. A collection of example 
datasets and Jupyter notebooks are available at https://github.com/ harmslab/topiary-examples.  
 45 
 
Figure 2.3. Summarized topiary ASR pipeline. The pipeline is a series of human and 
automatic steps (indicated on the left with brain and topiary icons, respectively). The 
approximate time, in hours, required for each step is indicated on the right. 
 
Our focus will be on topiary's algorithms and software settings; however, in passing, we 
want to note several aspects of the software. We refer users to the online documentation 
(https://topiary-asr.readthedocs.io/) for more details.  
1. Topiary has a fully documented Application Programming Interface (API), allowing 
users to run interactive analyses in a Jupyter notebook or write their own python scripts.  
 46 
2. Topiary is multithreaded, improving the speed of local BLAST queries, redundancy 
reduction, and NCBI downloads. It also takes full advantage of the parallelization support 
implemented in Muscle5, RAxML-NG, and GeneRax.  
3. Topiary allows users to restart interrupted pipelines without having to start over. This is 
particularly useful for the second stage, which can take a fair amount of time to run on a 
computing cluster.  
 
Stage 1: Seed to alignment  
As described in the Overview, the starting point for an ASR calculation is defining the 
problem. Topiary does this in a straightforward way: the user constructs a seed dataset that 
defines the paralogs of interest and the desired taxonomic distribution for the ASR study. For the 
example worked through in Figures 2.1 and 2.2, the seed might include three sequences: 
paralogs A and B from humans and a single protein from zebrafish (Figure 2.2a). The user 
prepares the seed dataset as a spreadsheet with four columns: sequence, species, name (e.g., the 
paralog identity), and aliases (what names this protein has across the various online databases). 
The species in the seed are used as “key species” in all downstream analyses. We go into further 
details on how to construct this seed dataset in Section 4 . From this starting point, topiary 
downloads high-quality homologous protein sequences from public databases and then generates 
a draft multiple sequence alignment.  
 
Initial dataset construction  
Topiary uses the seed sequences to BLAST against the NCBI non-redundant protein 
sequence database. To maximize the number of productive results, topiary automatically sets the 
 47 
taxonomic scope of the BLAST search. For non-microbial proteins, the scope is given by the 
taxonomic rank that encompasses the key species from the seed dataset, plus a user-defined 
expansion. For the example above—which included humans and zebrafish—the taxonomic rank 
is Vertebrata. With an expansion of one, the scope would be Craniata; with an expansion of two, 
the scope would be Chordata (Vertebrata → Craniata → Chordata). Using the default expansion 
of two, topiary would BLAST each of the seed sequences against the NCBI non-redundant 
protein database, limiting its results to Chordata. By default, topiary pulls down up to 5000 hits 
per seed with an intentionally generous e-value cutoff of 0.001. (Users have full control over the 
BLAST search parameters.) Note that a seed dataset containing only bacterial or archaeal 
sequences would be assigned a taxonomic scope of “All Bacteria” or “All Archaea.”  
In addition to this default method for building a sequence dataset, users can specify other 
sources of sequences including other NCBI BLAST databases, local BLAST databases, or 
previously saved BLAST XML files. Users can also manually add sequences by appending them 
to the initial spreadsheet.  
Once the initial dataset is constructed, topiary identifies each hit by reciprocal BLAST. It 
downloads proteomes for the key species in the seed dataset and constructs a combined local 
BLAST database. It then uses the hits above as queries against the key species BLAST database, 
searching the resulting reciprocal hits for text descriptions that match the aliases specified in the 
seed dataset. (See Section 4 for details about defining aliases.) It weights each hit by 2s/t where s 
is the BLAST bit score, and t is a user-defined parameter (default = 1). Finally, topiary calculates 
the posterior probability that the sequence is a given paralog by calculating the sum of the 
weights for all reciprocal hits that match a paralog alias and then dividing by the sum of the 
weights from all reciprocal hits (Frith, 2019).147 A sequence is assigned a paralog identity based 
 48 
on a user-defined stringency cutoff (default = 0.95). Multiple paralogs may be assigned if the 
sum of their posterior probabilities is above the cutoff. 
 
Redundancy reduction, quality control, and alignment  
This BLAST approach typically finds many more sequences than are necessary or 
practical for a standard phylogenetic analysis. We must therefore select sequences that sample 
the diversity in the dataset without compromising our ability to infer ancestors (step from Figure 
2.2b,c). Topiary selects a subset of sequences using a combination of taxonomy, sequence 
identity, and sequence quality. By default, topiary aims to build an alignment with approximately 
one sequence per site in the average length of seed sequences. If our seed sequences were 100 
amino acids long, topiary would try to build an alignment with 100 sequences. This prevents 
over-fitting and makes later computational steps faster. (Users can change the target alignment 
size if desired.)  
Topiary uses four strategies to decrease the size of the dataset while maintaining dataset 
quality. First, sequences defined in the initial seed dataset (Figure 2.2a) are kept, regardless of 
their quality score or redundancy. This means users can pre-specify sequences they need in their 
final alignment.  
Second, for datasets containing non-microbial genes, topiary selects sequences based on 
their placement on the species tree rather than solely based on their identity. (For microbial 
datasets, topiary lowers redundancy based on sequence identity alone because microbial species 
trees are poorly resolved.) When lowering redundancy in a species-aware fashion, topiary takes 
the desired alignment size and then divides this “budget” across the species seen in the dataset. 
The algorithm is shown in Figure 2.4 for a hypothetical dataset with seven orthologous proteins 
 49 
and a target alignment size of five. Topiary starts by downloading the species tree from the Open 
Tree of Life for all represented species. It then assigns the deepest ancestral node on the tree a 
budget of five sequences. Topiary traverses the tree, from ancestor to tips, splitting the sequence 
budget as evenly as possible among descendant lineages at each step. In the example, it assigns 
two sequences to the ancestor of bony fishes and three sequences to the ancestor of tetrapods. On 
the bony fish lineage, it assigns one sequence each to the zebrafish and salmon, meaning these 
sequences will be kept in the final dataset. On the tetrapod branch, the algorithm continues, 
assigning one sequence to the frog and two sequences to the ancestor of amniotes. It then gives 
one sequence to the bird/reptile ancestor (dark green clade) and the other sequence to the 
mammal ancestor (light green clade).  
Because of this explicitly taxonomic strategy, sequences that are taxonomically important 
are not removed from the dataset, even if their quality is lower than other, taxonomically 
redundant, sequences. The frog sequence in Figure 2.4, for example, has a long lineage-specific 
insertion. But because it is the only amphibian representative in this (toy) alignment, it is 
preserved. We leave the decision of whether or not to keep this sequence up to the user when 
they review the alignment. We also note that, in practice, there is enough sequence and 
taxonomic diversity in current databases that we rarely need to trade alignment quality for 
taxonomic diversity.  
Third, lowering sequence redundancy, topiary preferentially keeps sequences that align 
well to the seed sequences. We take this alignment-focused approach because ASR can only 
reconstruct ancestral states for columns seen in many modern proteins. Lineage-specific 
insertions and deletions do not contribute to the ancestral inference and, further, may interfere 
with MSA construction. To calculate alignment quality, topiary aligns clusters of sequences from 
 50 
closely related organisms to the whole seed sequence dataset using Muscle5. It identifies “dense” 
columns in which most sequences have non-gap characters (the gray shaded boxes in Figure 
2.4). It then calculates two quality scores for each sequence. First, it calculates the proportion of 
dense columns with non-gap characters in the sequence. Lower proportions indicate truncated 
sequences. Second, it looks for long stretches of non-gap characters that are not in “dense” 
columns, indicating a lineage-specific insertion. In our example dataset, topiary would select the 
human and chicken sequences over mouse and lizard, as these have the best alignment scores 
(Figure2. 4). 
 
Figure 2.4: Topiary redundancy reduction and quality control. This analysis starts with seven 
sequences (taken from seven organisms) with the goal of retaining five for the downstream 
analysis. The numbers next to the ancestral nodes on the tree are the budget allocated for all 
descendants: five for all organisms, two for the fishes, three for tetrapods, etc. The “keep” 
column indicates which sequences are kept for further analysis after the redundancy reduction 
step. A schematic alignment is shown on the right, with poorly aligned and missing regions 
labeled. The alignment quality is used to select which sequences to keep within taxonomic 
blocks (human/mouse and lizard/chicken, in this example). 
 
Fourth and finally, there are a few steps where topiary lowers redundancy based on 
shared sequence identity. Whenever this is done, topiary chooses the sequence to keep based on 
its relative quality. It calculates an identity score by performing a pairwise alignment with the 
 51 
Biopython pairwise2.align.localxx function and dividing the score by the length of the shorter 
sequence. If this number is above a specified identity cutoff, topiary selects which of the two 
sequences to discard based on a rank ordered vector of sequence features. These features are: 
“Sequence length deviates from median sequence length by more than 25%” > “Low quality” > 
“Partial” > “Predicted” > “Precursor” > “Hypothetical” > “Isoform” > “Structure” > “shorter 
sequence” > “random choice”. Some of these features are calculated by topiary (i.e., sequence 
length), others are extracted from NCBI sequence descriptions (i.e., Partial, Hypothetical). This 
process enriches the final dataset for higher-quality protein sequences.  
This protocol yields a relatively clean dataset with 5% more sequences than our target 
alignment number. We leave these extra sequences in place so we can manually delete the worst 
aligners upon visual inspection and still have our approximate target number of sequences.  
 
Alignment  
Topiary uses Muscle5 with its default parameters to generate the MSA (Figure 2.2c).139 
We selected this algorithm due to its demonstrated high performance, as well as the extremely 
fast “super5” algorithm that is useful for generating draft alignments for large datasets. Advanced 
users can set all Muscle5 options via the API.  
There are differing views about whether to manually edit alignments or not.124,125 The 
topiary pipeline leaves this decision in the hands of the user. The goal for topiary is to make the 
task of finalizing an alignment relatively painless by carefully filtering for well-aligned 
sequences and by using state-of-the-art alignment software: most of the sequences should already 
be well aligned. Over the years, we have settled on a 5% approach: automate up to the point 
where the alignment is 95% done, and then finalize the alignment with a human brain. This has 
 52 
proven much more practical than designing a complicated (and thus fragile and unpredictable) 
heuristic to completely automate alignment construction.122 In practice, most of our manual work 
consists of deleting a handful of problematic sequences, followed by global realignment in 
Muscle5. (See Section 4 for details.)  
 
Stage 2: Alignment to ancestors  
In stage 2, we go from our alignment to ancestral sequences (Figures 2.2c–g and 2.3). 
We selected RAxML-NG140 as our primary phylogenetic package. One key reason for this choice 
was that RAxML-NG integrates well with GeneRax, a clear choice for reconciling gene and 
species trees. Both GeneRax and RAxMLNG use the same underlying computational 
phylogenetics library—libpll148—thus ensuring internally consistent implementations of 
evolutionary models. Further, GeneRax was explicitly tested with RAxML-NG, making this the 
most conservative choice of software combinations. Finally, we wanted to calculate branch 
supports for our species-reconciled gene tree (Figure 2.2g). Because GeneRax does not 
implement any fast-branch support methods, we estimate branch support by non-parametric 
bootstrap.149 RAxML-NG can return pseudoreplicate alignments matched to pseudoreplicate 
trees. This allows us to feed bootstrap pseudoreplicates into GeneRax as separate, parallel 
calculations and thus conveniently determine branch supports on our species-reconciled gene 
trees.  
 
Infer the evolutionary model  
The first step in a maximum likelihood phylogenetic analysis is determining the 
maximum likelihood model of sequence evolution. This includes the matrix for amino acid 
 53 
substitution (i.e., LG, JTT, WAG, etc.), the stationary frequencies for that model, rate variation 
parameters (Γ distribution, rate categories, etc.), and the proportion of invariant sites. Topiary 
uses a conventional method to find the best model.150 It uses RAxMLNG to generate a maximum 
parsimony tree from the alignment. It then optimizes branch lengths and other parameters using 
all 360 combinations of these model parameters implemented in the computational library that 
underlies RAxML-NG and GeneRax. Finally, it ranks these models based on a corrected Akaike 
Information Criterion, which penalizes models with excess parameters to prevent overfitting.  
Although this protocol is done automatically, topiary returns a variety of statistics 
including AIC (Akaike Information Criterion), AICc (Corrected Akaike Information Criterion), 
and BIC (Bayesian Information Criterion) to help users who want more control over model 
selection. Via the API, users can also specify a custom input tree or a subset of the models to test. 
(Note: as of the current version, topiary excludes the LG4M and LG4X models, as these cause 
GeneRax to crash during gene-species tree reconciliation.)  
 
Build a maximum likelihood gene tree  
Topiary next infers an ML gene tree using the inferred phylogenetic model with the 
default RAxML-NG settings for the “—search” protocol. This starts the inference from 10 
random trees and 10 different parsimony trees. It then optimizes the tree topology using a subtree 
pruning and regrafting (SPR) subtree cutoff of 1, with an automatically selected fast versus slow 
SPR radius. Branch lengths are optimized using the NR-FAST algorithm. The tree with the 
highest likelihood is selected and used for downstream analyses (Figure 2.2d, tree G). Advanced 
users have full access to all RAxML-NG options via the topiary API.  
 
 54 
Reconcile gene and species tree  
The next step in the pipeline is to reconcile the gene tree with the species tree (Figure 
2.2e). (Note, this reconciliation step is skipped for datasets containing only microbial genes.) 
Reconciliation automatically roots the tree and has been shown to improve the quality of 
reconstructed sequences.129 For this purpose, we use GeneRax, a new high-performance program 
for reconciling gene and species trees. Unlike other, heuristic, methods, GeneRax explicitly 
models evolutionary events (speciation, duplication, loss, and lateral gene transfer) as well as 
sequence evolution (e.g., the LG model).128 If the gene and species trees are discordant, GeneRax 
can either rearrange the gene tree to follow the species tree or incorporate an evolutionary event 
(such as duplication) to account for the discordance. GeneRax finds the maximum likelihood 
reconciled tree that balances the signal from the aligned sequences against the plausibility of the 
evolutionary events required to generate that signal.  
Topiary uses the ML evolutionary model and ML gene tree inferred previously as inputs 
to GeneRax. For the rooted species tree, topiary automatically downloads the most recent 
synthetic tree from the Open Tree of Life (OTL) database.120,145 (Previous steps in the pipeline 
ensure that all sequences that have made it to this step come from species that are present in the 
OTL database.) Any polytomies in this tree are resolved arbitrarily prior to the reconciliation 
inference. Topiary runs GeneRax with the default parameters128: topology optimization using 
rounds of SPR with increasing radius (from 1 to 5) using the UndatedDL reconciliation model. 
The UndatedDL model accounts for duplication and loss events. Topiary users can select the 
UndatedDTL model, which allows lateral transfer, if they expect lateral gene transfer for their 
genes of interest.  
 55 
The resulting tree is a maximum likelihood species-reconciled gene tree with optimized 
branch lengths and nodes labeled with inferred evolutionary events (speciation, duplication, or 
transfer). GeneRax returns a variety of other outputs that are made accessible to topiary users, 
but only the reconciled tree is used further in the pipeline.  
 
Reconstruct ancestors  
The next step is to infer sequences of ancestral nodes on the species-reconciled gene tree 
(Figure 2.2f). For this, we use RAxML-NG, which implements a standard marginal ancestral 
reconstruction method.133 (This differs from previous versions of RAxML, which used a non-
standard reconstruction method that was not comparable to other approaches.) RAxML-NG finds 
the amino acid at each site in each ancestor that maximizes the likelihood of observing the 
sequence alignment given the tree, branch lengths, and phylogenetic model. This returns a matrix 
of posterior probabilities for each amino acid at each site in the alignment for each ancestral 
node. Topiary extracts the sequence of the maximum likelihood ancestor, as well as the so-called 
altAll version of the ancestor that incorporates alternate reconstructed amino acids at ambiguous 
positions. It uses a default cutoff of 0.25 to identify ambiguous sites134; this can be set by the 
user.  
The evolutionary models used by RAxML-NG do not explicitly treat gaps; therefore, the 
first draft of the reconstructed ancestor will be ungapped. Topiary assigns gaps by treating them 
as characters during ancestral character reconstruction. For this purpose, topiary uses the 
DOWNPASS151 algorithm as implemented by the PastML package141. The final output for this 
step consists of the gapped sequences of both maximum likelihood and altAll ancestors for each 
node. These have associated statistical supports: posterior probabilities for each reconstructed 
 56 
amino acid and support for gaps. Topiary also puts out a variety of summary graphs to help select 
high quality sequences (see Section 4). 
 
Branch supports  
To determine branch supports (Figure 2.2g), topiary uses non-parametric 
bootstrapping.149 Briefly, RAxML-NG generates pseudoreplicate alignments by sampling 
columns, with replacement, from the input alignment. RAxML-NG then infers an evolutionary 
tree for each of these alignments. Topiary generates up to 1000 bootstrap pseudoreplicates, using 
RAxML-NG's automatic Extended Majority Rules (autoMRE) method with a cutoff of 0.03 to 
determine the exact number. The output from RAxML-NG is a collection of pseudoreplicate 
alignments and pseudoreplicate gene trees. Because we are reconstructing ancestors on the 
reconciled tree, we pass each pseudoreplicate alignment and gene tree into GeneRax for gene-
species tree reconciliation, yielding a final collection of pseudoreplicate reconciled trees. Topiary 
then uses RAxML-NG to map these pseudoreplicate reconciled trees onto the ML reconciled tree 
as branch supports. Topiary also assesses convergence for the branch support estimate using the 
“—bsconverge” option.  
 
Output  
Topiary generates a single directory containing all ancestors, all trees, and an html file 
that allows users to browse their results. This directory can be shared with others without 
requiring the recipient to have installed topiary. The html file can be opened in any web browser 
and includes information to help users assess the quality of each reconstructed ancestor. In 
addition to this html output, topiary also writes the output for each step into individual 
 57 
directories, allowing users to access the intermediate steps and log files from each software 
package employed in the pipeline.  
 
PROTOCOL  
This section complements the previous section, which focused mostly on the 
computational steps in the pipeline (Figure 2.3). We will expand on the steps that require human 
intervention using the LY86/LY96 protein family to help demonstrate specific considerations and 
features. More detailed instructions are available in the topiary online documentation 
(https://topiary-asr.readthedocs.io).  
 
Construct a seed dataset  
The first step in a topiary ASR calculation is constructing a seed dataset (Figure 2a). This 
dataset defines protein family members of interest and the distribution of these proteins across 
species. Topiary uses this seed dataset to automatically find and download sequences to put into 
the alignment and, ultimately, evolutionary tree. As discussed in the previous sections as well as 
the documentation, thoughtful consideration goes into selecting proteins of interest for an ASR 
study and determining the taxonomic distribution of this protein family before key species are 
chosen for the seed dataset. An example for the LY86/LY96 protein family, a pair of closely 
related innate immune proteins, is shown in Table 2.1.  
 
Run the seed-to-alignment pipeline  
At this point the seed dataset is ready to be passed to the topiary-seed-to-alignment script. 
This script uses BLAST to build a dataset of thousands of protein sequences (Figure 2.2b), does 
 58 
quality control, lowers redundancy, and then generates an alignment of sequences (Figure 2.2c). 
This generally takes less than an hour on a modern laptop. The final output consists of a single 
spreadsheet and a single FASTA file holding the alignment. 
 
Table 2.1: Example seed dataset. 
name species sequence aliases 
LY96 Homo sapiens MLPFLFF... ESOP1;Myeloid Differentiation Protein-2;MD-
2;lymphocyte antigen 96;LY-96     
LY96 Danio rerio MALWCPS.. ESOP1;Myeloid Differentiation Protein-2;MD-
. 2;lymphocyte antigen 96;LY-96   
LY86 Homo sapiens MKGFTAT... Lymphocyte Antigen 86;LY86;Myeloid 
Differentiation Protein-1;MD-1;RP105-associated 
3;MMD-1  
LY86 Danio rerio MKTYFNM. Lymphocyte Antigen 86;LY86;Myeloid 
.. Differentiation Protein-1;MD-1;RP105-associated 
3;MMD-1  
 
Inspect and edit alignment  
Before reconstructing a phylogenetic tree and ancestors, we strongly recommend 
inspecting and possibly editing the alignment (Figure 2.2c). There are a variety of pieces of 
software for visualizing alignments, including AliView152, JALView153, and MEGA154. We 
generally use AliView because of its balance of utility and simplicity.  
There are differing views on whether to manually edit an alignment124,125; the topiary 
package allows a user to manually edit their alignment but does not require it. We generally 
recommend making a few adjustments to alignments. We describe our approach to editing 
alignments in detail in the topiary documentation (https://topiary-asr.read 
thedocs.io/en/latest/protocol.html). Importantly, if we edit an alignment, we publish the 
 59 
alignment as supplemental material in the resulting manuscript so others can reproduce our work. 
Once the alignment is finalized, it can be read back into the topiary spreadsheet with the 
command line script topiary-fasta-into-dataframe.  
 
Perform the ancestral inference  
We recommend performing the ancestral inference in a high-performance computing 
environment. Because of different parallelization requirements, the ancestral inference step uses 
two scripts run in sequence (alignment-to-ancestors and bootstrap-reconcile). The first script 
infers the evolutionary model, builds the ML gene tree, reconciles the gene and species trees, 
reconstructs ancestors, and generates bootstrap pseudoreplicate gene trees (Figure 2.2d–g). It 
writes out a summary tree at each step (Figure 2.5a–d). Alignment-to-ancestors should take 
about a day for a reasonable alignment (~1000 columns, ~500 sequences) running on a 
reasonable compute node (~30 cores). The second script reconciles each pseudoreplicate gene 
tree to the species tree and constructs the final branch supports (Figure 2.5e). Bootstrap 
sampling the gene-species reconciliation is computationally intensive but can be readily 
parallelized. It will likely take approximately a week spread across several cores. As discussed in 
the next section, if one is using a reconciled gene/species tree it is important to check the validity 
of the reconciliation before moving onto the bootstrap-reconcile step. If the analysis is being 
done without gene/species tree reconciliation—that is, for microbial genes—only the steps 
shown in Figure 2.5a,d are performed. 
 
 
 
 60 
Checking gene/species-tree reconciliation  
Before selecting ancestors to characterize, it is important to make sure the phylogenetic 
tree is reasonable. The probabilistic models used in ASR are powerful, but do not capture all 
possible evolutionary events. One common problem is incomplete lineage sorting (ILS), where a 
gene duplicates but exists as several variants in a population when speciation occurs.155 Different 
duplicates are preserved along the descendant lineages, meaning this cannot be classified as a 
simple duplication or speciation event. ILS is a general problem with all ASR methods and is 
specifically noted as being outside the scope of GeneRax.128 Another problem is gene fusion, 
where different parts of a single gene have different evolutionary histories. The methods used by 
topiary all assume a single genetic history for each protein sequence. If we force such a model to 
fit a fused alignment, we will likely end up with a nonsensical evolutionary tree and meaningless 
ancestral sequences.  
In the worst case, ILS and gene fusion can lead to nonsensical ancestors that still have 
high branch supports and high posterior probabilities. Looking at the reconciled tree (Figure 
2.5b) can help you decide if this might apply to your family. A standard signal for both ILS and 
gene fusion is high discordance between the inferred gene and species trees. This will manifest 
as an unexpectedly high number of duplication and/or transfer events in the reconciled tree. If, 
for example, you are studying a protein family where you expect two paralogs, but you observe 
20 duplication events scattered throughout the tree, there is a good chance that the evolutionary 
models used for ASR are not appropriate for your protein family. Topiary warns users in its 
summary output if there are an anomalous number of duplication events, suggesting model-
violation.  
 61 
 
Figure 2.5. Example trees at each step in the ASR calculation. Summary trees from an ASR 
inference using a toy alignment with seven LY96 sequences (orange) and seven LY86 sequences 
(blue). Black arrows indicate steps done by the first script (alignment-to-ancestors); gray arrows 
indicate steps done by the second script (bootstrap-reconcile). G, S, and R indicate gene, species, 
and species-reconciled gene trees throughout the pipeline, respectively. (a) The ML gene tree 
inferred by RAxML-NG. Branch lengths are proportional to substitutions/site. This tree has 
several inferred relationships that are discordant with the species tree (yellow exclamation 
points). (b) Topiary uses the gene tree from panel A and the Open Tree of Life species tree (S) as 
inputs to GeneRax, constructing the reconciled tree (R). The discordant species relationships are 
resolved (green check marks) and each node is now labeled as either a duplication or speciation 
event (purple and green, respectively). (c) Tree with posterior probabilities for ML ancestors 
mapped onto nodes as an orange color gradient. (d) Topiary generates 1000 pseudoreplicate gene 
trees and maps the resulting branch supports onto nodes as a black color gradient. (e) The final 
output of topiary is the reconciled tree with evolutionary events, ancestor posterior probabilities, 
and branch supports mapped onto all ancestral nodes. In this figure, the labeled speciation events 
have been dropped for clarity. 
 
If your protein has more than one domain, one option would be to try to reconstruct each 
domain independently. If the discordance disappears, it is good evidence for a gene fusion event. 
If the discordance remains, proceed with extreme caution.  
One way forward in the face of discordance is to compare the sequences—and functional 
characteristics—for any ancestors of interest reconstructed using either the gene tree alone or the 
reconciled gene tree. (Topiary returns ancestors inferred on both trees.) If the results for 
 62 
ancestors reconstructed on the two trees differ dramatically, one cannot infer the ancestral 
sequence with confidence given standard ASR methods. ILS and gene fusion are longstanding 
problems in phylogenetics; treating them requires expert input. 
 
 
Figure 2.6. Graphs for evaluating ancestor quality. (a) The final bootstrap supported gene-
species reconciled tree built from an example set of 14 sequences. Reconstructed ancestral 
sequences at each node are labeled with a unique name. Duplication events are marked in purple. 
Each node is labeled with a circle whose inner color represents the sequence's average posterior 
probability (orange color gradient). The level of branch support from bootstrapping analysis is 
denoted by the ring around each node circle (black color gradient). Branch lengths represent the 
average number of amino acid substitutions per site and can be estimated using the scale bar. (b, 
c): Ancestor summary plots written out by topiary. The black points show the probability of the 
most likely amino acid at each position. The distribution of these probabilities is given by the 
histogram on the right. The average posterior probability is the mean of these values. The red 
points show the probability of the second most likely amino acid at each position, with its 
distribution on the right. The horizontal dashed line shows the minimum PP cutoff for the altAll 
reconstruction. Shaded gray regions indicate gaps; vertical purple dashed lines represent 
ambiguously gapped positions. (b) Summary for anc4 (tetrapod LY86 ancestor) for the 14-
sequence alignment (see arrow in a). (c) Summary for the equivalent ancestor from a 188-
sequence alignment and phylogenetic tree for LY86/LY96. 
 
 63 
Table 2.2: Protein families used to validate the topiary pipeline. 
 
Taxonomic Average seed Number of seqs ML substitution 
Protein distribution sequence length in alignment model 
Islet Amyloid 
Polypeptide/ Calcitonin Vertebrates 37 39 JTT+G8 
gene-related peptide 
S100A5 & S100A6 Amniotes 94 104 JTT+G8 
Cytochrome C All life 109 121 WAG+G8 
Ribonuclease HI Bacterial 163 181 LG+G8 
LY86 & LY96 Vertebrates 164 188 VT 
Micrococcal nuclease Bacterial 200 182 LG+G8 
Chalcone Synthase Plants 390 107 DEN+G8 
tight junction protein 1 Vertebrates 1705 121 JTT+G8+FO+IO 
 
 
 
 
Figure 2.7. Validation of the topiary pipeline. Panels show topiary results generated for the 
eight protein families from Table 2. Colors indicate the family in question (see panel e for color 
legend). Panels a–c show topiary alignment quality as measured by three metrics: (a) Relative 
alignment length (number of columns in alignment divided by the average length of seed 
 64 
Figure 2.7 (continued) sequences); (b) The fraction of seed sequences lost during redundancy 
reduction; (c) Species tree imbalance (measured by the Colless Index of the species tree for the 
sequences in the alignment). (d): Number of pseudoreplicates required for converged branch 
supports for the gene tree (G) versus the reconciled tree (R) for the LY86/LY96 family. (e) 
Average posterior probabilities for all ML ancestors plotted against the total branch length 
between that ancestor and the nearest modern sequence on the tree. More negative values on the 
x-axis are deeper in the tree. Posterior probability starts at 1.0 near the tips of the tree and decays 
for more ancient ancestors. The dashed line indicates a “rule of thumb” of 0.85 for usable 
ancestral sequences. 
 
Selecting ancestors  
After checking for a reasonable reconciled tree and running the bootstrap-reconcile 
script, one can identify ancestors that are amenable to reconstruction based on their average 
posterior probability (Figure 2.2f) and branch supports (Figure2. 2g). As shown in Figure 2.6a, 
topiary maps these values onto the final tree as color gradients. One typically wants ancestors 
with average posterior probabilities and branch supports above 0.85 and 85, respectively. Note 
that the posterior probabilities and branch supports are independent of one another. For example, 
ancestor 11 has high branch support (dark black circle exterior) but a low ancestral posterior 
probability (light orange circle interior); ancestor 4, on the other hand, has low branch support 
but high posterior probability. As noted in the overview section, it is important to select ancestors 
with both high branch support and high posterior probabilities. (Note that this tree has low 
supports overall because it was built from a demonstration alignment with only 14 sequences.)  
In addition to summary statistics on the tree, topiary provides more detailed information 
about each ancestor. Figure 2.6b,c show minimally modified versions of graphs that topiary 
automatically writes out for each ancestor. Figure 2.6b shows site-specific posterior probabilities 
for the reconstructed LY86 protein from the ancestor of tetrapods, anc4 (see arrow in Figure 
2.6a). The average posterior probability (0.825) is the mean of the black points. Some sites have 
unambiguous reconstructions (black points have PP = 1.0), but many other sites have plausible 
 65 
alternate reconstructions with similar PP to the ML reconstruction (red). This ancestor has 31 
sites that topiary classifies as ambiguous, meaning that there are 31 positions where the alternate 
reconstruction has a posterior probability above 0.25 (graphically, the number of red points 
above the dashed horizontal line). Finally, topiary reports sites for which it is ambiguous whether 
the position should be reconstructed as an amino acid or as a gap (site 27, for example).  
We can compare the results in Figure 2.6b to the tetrapod LY86 ancestor returned by the 
pipeline for a 188-sequence alignment of LY86/LY96 sequences without manual MSA edits 
(Figure 2.6c). Upon increasing our number of sequences from 14 to 188 in the alignment, the 
average posterior probability for this ancestor increases significantly, from 0.825 to 0.952. We 
also see fewer ambiguous sites and no ambiguous gaps. Overall, this is a much higher-quality 
ancestor that is likely amenable to experimental characterization.  
We note, however, that there are still 21 ambiguous positions with alternate 
reconstructions whose posterior probabilities are above 0.25. This is real phylogenetic 
uncertainty that is unlikely to be resolved with the addition of more protein sequences. To 
account for this uncertainty, we recommend experimentally characterizing both the ML protein 
and the “altAll” version of the same protein.134 Topiary automatically generates both versions of 
every ancestor. The altAll ancestor reconstruction is made up of the ML sequence with every 
ambiguous ML amino acid replaced with its next most likely alternate. In other words, it selects 
the second-most-likely amino acid at every site where the red point is above the horizontal 
dashed line. For the ancestor shown in Figure 2.6c, the ML and altAll versions of the ancestor 
will differ at 21 positions.  
The altAll can be thought of as “worst case” for the reconstruction, allowing one to ask 
what the consequences would be if the reconstruction got every ambiguous site wrong. The true, 
 66 
historical ancestral sequence is likely somewhere between the ML and altAll ancestors, but more 
like the ML than altAll sequence. If, upon synthesis and characterization, both the ML and altAll 
ancestors have the same measured property, that property is robust to uncertainty in the 
reconstruction and likely reflects the ancestral state of the protein. In previous experiments, the 
altAll ancestor has behaved similarly to the ML ancestor.76,115,135–138 
 
On black boxes  
Topiary automates much of the drudgery of an ASR study, going from a seed dataset to 
reconstructed ancestors with minimal input. One of our goals is to make the technique accessible 
for non-experts. It should not, however, be treated as a black box. To help users better understand 
what topiary does at each step, we have provided Jupyter notebooks that can either be run locally 
or via Google Colab that break the topiary pipelines into individual steps 
(https://github.com/harmslab/topiaryexamples). This also provides a framework for users to 
modify or extend the pipelines to fit their specific needs.  
One final note. Generating ancestors is relatively easy, but experimentally characterizing 
them can take years; it is worth some caution upfront. Specifically, if the species-reconciled gene 
tree has a huge excess of non-speciation events, pause. Do not trust results from ancestors with 
low branch supports or low posterior probabilities. And, finally, characterize the robustness of 
experimental results to phylogenetic uncertainty using altAll versions of ancestors. Following 
these rules will ensure the quality of your reconstructed ancestors and thus evolutionary 
conclusions.  
 
 
 67 
PIPELINE VALIDATION  
In this final section, we describe how we validated the topiary pipeline itself. Our first 
level of validation is part of the software package. We developed topiary using a test-driven 
development framework, meaning we write test code in parallel with our functional code. As of 
this writing, 87% of the lines in the topiary codebase are automatically tested for correct inputs, 
outputs, and logic every time we update any part of the code. We paid special attention to core 
functions in our test development. For example, the module that interfaces with RAxMLNG has 
100% test coverage. Such efforts give us confidence that the software should behave as expected.  
We also validated that topiary is useful for realistic ASR studies. We solicited seed 
datasets from scientists studying a wide variety of proteins from different species (Table 2.2). 
This allowed us to test the pipeline on real inputs from different classes of proteins, protein sizes, 
and taxonomic distributions. We then ran these eight seed datasets through both stages of the 
pipeline. We did no manual corrections to the alignments, so these represent fully automatic 
outputs with no human input beyond initial seed dataset construction.  
Much of what topiary does is to connect existing pieces of software. Rather than 
attempting to test each component, we focused our validation on the connections between 
components. The first step we checked was that of going from BLAST to alignment. Our 
BLAST/ reciprocal BLAST strategy is standard; however, topiary reduces dataset size in a novel 
way (Figure 2.4). We therefore compared topiary to a strategy that lowered redundancy using 
sequence identity alone. We performed BLAST/reciprocal BLAST on all eight datasets, reduced 
redundancy using either topiary or CD-HIT156, and then aligned the resulting datasets using 
Muscle5. For each dataset, we selected a CD-HIT redundancy cutoff that yielded the same 
 68 
number of sequences as the topiary dataset. We then compared the resulting sequence-identity-
alone versus topiary datasets with three quality metrics (Figure 2.7a–c).  
The first metric was alignment length relative to average seed sequence length. A higher 
value indicates the presence of long, potentially poorly aligned, sequences in the alignment. We 
found that topiary significantly outperformed a sequence-identity-alone approach using this 
metric (Figure 2.7a). While the sequence-identity-alone approach gave alignments up to 35-
times longer than the seed sequence, the longest alignment coming from the topiary pipeline was 
only 5 times longer than the seed sequences. We next measured retention of key sequences. As 
expected, topiary never dropped key sequences from the dataset, while the simple redundancy 
cutoff was highly variable in this metric (Figure 2.7b). As a third comparison, we characterized 
the imbalance of the species tree corresponding to the final sequence dataset using the Colless 
Index157 as calculated by DendroPy158 (Figure 2.7c). Because topiary uses a taxonomically 
informed sampling strategy, we predicted the topiary trees would be more balanced than those 
from the dataset reduced by simple sequence identity. This was not true; both approaches gave 
similarly balanced trees for each dataset. This suggests that the tree imbalance reflects the real 
taxonomic diversity in the sequence databases for these proteins, rather than a problem with how 
that diversity is sampled to make tractably-sized datasets.  
We also validated the reliability of the branch supports generated by topiary. Topiary 
calculates branch supports by generating pseudoreplicate gene trees in RAxML-NG, then passing 
them into GeneRax for reconciliation. By default, RAxML-NG generates bootstrap replicates 
until the supports converge on the gene tree. We wanted to verify that the branch supports on the 
reconciled tree converged reliably, even though the number of pseudoreplicates required was 
determined by convergence on the gene tree. To do this, we performed an a posteriori 
 69 
convergence test on the bootstrap replicate trees generated for either the gene tree alone or the 
reconciled gene trees. For this, we used the RAxML-NG “—bsconverge” analysis mode with a 
default cutoff of 0.03.159 The results for the LY86/ LY96 family are shown in Figure 2.7d. The 
gene tree required over 600 bootstrap replicates for converged branch supports; the reconciled 
tree required <300. We observed similar results for all eight families, with the gene tree taking 
more replicates to converge than the reconciled tree. This indicates that the species tree is indeed 
constraining the gene tree and that the bootstrap supports converge with our standard protocol. 
As a final validation of the pipeline, we reconstructed all ML ancestors for the eight 
protein families (1027 ancestors in total). We then calculated the average posterior probability of 
each ML ancestor and plotted this against the branch length between that ancestor and the nearest 
modern protein sequence (Figure 2.7e). An ancestor identical to a modern protein would be 
plotted at zero on the x-axis; a more negative value corresponds to more substitutions per site 
between that ancestor and the most similar modern protein. In this plot, we observed that 
ancestral sequences close to the tips of the tree were better reconstructed than earlier ancestors. 
This is expected: more recent ancestors require less evolutionary extrapolation than more ancient 
ancestors. Despite the drop in quality for our deepest ancestors, however, we found that most 
reconstructed sequences are likely usable for reconstruction studies. Only 13 of the 1027 
ancestors had average posterior probabilities below 0.90. This demonstrates that the pipeline—
even without manual inspection and editing of the sequence alignment—generally yields high 
quality ancestral sequences.  
 
 
 
 70 
CONCLUSION  
The resources for performing high-quality ancestral sequence reconstruction already 
exist, but the complexity of the process and the importance of expert knowledge create a barrier 
to wider adoption; the topiary pipeline overcomes this barrier. It requires only that scientists 
define an evolutionary question and scope, and then lets computers do the rest, integrating 
powerful existing software to give users useful output for reconstructing and evaluating ancestral 
sequences. We hope this will improve the quality of ASR studies by codifying best practices and 
will increase the accessibility of the technique for protein scientists from a wide variety of 
backgrounds. 
 
BRIDGE TO CHAPTER III 
 With this topiary ancestral sequence reconstruction tool in hand, we were able to 
reconstruct and characterize the bony vertebrate, tetrapod, and teleost ancestral Toll-like receptor 
4 complexes used in the next chapter. Being able to resurrect and functionally characterize these 
key ancestral states was pertinent to revealing the origin of difference in ligand specificities 
between the human and zebrafish TLR4 complexes. 
 
 
 
  
 71 
CHAPTER III 
TOLL-LIKE RECEPTOR 4 EVOLUTION OF LPS SPECIFICITY IN EARLY VERTEBRATES 
AND DIVERGENCE IN ZEBRAFISH 
 
*This chapter contains unpublished co-authored material. 
 
Author contributions: Orlandi KN and Harms MJ designed the study. Orlandi KN designed and 
performed experiment, analyzed the data, and wrote the text. Harms MJ was the funding 
acquisition lead and oversaw the experiments and writing. Sánchez-Borbón J constructed the 
CD14 ancestors used in the study and contributed input on experiments. Brown C performed 
site-directed mutagenesis on several plasmids and executed one of the experiments included in 
this text. Robinson C helped to execute the zebrafish experiments. Guillemin K contributed 
experimental guidance. 
 
  
 72 
ABSTRACT 
 Toll-Like Receptor 4 (TLR4) plays a pivotal role the innate immune system in humans by 
activating the inflammatory response to lipopolysaccharide (LPS) from Gram-negative bacteria. 
Dysregulation of TLR4 activation can cause excessive inflammation resulting in myriad health 
problems including sepsis, heart disease, chronic arthritis, and other conditions. Much of our 
understanding about inflammation comes from careful studies of model organisms. One 
powerful model is the zebrafish, Danio rerio, which is often used to study mechanisms of human 
disease and host-microbe interactions. It was recently discovered that zebrafish express a 
functional TLR4. This could allow zebrafish to be a valuable, tractable model for the human 
innate immune response. This would require that we can relate zebrafish and human receptor 
functions. Here, we explored the function of zebrafish TLR4 in vivo and in vitro. We discovered 
that zebrafish TLR4 is activated in vitro by a class of LPS molecules that antagonize the human 
receptor, and that a unique structural feature is necessary for its activity. The mechanism of 
TLR4-induced inflammation in vivo also appears to be different than in humans. To understand 
the evolutionary context for the disparity between human and zebrafish TLR4 specificity, we 
used ancestral sequence reconstruction to infer and resurrect ancestral vertebrate TLR4 proteins 
and functionally compared them to several modern species. The results suggest complicated, 
species-dependent evolutionary trajectories originating from a low-sensitivity ancestral TLR4. 
Overall, this work will help guide future investigations using the zebrafish model of innate 
immunity by providing insight into the divergent functional roles of zebrafish and human TLR4. 
  
 73 
INTRODUCTION 
Toll-like receptor 4 (TLR4) is known to play a central role in the human immune defense 
against pathogens. TLR4 is a member of the Toll-like receptor (TLR) family of pattern 
recognition receptors. TLRs are type I transmembrane proteins expressed on innate immune cells 
and conserved across vertebrates. They recognize evolutionarily conserved molecules associated 
with danger and stimulate intracellular signaling cascades that activate the host immune 
response. TLR4 is responsible for discriminating host lipids from lipopolysaccharide (LPS), a 
component of Gram-negative bacteria outer.25–27 TLR4 was discovered because of its role 
mediating sepsis; sepsis occurs when the body produces excessive inflammation that damages 
host tissues in response to an infection. In 2017, sepsis accounted for almost 20% of all global 
deaths.160 TLR4 also contributes to the onset and progression of other illnesses such as cancer, 
atherosclerosis, osteoarthritis, and Alzheimer’s disease.161,162 Because of its importance to human 
health, TLR4 is an immune protein of major interest. 
TLR4 is only able to sense LPS by forming a heterodimer with a cofactor protein 
myeloid differentiation factor-2 (MD-2).36 In the absence of LPS, MD-2 forms a stable 
heterodimer with the extracellular domain of TLR4 and its expression is important for the correct 
distribution of TLR4 to the cell membrane.163 The beta-cup structure of MD-2 creates a 
hydrophobic pocket which accommodates the hydrophobic fatty acyl chains of LPS in crystal 
structures.41,42,46 MD-2 also forms the interface for dimerization required for TLR4 activation 
and its positioning of LPS inside the binding pocket is critical42,164 (Fig 3.1A-B). 
 74 
 
Figure 3.1. Current knowledge of the evolution of TLR4/MD-2 LPS specificity. A) TLR4 is 
shown in gray/white; MD-2 in cyan/blue. LPS (to the right of arrow) induces dimerization of 
TLR4/MD-2, triggering inflammation. B) LPS binds in a deep pocket in MD-2, creating a new 
dimerization surface (yellow). Right panel makes front TLR4 transparent to reveal interface. C) 
Structure of LPS from E. coli. D) Phylogenetic tree showing the evolution of TLR4/MD-2 with 
known activities of extant species. We aimed to characterize agonist specificity of TLR4 from 
extant species that would provide insight for species-specific TLR4 ligand specificity (sharks 
have no TLR4) and three ancestors (gray circles). 
 75 
The transfer of LPS into the binding pocket of MD-2 is catalyzed by the presence of LPS-
binding protein (LBP) and cluster of differentiation 14 protein (CD14). LBP is able to bind LPS-
rich surfaces like bacterial membranes, somehow altering the membrane, which permits CD14 to 
bind monomeric LPS.165–167 CD14 shields the hydrophobic acyl chains as it chaperones the LPS 
molecule to the binding pocket of MD-2.168 CD14 seems to be important in TLR4’s detection of 
LPS, but it may not be necessary. CD14-deficient mice are resistant to doses of LPS that are 
lethal to wild-type mice or would induce cytokine expression, but still respond to high doses of 
LPS.169 CD14 likely serves a more complex role in LPS signaling than simply LPS 
chaperone.166,167,170 
Ligand-binding drives the dimerization of two TLR4/MD-2 complexes, which brings the 
two TLR4 intracellular TIR domains together to act as a scaffold for adaptor proteins involved in 
MyD88 and TRIF signaling170,171 (Fig 3.1A). MyD88-mediated TLR4 signaling occurs mainly at 
the plasma membrane and results in the activation of transcription factor NF-κB and induction of 
proinflammatory cytokines like TNFα and IL-6. TRIF-mediated signaling occurs at the 
endosomal membrane after internalization of TLR4, which further activates IRF3 and the 
production of type-1 interferons and other IRF3-dependent genes, as well as delayed NF-κB 
activation.170,172,173 Structural modifications to LPS have been demonstrated to differentially 
activate these pathways.174–176 
LPS is an essential structural feature of Gram-negative bacteria outer membranes.22 It 
exhibits a high degree of structural diversity but generally consists of three components: a highly 
variable O-antigen, a less variable core oligosaccharide, and a highly conserved lipid A (Fig 
3.1C). The lipid A moiety of LPS is the structural feature recognized by TLR4/MD-2 and 
therefore accounts for most of the immunostimulatory effects of LPS.177 Structural and 
 76 
functional analyses show that the most proinflammatory forms of lipid A, which are generally 
purified from Escherichia coli and Salmonella strains, have two phosphate groups and six fatty 
acyl chains each with 12-14 carbons.34 Lipid A with more or less fatty acyl chains, longer chains, 
or a single phosphate group are typically less active and can act as antagonists of toxic LPS.34,35 
Bacteria have built in pathways to modify their lipid A structures in response to changing 
environmental conditions. Modifications occur through constitutive and regulated processes in 
response to external stimuli including changes in growth condition (temperature, nutrient, 
osmolarity), host detection (conversion of host immune agonist to antagonist), and antimicrobial 
molecules (modulation of surface exposed negative charge, deacylation of their outer 
membrane.178 Several studies of pathogens converge on the theme that alteration to lipid A is a 
common virulence strategy adopted by bacterial pathogens to evade host innate immune 
detection.49,51–56,170 There are also examples of the order Bacteroidales, a human gut commensal 
bacteria, that produce tetra- and penta-acylated LPS that can silence TLR4 signaling for the 
whole microbial community, potentially facilitating host tolerance of a healthy adult 
microbiome.47,48 
To understand the implications of this mutable host-microbe interface and how to develop 
treatment strategies for infections and disease requires leveraging studies in model systems. 
There are many inflammatory disease models in zebrafish (Danio rerio) which have been widely 
employed in immune system, host-microbe interaction, and drug discovery studies. Zebrafish are 
uniquely advantageous for research in these fields because as larvae they are optically 
transparent, enabling imaging in live organisms, and their microbiota can be easily manipulated. 
TLR4-induced inflammation has not yet been included in these models. Zebrafish have been 
shown to have an inflammatory response to LPS. However, the zebrafish response to LPS is 
 77 
much weaker than in humans or mice. It was widely accepted that this was due to gene loss of 
the TLR4 cofactors MD-2 and CD14.179 
Recently, the zebrafish MD-2 gene was discovered and shown to be expressed in immune 
cells.101 Furthermore, the zebrafish TLR4/MD-2 complex can be activated by LPS in vitro, 
although CD14 is required. MD-2 mutant zebrafish exhibited perturbed transcriptional responses 
to LPS challenge but did not show improved tolerance to LPS-induced death as observed for 
mice.101 These critical findings suggest that TLR4/MD-2 could play a role in LPS sensing in the 
zebrafish, but other pathways are likely also involved. This work has paved a way towards 
developing the zebrafish model of TLR4 inflammation.   
Here, we use an evolutionary lens to try to better understand the role of zebrafish TLR4 
in innate immunity. It has been shown that mammalian TLR4 is lowly responsive to “foreign” 
LPS from the deep sea Moritella genus of bacteria potentially due to acyl chain length.180 The 
authors posit that pattern recognition strategies may be defined by local environment rather than 
universal threats. We hypothesized, therefore, that differences in evolutionary pressures, like 
distinctive pathogenic bacteria present in a terrestrial versus tropical freshwater aqueous 
environment, could have led to divergent LPS specificities in zebrafish and mammals. To test 
this, we employed a broad range of LPS variants in a functional assay against zebrafish 
TLR4/MD-2 and discovered that the complex is uniquely sensitive to tetra-acylated lipid A. We 
also found that the lineage of teleost fish including zebrafish evolved a novel MD-2 C-terminal 
peptide that is essential for TLR4 signaling in functional assays. 
Next, we tested whether the heightened sensitivity to tetra-acyl lipid A we found for 
zebrafish TLR4/MD-2 in vitro would translate to a stronger immune response in the zebrafish 
immune system in vivo compared to previous studies. On the contrary, we find no elevated 
 78 
immune response to tetra-acyl lipid A relative to the vehicle control or when fish were pre-
treated with a TLR4-specific inhibitor. We determine that although zebrafish maintain functional 
copies of TLR4 and MD-2 with the ability to recognize and respond to various LPS structures, 
there is significant divergence in the human and zebrafish immune response to LPS. More 
studies will need to be done to define what role TLR4 might play in zebrafish innate immunity. 
We remained curious about the evolutionary origin of the difference in human and 
zebrafish TLR4/MD-2 specificity observed in vitro. Does the zebrafish state represent an 
ancestral state that was modified along the tetrapod lineage? Or did teleost fish lose high 
ancestral activity? To explore the specificity of evolutionary intermediates between human and 
zebrafish, we selected key modern species and reconstructed ancestors to compare in a functional 
assay (Fig 3.1D). We used ancestral sequence reconstruction to infer the ancestral states for 
TLR4 and MD-2 protein sequences and then resurrected these proteins for our assay. We find 
that other fish, amphibians, and early vertebrate ancestral complexes exhibit low sensitivity to all 
ligands tested. We infer that the zebrafish TLR4 lineage evolved heightened LPS sensitivity with 
unique specificity for tetra-acyl LPS. Interestingly, the tetrapod ancestor is highly active in the 
absence of ligand and can be stimulated with all ligands tested. This suggests that tetrapods may 
show increased TLR4 stimulation relative to early branching vertebrates due to sequence 
changes on the evolutionary trajectory from the ancestor of bony vertebrates to tetrapods.  
 
RESULTS 
Zebrafish TLR4/MD-2 is potently activated by tetra-acyl LPS in vitro 
We started by assessing differences between human and zebrafish TLR4 specificities for 
structural variations of the lipid A portion of LPS. Depending on the bacterial species, the lipid A 
 79 
moiety can have between four and eight acyl chains, each with different abilities to activate 
TLR4/MD-2.181 Loes et al. revealed mild activity when zebrafish TLR4 was challenged with 
hexa-acylated LPS molecules in vitro, and similar results for hexa- and hepta-acylated LPS 
challenge in vivo by immersion or cardiac ventricular injection.101 
We challenged human and zebrafish TLR4/MD-2 complexes in an in vitro functional 
assay with commercially available LPS variants. These are generally complex mixtures of LPS 
structures, so we report our findings based on the most abundant LPS structure stated by the 
supplier. We used Salmonella enterica Typhimurium LPS ((L7)-LPS-ST) to represent our hepta-
acyl chain variant, Escherichia coli K12 LPS ((L6)-LPS-EK) is our hexa-acylated variant with 
an O-antigen, Rhodobacter sphaeroides LPS ((L5)-LPS-RS) for a penta-acylated structure, and 
synthetic lipid IVa ((L4)-lipid IVa) to represent tetra-acylated LPS (Fig 3.2).  
For the TLR4 functional assay, we transfected HEK293T cells with plasmids containing 
either the human or zebrafish TLR4 complex components under constitutive promoters, as well 
as a luciferase reporter gene under control of the NF-κB transcription factor. Since zebrafish do 
not have a CD14, we included a mouse CD14 plasmid which confers zebrafish TLR4 the greatest 
sensitivity to LPS.101 HEK293T cells do not endogenously express the TLR4 complex but they 
do have the capacity to mount an NF-κB-mediated response to TLR4 activation. The following 
day, we treated cells with each of the LPS variants described, incubated the cells in treatment 
media for four hours to allow a robust transcriptional response, and then measured the amount of 
luciferase enzyme activity associated with each condition. The quantity of luciferase enzyme 
should be directly proportional to the level of NF-κB activation initiated by TLR4. To account 
for differential expression of immune receptors on the surface of cells and activation capacity, we 
 80 
normalize the observed signal to that of (L6)-LPS-EK for human TLR4 and (L4)-lipid IVa for 
zebrafish TLR4, unless otherwise noted.  
Our in vitro experiments (Fig 3.2) show that zebrafish TLR4/MD-2 exhibits a robust 
response to challenge with tetra-acyl lipid IVa, a low-level response to hexa-acyl LPS-EK, and 
little to no activity in the presence of hepta- and penta-acylated LPS variants. This specificity 
contrasts human TLR4, which is inhibited by lipid IVa and strongly activated by both hexa- and 
hepta-acylated LPS.  
 
Figure 3.2. Revealing differences in LPS specificity between human and zebrafish TLR4 
complexes. In vitro NF-κB activation by human TLR4 (blue) and zebrafish TLR4 (orange) 
challenged with LPS variants. LPS variants include a gradient of lipid A acyl chain number, from 
7 acyl chains (left) to 4 acyl chains (right). NF-κB activity level is shown relative for each 
species: LPS-EK for human, lipid IVa for zebrafish. These four LPS variants cover a range of 
chemical features and are readily available commercially. Yellow indicates additional acyl chains 
relative to lipid IVa. Human TLR4 was treated with 0.1 ng/uL LPS-ST and LPS-EK. All other 
conditions show treatments at 1 ng/uL LPS. Error bars indicated standard error of the mean 
across several experiments. 
 81 
Zebrafish have three TLR4 ohnologs (gene duplicates originating from whole genome 
duplication): tlr4ba, tlr4bb, and tlr4al which are all expressed in immune cells of larval fish.101 
Our previous experiment used the tlr4ba protein because Loes et al. had determined it was the 
only ohnolog that could respond to (L6)-LPS-EK. Now that we had found a more potent agonist 
of tlr4ba, we investigated whether the other zebrafish TLR4s or heterocomplexes of the ohnologs 
could be activated by LPS variants (Fig 3.3A-B). Slight variations to transfection conditions and 
LPS treatment concentrations between the experiments shown in 3.3A and B are noted in the 
figure caption. Our preliminary experiments did not identify agonists for homocomplexes of 
tlr4bb or tlr4al. They did, however, suggest that tlr4bb may enhance the signal of tlr4ba when co-
expressed in vitro. Although this may play a physiological role in the fish, we did not further 
pursue characterization of tlr4ba/bb complexes because tlr4bb appeared to enhance sensitivity 
without altering specificity of tlr4ba. Further analysis should be done to determine whether this 
initial observation is physiologically relevant, or if other heterocomplexes could be informative 
of zebrafish TLR4 agonist specificity. In the rest of the chapter, we will use zebrafish tlr4ba and 
TLR4 interchangeably. 
Discovering an agonist of zebrafish TLR4 brought up several questions including: What 
structural features of zebrafish TLR4/MD-2 confer this altered specificity? Is this specificity 
reminiscent of the vertebrate ancestral TLR4, or did it evolve on the zebrafish lineage? Can we 
use this agonist to further probe ambiguity in the role of zebrafish TLR4 in innate immunity? 
Can this difference in specificity explain previous observations of low-level immune responses 
of zebrafish to challenge with LPS-ST and LPS-EK?101 
 82 
Figure 3.3. Zebrafish ohnologs tlr4bb and tlr4al do not respond on their own to LPS 
variants in vitro. NF-κB activation of human and zebrafish TLR4 paralogs in response to LPS 
variant challenge. LPS variants include a gradient of lipid A acyl chain number, from 7- to 4-acyl 
chains, indicated by colors in the legends. HEK293T cells were transfected with human TLR4 or 
zebrafish TLR4 ohnologs: tlr4ba, tlr4bb, or tlr4al. A) Zebrafish complexes were made by 
transfecting TLR4:MD-2:mouse CD14 with 25:20:1 ng plasmid/well. All LPS concentrations 
were 2 ng/μL, except human TLR4 treated with LPS-EK was at 0.2 ng/μL. B) Similar to panel A 
but including a co-transfection with tlr4ba and tlr4bb (far right) and a 7-acyl chain LPS (purple). 
For this experiment, zebrafish complexes were made by transfecting TLR4:MD-2:human CD14 
with 25:25:1 ng plasmid/well. All LPS variants were used at 0.2 ng/μL. For both panels, NF-κB 
activity level was buffer-subtracted and is shown normalized by species: human TLR4 signal is 
normalized to human TLR4 treated with LPS-EK; zebrafish TLR4 signal is normalized to 
zebrafish tlr4ba treated with lipid IVa. Error bars indicate standard deviation for technical 
triplicates of a single experiment. 
 
A subset of teleost fish evolved a functionally necessary MD-2 C-terminal peptide 
 We first sought to understand the structural origin of these divergent LPS specificities. 
We used topiary182, the bioinformatic phylogenetics pipeline discussed in Chapter II, to gather 
TLR4, MD-2, and their paralog amino acid sequences from the NCBI database, sampling a wide 
taxonomic spread, and then aligned these sequences. In our multiple sequence alignment 
generated for MD-2 there was an anomaly in Cyprinidae, a family of ray-finned fish including 
zebrafish. At the C-terminus, there was an extension of roughly ten amino acids (Fig 3.4C).  
We used the AlphaFold2 structure prediction tool183–185 to predict the structure of the 
zebrafish TLR4 extracellular domain in complex with MD-2, with and without this C-terminal 
peptide (Fig 3.4A). At least in the absence of LPS, the C-terminal peptide was predicted to form 
 83 
a flexible linker connected to a small amphipathic alpha helix that fits snuggly inside the 
hydrophobic LPS-binding pocket. The crystal structure of human TLR4/MD-2 bound to E. coli 
LPS-Ra, which is similar to LPS-EK but has no O-antigen, is shown on the left in Fig 3.4A. 
TLR4 is in cyan, MD-2 is yellow, the LPS core oligosaccharide and lipid A diglucosamine group 
are shown in orange and the hydrophobic acyl chains in pink. Comparing this structure to the 
predicted zebrafish TLR4/MD-2 structure to the right reveals the peptide in the LPS-binding 
pocket of MD-2.   
The predicted zebrafish TLR4/MD-2 structure is colored by pLDDT to show per residue 
confidence in the structure, with high confidence in red and low confidence in blue. Most of the 
TLR4 leucine rich repeat (LRR) domain was predicted with high confidence, which is reasonable 
given there are many crystal structures of LRR domains. MD-2 was also predicted with high 
confidence except at the C-terminus which consists of the full C-terminal extension (amino acids 
GGNKSFFSPQIGRL). We can’t be certain that this peptide sits within the binding pocket, but 
zooming in on the peptide shows that there are nonpolar groups that could associate with the 
inside of the MD-2 beta-cup while exposing hydrophilic residues to the solvent (Fig 3.4B).  
Because the C-terminal peptide is at the opening of the MD-2 binding pocket, we 
hypothesized that it could be a structural feature of zebrafish MD-2 that defines LPS acyl chain 
number specificity. To test this hypothesis, we made a series of MD-2 C-terminal truncation 
mutants (Fig 3.4C). We chose two cut sites that seemed relevant to other species in the MD-2 
alignment and the predicted structure. We made a cut after residue 139 (139Δ) to mimic the frog 
sequence in the dataset and eliminate the amino acids associated with low structural confidence. 
The second truncation was after position 142 (142Δ) to match the chicken, mouse, and human 
proteins which have been well characterized. We made three additional distal truncations that 
 84 
included two more amino acids in each step to evaluate hydrophobic and length requirements of 
the peptide. Nomenclature refers to the amino acid position after the signal peptide is cleaved.  
We transfected cells with zebrafish TLR4 and either full-length (FL:154 amino acids) 
MD-2 or one of the truncation mutants and challenged them with (L7)-LPS-ST, (L6)-LPS-EK, or 
(L4)-lipid IVa. We included a wild-type human complex transfection as a positive control for 
LPS variant treatments. Our results show that the entire zebrafish MD-2 C-terminus is necessary 
for zebrafish TLR4 activation (Fig 3.4D). The signal from zebrafish TLR4/MD-2_154 (FL) is 
not rescued by any of the truncation mutants, and we do not see emergence of new specificity. 
We were curious, then, if this loss-of-function is specific to zebrafish TLR4. 
We tested whether human TLR4 would differentially respond to LPS challenge when 
expressed with either full-length or zebrafish MD-2_142Δ mutant (Fig 3.4E). We transfected 
cells with human TLR4, wild-type or truncated zebrafish MD-2, and either human or mouse 
CD14 and treated with LPS variants (Fig 3.4E). The human TLR4/wild-type zebrafish MD-2 
complex had not yet been characterized in our assay. In Figure 3.4E we show that human TLR4 
expressed with full-length zebrafish MD-2 and human CD14 mounts little to no immune 
response to LPS-EK or lipid IVa. However, in the presence of mouse CD14 there is some signal 
in response to LPS-EK. Intriguingly, zebrafish MD-2_142Δ conferred a slight increase in human 
TLR4 response in the presence of both human and mouse CD14. This suggests the peptide is not 
necessary for zebrafish MD-2 to bind LPS for TLR4 signaling but is specifically required for 
zebrafish TLR4. We infer that cyprinid TLR4 and MD-2 have co-evolved to make this C-
terminal extension functionally required for activation of TLR4 in this family of fish. We hope to 
further investigate the role of the cyprinid C-terminal peptide in TLR4/MD-2/LPS dimerization. 
 85 
 
Figure 3.4. The C-terminal peptide of fish MD-2 is necessary for zebrafish TLR4 signaling. 
A) The crystal structure of human TLR4 (cyan)/MD-2 (yellow) bound to E. coli LPS-Ra (PDB: 
3FXI; left) compared to the AlphaFold2 predicted structure of zebrafish TLR4/MD-2 (right). The 
predicted structure is colored by pLDDT, the confidence of the prediction per residue, from high 
to low confidence (red to blue, respectively). B) The zebrafish MD-2 predicted structure from 
panel A, zoomed in and twisted to look down into the LPS-binding pocket. Residues of the C-
terminal extension are shown as sticks with polar groups labeled. C) A set of taxonomic 
representatives (tree to the left) from the MD-2 alignment showing the zebrafish C-terminal 
peptide and where we made mutant truncations (scissors). D) NF-κB activated by zebrafish 
TLR4 co-expressed with MD-2 truncation mutants in response to LPS challenge (see legend). 
Truncations are ordered from least amino acids removed the most severe truncation (left to 
right). FL indicated full-length MD-2 and human TLR4 complex is included for comparison 
(left). E) NF-κB activated by human TLR4 co-expressed with full-length or 142Δ mutant 
zebrafish MD-2 and either human or mouse CD14, then challenged with LPS (see legend). The 
full human complex is included as a control on the left. To the right, zebrafish TLR4 with full 
length MD-2 shows ligand sensitivity in the presence of either human or mouse CD14. All NF-
κB activity is normalized by species: human TLR4 signal is normalized to the wild-type complex 
treated with LPS-EK; zebrafish TLR4 is normalized to zebrafish TLR4/MD-2 with mouse CD14 
treated with lipid IVa. Error bars show standard deviation for technical triplicates. 
 
  
 86 
Live zebrafish exhibit reduced immune response to lipid IVa compared to E. coli LPS 
Zebrafish have been shown to have a high tolerance to challenge with hexa- and hepta-
acylated LPS but succumb to high doses.101,186 We suspected that this high dose is required either 
because the hexa- and hepta-acylated LPS molecules are poor zebrafish TLR4 agonists, or 
because commercially available LPS have low-level lipoprotein and peptidoglycan contaminants 
that can activate the MyD88-mediated inflammatory response via TLR2. Based on our in vitro 
findings, we hypothesized that if zebrafish TLR4 was playing a major role in the immune 
response to LPS in these previous experiments then we would see a stronger immune response to 
challenge with lipid IVa, an ultrapure and potent zebrafish TLR4 agonist in vitro. 
We used two different LPS delivery routes to probe this question: microgavage into the 
gut and hindbrain injection into the circulatory system. We did not try delivery by submersion 
which is often used to test survival to high dose LPS due to the high cost of synthetic lipid IVa. 
To test the immune response dependence on TLR4, we used the TLR4-specific inhibitor TAK-
242 (resatorvid) that has been used in clinical trials.187,188 We had also hoped to use both standard 
purification and ultrapure E. coli LPS to test whether we would see a difference in immune 
stimulation, but for lack of time we have not tried this yet. We used fish at 6 days post-
fertilization (dpf) because TLR4 was shown to be upregulated in immune cells starting at 5 
dpf.101 
We first challenged 6 dpf fish by oral microgavage which involves moving a blunt needle 
tip into the fish mouth and down its throat to the top of the gut bulb where we dispensed 4.6 nL 
of 1 mg/mL standard purification (L6)-LPS-0111:B4, (L4)-lipid IVa, or vehicle treatment (Fig 
3.5A). For this experiment we tracked the immune response by using a fish line with GFP under 
the TNFα cytokine promoter (tnfα:GFP) and mCherry-marked macrophages (mpeg:mCherry). 
 87 
We imaged live fish 6 hours post-gavage using fluorescence stereo microscopy to assess the 
location and abundance of mCherry and GFP. After imaging, fish were immediately sacrificed 
and fixed in paraformaldehyde for subsequent staining and quantification of neutrophils 
associated with gut tissue.  
Figure 3.5A shows a representative image of GFP signal at the distal gut region of a fish 
treated with LPS-0111:B4. For each fish, we quantified the total GFP intensity and the quantity 
of GFP-positive (GFP+) pixels in the distal gut and divided the total GFP intensity by number of 
GFP+ pixels. Figure 3.5B shows the total GFP intensity averaged across all GFP+ pixels 
normalized to vehicle treated fish from two experiments. Unexpectedly, there was a significant 
decrease in tnfα:GFP expression for fish gavaged with LPS-0111:B4 relative to vehicle treatment 
(p value: 0.043). Lipid IVa treated fish did not appear different from control or LPS-0111:B4 
treated fish. The mCherry signal was too low for quantification. Figure 3.5C shows the 
neutrophil quantification of dissected gut tissue from one experiment. Because this quantification 
was subjective, we had two individuals do treatment-blind neutrophil counting. Counter 1 (blue) 
did not observe any difference between treatments. Counter 2 (orange) observed significant 
increases for both LPS-0111:B4 and lipid IVa treated fish relative to vehicle treatment (p values: 
0.029 and 0.008, respectively). No difference was observed between (L6)-LPS-0111:B4 and 
(L4)-lipid IVa conditions. 
The results of our microgavage experiments indicate that, relative to vehicle treatment, 
hexa-acylated and tetra-acylated LPS delivered directly to the gut do not increase TNFα 
expression in the distal gut but may play a role in neutrophil infiltration to gut tissue (Figure 
3.5B-C). There were no observable differences prompted by different LPS variants. Because 
 88 
these results did not match previous observations of increased inflammatory responses to LPS-
EK, we turned to a different LPS delivery method. 
Injection of 5 ng E. coli LPS into the brain tectum of 4 dpf larval fish has been shown to 
result in the systemic distribution of LPS and induction of liver-associated immune responses.189 
Neutrophil and macrophage infiltration into the liver after LPS injection was orchestrated by a 
MyD88-dependent inflammatory response.189 An experimental system with systemic LPS 
distribution through the bloodstream seemed that it would better model the role of TLR4s in 
sepsis compared to LPS delivery to the gut that is primed with modes for detoxifying LPS.186 
Like before, we hypothesized that if TLR4 plays a role in the zebrafish immune response to LPS, 
then injecting larval fish with lipid IVa would cause a stronger immune response than when 
treated with LPS-EK.  
For our injection experiment, we chose to test TLR4-specific activity using an orthogonal 
approach by pre-treating fish with the TLR4-specific inhibitor, TAK-242 (resatorvid).190 To our 
knowledge, the effects of TAK-242 on zebrafish or zebrafish TLR4 have not previously been 
reported. We first confirmed that TAK-242 knocks down zebrafish TLR4 signaling in vitro (Fig 
3.5F). Then we tested the effects of a range of TAK-242 concentrations on larval zebrafish. We 
did not anticipate that TAK-242 would have detrimental health effects because there are none 
reported for mouse or human. We found, however, that submersion in 83 μM or more TAK-242 
in embryonic media was toxic to 5 dpf fish. These fish showed signs of yolk sac edema and 
tissue degradation. Fish treated at 8.3 μM or less TAK-242 showed no signs of toxicity when 
assessing heart rate, behavior, and general appearance. Our in vitro work showed that short 
treatments with 1 μM of TAK-242 is sufficient to block subsequent immune stimulation of 
 89 
zebrafish TLR4 by lipid IVa. We inferred that 8.3 μM TAK-242 taken up by fish through flask 
water over 24 hours would be sufficient to block zebrafish TLR4 signaling. 
 
Figure 3.5. Zebrafish response to (L6)-LPS-EK and (L4)-lipid IVa challenge via 
microgavage and hindbrain injection. A-C) Immune response 6 hours after challenge with 
LPS via microgavage in 6 dpf live fish with transgenes tnfα:GFP and mpeg1:mCherry 
(macrophages). A) A representative fluorescence microscopy image of GFP signal in the distal 
gut after challenge with LPS-0111:B4. B) Plotting the results of two independent experiments. 
Each datapoint represents the total GFP signal divided by the quantity of GFP+ pixels in the 
distal gut region of a single fish and normalized to the average for vehicle-treated controls. C) 
Gut-associated neutrophils counted by two individuals (blue and orange) in a single experiment. 
An “X” indicates the mean value for each treatment. The asterisk denotes a significant difference 
determined by a t test; p values indicated on graphs. D-E) Hepatic immune response 3-5 hours 
after challenge with LPS via hindbrain injection in 6 dpf live fish with transgenes tnfα:GFP and 
mpx:mCherry (neutrophils). D) A representative maximum intensity projection of a mock-treated 
fish (vehicle) 4 hours post-injection (hpi) with the liver region used for analysis outlined with 
white dashes and neutrophils labeled with 1 μm spheres. E) Neutrophil counts in the liver of fish 
across two experiments 3-5 hpi with LPS-EK, lipid IVa, or vehicle (PBS). All fish were pre-
treated for 24 hours either with TLR4-specific inhibitor (TAK-242) or with volume-matched 
mock treatment (DMSO). F) HEK293T cells expressing either human or zebrafish TLR4 
complexes and treated with agonist (LPS-EK for human; lipid IVa for zebrafish) with or without 
pre-treatment with TLR4-specific inhibitor (TAK-242).  
 
 90 
We pretreated 5 dpf fish by submersion in either 8.3 μM TAK-242 in DMSO or an equal 
volume of DMSO alone (final concentration of 0.003% DMSO) in embryonic media for 24 
hours. The following day, 6 dpf fish were anesthetized then injected with ~4.2 nL of 2.17 ng/nL 
LPS-EK, lipid IVa, or ultrapure PBS control into the brain tectum. After injection, fish were 
rinsed and recovered in fresh embryonic media until imaging. We imaged fish 3-6 hours post 
injection (hpi) using fluorescence light sheet microscopy. The fish strain used had transgenes 
tnfα:GFP to report on TNFα cytokine expression and mpx:mCherry to mark neutrophils. We 
collected 5 sets of fluorescent z-stack images over the liver region through the full width of each 
fish to evaluate neutrophil infiltration of the liver. Fig 3.5D shows the maximum intensity 
projection in GFP and mCherry channels for one set of z-stack images of a mock-treated fish 
imaged 4 hpi.  
We are currently analyzing these data and plan to compare the number of neutrophils 
(pink) associated with the liver (white dashed line) between conditions. Our preliminary 
observations in Fig 3.5E indicate there are no statistically significant differences between any 
treatment. From visual observations of data not yet analyzed, there might be increased neutrophil 
infiltration of the liver for fish treated with standard purification E. coli LPS ((L6)-LPS-EK) and 
decreased levels for fish treated with (L4)-lipid IVa compared to other conditions. If these 
observations are supported by our analysis, we would be interested in investigating whether 
TAK-242 inhibits immune stimulation by standard purification E. coli LPS to determine if this 
response is TLR4-dependent. We would also like to test whether ultrapure hexa-acylated LPS, 
which lacks potential agonists of other immune receptors, is an agonist or antagonist for the 
zebrafish immune response.191 This would further shed light on our conflicting observations in 
vitro and in vivo for lipid IVa-induced TLR4 activation. 
 91 
These investigations of the in vivo role of zebrafish TLR4 support the impression that the 
human and zebrafish innate immune response to LPS are quite different. However, we find it 
fascinating that the evolution of zebrafish and other species with low LPS sensitivity would 
maintain the TLR4 complex genes for hundreds of millions of years if not to use them. If these 
the lineage leading to zebrafish has not used TLR4 throughout evolution, then it is astonishing to 
find that the zebrafish TLR4 and MD-2 sequences have not diverged so much to abolish the 
ability to activate in the presence of LPS. To understand this evolutionary history better, we 
investigated whether there are evolutionary trends in TLR4 ligand specificity or sensitivity 
across previously uncharacterized early diverging vertebrate species.  
 
Zebrafish and human TLR4 evolved from an ancestor with low LPS sensitivity 
To explore the sensitivity and specificity of TLR4 evolutionary intermediates between 
human and zebrafish, we selected key modern species and reconstructed ancestors to compare in 
a functional assay (Fig 3.6A). We used topiary182, an ancestral sequence reconstruction pipeline, 
to infer the most probable amino acid sequences of TLR4, MD-2 and CD14 from the ancestors of 
tetrapods (ancTetrapod), bony vertebrates (ancBonyVert), and teleost fish (ancTeleost) based on 
the protein sequences from hundreds of modern species. José Sánchez-Borbón, a fellow graduate 
student in the Harms lab, has made significant progress in identifying candidate teleost CD14 
proteins and was the one to do the CD14 ancestral sequence reconstruction with these protein 
sequences. I will not elaborate on his findings here, but we plan to co-author a manuscript with 
our reconstructed ancestral data. 
 92 
 
Figure 3.6. Characterization of TLR4 activity for reconstructed early vertebrate ancestors 
and modern fish and amphibian sequences. The phylogenetic tree on the left demonstrates 
evolutionary relationships between modern species (black) and ancestors (red) with their TLR4 
complex activities compared in the graph to the right. NF-κB activity measurements of the 
response to LPS-EK (dark blue), LPS-RS (orange), and lipid IVa (green) are shown stacked for 
each complex. Because pike and zebrafish are not known to have CD14, we included CD14 from 
human (h), mouse (m), chicken (c), or frog (f) to assess TLR4/MD-2 function. NF-κB activities 
from the teleost fish and the bony vertebrate ancestor complexes are shown relative to zebrafish 
TLR4/MD-2 with mouse CD14 in the presence of lipid IVa. Activities from all tetrapod 
complexes are shown normalized to the human TLR4 complex treated with LPS-EK. Human and 
mouse complexes were treated with 0.1 ng/uL LPS-EK (!). All other treatments were done with 1 
ng/uL LPS variant. There was no data collected for caecilian TLR4 challenged with LPS-RS.  
 
We provide a brief overview for our selection of modern species: Zebrafish are a species 
of teleost within the Otocephala clade. We selected the northern pike (Esox lucius) to represent 
the sister teleost clade, Euteleostei. Teleost fish make up most of the modern ray-finned fish 
(Actinopterygii). The sister lineage of ray-finned fish is lobe-finned fish (Sarcopterygii), and 
together they form the clade of bony vertebrates (Euteleostomi). The lobe-finned fish 
descendants include Tetrapoda, which is comprised of Amphibia and Amniota. We selected two 
amphibians, the African clawed frog (Xenopus laevis) and the two-lined caecilian (Rhinatrema 
bivittatum), to represent two distinct clades of Amphibia: the Salientia and the Gymnophiona, 
 93 
respectively. We directly compare our findings to the human, mouse, opossum, and chicken 
complexes as representative amniotes in the functional assay, which have been well studied 
elsewhere.192 Sequences for these species were used in the TLR4 and MD-2 ancestral sequence 
reconstructions. For zebrafish and pike, which do not have CD14, we supplemented these 
complexes with a CD14 from a different organism in the functional assay. 
The functional assay results indicate that low-level LPS sensitivity was present in the 
bony vertebrate and teleost ancestors and high sensitivity to LPS evolved before the tetrapod 
ancestor (Fig 3.6; red labels). All ancestral complexes show slight specificity for hexa- and 
penta-acylated LPS over tetra-acylated LPS. The pike TLR4/MD-2 complex did not show 
activity under any condition. This does not match what we found for zebrafish or ancTeleost, 
suggesting that the lineage to pike TLR4/MD-2 lost LPS recognition. For the amphibians, the 
caecilian TLR4/MD-2/CD14 complex responds moderately to (L6)-LPS-EK and not at all to 
(L4)-lipid IVa. We have not yet tested its response to (L5)-LPS-RS. The frog TLR4/MD-2/CD14 
complex does not show activity in the presence of any ligand tested. These data suggest that 
amphibians have also experienced lineage-specific loss of function in TLR4 complexes.  
We wondered if complexes with low signal could alternatively be explained by a problem 
with our heterologous expression system. In addition to low NF-κB signal, we noticed that the 
HEK cells expressing these complexes often had low overall expression as measured by a 
reporter of constitutive protein expression. We considered that the structural interface of the TIR 
domain and immune signaling adaptors of early branching vertebrates may have diverged 
significantly from the human proteins expressed in HEK cells, and this mismatch could result in 
low immune stimulation as well as altered levels of protein expression. We tested whether 
creating hybrid TLR4s with human intracellular domains would allow us to better interrogate the 
 94 
ligand sensitivity and specificity of other species’ extracellular domains. We designed a vector 
with the human TLR4 signaling peptide, transmembrane domain, and TIR domain in which we 
could scarlessly insert the extracellular domain of any TLR4 between the signaling peptide and 
transmembrane domain.  
This strategy also offered itself to the investigation of the TLR4 and MD-2 paralogs, 
CD180 and MD-1. CD180 (also known as RP105) does not have a TIR domain but forms a 
complex with MD-1 that is able to bind lipid A and induce a distinct TLR heterodimer 
complex.193 CD180/MD-1 is thought to regulate LPS-induced TLR4 signaling.163,194,195 
Exploring the LPS specificity of CD180/MD-1 by providing a measurable output for ligand 
recognition would give us insight into the early evolution of TLR4/MD-2. 
We generated plasmids with the extracellular domains (ectodomains) of two zebrafish 
TLR4 ohnologs, tlr4ba and tlr4al, and zebrafish CD180 attached to the transmembrane and TIR 
domains of human TLR4. We transfected these hybrid plasmids with their corresponding 
coreceptors, MD-2 for TLR4s and MD-1 for CD180, as well as mouse CD14. We measured the 
relative NF-κB activity induced by hybrid zebrafish receptors in response to LPS-EK and lipid 
IVa (Fig 3.7A). Our results show similar LPS specificity of zebrafish tlr4ba in the context of 
zebrafish and human transmembrane and TIR domains, perhaps with slightly decreased 
sensitivity to LPS-EK when in the human context. Zebrafish tlr4al and CD180 hybrid proteins 
show minimal, if any, activation above background.  
 95 
 
Figure 3.7. Hybridizing the human TLR4 transmembrane and TIR domains to other 
species extracellular domains does not reveal function. NF-κB activity of hybrid TLR4 and 
CD180 proteins with the human transmembrane and TIR domains (hum-TM/TIR) in response to 
LPS-EK and lipid IVa. A) On the right, the ectodomains of zebrafish TLR4 ohnologs tlr4ba and 
tlr4al (zf tlr4ba-ecto and zf tlr4al-ecto, respectively) as well as the TLR4 paralog, CD180, from 
zebrafish (zf CD180-ecto) were expressed with the human transmembrane and TIR domain. B) 
The transmembrane helical register was shifted for human TLR4 and the frog hybrid proteins by 
adding or removing amino acids from the interface of the transmembrane and ectodomain. These 
are displayed from left to right as the longest to shortest helix (TM(+/- # amino acids)). The frog 
helical register mutants were expressed with frog CD14. Wild-type zebrafish, human, and frog 
TLR4s are included as controls. Error bars indicate standard deviation. 
 
We also created a hybrid frog TLR4 ectodomain/human transmembrane and TIR domain 
protein and found it did not show activity in the presence of frog or human CD14 (Fig 3.7B). To 
test if our hybrid proteins were nonfunctional because of an improper relative orientation of the 
ectodomain to TIR domain, Corinthia Brown made mutants of both human and hybrid frog 
TLR4s to shift the helical register of the transmembrane domains. There are 3.6 amino acid 
residues per turn of an alpha helix with each additional residue conferring a 100o rotation around 
the helical axis. We constructed mutants to sample each step of at least one full rotation of the 
helical transmembrane domain, allowing all possible relative orientations of the extracellular to 
intracellular domains. When we tested these proteins in the functional assay, we found that 
human TLR4 can accommodate at least up to a 200o rotation when adding amino acids at the 
interface between transmembrane and extracellular domain (Fig 3.7B). It can only tolerate a 100o 
 96 
rotation into the membrane, as seen by a loss of function when removing 2 amino acids. Overall, 
the human TLR4 helical register mutant data indicate that human TLR4 can assume any relative 
orientation of the extracellular and intracellular domains without breaking function. No mutant 
of the frog TLR4 hybrid was able to activate activity (Fig 3.7B). We conclude that the frog 
TLR4/MD-2 complex is unable to mount an immune response to LPS.  
 Overall, these hybrid TLR4 experiments indicated that heterologous expression of TLR4 
complexes in HEK cells does not drastically affect NF-κB signaling. We did note some 
improvement to the overall level of protein expression in the cells expressing the hybrid forms of 
TLR4 (data not shown). But this improvement only made us more confident in the measurements 
of low NF-κB signal. Therefore, we did not further investigate other low-sensitivity complexes 
under these conditions. 
 
Initial investigations into possible CD14 functional homologs in fish 
We worked to identify a molecule that could serve an LPS-transport role like CD14. 
CD14 is known to catalyze the delivery of LPS to the TLR4/MD-2 complex in mammals. It 
retrieves monomeric LPS from the extracellular space, usually with the help of the LPS-binding 
protein (LBP) that extracts LPS from the outer membranes of Gram-negative bacteria, and then 
transfers LPS to the TLR4/MD-2 complex. At the time of these experiments, CD14 was believed 
not to be present in fish; phylogenetic studies suggested CD14 evolved in tetrapods before the 
divergence of amphibians and amniotes. In this section of the chapter, I will present several 
preliminary investigations into possible LPS chaperone candidates in fish that I took point on. 
These experiments paved the way for current research being conducted by José Sánchez-Borbón, 
a fellow graduate student in the Harms lab. José and I are preparing a manuscript that consists of 
 97 
findings previously reported in this chapter as well as José’s discoveries from digging further 
into the evolutionary history of fish CD14.  
The following preliminary experiments were done while still optimizing our in vitro 
assay for studying the ligand specificity of zebrafish TLR4. Unfortunately, this has made direct 
comparisons between experiments difficult, but I will highlight important experimental 
modifications. Critical details that were altered include: 1) The amount of TLR4, MD-2, and 
CD14 plasmids transfected into cells. We noticed this contributed significantly to signal strength. 
The transfection amount will always be presented as TLR4:MD-2:CD14 with “1” equal to 1 ng 
of plasmid transfected per well. Transfections with human TLR4 were consistently at 10:0.5:1, as 
established previously.102 2) The concentration of LPS used to treat cells. Most often, 
experiments include a control with human TLR4 treated with 0.2 or 0.1 ng/μL (L6)-LPS-EK. 
There are a few instances that this did not occur. All other treatments were applied with 2 or 1 
ng/μL LPS. And 3) the CD14 used to catalyze the zebrafish TLR4 response to LPS. As I will 
describe below, a CD14 molecule has not been found in zebrafish but including CD14 drastically 
improves in vitro zebrafish TLR4 signaling.101 
Our previous experiments used human or mouse CD14 as these were reported to allow 
zebrafish tlr4ba signaling in vitro.101 Now that we had a potent agonist of zebrafish tlr4ba, we 
wondered if we could see activation in the absence of CD14. Our preliminary investigations 
showed no signal from zebrafish tlr4ba/MD-2 in the absence of CD14 (Fig 3.8A). This was 
probably due to using low and variable concentrations of lipid IVa around 0.2 ng/μL. Signal 
amplitude and reproducibility were greatly improved in later experiments by transfecting cells 
with a plasmid ratio of 10:20:1 (mouse CD14) and treating with 1 ng/µL lipid IVa that was 
ultrasonicated to disrupt micelles prior to the experiment (Fig 3.8B). José Sánchez-Borbón first 
 98 
noticed that zebrafish tlr4ba/MD-2 exerts low-level NF-κB activation independent of CD14 and 
is currently following up on it. 
We considered the possibility that the chaperone role of CD14 could be achieved through 
some other zebrafish protein. CD14 is a gene duplicate of vertebrate TLR2, which is present in 
fish. Like TLR4, TLR2 is an innate immune pattern recognition receptor. In mammals, TLR2 
binds a wide variety of bacterially associated ligands including acylated lipopeptides and 
peptidoglycan but does not initiate the inflammatory response to LPS.196 It can also 
heterodimerize with other TLRs to initiate immune responses. We hypothesized that CD14’s 
ability to chaperone LPS could stem from an ancestral trait of the TLR2-CD14 ancestor, and 
perhaps the fish lineage maintained this function in their TLR2. We did not find this to be the 
case (Fig 3.8A). We transfected HEK293T cells with increasing doses of human CD14 or 
zebrafish TLR2 plasmid and found that all concentrations of CD14 promoted tlr4ba activation, 
but no amount of zebrafish TLR2 plasmid induced LPS-sensitivity. We did not test whether 
TLR2 was being expressed on the surface of cells, therefore we have not proven here that 
zebrafish TLR2 lacks an LPS-chaperoning ability, but it seems unlikely given our data. 
We next tested whether CD180 with or without MD-1 could act as an LPS chaperone. 
The CD180/MD-1 complex is able to bind lipid A and is implicated in LPS-induced TLR4 
signaling.163,194,195 We transfected cells with combinations of human or zebrafish TLR4, MD-2, 
CD180, MD-1 and CD14 in the presence of LPS-EK and lipid IVa (Fig 3.8C). Our data indicate 
a possible role for human CD180/MD-1 to activate TLR4 in the absence of MD-2, but zebrafish 
TLR4 is not activated in any condition without both MD-2 and CD14. 
 
Figure 3.8 (next page). Zebrafish TLR2, CD180/MD-1, and human transferrin do not 
rescue the zebrafish TLR4 response to lipid IVa in the absence of CD14. 
 99 
 
 100 
Figure 3.8. Zebrafish TLR2, CD180/MD-1, and human transferrin do not rescue the 
zebrafish TLR4 response to lipid IVa in the absence of CD14. A) NF-κB activation of 
zebrafish TRL4 in response to lipid IVa or LPS-EK in the presence of candidate LPS chaperone 
molecules. We tested zebrafish TLR2’s ability to catalyze LPS delivery to zebrafish TLR4. 
Zebrafish TLR4/MD-2 was transfected at 25:20 ng plasmid/well with human CD14 or zebrafish 
TLR2 at plasmid concentrations of 0, 1, 5, or 10 ng per well. Cells were challenged with 0.2 
ng/μL lipid IVa or LPS-EK. For comparison, human TLR4 with and without CD14 is shown on 
the far right. Human TLR4 was treated with 0.002 ng/μL LPS variants. B) Zebrafish TLR4/MD-
2 challenged with high dose (1 ng/μL) lipid IVa in the presence or absence of CD14. In addition 
to increasing concentration, lipid IVa was ultrasonicated before treatment and cells were 
transfected with 10:20 TLR4:MD-2 with or without 1 ng mouse CD14 per well. C) Testing 
human and zebrafish CD180/MD-1 for LPS chaperone abilities. Zebrafish complex transfected at 
25:20:1, zebrafish CD180:MD-1 transfected at 15:15, and human CD180:MD-1 at 10:10. Human 
proteins treated were with 0.2 ng/μL LPS variants, whereas zebrafish proteins were treated with 
2 ng/μL LPS. For experiments in panels A-C, NF-κB activity level was buffer-subtracted and is 
shown normalized by species: human TLR4 signal is relative to human TLR4 treated with LPS-
EK; zebrafish TLR4 signal is relative to its signal in the presence of lipid IVa. D) Testing if 
transferrin can act as an LPS chaperone for zebrafish TLR4. We treated zebrafish tlr4ba/MD-2 
with 1 ng/μL lipid IVa in the presence of full length and proteinase K-digested transferrin 
peptides (green). Digestion products are displayed in increasing concentration from left to right 
(gradient). A value of “1” is equal to 71.5 nM of transferrin starting material that was 
subsequently proteolyzed. For comparison, human TLR4/MD-2/CD14 and zebrafish TLR4/MD-
2 with mouse CD14 were treated with LPS-EK and lipid IVa, respectively. For this experiment, 
the plasmid ratio for zebrafish tlr4ba:MD-2:mouse CD14 was 10:20:1 ng/well. NF-κB activity 
level was buffer-subtracted and all data normalized to human TLR4 signal in the presence of 0.1 
ng/μL LPS-EK. 
  
The final candidate we probed for an LPS delivery role was transferrin. Transferrin is an 
iron-binding molecule that can bind the lipid A moiety of LPS and is implicated in sequestering 
iron from microbes. Macrophages from goldfish, teleost fish closely related to zebrafish, express 
transferrin and transferrin proteases in response to the presence of microbes.197 Cleavage 
products of transferrin are shown to enable the activation of goldfish macrophage immune 
responses in the presence of LPS.198,199 One of these cleavage products shares sequence 
similarity with the LPS-binding site of mouse CD14. In addition, transferrin, likely when bound 
to the transferrin receptor, colocalizes with TLR4 during LPS-induced TLR4 endocytosis which 
 101 
is essential for robust immune activation by TLR4.200 The culmination of this evidence suggested 
that transferrin could serve at least an incidental LPS-transport role that CD14 has become 
specialized in. We assessed this possibility using full-length or proteinase K-digested human 
transferrin and tested for CD14-like functions in vitro. 
Our results show that full-length transferrin might have some capacity to help lipid IVa 
get to TLR4 (Fig 3.8D). This can be seen by comparing the lipid IVa-induced activity of 
zebrafish TLR4 in the presence of full-length transferrin (second green bar) to the control 
without transferrin (third green bar: “0”). This is roughly a 3-fold increase (0.09 to 0.28 units). 
Yet CD14 provides ~5.5 times better enhancement of signaling. While working on this, it seemed 
that transferrin and/or proteinase K would sometimes cause the cells to form liquid-like droplets 
on the plate surface. This was more extreme at higher concentrations. Although perhaps 
interesting, we did not pursue this further. 
Overall, we have learned from these experiments that zebrafish TLR2 and transferrin 
probably do not serve a homologous role to CD14 in zebrafish. However, this work established a 
basis for José’s future functional characterizations of other candidate fish CD14-like proteins.  
  
DISCUSSION 
 Extrapolating findings from studies in model organisms to human biology hinges on our 
ability to define homology between species. We have shown here that the zebrafish innate 
immune response to LPS is different than in humans. We are optimistic that the root of these 
differences can be revealed by the functional characterization of homologous proteins in vitro 
and in vivo. We have shown that ancestral sequence reconstruction is an excellent tool that can be 
applied to the interrogation of evolutionary trajectories between zebrafish and human proteins. 
 102 
We are left with several future directions to explore the zebrafish immune response to LPS, the 
role of zebrafish TLR4, and the origin of LPS recognition and specificity of TLR4/MD-2/CD14 
complexes. 
  
Zebrafish TLR4 ohnologs might play important physiological roles 
 Our study began with the observation that one of the three zebrafish TLR4 ohnologs, 
tlr4ba, can induce a robust inflammatory response when challenged with tetra-acylated LPS in a 
human cell-based assay (Fig 3.2 & 3.3). We observed that it may be possible for 
heterocomplexes of TLR4 ohnologs to amplify the immune response to LPS (Fig 3.3B). 
Heterocomplexes may play an important role in ligand sensitivity in the zebrafish. The three 
ohnologs of TLR4 likely arose from unequal gene gain and loss throughout multiple rounds of 
whole genome duplications in the lineage leading to zebrafish.201-202 Gene duplications can lead 
to functional diversification of genes, including gain or loss of function, or subfunctionalization 
of the original function between the duplicate proteins.203 These three genes have been 
maintained in the zebrafish genome for hundreds of millions of years and are all expressed at 
least in developing larval fish tissues.101 We think it would be an interesting avenue of 
investigation to uncover what these proteins do for zebrafish physiology and health.202 
 
Possible functional roles of the zebrafish MD-2 C-terminal peptide 
We initially predicted that the zebrafish C-terminal peptide played a role in determining 
the number of acyl chains that fit into the MD-2 binding pocket. Our data show that it is more 
complicated than this, rather, the peptide is specifically necessary for zebrafish TLR4 to mount 
an immune response to LPS (Fig 3.4D-E). We consider that the peptide might stabilize the 
 103 
dimerization interface of zebrafish TLR4 in a way that is unnecessary for human TLR4. It has 
been reported that when (L6)-LPS packs its acyl chains into the human MD-2 binding pocket, 
the lipid A diglucosamine backbone is displaced upwards by ~5Å relative to lipid IVa-bound 
structures. This positions the phosphate groups such that they can interact with the positively 
charged residues of two dimerizing TLR4s (Park et al., 2009).42 Perhaps there is also some 
unique structural feature of the zebrafish TLR4 dimerization interface that requires the C-
terminal peptide to correctly position the diglucosamine backbone and phosphates for productive 
dimerization. This interaction could be between TLR4 and LPS or between TLR4 and the LPS-
bound peptide.  
A similar but alternative hypothesis is that the peptide serves as part of the hydrophobic 
core with LPS acyl chains to form the dimerization interface. Ohto et al. proposed from their 
crystal structures that for productive TLR4 signaling, amino acids at the opening of the MD-2 
beta-cup must interact with at least a partial LPS acyl chain to form the dimerization interface 
between two TLR4/MD-2/LPS complexes.45 For dimerizing mouse TLR4/MD-2/lipid IVa 
complexes, the MD-2 Phe126 side chain shifts more that 4Å toward the MD-2 cavity where the 
Phe126 loop interacts with another MD-2 loop and a partially exposed chain of LPS to form the 
hydrophobic core of the dimerization interface. This core interacts with the hydrophobic patch on 
the dimerizing TLR4. Perhaps the zebrafish MD-2 peptide has evolved to serve this role at the 
dimerization interface. The peptide is positioned at the outer edge of the MD-2 beta-cup opening 
and has two phenylalanines that could potentially be involved in dimerization like the mouse 
MD-2 Phe126 loop. 
Understanding the functional role of the cyprinid MD-2 C-terminal extension would help 
clarify how this protein has evolved, and potentially why in vitro zebrafish TLR4/MD-2 exhibits 
 104 
contrasting specificity to the human complex. Our next steps would be to gain more detailed 
structural information about zebrafish and other cyprinid species TLR4/MD-2s. Did other 
cyprinid species evolve to recognize LPS, and if so, is their function and their ancestral states 
also dependent on the C-terminal peptide? We could alternatively make mutations to zebrafish 
TLR4 at sites that are predicted to interface with the peptide during dimerization. This kind of 
mutational analysis would benefit from structural simulations of dimerization or ligand binding. 
In conclusion, there are many ways to continue probing the difference between human and 
zebrafish TLR4 in vitro. 
 
Is TLR4 used in the zebrafish innate immune response to Gram-negative bacteria? 
Our in vivo results suggest that larval 6 dpf fish do not recognize or respond to TLR4 
agonists. It is unclear if standard purification LPS-induced immune responses previously 
reported are mediated by TLR4, TLR2, or a redundant LPS-sensing pathway. RNAseq data of 5 
dpf fish showed there are a small subset of immune cells that upregulate expression of zebrafish 
TLR4 ohnologs and MD-2. It is possible, then, that our 6 dpf studies were too early in fish 
development to assess functional consequences of TLR4 in immunity if this defense mechanism 
is still maturing. Or perhaps lipid IVa is not a strong agonist of zebrafish TLR4 in vivo. 
Alternatively, maybe zebrafish TLR4 can bind lipid IVa in vivo but there are other regulatory 
mechanisms to dampen the immune response. Zebrafish have been shown to express intestinal 
alkaline phosphatase which detoxifies LPS in their gut and prevents intestinal inflammation in 
response to the gut microbiota.186 We considered that our microgavage experiment may have 
been affected by the dephosphorylation of LPS in the fish intestine with an intact microbiota. 
 105 
The inflammasome might serve as an alternate LPS-sensing pathway in zebrafish. 
Inflammasomes are cytosolic multiprotein complexes that regulate the immune response to 
intracellular danger signals much like TLRs at the extracellular surface. Caspase-11-deficient 
mice are resistant to LPS-induced sepsis, implying that caspase-11 participates in host response 
to LPS.204 Caspase-4 and caspase-5 in humans and the ortholog in mice, caspase-11, are 
activated by direct binding to intracellular LPS.205 Much like human TLR4/MD-2 activation, 
underacylated lipid IVa and (L5)-LPS-RS were shown to bind to caspase-4/11 but could not 
induce oligomerization and activation of the inflammasome.204–206 This LPS-binding was 
mediated by the CARD domain of the caspases.205 Zebrafish have homologs but not direct 
orthologs of mammalian caspases. One of these homologs, caspy2 (or Casb), when 
overexpressed in HEK293T cells can directly bind LPS via the N-terminal pyrin death domain, 
resulting in caspy2 oligomerization which is necessary for pyroptosis.207 Knockdown of caspy2 
expression protects larvae from lethal sepsis.207 However, LPS needs to be delivered to the 
cytosol in order for caspy2 to bind to it. This happens naturally during an infection due to 
bacterial effector proteins that can inject LPS into host cells. It is possible that this LPS-sensing 
role of the inflammasome protein caspy2 has taken over the responsibility of Gram-negative 
bacterial detection in zebrafish. It would be interesting to investigate whether TLR4 plays a role 
in priming zebrafish immune cells to upregulate the expression of caspy2 for LPS recognition, or 
in regulating the delivery of LPS to the intracellular space. 
 
Further probing ancestral complexes will help us understand TLR4 ligand responses 
The reconstructed bony vertebrate and teleost ancestors showed low-level LPS 
sensitivity. This makes sense given the lack of signal we see for the pike, frog, and zebrafish 
 106 
tlr4bb and tlr4al complexes, as well as the low sensitivity of zebrafish tlr4ba to LPS variants with 
more than four acyl chains and the caecilian TLR4 to hexa-acylated LPS. However, a lack of 
signal is harder to make a case for functional similarity than a positive signal.  
Ancestral sequence reconstruction relies on several assumptions of the evolution between 
sequences in a dataset. We cannot access every sequence change along an evolutionary trajectory 
and therefore, we rely on complex models to infer the most likely series of mutational events 
connecting modern sequences. Because of this, it is more than likely that the sequences 
reconstructed do not reflect the exact ancestral states of the protein. Posterior probability is a 
useful statistical measurement used to indicate our uncertainty in the amino acid call at each 
position along a reconstructed protein. If we were completely confident in our reconstruction, 
meaning there was only one residue at each position with high probability, the average posterior 
probability across the entire sequence would be 1.0. Our reconstructed sequences for the tetrapod 
ancestor (ancTetrapod) TLR4, MD-2, and CD14 have average posterior probabilities of 0.881, 
0.854, and 0.955, respectively. The average posterior probability for the ancestor of bony 
vertebrates (ancBonyVert) complex was 0.806, 0.798, and 0.929, respectively. And the teleost 
ancestor (ancTeleost) yielded average posterior probabilities of 0.885, 0.774, and 0.926 for 
TLR4, MD-2 and CD14, respectively. Typically, ancestral proteins have been shown to exhibit 
function when they have posterior probabilities > 0.85.  
It is possible that the ancestral complexes show low activity due to borderline poor-
quality reconstructions of TLR4 and MD-2 sequences. However, we argue that there are far more 
ways to break a protein’s function than there are to maintain it. A good next step would be to 
resurrect and characterize the altAll ancestors—ancestral sequences with every ambiguous amino 
acid substituted with the next most probable alternate. These proteins would serve as the “worst 
 107 
case” scenario if the reconstruction chose the wrong amino acid at every ambiguous site. If the 
maximum likelihood and altAll ancestors have the same function, then the function is robust to 
uncertainty in the reconstruction and likely reflects the protein’s ancestral state.76,115,134–138 
We would also be interested in doing further investigation into the biochemical changes 
that have occurred throughout these evolutionary intermediates that confer different LPS 
sensitivity and specificity. We would better understand the mechanism of TLR4/MD-2/CD14 
ligand responses if we were to compare these ancestors at a deeper level. 
 
Did zebrafish lose CD14 as a mechanism to avoid LPS toxicity? 
 A major hindrance to our investigations of zebrafish TLR4 function has been the elusive 
CD14-like protein. We have investigated several candidate molecules (TLR2, CD180/MD-1, and 
transferrin) that exist in zebrafish, can bind LPS, and have been shown to be involved in immune 
responses. However, none of these proteins could catalyze TLR4/MD-2 ligand-induced activity 
like CD14 in our functional assays. It is possible that zebrafish do not have a CD14-like protein 
and that could explain their low sensitivity to LPS even if zebrafish TLR4/MD-2 can mount an 
immune response. Currently, José Sánchez-Borbón is taking point on investigating the 
evolutionary history of CD14 and has begun to reveal when it became a functional part of the 
TLR4 complex. José has identified that most fish species either have TLR4/MD-2 or a proto-
CD14 molecule, but not all three proteins. If this is true, then perhaps fish lost the ability to sense 
LPS via TLR4 by selective pressure to evade sepsis-like diseases, or alternatively they have not 
needed the full complex to deal with infections by Gram-negative bacteria. 
 In conclusion, we have leveraged ancestral sequence reconstruction, cell-based and 
organismal functional assays, and mutational analyses to probe the evolutionary divergence 
 108 
between human and zebrafish TLR4 complex structure, function, and role in the innate immune 
response to LPS. We find that although human and zebrafish TLR4/MD-2 share many properties, 
like the ability to bind LPS and stimulate an inflammatory response, there are many 
dissimilarities after 430 million years of evolutionary separation in the way these homologous 
proteins are activated by their ligand and how they are involved in the immune response. Much 
more work remains to really understand the role of zebrafish TLR4, if the zebrafish can be used 
to model TLR4-induced inflammation, and why the tlr4ba ohnolog has evolved the unique 
ability to specifically recognize tetra-acylated LPS. 
 
MATERIALS AND METHODS 
Ancestral sequence reconstruction 
We reconstructed ancestral sequences using the topiary pipeline available on GitHub 
(https://github.com/harmslab/topiary).182 The multiple sequence alignments generated by the first 
stage of the topiary script were manually edited to remove ambiguous sequences and gene 
duplicates using AliView software.152 Sequences from key species were added to the alignment 
to increase taxonomic sampling when it was lacking. At this point, the signal peptide for every 
sequence in the alignment was predicted by SignalP – 6.0208 and then removed from the 
alignment before being fed back in to stage 2 of topiary to perform the ancestral inference. TLR4 
and MD-2 sequence alignments, reconstructed ancestral sequences, and phylogenetic trees built 
during ASR are available in the supplement files. The Jones-Taylor-Thornton (JTT) substitution 
model was used in the MD-2/MD-1 ancestral inference. The maximum likelihood species-
reconciled gene tree for MD-2/MD-1 aligned fairly well with the species tree except that it 
placed two duplication events within the early branches of the MD-1 clade and placed several 
 109 
amphibians outside of fish. This was probably due to high variability in amphibian MD-1 
sequences in the alignment. The MD-2 clade, however, aligns well with the species tree. The 
MD-2 ancestors we wanted to characterize ancBonyVert (anc88), ancTeleost (anc87), and 
ancTetrapod (anc76) had average sequence posterior probabilities of 0.798, 0.774, and 0.854, 
respectively. Bootstrap sampling of the gene-species tree resulted in high branch supports for 
these nodes. The TLR4/CD180 ancestral inference used the JTT general amino acid exchange 
rate matrix and discrete Gamma model with 8 rate categories (JTT+G8). The reconciled gene-
species tree aligns well with the species tree. There were also duplication events labeled early on 
in both TLR4 and CD180 clades suggesting fish TLR4 and CD180 have significantly diverged 
from their tetrapod homologs. The TLR4 ancBonyVert (anc345), ancTeleost (anc57), and 
ancTetrapod (anc334) reconstructions had posterior probabilities of 0.806, 0.885, and 0.881, 
respectively. Bootstrap analysis for the TLR4/CD180 tree is not complete, so we do not yet know 
the branch supports at ancestral nodes. Both maximum likelihood and altAll sequences were 
reconstructed for every ancestor. We used the maximum likelihood ancestral sequences for 
functional characterization. 
 
Plasmids 
 Ancestral gene sequences were human codon optimized and synthesized by GeneWiz 
(Azenta) in a pcDNA3.1(+) backbone without the T7 promoter. TLR4, MD-2, and CD14 
ancestral sequences without native signal peptides were flanked upstream by the CMV enhancer 
and promoter, a Kozak sequence, the human signal peptide, and a FLAG tag, and then flanked 
downstream by two stop codons. 
 110 
 Most mammalian expression plasmids were already in house. Pike TLR4 and MD-2 
genes were human codon optimized and then synthesized by GenScript in the pcDNA3.1(+) 
vector with a Kozak sequence upstream of the start codon. Zebrafish tlr4al (Accession No. 
:NM_001328605.1), cd180 (Accession No.: NM_001310490.1), ly86 (md-1) (Accession No. : 
NM_001310488.1), and human CD180 (Accession No.: NM_005582.3) and LY86 (MD-1) 
(Accession No.: NM_004271.4) were cloned by GenScript into the pcDNA3.1(+) expression 
vector with a Kozak sequence upstream. In house cloning was done using SLIC and KLD 
mutagenesis kits. 
 
Cell culture and transfection conditions 
We followed well-established protocols for transient transfection using the Dual-Glo 
Luciferase Assay System (Promega).102 Human embryonic kidney cells (HEK293T /17, 
American Type Culture Collection CRL-11268) were maintained up to 30 passages in DMEM 
supplemented with 10% FBS at 37oC with 5% CO2. For each transfection, a confluent 100 mm 
plate of cells was treated at room temperature with 0.25% Trypsin-EDTA in HBSS and 
resuspended with an addition of DMEM + 10% FBS. This was diluted four-fold into fresh 
medium and 135 µL aliquots of resuspended cells were transferred to a 96-well cell culture 
treated plate. All transfection mixes were made with 1 ng of Renilla and 20 ng of ELAM-Luc. 
Transfection mixes for human TLR4 complexes were made with an additional 10 ng of TLR4, 
0.5 ng of MD-2, 1 ng of CD14, and 67.5 ng of pcDNA3 per well for a total of 100 ng of DNA. 
Transfection mixes for other species TLR4 complexes, unless otherwise noted, were made with 
10 ng of TLR4, 20 ng of MD-2, 1 ng of CD14, and 48 ng of pcDNA3 per well for a total of 100 
ng of DNA. All plasmids contained human codon optimized genes and were in mammalian 
 111 
expression vectors. Transfection mixes were diluted in OptiMEM to a volume of 10 µL/well. To 
the DNA mix, 0.5 µL per well of PLUS reagent was added and thoroughly mixed followed by a 
10 min incubation at room temperature. Lipofectamine was diluted 0.5 µL into 9.5 µL OptiMEM 
per well. This was added to the DNA + PLUS mix, mixed well, and incubated at room 
temperature for 15 min. The transfection mix was diluted to 65 μL/well in OptiMEM and 
aliquoted onto the cells in the plate. Cells were incubated with transfection mix overnight (20–24 
h). For cells that received a pre-treatment with the TLR4 inhibitor TAK-242 (synonyms: 
Resatorvid; CLI-095) (HY-11109, MedChemExpress), 2 μL of 100 μM TAK-242 in cell culture 
grade DMSO was applied per well and the plate was incubated for 5 min at 37oC before media 
was removed and cells were treated as normal with 100 μL of LPS mixtures prepared in 25% 
PBS, 75% DMEM. E. coli K-12 LPS (tlrl-eklps, Invivogen) and R. sphaeroides LPS (tlrl-rslps, 
Invivogen) were dissolved at 5 mg/mL in endotoxin free water, aliquots were stored at −20°C. S. 
enterica serotype typhimurium LPS (L6511, Sigma-Aldrich) was dissolved at 5 mg/mL in 
endotoxin free water and stored at 4°C. Lipid IVa (CLP-24006-S, Biosynth) was dissolved at 0.1 
mg/mL in endotoxin free water, aliquots were stored at −20°C. To avoid freeze-thaw cycles, 
working stocks of LPS were prepared at 10μg/mL and stored at 4°C. To disrupt micelle 
formation and evenly distribute LPS in solution, LPS stocks were placed in a room temperature 
jewelry ultrasonicator for 15 min prior to use in treatments. Cells were incubated with treatments 
for 4 hr. The Dual-Glo Luciferase Assay System (Promega) was used to assay Firefly and Renilla 
luciferase activity of individual wells. Each NF-κB induction value shown represents the buffer-
subtracted Firefly luciferase activity/vehicle blanked Renilla luciferase activity, normalized to 
LPS-treated transfection controls for each species in order to normalize between plates. For cells 
treated with transferrin proteolysis products, 100 μL of 100 μM human transferrin (T8158, 
 112 
Sigma-Aldrich) in endotoxin-free water was digested with 30 μL of 20 μM proteinase K 
overnight at room temperature before being mixed at indicated concentrations with treatment 
mixes. 
 
Oral microgavage of LPS 
Zebrafish experiments were approved by the University of Oregon Institutional Animal 
Care and Use Committee. Larval 6 dpf fish were anesthetized in 168 mg/ml tricaine methane 
sulfonate in embryo medium (EM) and microgavaged with 4.6 nL of 1 mg/mL LPS purified from 
E. coli 0111:B4 (L2630; Sigma-Aldrich) dissolved in EM. After gavage, fish were transferred to 
fresh EM. Fish were imaged 6 h post-gavage using a fluorescence stereo microscope. Fish were 
anesthetized in 168 mg/ml tricaine methane sulfonate in EM before being mounted on a glass 
slide and images were taken over the distal gut region in bright field, GFP, and mCherry 
channels. The fish used in this experiment were tg(tnfα:GFP; mpeg:mCherry). After imaging, 
fish were immediately sacrificed and fixed in paraformaldehyde for subsequent staining and 
quantification of neutrophils associated with gut tissue. Fixed zebrafish were dissected to isolate 
the gut bulb and intestinal tract. Stained neutrophils were counted by two treatment-blind 
individuals.  
 
Brain tectum microinjection of LPS 
We used a modified version of a previously published protocol.209 Larval 5 dpf fish were 
submersed in either 8.3 μM TAK-242 (synonyms: Resatorvid; CLI-095) (HY-11109, 
MedChemExpress) or an equal volume of cell culture grade DMSO (final concentration of 
0.003%) in EM for 24 hours. 6 dpf fish were anesthetized in 168 mg/ml tricaine methane 
 113 
sulfonate in embryo medium (EM) and microinjected by brain tectum injection with ~4.2 nL of 
2.17 ng/nL E. coli LPS-EK (tlrl-eklps, InvivoGen), lipid IVa (CLP-24006-S, Biosynth), or 
ultrapure PBS. After injection, fish were rinsed and recovered in fresh EM until imaging. Fish 
were imaged 3-6 hours post-injection by fluorescence light sheet microscopy. For imaging, fish 
were anesthetized in 168 mg/mL tricaine methane sulfonate in EM before being mixed with 
0.7% low-melt agarose at 40oC in EM and mounted into a capillary tube. We collected 5 sets of 
fluorescent z-stack images over the liver region through the full width of each fish to track 
immune cells near the liver. The fish strain used in this experiment was tg(tnfα:GFP; 
mpx:mCherry). Images were analyzed using Imaris software. 
 
BRIDGE TO CHAPTER IV 
There have been many discoveries of host-microbe interactions, drug developments, and 
disease pathologies using the zebrafish model of innate immunity. In Chapter III, we showed 
ample evidence of dissimilarities between homologous innate immune proteins in humans and 
zebrafish. In Chapter IV, we present an investigation of zebrafish proteins that are homologous, 
but not orthologous to the mammalian innate immune protein, calprotectin. This work was 
stimulated by a recent publication describing the zebrafish s100a10b protein as the functional 
homolog of human calprotectin in the zebrafish innate immune response to bacterial infection. 
 
  
 114 
CHAPTER IV 
ZEBRAFISH DO NOT HAVE CALPROTECTIN 
 
*This chapter contains unpublished co-authored material. 
 
Author contributions: Orlandi KN and Harms MJ designed the study. Orlandi KN designed and 
performed experiments, analyzed data, and wrote the manuscript. Harms MJ obtained funding 
and oversaw the project and writing. 
 
  
 115 
ABSTRACT 
 The protein heterodimer calprotectin and its component proteins play important 
antibacterial and proinflammatory roles in the mammalian innate immune response. Calprotectin 
is also a well-validated, non-invasive biomarker of inflammation. Gaining mechanistic insights 
into the regulation and biological function of calprotectin will help facilitate patient diagnostics 
and therapy. Recent literature proposed that the zebrafish S100A10b protein is analogous to 
human calprotectin based on sequence similarity and genomic context. The field would benefit 
from expanding the breadth of calprotectin studies into a zebrafish innate immunity model. 
However, thus far there is no phylogenetic nor functional evidence demonstrating the existence 
of calprotectin in fish. Here, we evaluate the possibility that a zebrafish S100 protein could have 
convergently evolved a calprotectin-like role in the zebrafish innate immune response. We show 
the phylogenetic and syntenic relationships of human and zebrafish S100s. We identify and 
zebrafish S100s that are expressed in immune cells and upregulated during the immune response. 
We then recombinantly express and purify four candidate proteins and evaluated them for 
antimicrobial and proinflammatory characteristics. We find that none of the most promising 
candidates proved to be functionally orthologous to calprotectin nor its component proteins.
 116 
INTRODUCTION 
Calprotectin plays critical roles in the innate immune response.210 This protein is a 
complex formed by two calcium binding proteins: S100A8 and S100A9. Calprotectin is found in 
heterodimeric and heterotetrameric states, both of which play biological roles.63,64,82,211 Its 
individual components, S100A8 and S100A9, are also both found as homodimers with biological 
functions distinct from the heterocomplexes.  
S100A8 and S100A9 are highly expressed in the cytoplasm of immune cells212 
comprising up to 45% of soluble cytosolic protein in neutrophils62. Intracellular S100A8 and 
S100A9 are implicated in the calcium-dependent microtubule reorganization of phagocytes 
allowing migration to sites of infection.65,66,68 Upon release from cells after damage or during an 
immune response calprotectin exerts antimicrobial activity by sequestering transition metals 
essential for microbial growth in the extracellular matrix.86–88,213–220 Extracellular S100A8 and 
S100A9 homodimers can amplify the immune response by activating Toll-like receptor 4 (TLR4) 
and the Receptor for Advanced Glycation End-products (RAGE) promoting cytokine expression 
and immune cell migration, respectively.72 Several other important functions are associated with 
S100A8 and S100A9.69–71,221 Dysregulated expression of these proteins is linked to ailments such 
as Alzheimer’s disease, Parkinson’s disease, cerebral ischemia, obesity and cardiovascular 
disease.222 Correlated with its role in the immune response, high levels of calprotectin in tissues, 
serum, or stool are indicative of inflammation associated with severe infections, cystic fibrosis, 
digestive tract disorders, autoimmune diseases, rheumatoid arthritis and cancer.91–94  
Given the importance of calprotectin, there is interest in developing new models to study 
its function. One attractive model is the zebrafish, which is increasingly being used to understand 
the molecular mechanisms of immune functions. Recent work has begun to characterize the role 
of a calprotectin homolog in the zebrafish response to infection.107,108 As vertebrates, zebrafish 
share much of their physiology and molecular components with humans. They also have 
exceptional experimental advantages: well-established genetic tools, optically transparent larvae 
(making it possible to visualize tagged molecules in real-time in live fish), and rapid generation 
times.223 They are particularly useful for studying innate immunity because they survive with 
only the innate immune responses until 4-6 weeks post-fertilization when their adaptive immune 
system is morphologically and functionally mature.224–227  
Despite the power of the zebrafish model system, it can be challenging to map zebrafish 
biology to human biology. Millions of years of evolution have allowed the divergence, 
emergence, and loss of proteins and protein functions between species, often making the 
comparison difficult. One of the most important considerations is whether the genes being 
compared between species are, in fact, the same genes. Are they the result of speciation 
(orthologs) which often have very similar functions, or did they arise by gene duplication 
(paralogs) which often have very different functions. Establishing gene orthology is particularly 
challenging for S100 proteins, as they form the largest subgroup within the superfamily of 
proteins carrying the Ca2+-binding EF-hand motif. Humans have 24 S100 genes228,229; zebrafish 
have 14.230,231 
There is no annotated s100a8 or s100a9 in the zebrafish genome; however, there are 
several zebrafish S100 genes in a similar genomic location to that of human s100a8 and s100a9. 
One of the zebrafish genes annotated in this genomic location is s100a10b. When performing a 
BLAST query against the zebrafish proteome, human S100A8 pulls up zebrafish s100a10b as a 
top similarity hit. On this basis, zebrafish s100a10b has been classified as “calprotectin.” This 
was followed by experimental studies reporting its transcriptional response to pathogenic 
 118 
bacteria.107,108 Commercially available “fish calprotectin” ELISA antibodies also imply 
calprotectin is present in fish. However, these antibodies were raised against the highly 
conserved N-terminal helix of S100A8 and likely bind several S100 proteins. No rigorous 
investigations have yet been employed to demonstrate the presence of a calprotectin ortholog, or 
even a convergently-evolved paralog, in zebrafish.  
We set out to find phylogenetic, biochemical, or biological evidence of calprotectin (or 
calprotectin-like) activity in zebrafish s100 proteins. Through a careful review of existing 
phylogenetic literature, we confirm that fish do not have a calprotectin ortholog: both S100A8 
and S100A9 evolved in mammals 250 million years after the divergence of tetrapods and ray-
finned fishes. We support this phylogenetic result through a comparative synteny analysis of 
s100 genes in zebrafish and human genomes. We also investigate the possibility that fish 
convergently evolved a calprotectin-like s100 protein using single-cell RNAseq data to identify 
zebrafish s100 proteins expressed in immune cells. We recombinantly expressed and purified 
four of these proteins—including zebrafish s100a10b, the protein previously identified as fish 
calprotectin in the literature—and experimentally tested their antimicrobial and pro-
inflammatory activities. None of the proteins give measurable activity.  
We conclude that zebrafish have neither a vertically inherited ortholog of calprotectin, 
nor an obvious candidate protein that convergently evolved similar function. Our results 
highlight the danger of relying on sequence similarity and genomic placement to identify genes. 
We demonstrate it is necessary and prudent to use an explicitly evolutionary lens with careful 
functional analyses when mapping results from model organisms to human biology. 
 
 
 119 
RESULTS 
Zebrafish s100a10b is only distantly related to human S100A8 and S100A9 
 We started by looking for phylogenetic evidence that fish have a protein orthologous to 
mammalian S100A8 or S100A9. Orthologous proteins are ones that arose by speciation and are 
thus the same gene in the species being compared. Paralogous proteins arose by gene duplication 
and often exhibit gain or loss of function from the ancestral state203, establishing themselves as 
new proteins. Fig 4.1A summarizes the evolutionary history of S100s. This tree was built 
referencing several published phylogenetic analyses of the family, including two from our 
group102,232. The phylogeny at the top shows the current best estimate of the S100 gene tree; the 
phylogeny on the left shows the evolutionary history of bony vertebrates. Each circle denotes the 
S100 gene observed in at least one member of the taxonomic groups on the left.  
This evolutionary tree indicates that S100A8 and S100A9 evolved by gene duplication 
from a single gene in the ancestor of amniotes. Reptiles and birds preserve a single calgranulin 
protein (MRP-126), while mammals expanded it into three proteins (S100A8, S100A9 and 
s100a12). The closest evolutionary relatives of these proteins are s100a7, s100a7a and s100a15. 
Like the calgranulins, these arose by duplication of a single gene in the ancestor of amniotes. The 
reptile/bird protein MRP-126 is the earliest diverging protein known to exhibit nutritional 
immunity and/or Toll-like receptor 4 activation in functional assays.102,233 These observations 
indicate that calprotectin evolved in amniotes ~320 million years ago.   
In contrast, zebrafish s100a10b (the putative zebrafish calprotectin) falls into a clade with 
the proteins S100A10 and S100A11. This is one of the earliest S100 protein subfamilies to 
evolve, with orthologs present in species ranging from tetrapods to jawless fishes. This group of 
S100 proteins thus diverged from the lineage that led to mammalian S100A8 and S100A9 at 
 120 
least 563 million years ago, in the last common ancestor of humans and lampreys. Further, after 
this speciation event, there were at least two more gene duplications on the lineage leading to 
S100A8 and S100A9. S100a10b is therefore a different gene than S100A8 or S100A9. 
 
Chromosome placement indicates a shared origin but complicated evolution of homologous 
human and zebrafish S100s 
To cross-validate the lack of evidence for vertical inheritance from published 
phylogenies, we used syntenic analysis to identify zebrafish s100 genes in a similar genomic 
location to human S100A8 and S100A9. We used ENSEMBL to identify the zebrafish genomic 
region most similar to human chromosome Chr 1:152-155M, which encodes 19 of the 24 human 
S100 proteins, including S100A8 and S100A9. This region corresponded to zebrafish 
chromosome 16. Specifically, human Chr 1:154.6M-154.7M and zebrafish Chr 16:23.5M-23.7M 
cover the KCNN3 and ADAR genes adjacent to tandem repeats of S100 genes in both species 
(Fig 4.1C).  
The existence of this shared cluster indicates that a handful of s100 genes were in this 
genomic context at least in the bony vertebrate ancestor ~430 million years ago, as established in 
previous work.230 The syntenic relationships, however, also give evidence for extensive evolution 
after the divergence of bony fishes and tetrapods: the orientation and placement of genes are 
different, and orthologs to the human S100s are missing from this genomic location but present 
on other chromosomes. Further, most of the zebrafish s100s in this region appear to be teleost-
specific duplicates.230 This includes ictacalcin (icn), icn2, s100t, s100s, and s100w. The only 
clear orthologs to human proteins are s100a10b and s100a1 (Fig 4.1A).  
 
 121 
Human calprotectin and zebrafish s100 protein sequences have low sequence identity 
Previous workers identified zebrafish s100a10b as calprotectin using human S100A8 as a 
query in a BLAST search against the zebrafish proteome.108 Via ENSEMBL, S100A10b is the 
top hit; however, the e-value for this hit is only 0.045 and the percent identity is 36.17%. Via 
NCBI, this hit scores an e-value and percent identity of 8E-17 and 36.36%, respectively. We 
assessed the quality of the hit by reciprocal BLAST, meaning we used the zebrafish s100a10b 
protein sequence as a query against the human proteome on NCBI. This yielded human S100A1 
(9E-34; 55.9%) as the top hit, not S100A8. In fact, S100A8 (5E-16; 36.4%) was the 13th hit 
(after S100A1, S100A10, S100Z, S100P, S100B, S100A4, S100A12, S100A5, S100A6, 
S100A2, S100A4, and S100A9). This is consistent with the previous phylogenetic analyses that 
place S100A8 and S100A9 as relatively distant paralogs to zebrafish s100a10b (Fig 4.1A).  
To evaluate sequence similarity and identity, we aligned zebrafish s100 protein sequences 
from the syntenic region to human S100A8, S100A9, and S100A10 sequences (Figure 4.1C). As 
expected, there is high conservation at sites that form the EF-hand and pseudo-EF-hand calcium-
binding domains of the s100 proteins. There is low conservation in the region connecting the two 
EF-hands and at the termini. We used Clustal Omega to determine the sequence identity shared 
between zebrafish s100s and human S100A8, S100A9, and S100A10 (Fig 4.1D). We find 
similarly low levels of shared identity between human S100A8 and S100A9 and the zebrafish 
s100s. Overall, there is no obvious candidate zebrafish s100 that is like calprotectin by sequence 
similarity or identity. 
 122 
 
Figure 4.1: Phylogenetic analyses reveal there is no calprotectin ortholog outside of 
amniotes. A) S100 gene tree adapted from Wheeler et al., 2017 shows the evolutionary 
relationships determined for S100s across vertebrates. The phylogeny on the left shows the 
relationships between species, with branch point times noted in millions of years ago; the 
phylogeny on the top shows the estimated S100 gene tree. Circles denote S100 genes from the 
phylogeny at the top found in at least one member of the taxonomic group from the left. A 
horizontal line through a circle indicates a single gene that is co-orthologous to multiple S100 
genes found in mammals. S100A8, S100A9, and S100A12 form a clade specific to amniotes 
(orange). Zebrafish genes in the syntenic region shown in panel C are shown in blue. Zebrafish 
s100a10b, which has been treated as a calprotectin ortholog, is denoted with a white star. B) 
Syntenic regions of human chromosome 1 (top) and zebrafish chromosome 16 (bottom) 
identified by ENSEMBL. Arrows denote relative gene length and orientation. Human S100A8, 
S100A9, and S100A12 are shown in orange; zebrafish s100s and their human orthologs are 
shown in blue. The non-S100 genes adar/ADAR and kcnn3/KCNN3 are diagnostic for the 
syntenic region. Not all genes in the region are depicted. C) A multiple sequence alignment of 
the S100 proteins in this dataset displays the amino acid similarity at each position of human 
 123 
Figure 4.1 (continued) S100A8, S100A9 and S100A10 compared to the zebrafish s100s from a 
similar genomic context. At the top, secondary structure features of human S100A8 are shown. 
Under this, the bar is colored by sequence conservation (blue=low, red=high). The consensus 
sequence from all sequences in the alignment is shown above the individual protein sequences. 
Amino acids found in at least 50% of the sequences shown are shaded. The antigen for the “Fish 
Calprotectin” antibody was raised against the peptide boxed in yellow. D) Percent identity matrix 
comparing S100 proteins in this dataset (darker box indicates higher identity). S100 pair identity 
values range from 25.81-87.37% with a mean value of 39.35% and median value of 35.25%. 
Identity values comparing human S100A8 and S100A9 to zebrafish S100s are highlighted in the 
blue boxes.    
 
This alignment also allowed us to ask what zebrafish s100 protein(s) might be recognized 
by the commercially available “Fish Calprotectin” ELISA Kit from MyBioSource. This kit was 
made with antibodies raised against a 20 amino acid partial peptide of a human calprotectin-like 
protein (GenBank: AAB33355.1), which forms the N-terminal helix and beginning of the EF-
hand 1 domain of human S100A8 (Fig 4.1C, yellow box). An NCBI BLAST search reveals that 
eight to nine residues in this helix are highly conserved in several zebrafish s100s including 
s100b, s100a10b, s100a10a, s100a1, s100z, s100s and s100w as well as several other unrelated 
proteins. These residues are on the inside of the amphipathic helix, involved in stabilizing 
secondary and tertiary structures, and are potentially used to coordinate metal ions. The N-
terminal helix of S100 proteins is at the surface when in dimeric and tetrameric complexes and 
so the antibody in this kit is likely non-specific. 
 
Single cell RNA sequencing dataset mining points to candidate zebrafish s100 proteins 
expressed similar to calprotectin 
Our bioinformatic analyses revealed that no zebrafish S100 protein is orthologous to 
human calprotectin; however, it is possible that an S100 protein convergently evolved 
calprotectin-like activity. To investigate this possibility, we identified zebrafish s100s that share a 
 124 
similar expression profile to calprotectin. Calprotectin is expressed constitutively in mammalian 
neutrophils, monocytes, and several epithelial cell types is upregulated upon infection and 
injury.62,212 We queried existing zebrafish single cell RNA sequencing (scRNAseq) datasets for 
zebrafish s100s expressed in immune cells that are upregulated in response to injury. (We could 
not find a cell browser for zebrafish infection models.) 
We used the UCSC cell browser to visualize a zebrafish development dataset deposited by 
Farnsworth et al., 2019234, and assessed constitutive immune cell expression in whole fish 1-, 2-, 
and 5-days post-fertilization. Of the genes that share a genomic context with s100a9 (Fig 4.1B), 
we found that s100a10b, icn, icn2, s100w and s100a1 are expressed in immune cells of 
developing zebrafish (Table 4.1). s100t, also in this genomic region, shows very low expression 
in immune cells. We found that five s100 genes from other zebrafish chromosome locations 
show some expression in immune cells: s100v1, s100v2, s100u, s100z, and s100a11. Finally, the 
remaining annotated zebrafish s100 genes—s100s, s100b, and s100a10a—appear in very few 
cells within these clusters.  
To evaluate whether these proteins are expressed during the innate immune response to 
injury, we used the fin clip and tissue regeneration scRNAseq dataset and cell browser provided 
by Hou et al., 2020.235 In this dataset, cells were isolated from adult zebrafish caudal fins at 1-, 2-
, and 4-days post-amputation. The macrophage marker mpeg1.1 is highly expressed in 
hematopoietic cells during the response to fin clip injury and is also expressed in other cell types. 
The neutrophil marker mpx was only detected at very low levels in four basal epithelial cells. We 
see that s100a10b, icn, icn2, s100w, and s100a1 are expressed in hematopoietic cell clusters, 
albeit s100a1 to a lesser degree. S100t only appears twice in the hematopoietic cells sequenced 
but shows more expression in epithelial cells. Zebrafish s100s from other regions of the zebrafish 
 125 
genome show varying levels of expression in hematopoietic cells. Notably, all s100 genes from 
outside of zebrafish chromosome 16 were expressed to a lesser degree than s100a10b, icn, icn2 
and s100w. The injury model is consistent with the development model, although it is missing 
data for s100a10a and s100b—perhaps these genes were not detectable in regenerating fin 
tissues.  
 
Table 4.1: Single-cell RNAseq profiles for zebrafish s100s syntenic to human calprotectin 
Gene Linkage Group Ortholog 
Developmental dataset Injury dataset 
name and Synteny Call 
Widely upregulated, 
Chr 16 
s100a1 Z, A1 Immune cells low-level expression, 
~syntenic 
hematopoietic cells 
Widely upregulated, 
Chr 16 
s100a10b A10 Immune cells high expression, 
Syntenic 
hematopoietic cells 
Widely upregulated, 
Chr 16 A13, A14, 
s100w Immune cells moderate expression, 
Syntenic A16 
hematopoietic cells 
Widely upregulated, 
Chr 16 
icn A1 Immune cells high expression, 
Syntenic 
hematopoietic cells 
Widely upregulated, 
Chr 16 
icn2 A1 Immune cells high expression, 
Syntenic 
hematopoietic cells 
Chr 16 
s100t A1 Low in immune cells Epithelial cells 
~syntenic 
 
Recombinant zebrafish s100 proteins fold and interact with calcium 
We chose to functionally characterize a subset of the zebrafish s100s which seemed most 
promising to behave like calprotectin based on genomic context and gene expression profiles: 
s100a10b, s100a1, s100 and icn. We left out icn2 because it is very similar to icn by all metrics 
 126 
including sequence identity (87.37%: they differ by 14 amino acids, 8 of these at the termini; Fig 
4.1).  
We started by structurally characterizing the four selected zebrafish proteins. We used 
AlphaFold2 to predict structures for all four proteins.183–185 Overlaying the predicted structures 
with the crystal structure of human calprotectin shows high predicted structural similarity (Fig 
4.2A). High α-helical content is a shared feature of all known S100 proteins, as well as the 
predicted zebrafish s100 structures. We tested whether this held for zebrafish s100 proteins by 
heterologously expressing and purifying the proteins from E. coli and then measuring their 
secondary structure content by far-UV circular dichroism (CD). This revealed signal minima at 
208 and 222 nm consistent with primarily α helical structures (Figure 4.2B).  
Most S100 proteins also bind calcium and undergo a conformational change exposing a 
hydrophobic binding surface.236,237 We tested whether this held for the four zebrafish S100 
proteins by measuring calcium-induced changes in protein secondary and tertiary structure by 
far-UV CD and intrinsic fluorescence, respectively. We found that all four recombinantly 
expressed zebrafish proteins exhibited evidence of calcium-induced conformational change (Fig 
4.2B-C). 
Upon addition of saturating calcium, zebrafish s100a10b and s100a1 exhibited an 
increase in helical content, while s100w and icn, in contrast, show little change in secondary 
structure (Fig 4.2B). The intrinsic fluorescence of all four proteins, however, responded to 
calcium (Fig 4.2C). Intrinsic fluorescence captures changes in the local chemical environments 
of tyrosine and tryptophan residues, suggesting that calcium binding induces a change in the 
tertiary structure of all s100 proteins. This is consistent with the canonical calcium-induced 
rotation of the third helix relative to the other helices of S100s.236 
 127 
Taken together, these results show that these four zebrafish S100 proteins are folded, bind 
to calcium, and undergo the calcium-induced conformational changes expected for members of 
the family. 
 
Figure 4.2: Structural comparisons of human and zebrafish S100s. A) Overlaid AlphaFold 
structure predictions for all S100 homodimers in this dataset, as well as the human 
S100A8/S100A9 heterodimer (5W1F RCSB ID). Different chains of each homodimer are shown 
in black or white. B) Far UV circular dichroism spectra for each protein in the presence of 2 mM 
Ca++ (green) and then adding 5mM EDTA (blue). Units are in molar ellipticity (deg×cm2/dmol) 
over wavelength (nm). C) Fluorescence excitation and emission spectra for each protein in the 
presence or absence of calcium (green and blue, respectively). The fluorescence units are 
arbitrary; the x-axis is wavelength in nanometers. Excitation spectra were collected while 
observing fluorescence at the maximum emission wavelength; emission spectra were collected 
while exciting at the maximum excitation wavelength.   
 
Zebrafish s100s do not exhibit nutritional immunity like calprotectin 
One of the most important biological functions of human calprotectin is antimicrobial 
activity via nutritional immunity. We evaluated the antimicrobial abilities of each of the four 
zebrafish s100 proteins against human-derived Stapholococcus epidermidis and zebrafish-
derived Vibrio ZWU0020 and Aeromonas ZOR001 strains. S. epidermidis was previously shown 
to be susceptible to human calprotectin238,239; the response of the zebrafish-derived strains is 
unknown. Fig 4.3A shows the dose-dependent antimicrobial activity of human calprotectin 
 128 
against each strain over 13 hours in nutrient rich media across three biological replicates. For all 
three strains, increasing amounts of calprotectin (from blue to green) leads to decreased final 
OD600 values (indicated by black arrows). We quantified this response by measuring difference in 
area under the OD600 curve from 0-13 hrs with and without calprotectin (AUC). We then 
plotted ΔAUC as a function of protein concentration. A negative AUC value indicates growth 
inhibition at the indicated s100 concentration, while a zero or positive value indicates no 
antimicrobial activity. This revealed a calprotectin-dependent decrease in growth for all three 
bacterial strains (Fig 4.3B, yellow curves). 
To assess the nutritional immunity capacity of the zebrafish s100s, we performed 
identical experiments using each of the four proteins. We calculated ΔAUC curves for each 
bacterial strain under increasing concentrations of each s100 (Fig 4.3B). Unlike the effect of 
human calprotectin, none of the four zebrafish s100 proteins exhibited nutritional immunity (Fig 
4.3B). Human calprotectin was the only protein to exhibit nutritional immunity under these 
conditions (yellow curve). Bacteria treated with zebrafish s100a1 showed improved growth 
relative to bacteria in the absence of s100 (brown curve). Zebrafish s100a10b increased growth 
of S. epidermidis and had no effect on growth of zebrafish-derived bacterial strains (orange). 
Zebrafish s100w (purple) improved S. epidermidis growth, showed a possible slight inhibitory 
effect on Aeromonas ZOR001 for concentrations at or above 50 μM, and did not affect Vibrio 
ZWU0020 growth at any concentration. Similarly, zebrafish ICN (blue) improved S. epidermidis 
growth, had a slight inhibitory effect on Aeromonas ZOR001 when used at or above 50 μM, and 
increased Vibrio ZWU0020 growth at all concentrations.  
 
 129 
 
Figure 4.3: Zebrafish s100s do not exhibit nutritional immunity activity like human 
calprotectin. All data were collected in biological and technical triplicate. Error bars indicate 
standard error. A) Dose dependence of human calprotectin challenge on human and zebrafish 
commensal bacteria. Each column specifies which bacteria was used for the set of nutritional 
immunity assays: a human-derived Gram-positive Stapholococcus epidermidis and two 
zebrafish-derived Gram-negative bacteria, Aeromonas strain: ZOR001 and Vibrio strain: 
ZWU0020. Bacterial growth was measured by OD600 over 13 hours after challenge with 
increasing doses of human calprotectin noted in the legend on the right. Concentration increases 
from dark blue to dark green and black arrows indicate how growth is affected as calprotectin 
increases. B) Zebrafish s100 dose effects on human and zebrafish commensal bacterial growth 
compared to human calprotectin. The dotted line at zero represents no effect of s100 challenge 
on bacterial growth. Each datapoint shows the change in the area under the curve from the 
absence of s100 protein to the indicated s100 concentration, measured from growth curves like 
those shown in panel A.  
 
Zebrafish s100s do not exhibit proinflammatory activity like S100A9 
 Antimicrobial activity is not the only function of mammalian calprotectin. Calprotectin is 
a heterodimer of S100A8 and S100A9. The homodimer of mammalian S100A9, for example, 
can activate an innate immune response through the Toll-like receptor 4 complex (TLR4), 
 130 
inducing nuclear localization of NF-κB and transcription of a wide variety of pro-inflammatory 
proteins. This activity can be reproduced in an in vitro functional assay by transfecting HEK293T 
cells with plasmids encoding the proteins of the TLR4 complex (TLR4, MD-2, and CD14), as 
well as a plasmid placing luciferase behind an NF-κB promoter. We can treat the cells with 
exogenous S100A9 and measure the luciferase response.  
We wanted to see if our purified zebrafish s100 proteins could play a similar role; 
therefore, we tested the ability of these proteins to activate TLR4 in this assay (Fig 4.4). 
Zebrafish have three ohnologs of tetrapod TLR4: TLR4ba, TLR4bb, and TLR4al. Zebrafish 
TLR4ba has been shown to induce inflammation in response to endotoxin: the small molecule 
lipopolysaccharide (LPS)101 but neither TLR4bb nor TLR4al showed activity. We validated our 
assay by testing the ability of each complex to activate in response to the canonical agonist for 
the receptor, endotoxins derived from Gram-negative bacterial outer membranes (green). As 
expected, the human and TLR4ba complexes responded strongly to endotoxin. TLR4bb and 
TLR4al did not show signal above vehicle treatment (light blue). We next challenged all four 
complexes with human S100A9 (yellow). We found that 2 μM human S100A9 activated human 
TLR4, as expected, but that none of the zebrafish TLR4 complexes showed signal above 
background. This suggests that this pro-inflammatory activity is not a conserved function in 
zebrafish. Finally, we tested the ability of zebrafish s100s to activate each TLR4 complex at 2 
μM: we observed no statistically significant agonist activity for any protein. Although we 
detected a small amount of signal zebrafish s100a10b (orange) and s100w (brown), this was not 
statistically significant (Bonferroni-corrected one-sample t-test). We repeated the same 
experiment with 10 μM zebrafish s100s and observed no convincing agonist activity. 
 131 
These zebrafish s100 proteins show no evidence of activity against either human TLR4 or 
the three zebrafish tlr4 ohnologs. Given the potent response of these receptors to positive 
controls (endotoxin or human S100A9), this strongly suggests the activity is not present in these 
proteins. Further, with this assay, false positives are common due to endotoxin contamination 
from the heterologously expressed proteins. The lack of signal thus gives strong evidence that 
these zebrafish proteins cannot activate TLR4 in the same fashion as human S100A9.  
 
 
Figure 4.4: Zebrafish s100s do not exhibit the pro-inflammatory characteristics of human 
S100A9. Activation of human and zebrafish TLR4 complexes in the presence of zebrafish s100 
proteins. Bars show the average signal across three biological replicates, with error bars 
indicating standard error. The positive controls for this experiment included human TLR4 and 
zebrafish TLR4ba treated with endotoxin (green), and human TLR4 treated with human S100A9 
(yellow). There is no known agonist for zebrafish TLR4bb and al complexes. All data was 
background subtracted and normalized to the signal from human TLR4 treated with endotoxin.  
 
DISCUSSION 
 Employing zebrafish as a model for studies of innate immunity is a promising field of 
work. As vertebrates, zebrafish share much of their physiology and immune defense mechanisms 
with humans, thus enabling mechanistic insight into health, disease, and host-microbe 
interactions. However, when approaching this research, we must be cognizant of the more than 
 132 
400 million years since our most recent common ancestor, allowing for species-specific 
differences evolved to cope with diverse environments and other pressures.  
We assert here that recent studies and commercial products have made the incorrect 
assumption that calprotectin exists in the fish innate immune response. We re-evaluate 
phylogenetic evidence to look for a homolog of calprotectin in fish and confirm that, although 
the evolutionary history of S100 proteins is messy, zebrafish do not share an s100 protein within 
the clade containing mammalian calprotectin.  
We also tested the possibility that a fish s100 protein from a similar genomic context to 
calprotectin might have evolved calprotectin-like innate immune functions. We characterized the 
nutritional immunity and proinflammatory activity of four zebrafish s100s, including the 
previously studied zebrafish “calprotectin” s100a10b, in assays which are normally used to test 
calprotectin function. None of the zebrafish proteins performed like human calprotectin.  
We cannot prove a negative: this work does not show that no zebrafish s100 protein does some 
subset of the functions of human calprotectin. Our work does, however, put the burden of proof 
on researchers who would claim such functions exist. If a zebrafish s100 has antimicrobial or 
pro-inflammatory activity, it must have evolved that activity convergently and independently of 
those activities from mammalian calprotectin. Further, and importantly, such a protein does not 
shed direct light on mammalian biology. A convergent zebrafish s100 calprotectin-like protein 
would help us understand zebrafish biology; it would also be intriguing from the perspective of 
protein evolution. It would almost certainly have a different regulatory scheme and would only 
have a subset of human calprotectin functions.  
We also did not test all possible zebrafish S100 proteins because we focused on those that 
seemed most promising to be expressed in immune cells during the immune response and shared 
 133 
genomic origin.  Zebrafish s100 proteins represent a largely uncharacterized set of new proteins, 
many of which are specific to the ray-finned fishes.230 We are excited to see how investigations 
of their functional roles continue. We remain intrigued by the idea that convergent evolution may 
exist between mammalian calprotectin and some other protein(s) in zebrafish. In addition to 
studying zebrafish s100 proteins individually, one important line of work will be to investigate 
various s100 heterocomplexes. Mammalian calprotectin is a heterodimer and heterotetramer, but 
we only performed experiments with homodimeric zebrafish s100s. Studies of other 
heterodimeric human S100 complexes have been done and prove to have altered functions.240 In 
the future, this type of analysis could be done with zebrafish proteins to explore whether a 
heterodimer state confers nutritional immunity or proinflammatory activity. However, there is 
currently no evidence at this point suggesting this is likely.  
We conclude that it is crucial that we use an evolutionary lens and careful biochemical 
analyses to probe homology between zebrafish and human proteins so that we can make accurate 
extrapolations of findings from zebrafish models of human biology.  
 
MATERIALS AND METHODS 
Protein Purification 
We purchased all zebrafish s100 genes from GenScript in the pET-28a(+)-TEV vector 
with an N-terminal 6x-Histidine tag and TEV protease cleavage site. All genes were codon-
optimized for expression in E. coli. We expressed human calprotectin (with S100A9 containing 
the C3S mutation) and human S100A9/C3S in a pET-Duet vector without purification tags. We 
transformed Rosetta2(DE3)pLysS E. coli cells with plasmids. We used transformant glycerol 
stocks to inoculate cultures in 15 mL Luria broth (LB) with 50 μg/mL kanamycin and 34 μg/mL 
 134 
chloramphenicol. We incubated cultures overnight at 37 oC, shaking at 250 rpm. The following 
day, we diluted 15 mL saturated cultures into 1.5 L of LB with antibiotics. When the OD600 
reached 0.6-1.0, we induced recombinant protein expression with 1 mM IPTG and 0.2% glucose 
and then grew overnight at 16 oC, shaking at 250 rpm. We pelleted cells at 3,000 rpm for at least 
15 minutes in an F6B rotor in a Beckman Coulter preparative centrifuge. We stored pellets at -20 
oC for up to one month.  
We prepared protein lysates for purification with the following method: We vortexed 
pellets (6-9 g) in 45 mL buffer from the first chromatography step (see below) until cells were 
resuspended, added 15 µL each of DNase I and Lysozyme (ThermoFisher Scientific), and 
incubated at room temperature with gentle shaking for at least 10 minutes. We sonicated the 
resuspended cells at 55% amplitude with 0.3 second pulse on, 0.7 second pulse off, for 3-5 
minutes. We pelleted cell debris by centrifugation at 15,000 rpm at 4 oC for at least 20 minutes in 
a JA-20 rotor in a Beckman Coulter preparative centrifuge and collected the supernatant. To 
remove remaining large debris, we filtered lysate supernatant through a 0.2 µm pore syringe 
filter immediately prior to purification chromatography. 
We purified all proteins using an Äkta PrimePlus Fast Protein Liquid Chromatography 
system using two stacked 5 mL HiTrap columns at each step. We used HisTrap FF columns for 
Ni-affinity and Q HP columns for anion exchange (GE Health Science). All chromatography was 
performed at 4 oC. At the end of purification, we confirmed protein purity was >95% by SDS-
PAGE. Then, we dialyzed each protein overnight into 4 L of 25 mM Tris, 100 mM NaCl, pH 7.4 
at 4 oC. We placed 2 g/L Chelex 100 resin (Bio-Rad) in the dialysis buffer to remove divalent 
metal ions. We concentrated each protein to roughly 2 mg/mL and syringe-filtered through a 0.22 
µm pore filter directly into liquid nitrogen to sterilize and flash freeze before storing at -80 oC.  
 135 
We purified TEV-cleavable 6xHis-tagged zebrafish S100 proteins with the following 
scheme. We used 25 mM Tris, 100 mM NaCl, pH 7.4 buffer as the base for all chromatography 
buffers. We ran our protein lysate over a Ni-affinity column with a 50 mL wash and eluted over a 
75 mL gradient from 25-1000 mM imidazole to collect proteins with strong Ni binding capacity. 
We determined which fractions contained our desired protein by SDS-PAGE and pooled these 
fractions. To separate our recombinant proteins from their Ni-binding His-tag, we added 5 mM 
DTT and 6xHis-tagged TEV protease to the pooled fractions and incubated the reaction at room 
temperature with gentle shaking for at least 5 hours. We then dialyzed the protein solution 
overnight into 4 L buffer with 25 mM imidazole and 5 mM DTT to allow cleavage to come to 
completion and to remove excess imidazole from the sample. We performed a second round of 
Ni-affinity chromatography. Without the His-tag, the zebrafish S100s have low affinity for Ni. 
Therefore, we isolated pure, non-tagged zebrafish S100 proteins at this step during a 50 mL wash 
in 25 mM imidazole and then used a step gradient to 1 M imidazole to elute His-tagged and other 
contaminant proteins that had higher affinity for the Ni column. Purified zebrafish S100s were 
prepared for storage as described above. 
We purified human calprotectin using Ni-affinity chromatography at pH 7.4 and anion 
exchange at pH 8. When expressing calprotectin, S100A8 and S100A9 homodimers are also 
expressed and must be removed during chromatography. In the presence of calcium, S100A9 and 
calprotectin bind divalent metal ions like Ni, but S100A8 and most other lysate proteins do not. 
We loaded our Ni-affinity column with calprotectin lysate, washed away most A8 and 
contaminants in a 50 mL wash, and then eluted calprotectin and S100A9 over a 75 mL gradient 
from 0-1000 mM imidazole and 1-0 mM CaCl2 in 25 mM Tris, 100 mM NaCl, pH 7.4 buffer. We 
pooled elution peak fractions containing calprotectin and contaminant S100A9 homodimers, as 
 136 
determined by SDS-PAGE, and dialyzed overnight in 4 L of 25 mM Tris, 100 mM NaCl at pH 8. 
We loaded our sample onto an anion exchange chromatography column in 25 mM Tris, 100 mM 
NaCl at pH 8 with 100 mM NaCl. Because A9 has a lower pI than calprotectin, it binds the anion 
column more strongly at pH 8. We used a 50 mL wash in 100 mM NaCl to isolate calprotectin 
and then used a step gradient increasing the salt to 1 M NaCl to remove S100A9 and other 
contaminants from the column. At this point, calprotectin was pure and prepared for storage as 
described, and fractions with S100A9 and other contaminants were discarded.   
We used a similar protocol to purify human S100A9. We performed Ni-affinity 
chromatography as described for calprotectin. We then performed anion exchange 
chromatography using a 50 mL wash and collected fractions over a 70 mL gradient elution from 
100-1000 mM NaCl in 25 mM Tris, pH 8 buffer to isolate S100A9 from contaminant proteins 
that also bind the anion column. We used SDS-PAGE to confirm fractions with S100A9, pooled 
and dialyzed these fractions overnight into 4 L of 25 mM Tris, 100 mM NaCl at pH 6. As a final 
step, we loaded the A9 sample onto an anion exchange column in 25 mM Tris, 100 mM NaCl 
buffer at pH 6. S100A9 binds weakly to the anion column at pH 6. Therefore, we collected 
S100A9 in a 50 mL wash in 100 mM NaCl, and then removed contaminants from the column 
with a step elution at 1 M NaCl. Pure S100A9 was prepared for storage as described. 
 
Far-UV Circular Dichroism and Fluorescence Spectroscopy 
 Prior to biophysical measurements, we thawed and exchanged all proteins into 25 mM 
Tris, 100 mM NaCl, pH 7.4 via overnight dialysis in 4 L buffer at 4 oC. We determined protein 
concentrations by Bradford Assay using bovine serum albumin (BSA) standards and the 
molecular weight of each dimeric structure, then diluted to ~10 µM in dialysis buffer. 
 137 
For all spectroscopic measurements, we assessed metal-induced changes to the spectra by 
measuring the spectrum in the presence of 2 mM CaCl2 and then adding excess EDTA at 5 mM 
and re-measuring the spectrum. We collected far-UV circular dichroism data between 200–250 
nm using a J-815 CD spectrometer (Jasco) with a 1 mm quartz spectrophotometer cell (Starna 
Cells, Inc. Catalog No. 1-Q-1). We collected 3 scans for each condition, and then averaged the 
spectra and subtracted a blank buffer spectrum using the Jasco spectra analysis software suite. 
We converted raw ellipticity into mean molar ellipticity using the concentration and number of 
residues in each protein. We collected intrinsic tyrosine and/or tryptophan fluorescence using a J-
815 CD spectrometer (Jasco) with an attached model FDT-455 fluorescence detector (Jasco) 
using a 1 cm quartz cuvette (Starna Cells, Inc.). We collected a single excitation and emission 
scan at 10 nm/min with a 10 nm bandwidth, 1 nm data pitch, and 1 sec D.I.T. for each condition 
and then subtracted a blank buffer spectrum using the Jasco spectra analysis software suite. 
Depending on the sample signal, we set the detector sensitivity to either 630 or 800 Volts. We 
conducted excitation scans by measuring 305 nm light emitted at for all zebrafish proteins and 
345 nm emitted light for human S100A9 for each excitation wavelength from 200-295 nm. For 
emission scans we used 280 nm light to excite zebrafish proteins and 288 nm for human 
S100A9, and measured light emitted at all wavelengths from 285-425 nm. 
 
Nutritional Immunity Assay 
We measured the antimicrobial activity of zebrafish S100s and human calprotectin 
against human- and zebrafish-derived bacterial strains using a modified version of a well-
established assay86,216,219,238 that will be described here. Bacterial strains used in this assay 
include 1) Staphylococcus epidermidis, a human commensal strain previously shown to respond 
 138 
to calprotectin219,238; 2) Aeromonas ZOR001, isolated from zebrafish and not previously 
characterized for response to calprotectin; and 3) Vibrio ZWU0020, isolated from zebrafish and 
not previously characterized for response to calprotectin but related to human-derived Vibrio 
cholerae shown to respond to calprotectin.107 We obtained both zebrafish-derived strains from 
the Guillemin lab at the University of Oregon.  
Each week, we plated bacterial strains from glycerol stocks onto antibiotic-free LB agar 
and grew at 30 oC overnight before storing plates at 4 oC. The day before an experiment, we 
inoculated a 5 mL culture in liquid LB media with a single colony from each strain and grew 
overnight at 30 oC with shaking. The following day, we diluted cultures 1:100 in 5 mL LB and 
grew to an OD600 around 0.8 by the time of the experiment. Aeromonas ZOR001 and Vibrio 
ZWU0020 were diluted 2 hours before the experiment. S. epidermidis grew more slowly so 
required dilution 4 hours prior to the experiment.  
The day before each experiment, we thawed a single S100 protein from -80 oC, 
concentrated to at least 200 μM using a Nanosep 3K Omega spin concentrator (Pall 
Corporation), and dialyzed overnight at 4 oC into 4 L of Experimental Buffer (25 mM Tris, 100 
mM NaCl, pH 7.4) with 2 g/L Chelex 100 resin (Bio-Rad) to chelate residual transition metal 
ions. After dialysis, we filter-sterilized the protein through a Ultrafree-MC-VV centrifugal filter 
with Durapore PVDF 0.1 µm and kept at 4 oC until time of experiment.  
To start the experiment, we made a protein dilution series by mixing a desired amount of 
protein in sterile Experimental Buffer with the appropriate amount of LB to achieve a ratio of 
62:38. We then brought the volume of these protein solutions up to 1.7 mL in Experimental 
Media (EM). We made EM by diluting 62:38 Experimental Buffer:LB, and filter-sterilized. We 
distributed each sample in aliquots of 160 µL across ten wells of a clear Falcon 96-Well, Cell 
 139 
Culture-Treated, Flat-Bottom Microplate. At this time, we diluted each bacterial strain to an 
estimated OD600 of .008 in 5 mL Experimental Media with calcium (EMC). We made EMC by 
adding 10.2 μM CaCl2 to EM, and sterile-filtered. Then, we added 40 µL of dilute bacteria or 
EMC without bacteria (contamination control) to each well, making technical triplicate 
conditions for bacterial strains. To counteract sample evaporation, the outermost wells of the 
plate contained 160 μL EM and 40 μL EMC only and we wrapped the plate in a single layer of 
parafilm.  
We measured bacterial growth by OD600 every 15 minutes over 13 hours in a Molecular 
Devices SpectraMax i3. The plate was shaken for 5 seconds before the first read, then for 10 
minutes between each subsequent read. We set the plate reader temperature to 25 oC, however, 
over the course of the overnight growth, the actual temperature reached 37 oC. The final 
concentration of metals in the media without bacteria was measured using ICP-MS. The 
measured concentrations were Ni: 45.4 μM, Ca: 107.3 μM, Cu: 157.4 μM, Mg: 160.5 μM, Mn: 
216.6 μM, Fe: 1.1 mM, and Zn: 5.9 mM. 
For the analysis, we background subtracted each experimental condition using OD600 
values for the matching concentration of S100 protein concentration in buffer without bacteria 
added. We used Prism to average the replicates by condition, determine the standard error, and 
graph the results. 
 
Proinflammatory Activity Assay 
 We tested the S100A9-like proinflammatory activity of zebrafish S100s using a well-
established assay.101,102,238 This assay measures relative activation of the TLR4-mediated immune 
response through NF-κB. For each experiment, we thawed all zebrafish S100 proteins and 
 140 
human S100A9 from -80 oC, buffer exchanged into endotoxin-free PBS, then treated with 
endotoxin removal spin columns (ThermoFisher Scientific) to remove LPS residual from the 
purification process.  
 We performed each experiment in technical triplicate and followed the Dual-Glo 
Luciferase Assay System protocol (Promega). We transiently transfected adherent HEK293T 
cells in a Falcon 96-Well, Cell Culture-Treated, Flat-Bottom Microplate with pcDNA vector 
plasmids using PLUS and Lipofectamine Reagents (ThermoFisher Scientific). Plasmids 
contained genes for human or zebrafish TLR4 complex components and Renilla luciferase 
enzyme under constitutively active promoters, and the firefly luciferase gene controlled by an 
NF-κB promoter. For human TLR4 complex transfections, we transfected 10 ng human TLR4, 
0.5 ng human MD-2, and 1 ng human CD14 plasmids per well. For zebrafish TLR4 complex 
transfections, we used 10 ng zebrafish TLR4, 20 ng zebrafish MD-2, and 1 ng mouse CD14 
plasmids per well, as this ratio gives us the best signal to noise ratio. Zebrafish do not have an 
annotated CD14, but previous studies have shown zebrafish TLR4ba can be activated in the 
presence of mouse and human CD14, but more strongly with mouse. We also transfected all 
wells with 1 ng Renilla plasmid, 20 ng elam-Luc (firefly), and brought the total DNA mass per 
well to 100 ng with empty pcDNA vector in a total media volume of 200 μL per well. 
After 20-24 hours incubation at 37 oC in 5% CO2, we removed all 200 µL of transfection 
mix from each well. We then treated transfected HEK293T cells with 100 µL of one of the 
following treatment mixes: 1) 2 µM S100 protein and 200 ng/μL Polymyxin B to bind up LPS in 
media, 2) 0.2 ng/µL LPS-R (tlrl-eklps; Invivogen) as a positive control for human TLR4, or 3) 2 
ng/µL lipid IVa as a positive control for zebrafish TLR4ba activation. Because there is no known 
activator of zebrafish TLR4bb and TLR4al complexes (Loes et al., 2021), we treated these 
 141 
transfections with 2 ng/µL lipid IVa for consistency. After incubating again at 37 oC in 5% CO2 
for 3-4 hours, we removed and discarded 60 µL of treatment mix from each well. We chemically 
lysed the cells by adding 30 µL Dual Glo lysis reagent containing firefly luciferin and incubated 
in the dark for 7 minutes. We then mechanically lysed the cells by scraping the bottom of each 
well with a pipet tip and transferring 60 µL of cell solution to an opaque 96-well plate. After a 7-
minute incubation in the dark at room temperature, we measured luminescence per well 
produced by firefly luciferase activity using a Molecular Devices SpectraMax i3. Then we added 
30 μL of Dual-Glo Stop & Glo buffer containing firefly luciferase quencher and Renilla 
luciferase reagent, incubated for 7 more minutes, and measured luminescence.  
 For the analysis, we took the firefly signal for each experimental condition and 
background subtracted the averaged firefly signal of wells transfected with the corresponding 
complex but treated with buffer without agonist. We did the same for the Renilla signal, with 
background signal considered as the averaged signal from wells with same treatment condition 
but transfected only with vector. We divided the background-subtracted firefly signal for each 
well by the background-subtracted Renilla signal for that same well. To simplify comparisons 
across experiments, we normalized the firefly/Renilla value for each well to the triplicate average 
of the firefly/Renilla values for human TLR4 complex treated with 0.2 ng/μL LPS-R. 
 
BRIDGE TO CHAPTER V 
 We conclude from this chapter that zebrafish do not have an ortholog of human 
calprotectin and likely do not possess an s100 protein that convergently evolved similar 
antibacterial and proinflammatory activities. Because S100 proteins generally have the capacity 
for metal ion binding, further investigation into zebrafish s100 complexes might yield 
 142 
discoveries of species-specific host-microbe interactions in the fight for metal sequestration. 
Adding to our conclusions of zebrafish TLR4/MD-2, it is possible that the functions of these 
important human immune system proteins are paralleled in some way in the zebrafish. But our 
~430 million years of unique selective pressures seem to have changed the players and strategies 
at the host-microbe interface. 
 
 
  
 143 
CHAPTER V 
SUMMARY AND CLOSING REMARKS 
 
Our functional characterizations of homologous human and zebrafish immune proteins 
have shown us remarkable differences in the way these two species respond to microbes. We find 
that the zebrafish TLR4 has high specificity for tetra-acylated LPS molecules which inhibit 
human TLR4 signaling. But even though the zebrafish complex can activate an inflammatory 
response to tetra-acyl LPS in vitro, injecting live fish with this potent TLR4 activator does not 
stimulate the immune response. This completely contrasts the human and mouse systems, where 
a stronger TLR4 agonist induces a stronger immune response, which can lead to death caused by 
an overactive immune system. Resurrected ancestral proteins from early vertebrates suggest that 
this hyperactive TLR4 response to LPS evolved at some point between the ancestor of bony 
vertebrates and tetrapods but is not maintained by all tetrapod species. The reconstructed bony 
vertebrate and teleost ancestor proteins show low-level stimulation by LPS and suggest that 
zebrafish have evolved a unique sensitivity to tetra-acyl LPS. What could explain this evolution 
of ligand specificity in the absence of functional consequence? Perhaps we have yet to reveal the 
true role of zebrafish TLR4. 
Similar to our story of TLR4, zebrafish do not appear to have a functional ortholog of 
human calprotectin. Calprotectin plays several roles in the human defense against pathogens: at 
sites of inflammation, calprotectin chelates transition metal ions that are essential for microbial 
growth and so inhibits bacterial growth at wound sites. Calprotectin can also amplify the immune 
response by activating TLR4 and other damage-sensing immune receptors. But zebrafish do not 
have an ortholog of calprotectin. Our studies suggest that none of the zebrafish proteins that 
 144 
share homology with human calprotectin can serve either this antibacterial or proinflammatory 
role. Have zebrafish evolved alternative compensatory mechanisms to deal with infections? 
What new immune strategies can we learn from studying the zebrafish immune response in the 
absence of these proteins that humans rely so heavily on?    
Our work demonstrates it is necessary to consider the long evolutionary divergence 
between human and zebrafish when extrapolating findings from model systems. 
  
 145 
REFERENCES CITED 
(1) Bäckhed, F.; Roswall, J.; Peng, Y.; Feng, Q.; Jia, H.; Kovatcheva-Datchary, P.; Li, Y.; 
Xia, Y.; Xie, H.; Zhong, H.; Khan, M. T.; Zhang, J.; Li, J.; Xiao, L.; Al-Aama, J.; Zhang, 
D.; Lee, Y. S.; Kotowska, D.; Colding, C.; Tremaroli, V.; Yin, Y.; Bergman, S.; Xu, X.; 
Madsen, L.; Kristiansen, K.; Dahlgren, J.; Wang, J. Dynamics and Stabilization of the 
Human Gut Microbiome during the First Year of Life. Cell Host & Microbe 2015, 17 (5), 
690–703. https://doi.org/10.1016/j.chom.2015.04.004. 
(2) Houghteling, P. D.; Walker, W. A. Why Is Initial Bacterial Colonization of the Intestine 
Important to Infants’ and Children’s Health? J. pediatr. gastroenterol. nutr. 2015, 60 (3), 
294–307. https://doi.org/10.1097/MPG.0000000000000597. 
(3) Lopez, L. R.; Bleich, R. M.; Arthur, J. C. Microbiota Effects on Carcinogenesis: Initiation, 
Promotion, and Progression. Annu. Rev. Med. 2021, 72 (1), 243–261. 
https://doi.org/10.1146/annurev-med-080719-091604. 
(4) Michán‐Doña, A.; Vázquez‐Borrego, M. C.; Michán, C. Are There Any Completely 
Sterile Organs or Tissues in the Human Body? Is There Any Sacred Place? Microbial 
Biotechnology 2024, 17 (3), e14442. https://doi.org/10.1111/1751-7915.14442. 
(5) Pawelek, J. M.; Low, K. B.; Bermudes, D. Bacteria as Tumour-Targeting Vectors. The 
Lancet Oncology 2003, 4 (9), 548–556. https://doi.org/10.1016/S1470-2045(03)01194-X. 
(6) Limon, J. J.; Skalski, J. H.; Underhill, D. M. Commensal Fungi in Health and Disease. 
Cell Host & Microbe 2017, 22 (2), 156–165. https://doi.org/10.1016/j.chom.2017.07.002. 
(7) Sender, R.; Fuchs, S.; Milo, R. Are We Really Vastly Outnumbered? Revisiting the Ratio 
of Bacterial to Host Cells in Humans. Cell 2016, 164 (3), 337–340. 
https://doi.org/10.1016/j.cell.2016.01.013. 
(8) Luckey, T. D. Introduction to Intestinal Microecology. The American Journal of Clinical 
Nutrition 1972, 25 (12), 1292–1294. https://doi.org/10.1093/ajcn/25.12.1292. 
(9) Ogunrinola, G. A.; Oyewale, J. O.; Oshamika, O. O.; Olasehinde, G. I. The Human 
Microbiome and Its Impacts on Health. International Journal of Microbiology 2020, 2020, 
1–7. https://doi.org/10.1155/2020/8045646. 
(10) McFall-Ngai, M. J. Unseen Forces: The Influence of Bacteria on Animal Development. 
Developmental Biology 2002, 242 (1), 1–14. https://doi.org/10.1006/dbio.2001.0522. 
(11) Bry, L.; Falk, P. G.; Midtvedt, T.; Gordon, J. I. A Model of Host-Microbial Interactions in 
an Open Mammalian Ecosystem. Science 1996, 273 (5280), 1380–1383. 
https://doi.org/10.1126/science.273.5280.1380. 
(12) Umesaki, Y.; Setoyama, H.; Matsumoto, S.; Imaoka, A.; Itoh, K. Differential Roles of 
Segmented Filamentous Bacteria and Clostridia in Development of the Intestinal Immune 
 146 
System. Infect Immun 1999, 67 (7), 3504–3511. https://doi.org/10.1128/IAI.67.7.3504-
3511.1999. 
(13) Stappenbeck, T. S.; Hooper, L. V.; Gordon, J. I. Developmental Regulation of Intestinal 
Angiogenesis by Indigenous Microbes via Paneth Cells. Proc. Natl. Acad. Sci. U.S.A. 
2002, 99 (24), 15451–15455. https://doi.org/10.1073/pnas.202604299. 
(14) Bates, J. M.; Mittge, E.; Kuhlman, J.; Baden, K. N.; Cheesman, S. E.; Guillemin, K. 
Distinct Signals from the Microbiota Promote Different Aspects of Zebrafish Gut 
Differentiation. Developmental Biology 2006, 297 (2), 374–386. 
https://doi.org/10.1016/j.ydbio.2006.05.006. 
(15) Whiteside, S. A.; Razvi, H.; Dave, S.; Reid, G.; Burton, J. P. The Microbiome of the 
Urinary Tract—a Role beyond Infection. Nat Rev Urol 2015, 12 (2), 81–90. 
https://doi.org/10.1038/nrurol.2014.361. 
(16) Li, C.; Stražar, M.; Mohamed, A. M. T.; Pacheco, J. A.; Walker, R. L.; Lebar, T.; Zhao, 
S.; Lockart, J.; Dame, A.; Thurimella, K.; Jeanfavre, S.; Brown, E. M.; Ang, Q. Y.; Berdy, 
B.; Sergio, D.; Invernizzi, R.; Tinoco, A.; Pishchany, G.; Vasan, R. S.; Balskus, E.; 
Huttenhower, C.; Vlamakis, H.; Clish, C.; Shaw, S. Y.; Plichta, D. R.; Xavier, R. J. Gut 
Microbiome and Metabolome Profiling in Framingham Heart Study Reveals Cholesterol-
Metabolizing Bacteria. Cell 2024, 187 (8), 1834-1852.e19. 
https://doi.org/10.1016/j.cell.2024.03.014. 
(17) Bäckhed, F.; Ley, R. E.; Sonnenburg, J. L.; Peterson, D. A.; Gordon, J. I. Host-Bacterial 
Mutualism in the Human Intestine. Science 2005, 307 (5717), 1915–1920. 
https://doi.org/10.1126/science.1104816. 
(18) Perry, F.; Arsenault, R. J. The Study of Microbe–Host Two-Way Communication. 
Microorganisms 2022, 10 (2), 408. https://doi.org/10.3390/microorganisms10020408. 
(19) Chaplin, D. D. 1. Overview of the Immune Response. Journal of Allergy and Clinical 
Immunology 2003, 111 (2), S442–S459. https://doi.org/10.1067/mai.2003.125. 
(20) Delneste, Y.; Beauvillain, C.; Jeannin, P. Immunité Naturelle: Structure et Fonction Des 
Toll-like Receptors. Med Sci (Paris) 2007, 23 (1), 67–74. 
https://doi.org/10.1051/medsci/200723167. 
(21) Akira, S.; Uematsu, S.; Takeuchi, O. Pathogen Recognition and Innate Immunity. Cell 
2006, 124 (4), 783–801. https://doi.org/10.1016/j.cell.2006.02.015. 
(22) Bertani, B.; Ruiz, N. Function and Biogenesis of Lipopolysaccharides. EcoSal Plus 2018, 
8 (1), 10.1128/ecosalplus.ESP-0001–2018. https://doi.org/10.1128/ecosalplus.esp-0001-
2018. 
(23) Pfeiffer, R. Untersuchungen über das Choleragift. Zeitschr. f. Hygiene. 1892, 11 (1), 393–
412. https://doi.org/10.1007/BF02284303. 
 147 
(24) Singer, M.; Deutschman, C. S.; Seymour, C. W.; Shankar-Hari, M.; Annane, D.; Bauer, 
M.; Bellomo, R.; Bernard, G. R.; Chiche, J.-D.; Coopersmith, C. M.; Hotchkiss, R. S.; 
Levy, M. M.; Marshall, J. C.; Martin, G. S.; Opal, S. M.; Rubenfeld, G. D.; Van Der Poll, 
T.; Vincent, J.-L.; Angus, D. C. The Third International Consensus Definitions for Sepsis 
and Septic Shock (Sepsis-3). JAMA 2016, 315 (8), 801. 
https://doi.org/10.1001/jama.2016.0287. 
(25) Poltorak, A.; He, X.; Smirnova, I.; Liu, M.-Y.; Huffel, C. V.; Du, X.; Birdwell, D.; Alejos, 
E.; Silva, M.; Galanos, C.; Freudenberg, M.; Ricciardi-Castagnoli, P.; Layton, B.; Beutler, 
B. Defective LPS Signaling in C3H/HeJ and C57BL/10ScCr Mice: Mutations in Tlr4 
Gene. Science 1998, 282 (5396), 2085–2088. 
https://doi.org/10.1126/science.282.5396.2085. 
(26) Qureshi, S. T.; Larivière, L.; Leveque, G.; Clermont, S.; Moore, K. J.; Gros, P.; Malo, D. 
Endotoxin-Tolerant Mice Have Mutations in Toll-like Receptor 4 ( Tlr4 ). The Journal of 
Experimental Medicine 1999, 189 (4), 615–625. https://doi.org/10.1084/jem.189.4.615. 
(27) Wright, S. D. Toll, A New Piece in the Puzzle of Innate Immunity. The Journal of 
Experimental Medicine 1999, 189 (4), 605–609. https://doi.org/10.1084/jem.189.4.605. 
(28) Chow, J. C.; Young, D. W.; Golenbock, D. T.; Christ, W. J.; Gusovsky, F. Toll-like 
Receptor-4 Mediates Lipopolysaccharide-Induced Signal Transduction. Journal of 
Biological Chemistry 1999, 274 (16), 10689–10692. 
https://doi.org/10.1074/jbc.274.16.10689. 
(29) Raetz, C. R. H.; Whitfield, C. Lipopolysaccharide Endotoxins. Annu. Rev. Biochem. 2002, 
71 (1), 635–700. https://doi.org/10.1146/annurev.biochem.71.110601.135414. 
(30) Cohen, J. The Immunopathogenesis of Sepsis. Nature 2002, 420 (6917), 885–891. 
https://doi.org/10.1038/nature01326. 
(31) Park, B. S.; Lee, J.-O. Recognition of Lipopolysaccharide Pattern by TLR4 Complexes. 
Exp Mol Med 2013, 45 (12), e66–e66. https://doi.org/10.1038/emm.2013.97. 
(32) Luderitz, O.; Galanos, C.; Lehmann, V.; Nurminen, M.; Rietschel, E. T.; Rosenfelder, G.; 
Simon, M.; Westphal, O. Lipid A: Chemical Structure and Biological Activity. Journal of 
Infectious Diseases 1973, 128 (Supplement 1), S17–S29. 
https://doi.org/10.1093/infdis/128.Supplement_1.S17. 
(33) Erridge, C.; Bennett-Guerrero, E.; Poxton, I. R. Structure and Function of 
Lipopolysaccharides. Microbes and Infection 2002, 4 (8), 837–851. 
https://doi.org/10.1016/S1286-4579(02)01604-0. 
(34) Takayama, K.; Qureshi, N.; Ribi, E.; Cantrell, J. L. Separation and Characterization of 
Toxic and Nontoxic Forms of Lipid A. Clinical Infectious Diseases 1984, 6 (4), 439–443. 
https://doi.org/10.1093/clinids/6.4.439. 
 148 
(35) Qureshi, N.; Takayama, K.; Kurtz, R. Diphosphoryl Lipid A Obtained from the Nontoxic 
Lipopolysaccharide of Rhodopseudomonas Sphaeroides Is an Endotoxin Antagonist in 
Mice. Infect Immun 1991, 59 (1), 441–444. https://doi.org/10.1128/iai.59.1.441-444.1991. 
(36) Shimazu, R.; Akashi, S.; Ogata, H.; Nagai, Y.; Fukudome, K.; Miyake, K.; Kimoto, M. 
MD-2, a Molecule That Confers Lipopolysaccharide Responsiveness on Toll-like 
Receptor 4. The Journal of Experimental Medicine 1999, 189 (11), 1777–1782. 
https://doi.org/10.1084/jem.189.11.1777. 
(37) Visintin, A.; Halmen, K. A.; Latz, E.; Monks, B. G.; Golenbock, D. T. Pharmacological 
Inhibition of Endotoxin Responses Is Achieved by Targeting the TLR4 Coreceptor, MD-2. 
The Journal of Immunology 2005, 175 (10), 6465–6472. 
https://doi.org/10.4049/jimmunol.175.10.6465. 
(38) Coats, S. R.; Pham, T.-T. T.; Bainbridge, B. W.; Reife, R. A.; Darveau, R. P. MD-2 
Mediates the Ability of Tetra-Acylated and Penta-Acylated Lipopolysaccharides to 
Antagonize Escherichia Coli Lipopolysaccharide at the TLR4 Signaling Complex. The 
Journal of Immunology 2005, 175 (7), 4490–4498. 
https://doi.org/10.4049/jimmunol.175.7.4490. 
(39) Teghanemt, A.; Zhang, D.; Levis, E. N.; Weiss, J. P.; Gioannini, T. L. Molecular Basis of 
Reduced Potency of Underacylated Endotoxins. The Journal of Immunology 2005, 175 
(7), 4669–4676. https://doi.org/10.4049/jimmunol.175.7.4669. 
(40) Saitoh, S. -i. Lipid A Antagonist, Lipid IVa, Is Distinct from Lipid A in Interaction with 
Toll-like Receptor 4 (TLR4)-MD-2 and Ligand-Induced TLR4 Oligomerization. 
International Immunology 2004, 16 (7), 961–969. https://doi.org/10.1093/intimm/dxh097. 
(41) Ohto, U.; Fukase, K.; Miyake, K.; Satow, Y. Crystal Structures of Human MD-2 and Its 
Complex with Antiendotoxic Lipid IVa. Science 2007, 316 (5831), 1632–1634. 
https://doi.org/10.1126/science.1139111. 
(42) Park, B. S.; Song, D. H.; Kim, H. M.; Choi, B.-S.; Lee, H.; Lee, J.-O. The Structural Basis 
of Lipopolysaccharide Recognition by the TLR4–MD-2 Complex. Nature 2009, 458 
(7242), 1191–1195. https://doi.org/10.1038/nature07830. 
(43) Anderson, J. A.; Loes, A. N.; Waddell, G. L.; Harms, M. J. Tracing the Evolution of 
Novel Features of Human Toll‐like Receptor 4. Protein Science 2019, pro.3644. 
https://doi.org/10.1002/pro.3644. 
(44) Oblak, A.; Jerala, R. The Molecular Mechanism of Species-Specific Recognition of 
Lipopolysaccharides by the MD-2/TLR4 Receptor Complex. Molecular Immunology 
2015, 63 (2), 134–142. https://doi.org/10.1016/j.molimm.2014.06.034. 
(45) Ohto, U.; Fukase, K.; Miyake, K.; Shimizu, T. Structural Basis of Species-Specific 
Endotoxin Sensing by Innate Immune Receptor TLR4/MD-2. Proc. Natl. Acad. Sci. 
U.S.A. 2012, 109 (19), 7421–7426. https://doi.org/10.1073/pnas.1201193109. 
 149 
(46) Kim, H. M.; Park, B. S.; Kim, J.-I.; Kim, S. E.; Lee, J.; Oh, S. C.; Enkhbayar, P.; 
Matsushima, N.; Lee, H.; Yoo, O. J.; Lee, J.-O. Crystal Structure of the TLR4-MD-2 
Complex with Bound Endotoxin Antagonist Eritoran. Cell 2007, 130 (5), 906–917. 
https://doi.org/10.1016/j.cell.2007.08.002. 
(47) d’Hennezel, E.; Abubucker, S.; Murphy, L. O.; Cullen, T. W. Total Lipopolysaccharide 
from the Human Gut Microbiome Silences Toll-Like Receptor Signaling. mSystems 2017, 
2 (6). https://doi.org/10.1128/mSystems.00046-17. 
(48) Vatanen, T.; Kostic, A. D.; d’Hennezel, E.; Siljander, H.; Franzosa, E. A.; Yassour, M.; 
Kolde, R.; Vlamakis, H.; Arthur, T. D.; Hämäläinen, A.-M.; Peet, A.; Tillmann, V.; Uibo, 
R.; Mokurov, S.; Dorshakova, N.; Ilonen, J.; Virtanen, S. M.; Szabo, S. J.; Porter, J. A.; 
Lähdesmäki, H.; Huttenhower, C.; Gevers, D.; Cullen, T. W.; Knip, M.; Xavier, R. J. 
Variation in Microbiome LPS Immunogenicity Contributes to Autoimmunity in Humans. 
Cell 2016, 165 (4), 842–853. https://doi.org/10.1016/j.cell.2016.04.007. 
(49) Curtis, M. A.; Percival, R. S.; Devine, D.; Darveau, R. P.; Coats, S. R.; Rangarajan, M.; 
Tarelli, E.; Marsh, P. D. Temperature-Dependent Modulation of Porphyromonas 
Gingivalis Lipid A Structure and Interaction with the Innate Host Defenses. Infect Immun 
2011, 79 (3), 1187–1193. https://doi.org/10.1128/IAI.00900-10. 
(50) Tan, Y.; Zanoni, I.; Cullen, T. W.; Goodman, A. L.; Kagan, J. C. Mechanisms of Toll-like 
Receptor 4 Endocytosis Reveal a Common Immune-Evasion Strategy Used by Pathogenic 
and Commensal Bacteria. Immunity 2015, 43 (5), 909–922. 
https://doi.org/10.1016/j.immuni.2015.10.008. 
(51) Montminy, S. W.; Khan, N.; McGrath, S.; Walkowicz, M. J.; Sharp, F.; Conlon, J. E.; 
Fukase, K.; Kusumoto, S.; Sweet, C.; Miyake, K.; Akira, S.; Cotter, R. J.; Goguen, J. D.; 
Lien, E. Virulence Factors of Yersinia Pestis Are Overcome by a Strong 
Lipopolysaccharide Response. Nat Immunol 2006, 7 (10), 1066–1073. 
https://doi.org/10.1038/ni1386. 
(52) Coats, S. R.; Jones, J. W.; Do, C. T.; Braham, P. H.; Bainbridge, B. W.; To, T. T.; 
Goodlett, D. R.; Ernst, R. K.; Darveau, R. P. Human Toll-like Receptor 4 Responses to P. 
Gingivalis Are Regulated by Lipid A 1- and 4′-Phosphatase Activities. Cellular 
Microbiology 2009, 11 (11), 1587–1599. https://doi.org/10.1111/j.1462-
5822.2009.01349.x. 
(53) Rangarajan, M.; Aduse-Opoku, J.; Paramonov, N.; Hashim, A.; Bostanci, N.; Fraser, O. 
P.; Tarelli, E.; Curtis, M. A. Identification of a Second Lipopolysaccharide in 
Porphyromonas Gingivalis W50. J Bacteriol 2008, 190 (8), 2920–2932. 
https://doi.org/10.1128/JB.01868-07. 
(54) Moran, A. P.; Lindner, B.; Walsh, E. J. Structural Characterization of the Lipid A 
Component of Helicobacter Pylori Rough- and Smooth-Form Lipopolysaccharides. J 
Bacteriol 1997, 179 (20), 6453–6463. https://doi.org/10.1128/jb.179.20.6453-6463.1997. 
 150 
(55) Guo, L.; Lim, K. B.; Gunn, J. S.; Bainbridge, B.; Darveau, R. P.; Hackett, M.; Miller, S. I. 
Regulation of Lipid A Modifications by Salmonella Typhimurium Virulence Genes phoP-
phoQ. Science 1997, 276 (5310), 250–253. https://doi.org/10.1126/science.276.5310.250. 
(56) Paciello, I.; Silipo, A.; Lembo-Fazio, L.; Curcurù, L.; Zumsteg, A.; Noël, G.; Ciancarella, 
V.; Sturiale, L.; Molinaro, A.; Bernardini, M. L. Intracellular Shigella Remodels Its LPS 
to Dampen the Innate Immune Recognition and Evade Inflammasome Activation. Proc. 
Natl. Acad. Sci. U.S.A. 2013, 110 (46). https://doi.org/10.1073/pnas.1303641110. 
(57) Chen, F.; Zou, L.; Williams, B.; Chao, W. Targeting Toll-Like Receptors in Sepsis: From 
Bench to Clinical Trials. Antioxidants & Redox Signaling 2021, 35 (15), 1324–1339. 
https://doi.org/10.1089/ars.2021.0005. 
(58) Opal, S. M.; Laterre, P.-F.; Francois, B.; LaRosa, S. P.; Angus, D. C.; Mira, J.-P.; 
Wittebole, X.; Dugernier, T.; Perrotin, D.; Tidswell, M.; Jauregui, L.; Krell, K.; Pachl, J.; 
Takahashi, T.; Peckelsen, C.; Cordasco, E.; Chang, C.-S.; Oeyen, S.; Aikawa, N.; 
Maruyama, T.; Schein, R.; Kalil, A. C.; Van Nuffelen, M.; Lynn, M.; Rossignol, D. P.; 
Gogate, J.; Roberts, M. B.; Wheeler, J. L.; Vincent, J.-L.; Access Study Group, F. T. 
Effect of Eritoran, an Antagonist of MD2-TLR4, on Mortality in Patients With Severe 
Sepsis: The ACCESS Randomized Trial. JAMA 2013, 309 (11), 1154. 
https://doi.org/10.1001/jama.2013.2194. 
(59) Amarante-Mendes, G. P.; Adjemian, S.; Branco, L. M.; Zanetti, L. C.; Weinlich, R.; 
Bortoluci, K. R. Pattern Recognition Receptors and the Host Cell Death Molecular 
Machinery. Front. Immunol. 2018, 9, 2379. https://doi.org/10.3389/fimmu.2018.02379. 
(60) Foell, D.; Wittkowski, H.; Vogl, T.; Roth, J. S100 Proteins Expressed in Phagocytes: A 
Novel Group of Damage-Associated Molecular Pattern Molecules. Journal of Leukocyte 
Biology 2007, 81 (1), 28–37. https://doi.org/10.1189/jlb.0306170. 
(61) Björk, P.; Björk, A.; Vogl, T.; Stenström, M.; Liberg, D.; Olsson, A.; Roth, J.; Ivars, F.; 
Leanderson, T. Identification of Human S100A9 as a Novel Target for Treatment of 
Autoimmune Disease via Binding to Quinoline-3-Carboxamides. PLoS Biol 2009, 7 (4), 
e1000097. https://doi.org/10.1371/journal.pbio.1000097. 
(62) Edgeworth, J.; Gorman, M.; Bennett, R.; Freemont, P.; Hogg, N. Identification of P8,14 as 
a Highly Abundant Heterodimeric Calcium Binding Protein Complex of Myeloid Cells. 
Journal of Biological Chemistry 1991, 266 (12), 7706–7713. 
https://doi.org/10.1016/S0021-9258(20)89506-4. 
(63) Vog, T.; Roth, J.; Sorg, C.; Hillenkamp, F.; Strupat, K. Calcium-Induced Noncovalently 
Linked Tetramers of MRP8 and MRP14 Detected by Ultraviolet Matrix-Assisted Laser 
Desorption/Ionization Mass Spectrometry. J. Am. Soc. Mass Spectrom. 1999, 10 (11), 
1124–1130. https://doi.org/10.1016/S1044-0305(99)00085-9. 
(64) Strupat, K.; Rogniaux, H.; Van Dorsselaer, A.; Roth, J.; Vogl, T. Calcium-Induced 
Noncovalently Linked Tetramers of MRP8 and MRP14 Are Confirmed by Electrospray 
 151 
Ionization-Mass Analysis. J. Am. Soc. Mass Spectrom. 2000, 11 (9), 780–788. 
https://doi.org/10.1016/S1044-0305(00)00150-1. 
(65) Roth, J.; Burwinkel, F.; Van Den Bos, C.; Goebeler, M.; Vollmer, E.; Sorg, C. MRP8 and 
MRP14, S-100-like Proteins Associated with Myeloid Differentiation, Are Translocated to 
Plasma Membrane and Intermediate Filaments in a Calcium-Dependent Manner. Blood 
1993, 82 (6), 1875–1883. https://doi.org/10.1182/blood.V82.6.1875.1875. 
(66) Goebeler, M.; Roth, J.; Van Den Bos, C.; Ader, G.; Sorg, C. Increase of Calcium Levels in 
Epithelial Cells Induces Translocation of Calcium-Binding Proteins Migration Inhibitory 
Factor-Related Protein 8 (MRP8) and MRP14 to Keratin Intermediate Filaments. 
Biochemical Journal 1995, 309 (2), 419–424. https://doi.org/10.1042/bj3090419. 
(67) Kerkhoff, C.; Klempt, M.; Kaever, V.; Sorg, C. The Two Calcium-Binding Proteins, 
S100A8 and S100A9, Are Involved in the Metabolism of Arachidonic Acid in Human 
Neutrophils. Journal of Biological Chemistry 1999, 274 (46), 32672–32679. 
https://doi.org/10.1074/jbc.274.46.32672. 
(68) Vogl, T.; Ludwig, S.; Goebeler, M.; Strey, A.; Thorey, I. S.; Reichelt, R.; Foell, D.; Gerke, 
V.; Manitz, M. P.; Nacken, W.; Werner, S.; Sorg, C.; Roth, J. MRP8 and MRP14 Control 
Microtubule Reorganization during Transendothelial Migration of Phagocytes. Blood 
2004, 104 (13), 4260–4268. https://doi.org/10.1182/blood-2004-02-0446. 
(69) Lackmann, M.; Cornish, C. J.; Simpson, R. J.; Moritz, R. L.; Geczy, C. L. Purification and 
Structural Analysis of a Murine Chemotactic Cytokine (CP-10) with Sequence Homology 
to S100 Proteins. Journal of Biological Chemistry 1992, 267 (11), 7499–7504. 
https://doi.org/10.1016/S0021-9258(18)42545-8. 
(70) Passey, R. J.; Williams, E.; Lichanska, A. M.; Wells, C.; Hu, S.; Geczy, C. L.; Little, M. 
H.; Hume, D. A. A Null Mutation in the Inflammation-Associated S100 Protein S100A8 
Causes Early Resorption of the Mouse Embryo. The Journal of Immunology 1999, 163 
(4), 2209–2216. https://doi.org/10.4049/jimmunol.163.4.2209. 
(71) Ryckman, C.; Vandal, K.; Rouleau, P.; Talbot, M.; Tessier, P. A. Proinflammatory 
Activities of S100: Proteins S100A8, S100A9, and S100A8/A9 Induce Neutrophil 
Chemotaxis and Adhesion. The Journal of Immunology 2003, 170 (6), 3233–3242. 
https://doi.org/10.4049/jimmunol.170.6.3233. 
(72) Chen, B.; Miller, A. L.; Rebelatto, M.; Brewah, Y.; Rowe, D. C.; Clarke, L.; Czapiga, M.; 
Rosenthal, K.; Imamichi, T.; Chen, Y.; Chang, C.-S.; Chowdhury, P. S.; Naiman, B.; 
Wang, Y.; Yang, D.; Humbles, A. A.; Herbst, R.; Sims, G. P. S100A9 Induced 
Inflammatory Responses Are Mediated by Distinct Damage Associated Molecular 
Patterns (DAMP) Receptors In Vitro and In Vivo. PLoS ONE 2015, 10 (2), e0115828. 
https://doi.org/10.1371/journal.pone.0115828. 
(73) Riva, M.; Källberg, E.; Björk, P.; Hancz, D.; Vogl, T.; Roth, J.; Ivars, F.; Leanderson, T. 
Induction of Nuclear Factor‐κ B Responses by the S 100 A 9 Protein Is Toll‐like 
 152 
Receptor‐4‐dependent. Immunology 2012, 137 (2), 172–182. 
https://doi.org/10.1111/j.1365-2567.2012.03619.x. 
(74) Ehrchen, J. M.; Sunderkötter, C.; Foell, D.; Vogl, T.; Roth, J. The Endogenous Toll–like 
Receptor 4 Agonist S100A8/S100A9 (Calprotectin) as Innate Amplifier of Infection, 
Autoimmunity, and Cancer. Journal of Leukocyte Biology 2009, 86 (3), 557–566. 
https://doi.org/10.1189/jlb.1008647. 
(75) Nacken, W.; Kerkhoff, C. The Hetero‐oligomeric Complex of the S100A8/S100A9 
Protein Is Extremely Protease Resistant. FEBS Letters 2007, 581 (26), 5127–5130. 
https://doi.org/10.1016/j.febslet.2007.09.060. 
(76) Harman, J. L.; Loes, A. N.; Warren, G. D.; Heaphy, M. C.; Lampi, K. J.; Harms, M. J. 
Evolution of Multifunctionality through a Pleiotropic Substitution in the Innate Immune 
Protein S100A9. eLife 2020, 9, e54100. https://doi.org/10.7554/eLife.54100. 
(77) Kessenbrock, K.; Dau, T.; Jenne, D. E. Tailor-Made Inflammation: How Neutrophil 
Serine Proteases Modulate the Inflammatory Response. J Mol Med 2011, 89 (1), 23–28. 
https://doi.org/10.1007/s00109-010-0677-3. 
(78) Heutinck, K. M.; Ten Berge, I. J. M.; Hack, C. E.; Hamann, J.; Rowshani, A. T. Serine 
Proteases of the Human Immune System in Health and Disease. Molecular Immunology 
2010, 47 (11–12), 1943–1955. https://doi.org/10.1016/j.molimm.2010.04.020. 
(79) Janoff, A. Neutrophil Proteases in Inflammation. Annu. Rev. Med. 1972, 23 (1), 177–190. 
https://doi.org/10.1146/annurev.me.23.020172.001141. 
(80) Jerke, U.; Hernandez, D. P.; Beaudette, P.; Korkmaz, B.; Dittmar, G.; Kettritz, R. 
Neutrophil Serine Proteases Exert Proteolytic Activity on Endothelial Cells. Kidney 
International 2015, 88 (4), 764–775. https://doi.org/10.1038/ki.2015.159. 
(81) Vogl, T.; Stratis, A.; Wixler, V.; Völler, T.; Thurainayagam, S.; Jorch, S. K.; Zenker, S.; 
Dreiling, A.; Chakraborty, D.; Fröhling, M.; Paruzel, P.; Wehmeyer, C.; Hermann, S.; 
Papantonopoulou, O.; Geyer, C.; Loser, K.; Schäfers, M.; Ludwig, S.; Stoll, M.; 
Leanderson, T.; Schultze, J. L.; König, S.; Pap, T.; Roth, J. Autoinhibitory Regulation of 
S100A8/S100A9 Alarmin Activity Locally Restricts Sterile Inflammation. Journal of 
Clinical Investigation 2018, 128 (5), 1852–1866. https://doi.org/10.1172/JCI89867. 
(82) Stephan, J. R.; Nolan, E. M. Calcium-Induced Tetramerization and Zinc Chelation Shield 
Human Calprotectin from Degradation by Host and Bacterial Extracellular Proteases. 
Chem. Sci. 2016, 7 (3), 1962–1975. https://doi.org/10.1039/C5SC03287C. 
(83) Steinbakk, M.; Naess-Andresen, C.-F.; Fagerhol, M. K.; Lingaas, E.; Dale, I.; Brandtzaeg, 
P. Antimicrobial Actions of Calcium Binding Leucocyte L1 Protein, Calprotectin. The 
Lancet 1990, 336 (8718), 763–765. https://doi.org/10.1016/0140-6736(90)93237-J. 
(84) Corbin, B. D.; Seeley, E. H.; Raab, A.; Feldmann, J.; Miller, M. R.; Torres, V. J.; 
Anderson, K. L.; Dattilo, B. M.; Dunman, P. M.; Gerads, R.; Caprioli, R. M.; Nacken, W.; 
 153 
Chazin, W. J.; Skaar, E. P. Metal Chelation and Inhibition of Bacterial Growth in Tissue 
Abscesses. Science 2008, 319 (5865), 962–965. https://doi.org/10.1126/science.1152449. 
(85) Kehl-Fie, T. E.; Chitayat, S.; Hood, M. I.; Damo, S.; Restrepo, N.; Garcia, C.; Munro, K. 
A.; Chazin, W. J.; Skaar, E. P. Nutrient Metal Sequestration by Calprotectin Inhibits 
Bacterial Superoxide Defense, Enhancing Neutrophil Killing of Staphylococcus Aureus. 
Cell Host & Microbe 2011, 10 (2), 158–164. https://doi.org/10.1016/j.chom.2011.07.004. 
(86) Brophy, M. B.; Hayden, J. A.; Nolan, E. M. Calcium Ion Gradients Modulate the Zinc 
Affinity and Antibacterial Activity of Human Calprotectin. J. Am. Chem. Soc. 2012, 134 
(43), 18089–18100. https://doi.org/10.1021/ja307974e. 
(87) Damo, S. M.; Kehl-Fie, T. E.; Sugitani, N.; Holt, M. E.; Rathi, S.; Murphy, W. J.; Zhang, 
Y.; Betz, C.; Hench, L.; Fritz, G.; Skaar, E. P.; Chazin, W. J. Molecular Basis for 
Manganese Sequestration by Calprotectin and Roles in the Innate Immune Response to 
Invading Bacterial Pathogens. Proc. Natl. Acad. Sci. U.S.A. 2013, 110 (10), 3841–3846. 
https://doi.org/10.1073/pnas.1220341110. 
(88) Nakashige, T. G.; Zhang, B.; Krebs, C.; Nolan, E. M. Human Calprotectin Is an Iron-
Sequestering Host-Defense Protein. Nat Chem Biol 2015, 11 (10), 765–771. 
https://doi.org/10.1038/nchembio.1891. 
(89) Zygiel, E. M.; Nolan, E. M. Transition Metal Sequestration by the Host-Defense Protein 
Calprotectin. Annu. Rev. Biochem. 2018, 87 (1), 621–643. 
https://doi.org/10.1146/annurev-biochem-062917-012312. 
(90) Rosen, T.; Nolan, E. M. Metal Sequestration and Antimicrobial Activity of Human 
Calprotectin Are pH-Dependent. Biochemistry 2020, 59 (26), 2468–2478. 
https://doi.org/10.1021/acs.biochem.0c00359. 
(91) Carnazzo, V.; Redi, S.; Basile, V.; Natali, P.; Gulli, F.; Equitani, F.; Marino, M.; Basile, 
U. Calprotectin: Two Sides of the Same Coin. Rheumatology 2024, 63 (1), 26–33. 
https://doi.org/10.1093/rheumatology/kead405. 
(92) Romand, X.; Bernardy, C.; Nguyen, M. V. C.; Courtier, A.; Trocme, C.; Clapasson, M.; 
Paclet, M.-H.; Toussaint, B.; Gaudin, P.; Baillet, A. Systemic Calprotectin and Chronic 
Inflammatory Rheumatic Diseases. Joint Bone Spine 2019, 86 (6), 691–698. 
https://doi.org/10.1016/j.jbspin.2019.01.003. 
(93) Shabani, F.; Farasat, A.; Mahdavi, M.; Gheibi, N. Calprotectin (S100A8/S100A9): A Key 
Protein between Inflammation and Cancer. Inflamm. Res. 2018, 67 (10), 801–812. 
https://doi.org/10.1007/s00011-018-1173-4. 
(94) Ometto, F.; Friso, L.; Astorri, D.; Botsios, C.; Raffeiner, B.; Punzi, L.; Doria, A. 
Calprotectin in Rheumatic Diseases. Exp Biol Med (Maywood) 2017, 242 (8), 859–873. 
https://doi.org/10.1177/1535370216681551. 
 154 
(95) Lam, S. H.; Chua, H. L.; Gong, Z.; Lam, T. J.; Sin, Y. M. Development and Maturation of 
the Immune System in Zebrafish, Danio Rerio: A Gene Expression Profiling, in Situ 
Hybridization and Immunological Study. Developmental & Comparative Immunology 
2004, 28 (1), 9–28. https://doi.org/10.1016/S0145-305X(03)00103-4. 
(96) Melancon, E.; Gomez De La Torre Canny, S.; Sichel, S.; Kelly, M.; Wiles, T. J.; Rawls, J. 
F.; Eisen, J. S.; Guillemin, K. Best Practices for Germ-Free Derivation and Gnotobiotic 
Zebrafish Husbandry. In Methods in Cell Biology; Elsevier, 2017; Vol. 138, pp 61–100. 
https://doi.org/10.1016/bs.mcb.2016.11.005. 
(97) William Detrich, H.; Westerfield, M.; Zon, L. I. Preface. In Methods in Cell Biology; 
Elsevier, 2017; Vol. 138, pp xxiii–xxiv. https://doi.org/10.1016/S0091-679X(17)30010-9. 
(98) Meeker, N. D.; Trede, N. S. Immunology and Zebrafish: Spawning New Models of 
Human Disease. Developmental & Comparative Immunology 2008, 32 (7), 745–757. 
https://doi.org/10.1016/j.dci.2007.11.011. 
(99) Kumar, S.; Suleski, M.; Craig, J. M.; Kasprowicz, A. E.; Sanderford, M.; Li, M.; Stecher, 
G.; Hedges, S. B. TimeTree 5: An Expanded Resource for Species Divergence Times. 
Molecular Biology and Evolution 2022, 39 (8), msac174. 
https://doi.org/10.1093/molbev/msac174. 
(100) Howe, K.; Clark, M. D.; Torroja, C. F.; Torrance, J.; Berthelot, C.; Muffato, M.; Collins, 
J. E.; Humphray, S.; McLaren, K.; Matthews, L.; McLaren, S.; Sealy, I.; Caccamo, M.; 
Churcher, C.; Scott, C.; Barrett, J. C.; Koch, R.; Rauch, G.-J.; White, S.; Chow, W.; 
Kilian, B.; Quintais, L. T.; Guerra-Assunção, J. A.; Zhou, Y.; Gu, Y.; Yen, J.; Vogel, J.-
H.; Eyre, T.; Redmond, S.; Banerjee, R.; Chi, J.; Fu, B.; Langley, E.; Maguire, S. F.; 
Laird, G. K.; Lloyd, D.; Kenyon, E.; Donaldson, S.; Sehra, H.; Almeida-King, J.; 
Loveland, J.; Trevanion, S.; Jones, M.; Quail, M.; Willey, D.; Hunt, A.; Burton, J.; Sims, 
S.; McLay, K.; Plumb, B.; Davis, J.; Clee, C.; Oliver, K.; Clark, R.; Riddle, C.; Elliott, D.; 
Threadgold, G.; Harden, G.; Ware, D.; Begum, S.; Mortimore, B.; Kerry, G.; Heath, P.; 
Phillimore, B.; Tracey, A.; Corby, N.; Dunn, M.; Johnson, C.; Wood, J.; Clark, S.; Pelan, 
S.; Griffiths, G.; Smith, M.; Glithero, R.; Howden, P.; Barker, N.; Lloyd, C.; Stevens, C.; 
Harley, J.; Holt, K.; Panagiotidis, G.; Lovell, J.; Beasley, H.; Henderson, C.; Gordon, D.; 
Auger, K.; Wright, D.; Collins, J.; Raisen, C.; Dyer, L.; Leung, K.; Robertson, L.; 
Ambridge, K.; Leongamornlert, D.; McGuire, S.; Gilderthorp, R.; Griffiths, C.; 
Manthravadi, D.; Nichol, S.; Barker, G.; Whitehead, S.; Kay, M.; Brown, J.; Murnane, C.; 
Gray, E.; Humphries, M.; Sycamore, N.; Barker, D.; Saunders, D.; Wallis, J.; Babbage, 
A.; Hammond, S.; Mashreghi-Mohammadi, M.; Barr, L.; Martin, S.; Wray, P.; Ellington, 
A.; Matthews, N.; Ellwood, M.; Woodmansey, R.; Clark, G.; Cooper, J. D.; Tromans, A.; 
Grafham, D.; Skuce, C.; Pandian, R.; Andrews, R.; Harrison, E.; Kimberley, A.; Garnett, 
J.; Fosker, N.; Hall, R.; Garner, P.; Kelly, D.; Bird, C.; Palmer, S.; Gehring, I.; Berger, A.; 
Dooley, C. M.; Ersan-Ürün, Z.; Eser, C.; Geiger, H.; Geisler, M.; Karotki, L.; Kirn, A.; 
Konantz, J.; Konantz, M.; Oberländer, M.; Rudolph-Geiger, S.; Teucke, M.; Lanz, C.; 
Raddatz, G.; Osoegawa, K.; Zhu, B.; Rapp, A.; Widaa, S.; Langford, C.; Yang, F.; 
Schuster, S. C.; Carter, N. P.; Harrow, J.; Ning, Z.; Herrero, J.; Searle, S. M. J.; Enright, 
A.; Geisler, R.; Plasterk, R. H. A.; Lee, C.; Westerfield, M.; De Jong, P. J.; Zon, L. I.; 
 155 
Postlethwait, J. H.; Nüsslein-Volhard, C.; Hubbard, T. J. P.; Crollius, H. R.; Rogers, J.; 
Stemple, D. L. The Zebrafish Reference Genome Sequence and Its Relationship to the 
Human Genome. Nature 2013, 496 (7446), 498–503. https://doi.org/10.1038/nature12111. 
(101) Loes, A. N.; Hinman, M. N.; Farnsworth, D. R.; Miller, A. C.; Guillemin, K.; Harms, M. 
J. Identification and Characterization of Zebrafish Tlr4 Coreceptor Md-2. The Journal of 
Immunology 2021, 206 (5), 1046–1057. https://doi.org/10.4049/jimmunol.1901288. 
(102) Loes, A. N.; Bridgham, J. T.; Harms, M. J. Coevolution of the Toll-Like Receptor 4 
Complex with Calgranulins and Lipopolysaccharide. Front Immunol 2018, 9, 304. 
https://doi.org/10.3389/fimmu.2018.00304. 
(103) Yang, L.-L.; Wang, G.-Q.; Yang, L.-M.; Huang, Z.-B.; Zhang, W.-Q.; Yu, L.-Z. 
Endotoxin Molecule Lipopolysaccharide-Induced Zebrafish Inflammation Model: A 
Novel Screening Method for Anti-Inflammatory Drugs. Molecules 2014, 19 (2), 2390–
2409. https://doi.org/10.3390/molecules19022390. 
(104) Watzke, J.; Schirmer, K.; Scholz, S. Bacterial Lipopolysaccharides Induce Genes Involved 
in the Innate Immune Response in Embryos of the Zebrafish (Danio Rerio). Fish & 
Shellfish Immunology 2007, 23 (4), 901–905. https://doi.org/10.1016/j.fsi.2007.03.004. 
(105) Novoa, B.; Bowman, T. V.; Zon, L.; Figueras, A. LPS Response and Tolerance in the 
Zebrafish (Danio Rerio). Fish & Shellfish Immunology 2009, 26 (2), 326–331. 
https://doi.org/10.1016/j.fsi.2008.12.004. 
(106) Harms, M. J.; Thornton, J. W. Analyzing Protein Structure and Function Using Ancestral 
Gene Reconstruction. Current Opinion in Structural Biology 2010, 20 (3), 360–366. 
https://doi.org/10.1016/j.sbi.2010.03.005. 
(107) Farr, D.; Nag, D.; Chazin, W. J.; Harrison, S.; Thummel, R.; Luo, X.; Raychaudhuri, S.; 
Withey, J. H. Neutrophil-Associated Responses to Vibrio Cholerae Infection in a Natural 
Host Model. Infect Immun 2022, 90 (3), e00466-21. https://doi.org/10.1128/iai.00466-21. 
(108) Nag, D.; Farr, D.; Raychaudhuri, S.; Withey, J. H. An Adult Zebrafish Model for 
Adherent-Invasive Escherichia Coli Indicates Protection from AIEC Infection by Probiotic 
E. Coli Nissle. iScience 2022, 25 (7), 104572. https://doi.org/10.1016/j.isci.2022.104572. 
(109) Janeway, C. A.; Medzhitov, R. Innate Immune Recognition. Annu. Rev. Immunol. 2002, 
20 (1), 197–216. https://doi.org/10.1146/annurev.immunol.20.083001.084359. 
(110) Pauling, L.; Zuckerkandl, E.; Henriksen, T.; Lövstad, R. Chemical Paleogenetics. 
Molecular “Restoration Studies” of Extinct Forms of Life. Acta Chem. Scand. 1963, 17 
supl., 9–16. https://doi.org/10.3891/acta.chem.scand.17s-0009. 
(111) Spence, M. A.; Kaczmarski, J. A.; Saunders, J. W.; Jackson, C. J. Ancestral Sequence 
Reconstruction for Protein Engineers. Current Opinion in Structural Biology 2021, 69, 
131–141. https://doi.org/10.1016/j.sbi.2021.04.001. 
 156 
(112) Nicoll, C. R.; Bailleul, G.; Fiorentini, F.; Mascotti, M. L.; Fraaije, M. W.; Mattevi, A. 
Ancestral-Sequence Reconstruction Unveils the Structural Basis of Function in 
Mammalian FMOs. Nat Struct Mol Biol 2020, 27 (1), 14–24. 
https://doi.org/10.1038/s41594-019-0347-2. 
(113) Furukawa, R.; Toma, W.; Yamazaki, K.; Akanuma, S. Ancestral Sequence Reconstruction 
Produces Thermally Stable Enzymes with Mesophilic Enzyme-like Catalytic Properties. 
Sci Rep 2020, 10 (1), 15493. https://doi.org/10.1038/s41598-020-72418-4. 
(114) Zakas, P. M.; Brown, H. C.; Knight, K.; Meeks, S. L.; Spencer, H. T.; Gaucher, E. A.; 
Doering, C. B. Enhancing the Pharmaceutical Properties of Protein Drugs by Ancestral 
Sequence Reconstruction. Nat Biotechnol 2017, 35 (1), 35–37. 
https://doi.org/10.1038/nbt.3677. 
(115) Anderson, D. P.; Whitney, D. S.; Hanson-Smith, V.; Woznica, A.; Campodonico-Burnett, 
W.; Volkman, B. F.; King, N.; Thornton, J. W.; Prehoda, K. E. Evolution of an Ancient 
Protein Function Involved in Organized Multicellularity in Animals. eLife 2016, 5, 
e10147. https://doi.org/10.7554/eLife.10147. 
(116) Diez-Hermano, S.; Ganfornina, M. D.; Skerra, A.; Gutiérrez, G.; Sanchez, D. An 
Evolutionary Perspective of the Lipocalin Protein Family. Front. Physiol. 2021, 12, 
718983. https://doi.org/10.3389/fphys.2021.718983. 
(117) Mascotti, M. L. Resurrecting Enzymes by Ancestral Sequence Reconstruction. In Enzyme 
Engineering; Magnani, F., Marabelli, C., Paradisi, F., Eds.; Methods in Molecular 
Biology; Springer US: New York, NY, 2022; Vol. 2397, pp 111–136. 
https://doi.org/10.1007/978-1-0716-1826-4_7. 
(118) Merkl, R.; Sterner, R. Ancestral Protein Reconstruction: Techniques and Applications. 
Biological Chemistry 2016, 397 (1), 1–21. https://doi.org/10.1515/hsz-2015-0158. 
(119) Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Molecular Biology 
and Evolution 2007, 24 (8), 1586–1591. https://doi.org/10.1093/molbev/msm088. 
(120) Rees, J.; Cranston, K. Automated Assembly of a Reference Taxonomy for Phylogenetic 
Data Synthesis. BDJ 2017, 5, e12581. https://doi.org/10.3897/BDJ.5.e12581. 
(121) Vialle, R. A.; Tamuri, A. U.; Goldman, N. Alignment Modulates Ancestral Sequence 
Reconstruction Accuracy. Molecular Biology and Evolution 2018, 35 (7), 1783–1797. 
https://doi.org/10.1093/molbev/msy055. 
(122) Tan, G.; Muffato, M.; Ledergerber, C.; Herrero, J.; Goldman, N.; Gil, M.; Dessimoz, C. 
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently 
Worsen Single-Gene Phylogenetic Inference. Syst Biol 2015, 64 (5), 778–791. 
https://doi.org/10.1093/sysbio/syv033. 
 157 
(123) Tumescheit, C.; Firth, A. E.; Brown, K. CIAlign: A Highly Customisable Command Line 
Tool to Clean, Interpret and Visualise Multiple Sequence Alignments. PeerJ 2022, 10, 
e12983. https://doi.org/10.7717/peerj.12983. 
(124) Catanach, T. A.; Sweet, A. D.; Nguyen, N. D.; Peery, R. M.; Debevec, A. H.; Thomer, A. 
K.; Owings, A. C.; Boyd, B. M.; Katz, A. D.; Soto-Adames, F. N.; Allen, J. M. Fully 
Automated Sequence Alignment Methods Are Comparable to, and Much Faster than, 
Traditional Methods in Large Data Sets: An Example with Hepatitis B Virus. PeerJ 2019, 
7, e6142. https://doi.org/10.7717/peerj.6142. 
(125) Morrison, D. A. Multiple Sequence Alignment for Phylogenetic Purposes. Aust. 
Systematic Bot. 2006, 19 (6), 479. https://doi.org/10.1071/SB06020. 
(126) Del Amparo, R.; Arenas, M. Consequences of Substitution Model Selection on Protein 
Ancestral Sequence Reconstruction. Molecular Biology and Evolution 2022, 39 (7), 
msac144. https://doi.org/10.1093/molbev/msac144. 
(127) Joy, J. B.; Liang, R. H.; McCloskey, R. M.; Nguyen, T.; Poon, A. F. Y. Ancestral 
Reconstruction. PLoS Comput Biol 2016, 12 (7), e1004763. 
https://doi.org/10.1371/journal.pcbi.1004763. 
(128) Morel, B.; Kozlov, A. M.; Stamatakis, A.; Szöllősi, G. J. GeneRax: A Tool for Species-
Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene 
Duplication, Transfer, and Loss. Molecular Biology and Evolution 2020, 37 (9), 2763–
2774. https://doi.org/10.1093/molbev/msaa141. 
(129) Groussin, M.; Hobbs, J. K.; Szöllősi, G. J.; Gribaldo, S.; Arcus, V. L.; Gouy, M. Toward 
More Accurate Ancestral Protein Genotype–Phenotype Reconstructions with the Use of 
Species Tree-Aware Gene Trees. Molecular Biology and Evolution 2015, 32 (1), 13–22. 
https://doi.org/10.1093/molbev/msu305. 
(130) Gogarten, J. P.; Doolittle, W. F.; Lawrence, J. G. Prokaryotic Evolution in Light of Gene 
Transfer. Molecular Biology and Evolution 2002, 19 (12), 2226–2238. 
https://doi.org/10.1093/oxfordjournals.molbev.a004046. 
(131) Parks, D. H.; Chuvochina, M.; Waite, D. W.; Rinke, C.; Skarshewski, A.; Chaumeil, P.-
A.; Hugenholtz, P. A Standardized Bacterial Taxonomy Based on Genome Phylogeny 
Substantially Revises the Tree of Life. Nat Biotechnol 2018, 36 (10), 996–1004. 
https://doi.org/10.1038/nbt.4229. 
(132) Kinene, T.; Wainaina, J.; Maina, S.; Boykin, L. M. Rooting Trees, Methods For. In 
Encyclopedia of Evolutionary Biology; Elsevier, 2016; pp 489–493. 
https://doi.org/10.1016/B978-0-12-800049-6.00215-8. 
(133) Yang, Z.; Kumar, S.; Nei, M. A New Method of Inference of Ancestral Nucleotide and 
Amino Acid Sequences. Genetics 1995, 141 (4), 1641–1650. 
https://doi.org/10.1093/genetics/141.4.1641. 
 158 
(134) Eick, G. N.; Bridgham, J. T.; Anderson, D. P.; Harms, M. J.; Thornton, J. W. Robustness 
of Reconstructed Ancestral Protein Functions to Statistical Uncertainty. Mol Biol Evol 
2016, msw223. https://doi.org/10.1093/molbev/msw223. 
(135) Akanuma, S.; Nakajima, Y.; Yokobori, S.; Kimura, M.; Nemoto, N.; Mase, T.; Miyazono, 
K.; Tanokura, M.; Yamagishi, A. Experimental Evidence for the Thermophilicity of 
Ancestral Life. Proc. Natl. Acad. Sci. U.S.A. 2013, 110 (27), 11067–11072. 
https://doi.org/10.1073/pnas.1308215110. 
(136) Bridgham, J. T.; Keay, J.; Ortlund, E. A.; Thornton, J. W. Vestigialization of an Allosteric 
Switch: Genetic and Structural Mechanisms for the Evolution of Constitutive Activity in a 
Steroid Hormone Receptor. PLoS Genet 2014, 10 (1), e1004058. 
https://doi.org/10.1371/journal.pgen.1004058. 
(137) McKeown, A. N.; Bridgham, J. T.; Anderson, D. W.; Murphy, M. N.; Ortlund, E. A.; 
Thornton, J. W. Evolution of DNA Specificity in a Transcription Factor Family Produced 
a New Gene Regulatory Module. Cell 2014, 159 (1), 58–68. 
https://doi.org/10.1016/j.cell.2014.09.003. 
(138) Wheeler, L. C.; Anderson, J. A.; Morrison, A. J.; Wong, C. E.; Harms, M. J. Conservation 
of Specificity in Two Low-Specificity Proteins. Biochemistry 2018, 57 (5), 684–695. 
https://doi.org/10.1021/acs.biochem.7b01086. 
(139) Edgar, R. C. Muscle5: High-Accuracy Alignment Ensembles Enable Unbiased 
Assessments of Sequence Homology and Phylogeny. Nat Commun 2022, 13 (1), 6968. 
https://doi.org/10.1038/s41467-022-34630-w. 
(140) Kozlov, A. M.; Darriba, D.; Flouri, T.; Morel, B.; Stamatakis, A. RAxML-NG: A Fast, 
Scalable and User-Friendly Tool for Maximum Likelihood Phylogenetic Inference. 
Bioinformatics 2019, 35 (21), 4453–4455. https://doi.org/10.1093/bioinformatics/btz305. 
(141) Ishikawa, S. A.; Zhukova, A.; Iwasaki, W.; Gascuel, O. A Fast Likelihood Method to 
Reconstruct and Visualize Ancestral Scenarios. Molecular Biology and Evolution 2019, 
36 (9), 2069–2085. https://doi.org/10.1093/molbev/msz131. 
(142) Huerta-Cepas, J.; Serra, F.; Bork, P. ETE 3: Reconstruction, Analysis, and Visualization 
of Phylogenomic Data. Mol Biol Evol 2016, 33 (6), 1635–1638. 
https://doi.org/10.1093/molbev/msw046. 
(143) Altschul, S. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database 
Search Programs. Nucleic Acids Research 1997, 25 (17), 3389–3402. 
https://doi.org/10.1093/nar/25.17.3389. 
(144) Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, 
I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: Freely 
Available Python Tools for Computational Molecular Biology and Bioinformatics. 
Bioinformatics 2009, 25 (11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163. 
 159 
(145) Mctavish, E. J.; Sánchez-Reyes, L. L.; Holder, M. T. OpenTree: A Python Package for 
Accessing and Analyzing Data from the Open Tree of Life. Systematic Biology 2021, 70 
(6), 1295–1301. https://doi.org/10.1093/sysbio/syab033. 
(146) Eaton, D. A. R. Toytree: A Minimalist Tree Visualization and Manipulation Library for 
Python. Methods Ecol Evol 2020, 11 (1), 187–191. https://doi.org/10.1111/2041-
210X.13313. 
(147) Frith, M. C. How Sequence Alignment Scores Correspond to Probability Models. 
Bioinformatics 2019, btz576. https://doi.org/10.1093/bioinformatics/btz576. 
(148) Flouri, T.; Izquierdo-Carrasco, F.; Darriba, D.; Aberer, A. J.; Nguyen, L.-T.; Minh, B. Q.; 
Von Haeseler, A.; Stamatakis, A. The Phylogenetic Likelihood Library. Systematic 
Biology 2015, 64 (2), 356–362. https://doi.org/10.1093/sysbio/syu084. 
(149) Felsenstein, J. CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING 
THE BOOTSTRAP. Evolution 1985, 39 (4), 783–791. https://doi.org/10.1111/j.1558-
5646.1985.tb00420.x. 
(150) Abascal, F.; Zardoya, R.; Posada, D. ProtTest: Selection of Best-Fit Models of Protein 
Evolution. Bioinformatics 2005, 21 (9), 2104–2105. 
https://doi.org/10.1093/bioinformatics/bti263. 
(151) Maddison, D. R.; Maddison, W. P. MacClade 4, 2000. 
http://ib.berkeley.edu/courses/ib200/readings/MacClade%204%20Manual.pdf. 
(152) Larsson, A. AliView: A Fast and Lightweight Alignment Viewer and Editor for Large 
Datasets. Bioinformatics 2014, 30 (22), 3276–3278. 
https://doi.org/10.1093/bioinformatics/btu531. 
(153) Waterhouse, A. M.; Procter, J. B.; Martin, D. M. A.; Clamp, M.; Barton, G. J. Jalview 
Version 2--a Multiple Sequence Alignment Editor and Analysis Workbench. 
Bioinformatics 2009, 25 (9), 1189–1191. https://doi.org/10.1093/bioinformatics/btp033. 
(154) Tamura, K.; Stecher, G.; Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis 
Version 11. Molecular Biology and Evolution 2021, 38 (7), 3022–3027. 
https://doi.org/10.1093/molbev/msab120. 
(155) Zheng, Y.; Zhang, L. Effect of Incomplete Lineage Sorting On Tree-Reconciliation-Based 
Inference of Gene Duplication. IEEE/ACM Trans. Comput. Biol. and Bioinf. 2014, 11 (3), 
477–485. https://doi.org/10.1109/TCBB.2013.2297913. 
(156) Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for Clustering the next-
Generation Sequencing Data. Bioinformatics 2012, 28 (23), 3150–3152. 
https://doi.org/10.1093/bioinformatics/bts565. 
(157) Colless, D. H.; Wiley, E. O. Phylogenetics: The Theory and Practice of Phylogenetic 
Systematics. Systematic Zoology 1982, 31 (1), 100. https://doi.org/10.2307/2413420. 
 160 
(158) Sukumaran, J.; Holder, M. T. DendroPy: A Python Library for Phylogenetic Computing. 
Bioinformatics 2010, 26 (12), 1569–1571. https://doi.org/10.1093/bioinformatics/btq228. 
(159) Pattengale, N. D.; Alipour, M.; Bininda-Emonds, O. R. P.; Moret, B. M. E.; Stamatakis, 
A. How Many Bootstrap Replicates Are Necessary? Journal of Computational Biology 
2010, 17 (3), 337–354. https://doi.org/10.1089/cmb.2009.0179. 
(160) Rudd, K. E.; Johnson, S. C.; Agesa, K. M.; Shackelford, K. A.; Tsoi, D.; Kievlan, D. R.; 
Colombara, D. V.; Ikuta, K. S.; Kissoon, N.; Finfer, S.; Fleischmann-Struzek, C.; 
Machado, F. R.; Reinhart, K. K.; Rowan, K.; Seymour, C. W.; Watson, R. S.; West, T. E.; 
Marinho, F.; Hay, S. I.; Lozano, R.; Lopez, A. D.; Angus, D. C.; Murray, C. J. L.; 
Naghavi, M. Global, Regional, and National Sepsis Incidence and Mortality, 1990–2017: 
Analysis for the Global Burden of Disease Study. The Lancet 2020, 395 (10219), 200–
211. https://doi.org/10.1016/S0140-6736(19)32989-7. 
(161) Kim, H.-J.; Kim, H.; Lee, J.-H.; Hwangbo, C. Toll-like Receptor 4 (TLR4): New Insight 
Immune and Aging. Immun Ageing 2023, 20 (1), 67. https://doi.org/10.1186/s12979-023-
00383-3. 
(162) Yan, B.; Yu, X.; Cai, X.; Huang, X.; Xie, B.; Lian, D.; Chen, J.; Li, W.; Lin, Y.; Ye, J.; Li, 
J. A Review: The Significance of Toll-Like Receptors 2 and 4, and NF-κB Signaling in 
Endothelial Cells during Atherosclerosis. Front. Biosci. (Landmark Ed) 2024, 29 (4), 161. 
https://doi.org/10.31083/j.fbl2904161. 
(163) Nagai, Y.; Akashi, S.; Nagafuku, M.; Ogata, M.; Iwakura, Y.; Akira, S.; Kitamura, T.; 
Kosugi, A.; Kimoto, M.; Miyake, K. Essential Role of MD-2 in LPS Responsiveness and 
TLR4 Distribution. Nat Immunol 2002, 3 (7), 667–672. https://doi.org/10.1038/ni809. 
(164) Viriyakosol, S.; Tobias, P. S.; Kitchens, R. L.; Kirkland, T. N. MD-2 Binds to Bacterial 
Lipopolysaccharide. Journal of Biological Chemistry 2001, 276 (41), 38044–38051. 
https://doi.org/10.1074/jbc.M105228200. 
(165) Hailman, E.; Lichenstein, H. S.; Wurfel, M. M.; Miller, D. S.; Johnson, D. A.; Kelley, M.; 
Busse, L. A.; Zukowski, M. M.; Wright, S. D. Lipopolysaccharide (LPS)-Binding Protein 
Accelerates the Binding of LPS to CD14. The Journal of experimental medicine 1994, 179 
(1), 269–277. https://doi.org/10.1084/jem.179.1.269. 
(166) Gioannini, T. L.; Weiss, J. P. Regulation of Interactions of Gram-Negative Bacterial 
Endotoxins with Mammalian Cells. Immunol Res 2007, 39 (1–3), 249–260. 
https://doi.org/10.1007/s12026-007-0069-0. 
(167) Prohinar, P.; Re, F.; Widstrom, R.; Zhang, D.; Teghanemt, A.; Weiss, J. P.; Gioannini, T. 
L. Specific High Affinity Interactions of Monomeric Endotoxin·Protein Complexes with 
Toll-like Receptor 4 Ectodomain. Journal of Biological Chemistry 2007, 282 (2), 1010–
1017. https://doi.org/10.1074/jbc.M609400200. 
(168) Ryu, J.-K.; Kim, S. J.; Rah, S.-H.; Kang, J. I.; Jung, H. E.; Lee, D.; Lee, H. K.; Lee, J.-O.; 
Park, B. S.; Yoon, T.-Y.; Kim, H. M. Reconstruction of LPS Transfer Cascade Reveals 
 161 
Structural Determinants within LBP, CD14, and TLR4-MD2 for Efficient LPS 
Recognition and Transfer. Immunity 2017, 46 (1), 38–50. 
https://doi.org/10.1016/j.immuni.2016.11.007. 
(169) Haziot, A.; Ferrero, E.; Köntgen, F.; Hijiya, N.; Yamamoto, S.; Silver, J.; Stewart, C. L.; 
Goyert, S. M. Resistance to Endotoxin Shock and Reduced Dissemination of Gram-
Negative Bacteria in CD14-Deficient Mice. Immunity 1996, 4 (4), 407–414. 
https://doi.org/10.1016/S1074-7613(00)80254-X. 
(170) Tan, Y.; Kagan, J. C. A Cross-Disciplinary Perspective on the Innate Immune Responses 
to Bacterial Lipopolysaccharide. Molecular Cell 2014, 54 (2), 212–223. 
https://doi.org/10.1016/j.molcel.2014.03.012. 
(171) O’Neill, L. A. J.; Bowie, A. G. The Family of Five: TIR-Domain-Containing Adaptors in 
Toll-like Receptor Signalling. Nat Rev Immunol 2007, 7 (5), 353–364. 
https://doi.org/10.1038/nri2079. 
(172) Watters, T. M.; Kenny, E. F.; O’Neill, L. A. J. Structure, Function and Regulation of the 
Toll/IL‐1 Receptor Adaptor Proteins. Immunol Cell Biol 2007, 85 (6), 411–419. 
https://doi.org/10.1038/sj.icb.7100095. 
(173) Ve, T.; J. Gay, N.; Mansell, A.; Kobe, B.; Kellie, S. Adaptors in Toll-Like Receptor 
Signaling and Their Potential as Therapeutic Targets. CDT 2012, 13 (11), 1360–1374. 
https://doi.org/10.2174/138945012803530260. 
(174) Mata-Haro, V.; Cekic, C.; Martin, M.; Chilton, P. M.; Casella, C. R.; Mitchell, T. C. The 
Vaccine Adjuvant Monophosphoryl Lipid A as a TRIF-Biased Agonist of TLR4. Science 
2007, 316 (5831), 1628–1632. https://doi.org/10.1126/science.1138963. 
(175) Li, Y.; Wang, Z.; Chen, J.; Ernst, R.; Wang, X. Influence of Lipid A Acylation Pattern on 
Membrane Permeability and Innate Immune Stimulation. Marine Drugs 2013, 11 (9), 
3197–3208. https://doi.org/10.3390/md11093197. 
(176) Needham, B. D.; Carroll, S. M.; Giles, D. K.; Georgiou, G.; Whiteley, M.; Trent, M. S. 
Modulating the Innate Immune Response by Combinatorial Engineering of Endotoxin. 
Proc. Natl. Acad. Sci. U.S.A. 2013, 110 (4), 1464–1469. 
https://doi.org/10.1073/pnas.1218080110. 
(177) Rietschel, E. T.; Kirikae, T.; Schade, F. U.; Mamat, U.; Schmidt, G.; Loppnow, H.; Ulmer, 
A. J.; Zähringer, U.; Seydel, U.; Di Padova, F.; Schreier, M.; Brade, H. Bacterial 
Endotoxin: Molecular Relationships of Structure to Activity and Function. FASEB j. 1994, 
8 (2), 217–225. https://doi.org/10.1096/fasebj.8.2.8119492. 
(178) Scott, A. J.; Oyler, B. L.; Goodlett, D. R.; Ernst, R. K. Lipid A Structural Modifications in 
Extreme Conditions and Identification of Unique Modifying Enzymes to Define the Toll-
like Receptor 4 Structure-Activity Relationship. Biochimica et Biophysica Acta (BBA) - 
Molecular and Cell Biology of Lipids 2017, 1862 (11), 1439–1450. 
https://doi.org/10.1016/j.bbalip.2017.01.004. 
 162 
(179) Xie, Y.; Meijer, A. H.; Schaaf, M. J. M. Modeling Inflammation in Zebrafish for the 
Development of Anti-Inflammatory Drugs. Front. Cell Dev. Biol. 2021, 8, 620984. 
https://doi.org/10.3389/fcell.2020.620984. 
(180) Gauthier, A. E.; Chandler, C. E.; Poli, V.; Gardner, F. M.; Tekiau, A.; Smith, R.; Bonham, 
K. S.; Cordes, E. E.; Shank, T. M.; Zanoni, I.; Goodlett, D. R.; Biller, S. J.; Ernst, R. K.; 
Rotjan, R. D.; Kagan, J. C. Deep-Sea Microbes as Tools to Refine the Rules of Innate 
Immune Pattern Recognition. Sci. Immunol. 2021, 6 (57), eabe0531. 
https://doi.org/10.1126/sciimmunol.abe0531. 
(181) Chilton, P. M.; Embry, C. A.; Mitchell, T. C. Effects of Differences in Lipid A Structure 
on TLR4 Pro-Inflammatory Signaling and Inflammasome Activation. Front. Immun. 
2012, 3. https://doi.org/10.3389/fimmu.2012.00154. 
(182) Orlandi, K. N.; Phillips, S. R.; Sailer, Z. R.; Harman, J. L.; Harms, M. J. Topiary: Pruning 
the Manual Labor from Ancestral Sequence Reconstruction. Protein Science 2023, 32 (2), 
e4551. https://doi.org/10.1002/pro.4551. 
(183) Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; 
Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; 
Kohl, S. A. A.; Ballard, A. J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; 
Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; 
Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A. W.; 
Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly Accurate Protein Structure Prediction 
with AlphaFold. Nature 2021, 596 (7873), 583–589. https://doi.org/10.1038/s41586-021-
03819-2. 
(184) Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, 
R.; Blackwell, S.; Yim, J.; Ronneberger, O.; Bodenstein, S.; Zielinski, M.; Bridgland, A.; 
Potapenko, A.; Cowie, A.; Tunyasuvunakool, K.; Jain, R.; Clancy, E.; Kohli, P.; Jumper, 
J.; Hassabis, D. Protein Complex Prediction with AlphaFold-Multimer. October 4, 2021. 
https://doi.org/10.1101/2021.10.04.463034. 
(185) Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. 
ColabFold: Making Protein Folding Accessible to All. Nat Methods 2022, 19 (6), 679–
682. https://doi.org/10.1038/s41592-022-01488-1. 
(186) Bates, J. M.; Akerlund, J.; Mittge, E.; Guillemin, K. Intestinal Alkaline Phosphatase 
Detoxifies Lipopolysaccharide and Prevents Inflammation in Zebrafish in Response to the 
Gut Microbiota. Cell Host & Microbe 2007, 2 (6), 371–382. 
https://doi.org/10.1016/j.chom.2007.10.010. 
(187) Rice, T. W.; Wheeler, A. P.; Bernard, G. R.; Vincent, J.-L.; Angus, D. C.; Aikawa, N.; 
Demeyer, I.; Sainati, S.; Amlot, N.; Cao, C.; Ii, M.; Matsuda, H.; Mouri, K.; Cohen, J. A 
Randomized, Double-Blind, Placebo-Controlled Trial of TAK-242 for the Treatment of 
Severe Sepsis*: Critical Care Medicine 2010, 38 (8), 1685–1694. 
https://doi.org/10.1097/CCM.0b013e3181e7c5c9. 
 163 
(188) Ono, Y.; Maejima, Y.; Saito, M.; Sakamoto, K.; Horita, S.; Shimomura, K.; Inoue, S.; 
Kotani, J. TAK-242, a Specific Inhibitor of Toll-like Receptor 4 Signalling, Prevents 
Endotoxemia-Induced Skeletal Muscle Wasting in Mice. Sci Rep 2020, 10 (1), 694. 
https://doi.org/10.1038/s41598-020-57714-3. 
(189) Yang, L.; Jiménez, J. A.; Earley, A. M.; Hamlin, V.; Kwon, V.; Dixon, C. T.; Shiau, C. E. 
Drainage of Inflammatory Macromolecules from the Brain to Periphery Targets the Liver 
for Macrophage Infiltration. eLife 2020, 9, e58191. https://doi.org/10.7554/eLife.58191. 
(190) Matsunaga, N.; Tsuchimori, N.; Matsumoto, T.; Ii, M. TAK-242 (Resatorvid), a Small-
Molecule Inhibitor of Toll-Like Receptor (TLR) 4 Signaling, Binds Selectively to TLR4 
and Interferes with Interactions between TLR4 and Its Adaptor Molecules. Mol 
Pharmacol 2011, 79 (1), 34–41. https://doi.org/10.1124/mol.110.068064. 
(191) MacKenzie, S. A.; Roher, N.; Boltaña, S.; Goetz, F. W. Peptidoglycan, Not Endotoxin, Is 
the Key Mediator of Cytokine Gene Expression Induced in Rainbow Trout Macrophages 
by Crude LPS☆. Molecular Immunology 2010, 47 (7–8), 1450–1457. 
https://doi.org/10.1016/j.molimm.2010.02.009. 
(192) Anderson, J. A.; Loes, A. N.; Waddell, G. L.; Harms, M. J. Tracing the Evolution of 
Novel Features of Human Toll‐like Receptor 4. Protein Science 2019, 28 (7), 1350–1358. 
https://doi.org/10.1002/pro.3644. 
(193) Ohto, U.; Miyake, K.; Shimizu, T. Crystal Structures of Mouse and Human RP105/MD-1 
Complexes Reveal Unique Dimer Organization of the Toll-Like Receptor Family. Journal 
of Molecular Biology 2011, 413 (4), 815–825. https://doi.org/10.1016/j.jmb.2011.09.020. 
(194) Ogata, H.; Su, I.; Miyake, K.; Nagai, Y.; Akashi, S.; Mecklenbräuker, I.; Rajewsky, K.; 
Kimoto, M.; Tarakhovsky, A. The Toll-like Receptor Protein Rp105 Regulates 
Lipopolysaccharide Signaling in B Cells. The Journal of Experimental Medicine 2000, 
192 (1), 23–30. https://doi.org/10.1084/jem.192.1.23. 
(195) Divanovic, S.; Trompette, A.; Atabani, S. F.; Madan, R.; Golenbock, D. T.; Visintin, A.; 
Finberg, R. W.; Tarakhovsky, A.; Vogel, S. N.; Belkaid, Y.; Kurt-Jones, E. A.; Karp, C. L. 
Inhibition of TLR-4/MD-2 Signaling by RP105/MD-1. J. Endotoxin Res. 2005, 11 (6), 
363–368. https://doi.org/10.1179/096805105X67300. 
(196) Oliveira-Nascimento, L.; Massari, P.; Wetzler, L. M. The Role of TLR2 in Infection and 
Immunity. Front. Immun. 2012, 3. https://doi.org/10.3389/fimmu.2012.00079. 
(197) Stafford, J. L.; Neumann, N. F.; Belosevic, M. Products of Proteolytic Cleavage of 
Transferrin Induce Nitric Oxide Response of Goldfish Macrophages. Developmental & 
Comparative Immunology 2001, 25 (2), 101–115. https://doi.org/10.1016/S0145-
305X(00)00048-3. 
(198) Haddad, G.; Belosevic, M. Transferrin-Derived Synthetic Peptide Induces Highly 
Conserved pro-Inflammatory Responses of Macrophages. Molecular Immunology 2009, 
46 (4), 576–586. https://doi.org/10.1016/j.molimm.2008.07.030. 
 164 
(199) Trites, M. J.; Barreda, D. R. Contributions of Transferrin to Acute Inflammation in the 
Goldfish, C. Auratus. Developmental & Comparative Immunology 2017, 67, 300–309. 
https://doi.org/10.1016/j.dci.2016.09.004. 
(200) Husebye, H.; Halaas, Ø.; Stenmark, H.; Tunheim, G.; Sandanger, Ø.; Bogen, B.; Brech, 
A.; Latz, E.; Espevik, T. Endocytic Pathways Regulate Toll-like Receptor 4 Signaling and 
Link Innate and Adaptive Immunity. EMBO J 2006, 25 (4), 683–692. 
https://doi.org/10.1038/sj.emboj.7600991. 
(201) Meyer, A.; Schartl, M. Gene and Genome Duplications in Vertebrates: The One-to-Four (-
to-Eight in Fish) Rule and the Evolution of Novel Gene Functions. Current Opinion in 
Cell Biology 1999, 11 (6), 699–704. https://doi.org/10.1016/S0955-0674(99)00039-3. 
(202) Sullivan, C.; Charette, J.; Catchen, J.; Lage, C. R.; Giasson, G.; Postlethwait, J. H.; 
Millard, P. J.; Kim, C. H. The Gene History of Zebrafish Tlr4a and Tlr4b Is Predictive of 
Their Divergent Functions. The Journal of Immunology 2009, 183 (9), 5896–5908. 
https://doi.org/10.4049/jimmunol.0803285. 
(203) Ohno, S. Evolution by Gene Duplication; Springer Berlin Heidelberg: Berlin, Heidelberg, 
1970. https://doi.org/10.1007/978-3-642-86659-3. 
(204) Kayagaki, N.; Warming, S.; Lamkanfi, M.; Walle, L. V.; Louie, S.; Dong, J.; Newton, K.; 
Qu, Y.; Liu, J.; Heldens, S.; Zhang, J.; Lee, W. P.; Roose-Girma, M.; Dixit, V. M. Non-
Canonical Inflammasome Activation Targets Caspase-11. Nature 2011, 479 (7371), 117–
121. https://doi.org/10.1038/nature10558. 
(205) Shi, J.; Zhao, Y.; Wang, Y.; Gao, W.; Ding, J.; Li, P.; Hu, L.; Shao, F. Inflammatory 
Caspases Are Innate Immune Receptors for Intracellular LPS. Nature 2014, 514 (7521), 
187–192. https://doi.org/10.1038/nature13683. 
(206) Hagar, J. A.; Powell, D. A.; Aachoui, Y.; Ernst, R. K.; Miao, E. A. Cytoplasmic LPS 
Activates Caspase-11: Implications in TLR4-Independent Endotoxic Shock. Science 2013, 
341 (6151), 1250–1253. https://doi.org/10.1126/science.1240988. 
(207) Yang, D.; Zheng, X.; Chen, S.; Wang, Z.; Xu, W.; Tan, J.; Hu, T.; Hou, M.; Wang, W.; 
Gu, Z.; Wang, Q.; Zhang, R.; Zhang, Y.; Liu, Q. Sensing of Cytosolic LPS through 
Caspy2 Pyrin Domain Mediates Noncanonical Inflammasome Activation in Zebrafish. 
Nat Commun 2018, 9 (1), 3052. https://doi.org/10.1038/s41467-018-04984-1. 
(208) Teufel, F.; Almagro Armenteros, J. J.; Johansen, A. R.; Gíslason, M. H.; Pihl, S. I.; 
Tsirigos, K. D.; Winther, O.; Brunak, S.; Von Heijne, G.; Nielsen, H. SignalP 6.0 Predicts 
All Five Types of Signal Peptides Using Protein Language Models. Nat Biotechnol 2022, 
40 (7), 1023–1025. https://doi.org/10.1038/s41587-021-01156-3. 
(209) Rojas, A.; Shiau, C. Brain-Localized and Intravenous Microinjections in the Larval 
Zebrafish to Assess Innate Immune Response. BIO-PROTOCOL 2021, 11 (7). 
https://doi.org/10.21769/BioProtoc.3978. 
 165 
(210) Wang, S.; Song, R.; Wang, Z.; Jing, Z.; Wang, S.; Ma, J. S100A8/A9 in Inflammation. 
Front. Immunol. 2018, 9, 1298. https://doi.org/10.3389/fimmu.2018.01298. 
(211) Leukert, N.; Vogl, T.; Strupat, K.; Reichelt, R.; Sorg, C.; Roth, J. Calcium-Dependent 
Tetramer Formation of S100A8 and S100A9 Is Essential for Biological Activity. Journal 
of Molecular Biology 2006, 359 (4), 961–972. https://doi.org/10.1016/j.jmb.2006.04.009. 
(212) Odink, K.; Cerletti, N.; Brüggen, J.; Clerc, R. G.; Tarcsay, L.; Zwadlo, G.; Gerhards, G.; 
Schlegel, R.; Sorg, C. Two Calcium-Binding Proteins in Infiltrate Macrophages of 
Rheumatoid Arthritis. Nature 1987, 330 (6143), 80–82. https://doi.org/10.1038/330080a0. 
(213) Urban, C. F.; Ermert, D.; Schmid, M.; Abu-Abed, U.; Goosmann, C.; Nacken, W.; 
Brinkmann, V.; Jungblut, P. R.; Zychlinsky, A. Neutrophil Extracellular Traps Contain 
Calprotectin, a Cytosolic Protein Complex Involved in Host Defense against Candida 
Albicans. PLoS Pathog 2009, 5 (10), e1000639. 
https://doi.org/10.1371/journal.ppat.1000639. 
(214) Hayden, J. A.; Brophy, M. B.; Cunden, L. S.; Nolan, E. M. High-Affinity Manganese 
Coordination by Human Calprotectin Is Calcium-Dependent and Requires the Histidine-
Rich Site Formed at the Dimer Interface. J. Am. Chem. Soc. 2013, 135 (2), 775–787. 
https://doi.org/10.1021/ja3096416. 
(215) Gagnon, D. M.; Brophy, M. B.; Bowman, S. E. J.; Stich, T. A.; Drennan, C. L.; Britt, R. 
D.; Nolan, E. M. Manganese Binding Properties of Human Calprotectin under Conditions 
of High and Low Calcium: X-Ray Crystallographic and Advanced Electron Paramagnetic 
Resonance Spectroscopic Analysis. J. Am. Chem. Soc. 2015, 137 (8), 3004–3016. 
https://doi.org/10.1021/ja512204s. 
(216) Nakashige, T. G.; Stephan, J. R.; Cunden, L. S.; Brophy, M. B.; Wommack, A. J.; Keegan, 
B. C.; Shearer, J. M.; Nolan, E. M. The Hexahistidine Motif of Host-Defense Protein 
Human Calprotectin Contributes to Zinc Withholding and Its Functional Versatility. J. 
Am. Chem. Soc. 2016, 138 (37), 12243–12251. https://doi.org/10.1021/jacs.6b06845. 
(217) Clark, H. L.; Jhingran, A.; Sun, Y.; Vareechon, C.; De Jesus Carrion, S.; Skaar, E. P.; 
Chazin, W. J.; Calera, J. A.; Hohl, T. M.; Pearlman, E. Zinc and Manganese Chelation by 
Neutrophil S100A8/A9 (Calprotectin) Limits Extracellular Aspergillus Fumigatus Hyphal 
Growth and Corneal Infection. The Journal of Immunology 2016, 196 (1), 336–344. 
https://doi.org/10.4049/jimmunol.1502037. 
(218) Baker, T. M.; Nakashige, T. G.; Nolan, E. M.; Neidig, M. L. Magnetic Circular Dichroism 
Studies of Iron( II ) Binding to Human Calprotectin. Chem. Sci. 2017, 8 (2), 1369–1377. 
https://doi.org/10.1039/C6SC03487J. 
(219) Hadley, R. C.; Gagnon, D. M.; Brophy, M. B.; Gu, Y.; Nakashige, T. G.; Britt, R. D.; 
Nolan, E. M. Biochemical and Spectroscopic Observation of Mn(II) Sequestration from 
Bacterial Mn(II) Transport Machinery by Calprotectin. J. Am. Chem. Soc. 2018, 140 (1), 
110–113. https://doi.org/10.1021/jacs.7b11207. 
 166 
(220) Besold, A. N.; Gilston, B. A.; Radin, J. N.; Ramsoomair, C.; Culbertson, E. M.; Li, C. X.; 
Cormack, B. P.; Chazin, W. J.; Kehl-Fie, T. E.; Culotta, V. C. Role of Calprotectin in 
Withholding Zinc and Copper from Candida Albicans. Infect Immun 2018, 86 (2), e00779-
17. https://doi.org/10.1128/IAI.00779-17. 
(221) Manitz, M.-P.; Horst, B.; Seeliger, S.; Strey, A.; Skryabin, B. V.; Gunzer, M.; Frings, W.; 
Schünlau, F.; Roth, J.; Sorg, C.; Nacken, W. Loss of S100A9 (MRP14) Results in 
Reduced Interleukin-8-Induced CD11b Surface Expression, a Polarized Microfilament 
System, and Diminished Responsiveness to Chemoattractants In Vitro. Molecular and 
Cellular Biology 2003, 23 (3), 1034–1043. https://doi.org/10.1128/MCB.23.3.1034-
1043.2003. 
(222) Wang, C.; Iashchishyn, I. A.; Pansieri, J.; Nyström, S.; Klementieva, O.; Kara, J.; 
Horvath, I.; Moskalenko, R.; Rofougaran, R.; Gouras, G.; Kovacs, G. G.; Shankar, S. K.; 
Morozova-Roche, L. A. S100A9-Driven Amyloid-Neuroinflammatory Cascade in 
Traumatic Brain Injury as a Precursor State for Alzheimer’s Disease. Sci Rep 2018, 8 (1), 
12836. https://doi.org/10.1038/s41598-018-31141-x. 
(223) Grunwald, D. J.; Eisen, J. S. Headwaters of the Zebrafish — Emergence of a New Model 
Vertebrate. Nat Rev Genet 2002, 3 (9), 717–724. https://doi.org/10.1038/nrg892. 
(224) Willett, C. E.; Cortes, A.; Zuasti, A.; Zapata, A. G. Early Hematopoiesis and Developing 
Lymphoid Organs in the Zebrafish. Dev Dyn 1999, 214 (4), 323–336. 
https://doi.org/10.1002/(SICI)1097-0177(199904)214:4<323::AID-AJA5>3.0.CO;2-3. 
(225) Davidson, A. J.; Zon, L. I. The ‘Definitive’ (and ‘Primitive’) Guide to Zebrafish 
Hematopoiesis. Oncogene 2004, 23 (43), 7233–7246. 
https://doi.org/10.1038/sj.onc.1207943. 
(226) Trede, N. S.; Langenau, D. M.; Traver, D.; Look, A. T.; Zon, L. I. The Use of Zebrafish to 
Understand Immunity. Immunity 2004, 20 (4), 367–379. https://doi.org/10.1016/S1074-
7613(04)00084-6. 
(227) Lieschke, G. J.; Currie, P. D. Animal Models of Human Disease: Zebrafish Swim into 
View. Nat Rev Genet 2007, 8 (5), 353–367. https://doi.org/10.1038/nrg2091. 
(228) Chen, H.; Xu, C.; Jin, Q.; Liu, Z. S100 Protein Family in Human Cancer. Am J Cancer 
Res 2014, 4 (2), 89–115. 
(229) Marenholz, I.; Heizmann, C. W.; Fritz, G. S100 Proteins in Mouse and Man: From 
Evolution to Function and Pathology (Including an Update of the Nomenclature). 
Biochemical and Biophysical Research Communications 2004, 322 (4), 1111–1122. 
https://doi.org/10.1016/j.bbrc.2004.07.096. 
(230) Kraemer, A. M.; Saraiva, L. R.; Korsching, S. I. Structural and Functional Diversification 
in the Teleost S100 Family of Calcium-Binding Proteins. BMC Evol Biol 2008, 8 (1), 48. 
https://doi.org/10.1186/1471-2148-8-48. 
 167 
(231) Zhang, C.; Zhang, Q.; Wang, J.; Tian, J.; Song, Y.; Xie, H.; Chang, M.; Nie, P.; Gao, Q.; 
Zou, J. Transcriptomic Responses of S100 Family to Bacterial and Viral Infection in 
Zebrafish. Fish & Shellfish Immunology 2019, 94, 685–696. 
https://doi.org/10.1016/j.fsi.2019.09.051. 
(232) Wheeler, L. C.; Donor, M. T.; Prell, J. S.; Harms, M. J. Multiple Evolutionary Origins of 
Ubiquitous Cu2+ and Zn2+ Binding in the S100 Protein Family. PLoS ONE 2016, 11 
(10), e0164740. https://doi.org/10.1371/journal.pone.0164740. 
(233) Bozzi, A. T.; Nolan, E. M. Avian MRP126 Restricts Microbial Growth through Ca(II)-
Dependent Zn(II) Sequestration. Biochemistry 2020, 59 (6), 802–817. 
https://doi.org/10.1021/acs.biochem.9b01012. 
(234) Farnsworth, D. R.; Saunders, L. M.; Miller, A. C. A Single-Cell Transcriptome Atlas for 
Zebrafish Development. Developmental Biology 2020, 459 (2), 100–108. 
https://doi.org/10.1016/j.ydbio.2019.11.008. 
(235) Hou, Y.; Lee, H. J.; Chen, Y.; Ge, J.; Osman, F. O. I.; McAdow, A. R.; Mokalled, M. H.; 
Johnson, S. L.; Zhao, G.; Wang, T. Cellular Diversity of the Regenerating Caudal Fin. Sci. 
Adv. 2020, 6 (33), eaba2084. https://doi.org/10.1126/sciadv.aba2084. 
(236) Bhattacharya, S.; Chazin, W. J. Calcium-Driven Changes in S100A11 Structure Revealed. 
Structure 2003, 11 (7), 738–740. https://doi.org/10.1016/S0969-2126(03)00132-1. 
(237) Santamaria-Kisiel, L.; Rintala-Dempsey, A. C.; Shaw, G. S. Calcium-Dependent and -
Independent Interactions of the S100 Protein Family. Biochemical Journal 2006, 396 (2), 
201–214. https://doi.org/10.1042/BJ20060195. 
(238) Harman, J. L.; Loes, A. N.; Warren, G. D.; Heaphy, M. C.; Lampi, K. J.; Harms, M. J. 
Evolution of Multifunctionality through a Pleiotropic Substitution in the Innate Immune 
Protein S100A9. eLife 2020, 9, e54100. https://doi.org/10.7554/eLife.54100. 
(239) Hadley, R. C.; Gu, Y.; Nolan, E. M. Initial Biochemical and Functional Evaluation of 
Murine Calprotectin Reveals Ca(II)-Dependence and Its Ability to Chelate Multiple 
Nutrient Transition Metal Ions. Biochemistry 2018, 57 (19), 2846–2856. 
https://doi.org/10.1021/acs.biochem.8b00309. 
(240) Spratt, D. E.; Barber, K. R.; Marlatt, N. M.; Ngo, V.; Macklin, J. A.; Xiao, Y.; 
Konermann, L.; Duennwald, M. L.; Shaw, G. S. A Subset of Calcium‐binding S100 
Proteins Show Preferential Heterodimerization. The FEBS Journal 2019, 286 (10), 1859–
1876. https://doi.org/10.1111/febs.14775. 
 
 168