Harnessing Single Cell RNA Sequencing to Uncover Developmental Genetic Underpinnings of 

Syngnathid Fishes’ Derived Traits  

by  

Hope Meredith Healey 

  
A dissertation accepted and approved in partial fulfillment of the   

requirements for the degree of  

Doctor of Philosophy   

in Biology 

  
Dissertation Committee:  

Nadia Singh, Chair  

William Cresko, Advisor  

Matthew Barber, Core Member  

John Postlethwait, Core Member  

Kirstin Sterner, Institutional Representative  

 
University of Oregon  
 

Fall 2024 
 
 
2 
 

© 2024 Hope Meredith Healey 
This work is openly licensed via CC BY.  

  
3 
 

DISSERTATION ABSTRACT 
 
Hope Meredith Healey 
 
Doctor of Philosophy in Biology 
 
Fall 2024 
 
Title: Harnessing Single Cell RNA Sequencing to Uncover Developmental Genetic 

Underpinnings of Syngnathid Fishes’ Derived Traits  

 
Syngnathid fishes are extraordinary creatures with highly derived features. Notably, these 

fishes have elongated, toothless snouts, male pregnancy, and exoskeletons. In addition to their 

highly altered morphologies, syngnathids have lost highly conserved signaling genes (fgf3 and 

4), transcription factors (eve1), and tooth mineralization genes (scpp genes). Although these 

genomic changes are predicted to contribute to the development, or lack thereof, of syngnathid’s 

derived traits, it is unknown whether this is the case. To investigate developmental origins of 

syngnathid’s highly altered traits, I harnessed single cell RNA sequencing (scRNAseq) methods 

for my dissertation research. First, I identified limitations for completing scRNAseq analysis in 

non-traditional model organisms, then overcame these challenges using scISOseq to improve 

gene annotations. Then, I used these methods to study the unique craniofacial morphology 

(elongate head and loss of teeth) of syngnathids. I identified the cell types present and signaling 

genes expressed at the start of craniofacial elongation in Gulf pipefish. Additionally, I found 

unique changes in signaling gene expression patterns during early craniofacial development of 

Gulf pipefish. This thesis discovered local modifications (select Fgf, Wnt, and Bmp genes) in 

pipefish craniofacial development, but found no evidence for global gene network re-wiring. 

Overall, this study suggests that syngnathid’s highly derived heads evolved from local network 

re-wiring. 

 
This dissertation includes previously published and unpublished co-authored material. 

 
4 
 

ACKNOWLEDGMENTS 
 

I am extraordinarily lucky to have worked with and been supported by amazing people 

during my dissertation. My advisor, Dr. Bill Cresko, has been instrumental for my scientific 

growth and achievement; he has always supported my scientific goals and pushed me to grow as 

a researcher. My advancement would not have been possible without my role models, Dr. Susie 

Bassham and Dr. Emily Beck, who showed me how to be a strong, successful woman in science, 

provided scientific guidance, and extended opportunities to me.  I am also lucky to have worked 

with Dr. Clay Small, who has always provided key scientific input. Additionally, I thank Mark 

Currey and Tiffany Thornton (and many undergraduate assistants) for taking excellent care of all 

the fish used in this dissertation.  I was fortunate to ford through graduate school with other 

Cresko Lab graduate students: Sophia Frantz, Shannon Snyder, Carla Campos, and Amberly 

Buer. Additionally, I am grateful for all the undergraduates who supported my work: Starla 

Chambrose, Thomas Skates, Micah Woods, Vithika Goyal, and Hayden Penn. Lastly, I want to 

thank rotation students who have inspired and assisted with the work from these chapters: 

LyAndra Lujan, Clara Rehmann, and Aubrey Mayer. 

I am grateful for the support and guidance of my committee: Dr. Kirstin Sterner, Dr. 

Nadia Singh, Dr. Matt Barber, and Dr. John Postlethwait. Additionally, I am lucky to have had a 

second home in the Bioinformatics and Genomics Master’s Program, and thank Dr. Stacey 

Wagner, Dr. Leslie Coonrod, Pete Batzel, and Jason Sydes. I am also grateful for the BGMP 

cohort from 2019-2020, especially Wyatt Eng, who grew with me as bioinformaticians. I also 

want to thank our wonderful collaborators from University of Idaho, Dr. Adam Jones and Dr. 

Balan Ramesh, for truly making this work possible. I am grateful for additional collaborators and 

scientific mentors including Dr. Thomas Desvignes, Dr. Angel Amores, Dr. Martin Stervander, 

Dr. Ingo Braasch, Dr. Andrew Thompson, and Dr. Emily Rose. 

I would further like to thank my community here in the Institute of Ecology and 

Evolution and the Department of Biology. I am grateful for support from the Genetics Training 

Program, the GrEBES student group, and zebrafish groupie. But most importantly, I am grateful 

for my IE2 cohort: Victoria Caudill, Gabrielle Coffing, Aidan Short, and Caitlin Smith. I am also 

thankful for crucial administrative support from Sara Nash, Arlene Deyo, and Leah Frazier.  


5 
 

Finally, I am thankful for Women in Graduate Sciences, which brought together a supportive 

community ready to make positive impacts on our local community.  

I am grateful for the funding that made this research possible. Specifically, funding from 

the Genetics Training Program (NIH	T32GM007413),	F31	funding	from	NIDCR	

(5F31DE032559-02),	University	of	Oregon	Research	Excellence	funds	(Bill	Cresko),	and	a	

National	Science	Foundation	Grant	(OPP-2015301;	Bill	Cresko,	Susie Bassham, and Clay 

Small). 

Finally, I want to thank all my loved ones for supporting and making this dream possible.  

  
6 
 

DEDICATION 
 

For my grandfather, who gave me a love for learning  
 
 
7 
 

TABLE OF CONTENTS 

Chapter Page 
 
 
I. INTRODUCTION ....................................................................................................  14 

 Introduction ...........................................................................................................  17 

 References .............................................................................................................  17 

II. SINGLE-CELL ISO-SEQUENCING ENABLES RAPID GENOME ANNOTATION FOR 

SCRNASEQ ANALYSIS ...........................................................................................  18 

 Introduction ...........................................................................................................  18 

 Materials and Methods ..........................................................................................  19 

 Tissue Dissociation to Generate a Pool of Single Cells ..................................  19 

 Library Preparation ..........................................................................................  20 

 PacBio Data Processing ...................................................................................  20 

 Annotation Analysis ........................................................................................  22 

 ScRNAseq Analysis ........................................................................................  22 

 Results and Discussion ..........................................................................................  23 

 ScISOr-Seq Captured Novel Isoforms and Improved 3’ UTR ........................  23 

 ScISOr-Seq Improved the Number of Reads Retained for ScRNAseq ...........  25 

 ScISOr-Seq Enhances Biological Interpretation of Cell Clusters ...................  28 

 Conclusion .............................................................................................................  30  

 References .............................................................................................................  32 

 Bridge ....................................................................................................................  36 

 
8 
 

Chapter Page 
 
 
III. SINGLE CELL RNA SEQUENCING PROVIDES CLUES FOR THE DEVELOPMENTAL 

GENETIC BASIS OF SYNGNATHID FISHES EVOLUTIONARY ADAPTATIONS 

 .....................................................................................................................................  37 

 Introduction ...........................................................................................................  37 

 Materials and Methods ..........................................................................................  40 

 Single Cell RNA Sequencing Libraries Preparation .......................................  40 

 Single Cell Atlas Construction ........................................................................  41 

 Single Cell Atlas Cluster Identification ...........................................................  42 

 Differentiation State Analysis .........................................................................  44 

 Single Cell KEGG Analysis ............................................................................  44 

 Single Cell Atlas Network Analysis ................................................................  44 

 In situ Hybridization ........................................................................................  45 

 Bone and Cartilage Staining ............................................................................  46 

 Gene Cluster Analysis .....................................................................................  46 

 Results ...................................................................................................................  47 

 Valuable cRNAseq Atlas for Studying Syngnathid Development ..................  47 

 Discovery of Cell Cluster Function and State Using KEGG Analysis ...........  49 

Commonalities of Cell Clusters, Unique Networks, and Elusive Cell Types Identified in 

Network Analysis ............................................................................................  51 

Conserved Signaling Pathways Are Active during Syngnathid Craniofacial  

Development.. ..................................................................................................  53 


9 
 

Gulf Pipefish Retain Tooth Development Genes but Likely Lack Onset of Tooth 

Development ....................................................................................................  55 

 Tooth and Skeletal Genes Are Expressed during Dermal Armor Development  58 

Epithelial Expression of Immune and Nutrient Processing Genes May Facilitate Embryo-

Paternal Interactions in the Brood Pouch ........................................................  59 

 Discussion ..............................................................................................................  61 

 Our atlas Represents a Novel Resource for Evo-Devo Research ....................  62 

 Conserved Pathways May Contribute to Derived Syngnathid Heads .............  62 

No Evidence of Primordia Suggests Changes in Early, Not Late, Tooth Development Are 

Likely at the Root of Evolutionary Tooth Loss ...............................................  64 

 Redeployment of the Bone Gene Network to Build Dermal Armor ...............  65 

 Signatures of Embryonic Interactions within the Novel Pouch Environment  65 

 Conclusion .............................................................................................................  67 

 References .............................................................................................................  68 

 Bridge ....................................................................................................................  81 

IV. REDUNDACY AND PATHWAY EVOLUTION IN SYNGNATHID FISH 

CRANIOFACIAL GENE NETWORKS ....................................................................  82 

 Introduction ...........................................................................................................  82 

 Materials and Methods ..........................................................................................  84 

       Production of ScRNAseq Sequencing Libraries .............................................  84 

       Production of ATACseq Sequencing Libraries ...............................................  85 

       ScRNAseq Library Initial Processing .............................................................  85 

       Ortholog Identification ....................................................................................  85 


10 
 

       ScRNAseq Atlas Construction ........................................................................  86 

       ScRNAseq Trajectory Analysis .......................................................................  87 

       ScRNAseq Differentially Expressed Gene Analysis .......................................  87 

       ScRNAseq Gene Network Analysis ................................................................  87 

       ScRNAseq Cross-Species Correlation Analysis .............................................  87 

       ATACseq Read Processing and Analysis ........................................................  88 

       Whole Genome Alignment and Identification of Conserved Elements ..........  88 

       In situ Hybridization ........................................................................................  89 

 Results ...................................................................................................................  89 

       Successful Integration of Gulf Pipefish, Stickleback, and Zebrafish Single Cell Dataset…

  ............................................................................................................................... 89 

 Cranial Neural Crest Pharyngeal Arch Core Gene Regulatory Network is Largely   

Conserved in Gulf Pipefish .............................................................................  92 

Gene Losses and Potential Novel Genes in Gulf Pipefish Pharyngeal Arch Cells Alter  

Gene Regulatory Networks .............................................................................  94 

       GENESPACE Analysis Identifies Candidates for Broader Syngnathid Evolution…

  ............................................................................................................................... 95 

Numerous Candidate Genes Identified from Global Analyses, but few are Promising for  

Craniofacial Evolution .....................................................................................  96 

       Gulf Pipefish has Novel Fgf Expression in Pharyngeal Arch Cells ................  96 

Potential Changes in Gene Sequence and Regulation may Lead to Gulf Pipefish      

Expression Differences ....................................................................................  98 

 Discussion ..............................................................................................................  100 


11 
 

Extensive Teleost Genome Duplication Paralog Evolution but no Evidence for Global 

Re-wiring in Pipefish CNC .............................................................................  100 

       Local Gene Regulatory Network Re-wiring through Co-option of Fgf23 in Pipefish…

  ............................................................................................................................... 101 

       Evolution of Redundancy in Fgf pathway may Enable Genome Evolution ...  102 

 References .............................................................................................................  104 

V. CONCLUSION ......................................................................................................  112 

APPENDICES .............................................................................................................  115 

 A. SUPPORTING INFORMATION FOR CHAPTER II .....................................  115 

 B. SUPPORTING INFORMATION FOR CHAPTER III ....................................  117 

 C. SUPPORTING INFORMATION FOR CHAPTER IV ....................................  198 


12 
 

LIST OF FIGURES 
 
Figure Page 
 
 
2.1 SQANTI3 Classification of ScISOr-Seq Data Revealed that the Majority of Isoforms were 

Previously Unannotated in the Stickleback Genome .................................................. 24 

2.2 ScISOr-Seq and Bulk Iso-Seq Captured a Limited Number of Annotated Transcripts, but 

Both Resulted in an Increase in Reads Mapping to the Transcriptome ....................... 25 

2.3 ScISOr-Seq Improved Gene Models by Extending the 3’UTRs Illustrated Here by the Two 

Models for Nkx2.3 ....................................................................................................... 26 

2.4 Annotation improvements using Iso-Seq (bulk Iso-Seq or ScISOr-Seq) data extended genes 

both 5’ and 3’, with 3’ expansions leading to greater numbers of reads counted. ...... 27 

2.5 Using the Updated Gene Models from Our Merged and Pooled Dataset, We Identified 

Expected Cell Types in 70hpf Stickleback ScRNAseq Data and Illustrated How Annotations 

Improvements Are Relevant for Analysis ................................................................... 29  

3.1 Gulf Pipefish Exemplify Syngnathid Derived Traits ............................................ 40 

3.2 Gulf Pipefish Single Cell Atlas Contains Cells from the Entire Embryo and Identifies Genetic 

Pathways Active in Different Cell Types .................................................................... 48 

3.3 Weighted Gene Network Analysis (WGCNA) Identifies Gene Modules that Define and Unite 

Cell Clusters ................................................................................................................ 50 

3.4 Conserved Cell Types and Gene Pathways Build Unique Faces of Syngnathids. 52 

3.5 Pipefish Do Not Possess Identifiable Tooth Primordium Cells, but Continue to Express Tooth 

Development Genes in Other Contexts ....................................................................... 57 

3.6 Tooth and Bone Development Genes Expressed during Exoskeleton Development  59 

3.7 Gene Expression Signatures Suggest Embryonic Interactions within Brood Pouch 

Environment ................................................................................................................ 60 

4.1 Two Hypotheses for Syngnathid Craniofacial Development and Evolution ........ 83 


13 
 

4.2 Integrated Atlas Encompasses Cells from the Entire Embryo .............................. 91 

4.3 Number of Cells per Cluster Varied by Species .................................................... 92 

4.4 Pharyngeal Arch Cells Express Highly Conserved Core Genetic Program .......... 94 

4.5 Novel Fgf and Wnt Pathway Expression Changes in Gulf Pipefish CNC Pharyngeal Arch 

Cells.  ........................................................................................................................... 98 

4.6 Potential Changes in Gene Sequence and Regulation may Lead to Gulf Pipefish Expression 

Differences .................................................................................................................. 99 

 
14 
 

CHAPTER I 

INTRODUCTION 

 
The beauty of life is one of evolution’s greatest mysteries. Solving life’s riddles may be 

impossible, but uncovering the underlying mechanisms that create the incredible is worth it.  In 

this thesis, I attempted to understand the extraordinary: the development and evolution of highly 

derived traits.  

Highly derived traits appear across the diversity of life and often allow species to adapt to 

new niches. Although the ecological and evolutionary consequences of these traits is 

occasionally clear, their developmental genetic underpinnings are often challenging to discern. In 

practice, finding these needles in the haystack requires comparisons to close outgroups, 

hypothesis driven sampling, and different modalities of data. New genome assemblies have 

recently enabled these types of analyses in the charismatic clade of fishes known as syngnathids. 

These popular fishes include seahorses, pipefishes, and seadragons. They are known for their 

derived traits including male pregnancy, dermal armor, elongated snouts, and jaws without teeth. 

However, the developmental basis of these traits is unknown. 

Syngnathids’ genomes have been extensively studied to identify candidates for their trait 

evolution, and numerous candidates have been uncovered through these approaches. Notably, 

syngnathids have lost fgf3 and fgf4, two deeply conserved signaling genes involved in 

craniofacial development (Small et al. 2022). The loss of fgf3 is particularly intriguing given that 

the absence of fgf3 in zebrafish is lethal (Herzog et al. 2004). Additionally, syngnathids have lost 

genes related to tooth development including eve1 and most scpp genes (Small et al. 2016; Lin et 

al. 2016; Zhang et al. 2020). Beyond these protein coding losses, conserved non-coding elements 

(including in dlx1a and near dlx2a) and miRNAs (including mir130 clusters and mir196b) are 

also missing in syngnathids (Small et al. 2015; Small et al. 2022). Connecting these genomic 

losses with syngnathid craniofacial evolution requires transcriptomic data to uncover whether 

these losses led to shifts in developmental networks. 

To probe developmental transcription patterns in syngnathid fishes, I used a sequencing 

method called single cell RNA sequencing (scRNAseq). This technique determines the RNA 

expression profile of individual cells from a sample. Using these data, cell identities can be 

fleshed out post-hoc. Therefore, using this approach in non-traditional systems, including 


15 
 

syngnathids, is powerful because expression patterns of cell types can be deduced without 

dissection or genetic modifications. These datasets can be used to determine the cell types 

present at different stages of development, expression patterns of signaling pathway genes, and 

detect genetic networks.  

In Chapter II, I identified a limitation of scRNAseq in non-traditional organisms and 

solved this problem using single cell ISO-sequencing (scISOseq). Although numerous 

developmental scRNAseq atlases exist for model systems, there are few atlases in non-model 

systems. Due to the 3’ bias of scRNAseq reads, many sequencing reads land in the 3’ UTR of 

gene models which are often poorly annotated in non-model organisms. I addressed this 

limitation using scISOseq, a sequencing method that used leftover cells from the scRNAseq 

library to produce full length RNA reads. Using this method, I lengthened 3’ UTR annotations in 

the stickleback genome assembly leading to a 26.1% improvement in the number of scRNAseq 

reads counted. These findings suggest that scRNAseq is feasible in non-model systems but 

requires careful planning to ensure proper read quantification. This chapter has been published 

and is co-authored with Dr. Susan Bassham and Dr. William Cresko (Healey et al. 2022).  

In Chapter III, I explored the developmental genetic basis of numerous syngnathid 

derived traits using scRNAseq. Although genomic candidates have been identified for their 

derived traits, there are few studies that examine how these changes relate to development. In 

this study, I developed a scRNAseq atlas that probed the elongate snout, loss of teeth, dermal 

armor, and male pregnancy (from the embryonic lens). Using Gulf pipefish embryos from early 

craniofacial elongation, I identified candidate genes (sfrp1a, bmp4, and prdm16) expressed in 

mesenchyme above the lengthening ethmoid plate in the face. Next, I searched for tooth 

primordial cells to determine whether tooth development initiates in syngnathids. Although I did 

not find these tooth primordial cells, expression of select tooth genes (dlx3b and scpp1) was 

observed in the developing dermal armor. Finally, I identified an enrichment of immune and 

metabolism genes in the embryo epidermis, potentially indicating nutrient uptake from the brood 

pouch. These findings suggest that syngnathid highly derived traits emerge from re-deployment 

of signaling pathways. This chapter has been published and is co-authored with Hayden Penn, 

Dr. Clay Small, Dr. Susan Bassham, Micah Woods, Vithika Goyal, and Dr. William Cresko 

(Healey et al. 2024). 


16 
 

In Chapter IV, I asked whether syngnathids have changes in the pharyngeal arch cells 

during early craniofacial development that could relate to their unique morphology. In laboratory 

and evolutionary models, changes in gene expression or numbers of cranial neural crest cells that 

build the pharyngeal arches can lead to changes in head shapes. Since syngnathids have an 

evolutionary knockout of fgf3, a gene essential during pharyngeal arch development, I probed the 

extent of genetic network changes in syngnathid pharyngeal arch cells. Using comparative 

scRNAseq, I identified that Gulf pipefish have largely conserved pharyngeal arch gene networks. 

Furthermore, I discovered that fgf22 from the fgf3 sub-family had evolved pharyngeal arch 

expression prior to the loss of fgf3 in syngnathids. These findings suggest that redundancy 

permitted the loss of fgf3 in syngnathids and, furthermore, there has not been extensive 

craniofacial gene regulatory network re-wiring in these fishes. This chapter is co-authored with 

Clara Rehmann, Dr. Susan Bassham, and Dr. William Cresko. 

Together, these chapters form the foundation for successful developmental scRNAseq 

atlases in non-traditional systems. Chapters III and IV contribute significantly to the syngnathid 

literature by providing the first transcriptomic insights into their unique development. 

 
17 
 

References 

Healey, Bassham, Cresko. 2022. Single-cell Iso-Sequencing enables rapid genome annotation for 

scRNAseq analysis. Genetics 220 (3): iyac017. 

Healey, Penn, Small, Bassham, Goyal, Woods, Cresko. 2024. Single Cell RNA Sequencing 

Provides Clues for the Developmental Genetic Basis of Synganthidae’s Evolutionary 

Adaptations. eLIFE. 

Herzog, W., Sonntag, C., von der Hardt, S., Roehl, H. H., Varga, Z. M., & Hammerschmidt, M. 

(2004). Fgf3 signaling from the ventral diencephalon is required for early specification and 

subsequent survival of the zebrafish adenohypophysis. Development, 131(15), 3681–3692.  

Lin, Q., Fan, S., Zhang, Y., Xu, M., Zhang, H., Yang, Y., Lee, A. P., Woltering, J. M., Ravi, V., 

Gunter, H. M., Luo, W., Gao, Z., Lim, Z. W., Qin, G., Schneider, R. F., Wang, X., Xiong, 

P., Li, G., Wang, K., … Venkatesh, B. (2016a). The seahorse genome and the evolution 

of its specialized morphology. Nature, 540(7633), 395–399.  

Small, C. M., Bassham, S., Catchen, J., Amores, A., Fuiten, A. M., Brown, R. S., Jones, A. G., & 

Cresko, W. A. (2016). The genome of the Gulf pipefish enables understanding of 

evolutionary innovations. Genome Biology, 17(1), 258. https://doi.org/10.1186/s13059-016-

1126-6 

Small, Clayton M., Healey, H. M., Currey, M. C., Beck, E. A., Catchen, J., Lin, A. S. P., Cresko, 

W. A., & Bassham, S. (2022). Leafy and weedy seadragon genomes connect genic and 

repetitive DNA features to the extravagant biology of syngnathid fishes. Proceedings of the 

National Academy of Sciences of the United States of America, 119(26), e2119602119.  

Zhang, Y., Ravi, V., Qin, G., Dai, H., Zhang, H., Han, F., Wang, X., & Liu, Y. (2020a). 

Comparative genomics reveal shared genomic changes in syngnathid fishes and 

signatures of genetic convergence with placental mammals ! Phylogenomics and 

evolutionary rate Transposable elements in syngnathids Expansion and contraction of 

gene families in t. National Science Review, 7(6), 964–977.  

 
18 
 

CHAPTER II 

SINGLE-CELL ISO-SEQUENCING ENABLES RAPID GENOME ANNOTATION FOR 

SCRNASEQ ANALYSIS 

 
From Healey, Bassham, Cresko. 2022. Single-cell Iso-Sequencing enables rapid genome 

annotation for scRNAseq analysis. Genetics 220 (3): iyac017.  

 
This chapter was published in Genetics. Susan Bassham and William Cresko are co-authors on 

this publication. The project originated from conversations between William Cresko and me. I 

prepared the sample for sequencing, performed the scRNAseq and scISOr-Seq analysis, and took 

the lead on writing the manuscript. Susan Bassham contributed to the analysis and made writing 

contributions. William Cresko oversaw the analysis, contributed to writing, and acquired 

funding. 

 
The full supplemental material for this publication can be found at 

https://doi.org/10.25386/genetics.16811614.v1 

 
Introduction 

Single cell RNA sequencing (scRNAseq) is a revolutionary technique in biology that 

provides expression information from tissues and embryos (Shapiro et al. 2013). By barcoding 

RNA from individual cells directly from dissociated samples, scRNAseq allows for post hoc 

analysis of cell types and can be used to ascertain novel cell populations, explore developmental 

trajectories, and define gene regulatory networks (Luecken and Theis 2019).  

To maximize the utility of scRNAseq datasets, however, 3’ UTRs must be annotated for 

several reasons. ScRNAseq captures transcripts through poly(A) tails leading to a 3’ bias in 

coverage (Hwang et al. 2018). Partially annotated genes may be represented in a dataset with 

lower read counts, leading to erroneous conclusions regarding their magnitude of expression 

across cell types. In addition, downstream scRNAseq analysis clusters cell types through the 

determination of a covariance structure across highly variable genes (Stuart et al. 2019). The 

systematic absence of such genes can lead to inferential errors in multivariate analyses and 

obscure biological reality.   


19 
 

The annotations of even intensively studied models, such as mice and zebrafish, continue 

to be improved (Gupta et al. 2018; Lawson et al. 2020). Most other organismal genome 

assemblies are even less well annotated. Facilitating a broader utility of scRNAseq requires 

more efficient methods for 3’ UTR annotation. Recently, full length, single-molecule isoform 

sequencing (Iso-Seq) has been used to improve genome annotations (Kuo et al. 2017; Beiki et 

al. 2019; Kuo et al. 2020; Ali et al. 2021). PacBio’s Iso-seq has been further adapted to use the 

10x Genomics platform for scRNA barcoding (ScISOr-Seq) to track cell type specific isoform 

expression (Gupta et al. 2018; Zheng et al. 2020).   

Here we show that ScISOr-Seq in the context of a scRNAseq experiment allows rapid 

3’UTR annotation in threespine stickleback fish (Gasterosteus aculeatus). This fish has long 

been a focus of study in behavior, ecology and evolution (Bell and Foster 1994; Colosimo et al. 

2004; Cresko et al. 2004a; Shapiro et al. 2004; Cresko et al. 2007; Hohenlohe et al. 2010; Reid 

et al. 2021), and is now a nascent system for biomedical research (Miller et al. 2007; Gardell et 

al. 2017; Small et al. 2017; Beck et al. 2020; Beck et al. 2021; Fuess et al. 2021). Although 

stickleback have a well assembled genome, its 3’ UTR annotations are incomplete which limits 

scRNAseq’s utility. We demonstrate that a single PacBio SMRT cell of ScISOr-Seq data is 

sufficient to significantly improve the stickleback annotations to an extent on par with zebrafish 

for the purpose of scRNAseq analysis at this stage. Our findings demonstrate that ScISOr-Seq 

will be a useful tool to efficiently improve genome annotations for scRNAseq in many 

organisms.  

 
Materials and Methods 

Tissue Dissociation to Generate a Pool of Single Cells  

We crossed a laboratory line of stickleback originally isolated from Cushman Slough 

(Oregon) and raised embryos to 70 hours post fertilization (hpf) at 20°C using standard 

procedures from the Cresko Laboratory Stickleback Facility (Cresko et al. 2004b). 70hpf 

stickleback embryos are developmentally at an equivalent stage to 24 hpf zebrafish (Cresko et al. 

2003). We euthanized 36 embryos in MS-222 following IACUC approved procedures then 

dechorionated and deyolked them at room temperature. We limited our embryo dissection to 20 

minutes based on (Farrell et al. 2018). Following protocols from (Bresciani et al. 2018), we 

dissociated the cells for 6 minutes in 0.25% trypsin in PBS at 30°C, pipetting up and down every 


20 
 

30 seconds, then, stopped the dissociation using 10% FBS DMEM and spun down cells at 400xG 

for 3min at 4°C. We resuspended cells in 1mL of 0.1% BSA PBS, centrifuged at 400xG for 3min 

at 4°C, and resuspended in 100ul 0.04% BSA in PBS. At room temperature, we filtered cells 

through a 40uM cell strainer (Thomas scientific 1181X52) then washed the original tube twice 

with 100ul of 0.04% BSA in PBS and poured over the same cell strainer.   

Library Preparation  

The scRNAseq and ScISOr-Seq libraries were prepared and sequenced by the University 

of Oregon Genomics and Cell Characterization Core Facility (https://gc3f.uoregon.edu). The 

dissociated cells were diluted to 800 cells/ul using 0.04% BSA in PBS to target 10,000 cells for 

the 10X Genomics Single Cell 3' Genome Expression (GEX) mRNA-Seq prep with v3.1 

NextGem chemistry. The single 10X preparation was split to create the scRNAseq and ScISOr-

Seq libraries. The scRNAseq sample was sequenced on 1/7th of a single S4 lane on a NovaSeq 

6000 platform (Illumina). For the ScISOr-Seq library, 400ng of the amplified 10X cDNA was 

used as input for the SMRTbell Express Template Prep Kit 2.0 (P/N 100-938-900) without 

reamplification. Sample specific barcode (ATATAGCGCGCGTGTG) was added using the 

Barcoded Adapter Kit 8B—OVERHANG (P/N 101-628-500). The ScISOr-Seq library was 

sequenced on a single SMRT Cell 8M on the PacBio Sequel II platform using the v4 primer, 

v2.1 polymerase, 1 hour binding, 30-hour movie, and 2-hour pre-extension time at a loading 

concentration of 100pM.   

PacBio Data Processing  

The University of Oregon Genomics and Cell Characterization Core Facility 

(https://gc3f.uoregon.edu) generated “circular consensus” reads for our ScISOr-Seq dataset (ccs -

j 39 --min-passes 3 --min-snr 2.5 --min-length 10 --max-length 50000 --min-rq 0.99; v6.6.0) and 

used lima (-j 39 –isoseq; v2.2.0) to remove a sample specific barcode 

(ATATAGCGCGCGTGTG) as a part of the PacBio SMRT Analysis software (Supplemental file 

6). After barcode clipping, we used a custom ScISOr-Seq processing script 

(scISOr_Seq_processing.py) that removed the sample primers (3p: 

CTACACGACGCTCTTCCGATCT; 5p: CCCATGTACTCTGCGTTGATACCACTGCTT), 

removed and saved cell and UMI barcodes, removed poly(A) tails, then filtered out duplicated 

reads. This script outputs reads that had the expected presence and orientation of primers and 


21 
 

polyA tails for downstream analysis. The script additionally outputs reads without the expected 

primers in a separate file; however, these other reads were not used for further analysis.  

We aligned reads to the stickleback genome (BROAD S1, 104.1 database version) using 

minimap2 (v2.7; parameters: -ax splice, -uf, -06,24, -B4; (Li 2018). We clustered for unique 

transcripts with collapse_isoforms_by_sam.py script (c=.99, i=.95; (Tseng 2021) from cDNA 

Cupcake (v27.0.0), classified transcripts with the sqanti3_qc.py script from SQANTI3 (v4.2), 

and refined transcripts with sqanti3_RulesFilter.py script (Tardaguila et al. 2018). We used the 

sqanti_classification.filtered_lite.gtf that resulted from the refining step (Supplemental file 2) to 

run Cell Ranger from 10x Genomics (v3.2.0) as described below.  

We downloaded the raw data from (Naftaly et al. 2021) from NCBI. This adult Iso-Seq 

dataset contains gonad, pronephros, brain, and liver reads from both sexes, a total of 16 SMRT 

cells. We processed reads from this dataset with the following steps: we generated circular 

consensus reads with ccs (--min-rq=.9; v6.0.0), clipped barcodes listed in Naftaly, Pau, and 

White (2021) with lima (--isoseq –dump-clips; v2.2.0), removed polyA tails with isoseq3 refine 

(--require-polyA; v3.4.0-0), and clustered reads with isoseq3 cluster (v3.4.0-0) from the PacBio 

SMRT Analysis software (Supplemental file 6). For the analysis with solely these data, we 

aligned reads to the stickleback genome (BROAD S1, 104.1 database version), collapsed 

transcripts with cDNA Cupcake (v28.0.0), and filtered and classified with SQANTI3 (v4.2) as 

done with the ScISOr-Seq data. Again, we used the sqanti_classification.filtered_lite.gtf from 

SQANTI3 (Supplemental file 3) to run Cell Ranger as described below. Due to reduced novel 

and increased annotated genes relative to (Naftaly et al. 2021), we additionally tried using 

minimap2-2.15 with -ax splice -uf -C5 -secondary=no to match their parameters; however, we 

observed negligible changes. We propose that differences in our analyses arise from different 

earlier processing steps, different versions of stickleback gene models, and an updated version of 

SQANTI.  

We pooled both sets of Iso-Seq reads and merged it with existing Ensembl annotations to 

create an improved version of the Ensembl annotations. Previously, we created a modified 

version of the Ensembl annotations where we extended the 3’ UTRs (Supplemental files 7,8) for 

several marker genes (tbx16, sox10, sox32, and eya1) and fgf/fgfr genes (fgfr1a, fgfr1b, fgfrl1a, 

FGFRL1, fgfr2, fgfr3, fgfr4, fgf3, fgf4, fgf16, fgf17, fgf6, fgf6a, fgf8a, and fgf8b). Because we 

lacked Iso-Seq reads for fgf4, we used this modified gtf for the merging (Supplemental file 8). 


22 
 

We completed the processing of the ScISOr-Seq and bulk Iso-Seq data separately as described 

above then combined the files prior to alignment. We aligned the combined files with minimap2, 

collapsed transcripts with cDNA cupcake, and refined and classified isoforms with SQANTI3 as 

explained above.   

We merged the sqanti_classification.filtered_lite.gtf resulting from SQANTI3 with the 

Ensembl annotation using TAMA (Kuo et al. 2020). To prepare the gtf files for TAMA Merge, 

we converted them to bed files using bedparse gtf2bed (--extraFields gene_id). Then, modified 

the output with awk to rearrange the columns of the bed file such that the gene id was separated 

from the transcript id by a semicolon (awk -v OFS=’\t’ ‘{print $1,$2,$3, $13 “;” $4, $5, 

$6,$7,$8,$9,$10,$11,$12}’). We merged with TAMA’s script tama_merge.py (-s ensembl -cds 

ensembl -d merge_dup). We applied the ensembl gene names to genes with corresponding IDs 

using a custom script (tama_associating_ensembl_ids_with_genes.py). For our final gtf file, we 

converted the output bed file back to a gtf with TAMA’s 

tama_convert_bed_gtf_ensembl_no_cds.py (Supplemental file 4). We also generated a gtf file 

specifically for scRNAseq analysis where mitochondrial genes started with “MT” by 

transforming the bed file with a custom script 

(tama_associating_ensembl_ids_with_genes_for_scRNAseq.py) and then converting the output 

back to a gtf in the same fashion as above (Supplemental file 5).  

Annotation analysis  

We assessed the Iso-Seq improvements to Ensembl gene models in two ways: 1) 

differences in starting and ending positions and 2) differences in reads captured. We compared 

the 5’ start and 3’ end of Ensembl and Iso-Seq generated gene models with a custom script 

written by H.M.H. (tama_observations_bed.py). Using the same custom script, we also 

compared the reads present in the first and last exons between Ensembl and Iso-Seq gene 

models.   

scRNAseq analysis  

We quantified gene counts using 10x Genomics Cell Ranger v3.0.2 (Supplemental file 1). 

We generated a reference using mkgtf and mkref, retaining protein coding and non-protein 

coding genes, and then aligned and counted reads using the stickleback genome (BROAD S1, 

104.1). We analyzed the counts using the Seurat package (v3.2.3; (Stuart et al. 2019) on R 

(v4.0.2). We retained all cells for the analysis. We normalized the counts using SCTransform and 


23 
 

regressed out the mitochondrial genes using the glmGamPoi method (Hafemeister and Satija 

2019). Based on the inflection point in the elbow plot of the PCA results, we chose 38 

dimensions for generating the UMAP and identifying clusters. We identified cluster identities 

using three marker gene approaches.   

First, we searched for marker genes identified in Farnsworth et al. (2020) for specific cell 

types found in a similar developmental stage from zebrafish (24hpf in zebrafish). Next, we 

identified markers with Seurat’s FindAllMarkers searching for genes that were positively 

upregulated in each cluster compared to the other identities, in a minimum of 25% of cells, and a 

log fold change threshold of 0.25. Finally, we identified markers with Seurat’s FindMarkers 

searching for genes that were positively upregulated in each cluster versus all the other cells. For 

the second and third approaches, we searched for the homologous gene in zebrafish and used 

expression data from zfin to predict which cell types expressed the gene. To quantify frizzled 

gene counts, we used FetchData to isolate raw counts for each gene and the number of cells 

expressing the gene. To determine the number of cell clusters each frizzled gene was expressed 

in, we saved a dotplot, subsetted for clusters where the percent of cells expressing the gene was 

greater than 10%, and then counted the number of clusters left for each gene.  

Results and Discussion 

ScISOr-Seq captured novel isoforms and improved 3’ UTRs  


24 
 

A ScISOr-Seq library was produced using dissociated cells from 70 hpf stickleback 

embryos. The ScISOr-Seq reads were classified with SQANTI3 (Tardaguila et al. 2018) using 

Ensembl gene models to describe isoforms based on how well they match splice variants (Figure 

2.1A). The most common structural category (45.15% of isoforms) was “novel not in catalog” 

containing at least one new splicing site relative to the existing annotation (Figure 2.1A and 

2.1B). 17.42% of collapsed isoforms were “full splice matches” (FSM) that contained splicing 

sites and exons present in the Ensembl annotation but allowed differences in 5’ or 3’ ends. 

Notably, only 3% of the unique isoforms matched the existing gene annotations, while 97% 

improved existing or added gene models. Confirming that 3’ UTRs were poorly annotated, 

46.2% FSM isoforms had alternative 3’ ends and 27.4% had alternative 3’ and 5’ ends relative to 

 
Figure 2.1: SQANTI3 classification of ScISOr-Seq data revealed that the majority of isoforms 
were previously unannotated in the stickleback genome. A) SQANTI3 categorizes isoforms based 
on their match to the reference, the major categories are shown here. B) the distribution of 
ScISOr-Seq isoforms in the major SQANTI3 classes illustrating that the bulk of isoforms are 
novel not in catalog. C) within the novel not in catalog category, isoforms are largely coming from 
cases where there is at least one new splicing donor or acceptor with some coming from cases of 
intron retention (for example, nkx2.3 isoforms from Figure 2). D) the majority of isoforms that are 
full splice matches have at least an alternative 3’ site while few are complete reference matches. 

A

B

C

Full Splice Match

Novel not in catalog

Annotated transcript

Incomplete splice match

Novel in catalog

D

Genic genomic

Antisense


25 
 

Ensembl gene models (Figure 2.1C). Therefore, the existing stickleback annotation - in addition 

to having incomplete 3’ UTRs - is also missing many splice variants. Our work indicates that 

additional ScISOr-Seq or bulk Iso-Seq is necessary to capture these variants as well as prune 

erroneous transcript models from the Ensembl annotation.  

ScISOr-Seq improved the number of reads retained for scRNAseq  

  We initially tested how well the existing stickleback gene models from Ensembl 

(BROAD S1, 104.1 database version) would capture scRNAseq reads. Strikingly, less than 50% 

of reads were retained for downstream scRNAseq analysis (Figure 2.2B; Supplemental file 1). 

Using such an incomplete annotation for scRNAseq would likely result in erroneous 

interpretation of gene expression patterns for genes lacking 3’ UTR annotations.   

Next, we used ScISOr-sequencing data to generate new gene models and tested how well 

these new models would capture 10X based Illumina scRNAseq reads (Supplemental file 2). 

Using the ScISOr-Seq dataset, 13,028 of 22,456 previously annotated genes and 2,942 novel 

genes were identified (Figure 2A). The ScISOr-Seq annotations lead to a notable 26.1% increase 

 
Figure 2.2: ScISOr-Seq and bulk Iso-Seq captured a limited number of annotated 
transcripts, but both resulted in an increase in reads mapping to the transcriptome. 
Ultimately, pooling this data and merging with the ensembl annotation resulted in the largest 
gains in reads mapped. A) the number of genes in common and unique to each Iso-Seq read 
category is compared to the ensembl genome. B) results from cell ranger runs using the 
different genome annotation files illustrating the overall improvements by using Iso-Seq data 
particularly the ScISOr-Seq gene models. 
 

26 
 

in reads retained from scRNAseq compared to the Ensembl gene models alone (Figure 2B; 

Supplemental file 1).  The alignment of scRNAseq reads with existing Ensembl and new 

ScISOr-Seq gene models illustrates that ScISOr-Seq models retain greater numbers of reads due 

to improved annotation of 3’ UTRs (Figure 2.3; Figure S2.1). This result compares favorably 

with current research in the much better studied zebrafish model. Farnsworth, Saunders, and 

Miller (2020) completed their zebrafish atlas with 80% of reads retained in 5dpf (days post 

fertilization) fish. After (Lawson et al. 2020) improved the zebrafish transcriptome using 

RNAseq data, they noted a 4% increase in reads retained that lead to an increase of 2,257 cells 

and 8 clusters using the same 5 dpf atlas (Farnsworth et al. 2020). The similarity in number of 

retained reads in our dataset with those published in zebrafish improves the validity of 

stickleback scRNAseq investigations.  

 
Figure 2.3: ScISOr-Seq improved gene models by extending the 3’ UTRs illustrated here by 
the two models for nkx2.3. scRNAseq reads are shown at the top in grey. The existing ensembl 
model is shown in dark blue. Notably, this model does not overlap with the majority of the 
scRNAseq reads. The ScISOr-Seq reads (in grey) are shown above the respective gene models 
(in red) that they generated. Colored lines on the ScISOr-Seq and scRNAseq reads indicate a 
different base pair in the read than the stickleback reference genome (BROAD S1, 104.1 
database version). These ScISOr-Seq gene models capture all of the scRNAseq reads. 
 

Ensembl

ScISOr-Seq

!"!#$"%%%&'( !")))"*%%&'(

+%,-..&/01234

A 


27 
 

Recently, (Naftaly et al. 2021) published a large bulk analysis of adult stickleback Iso-

Seq data from 16 PacBio SMRT cells and 4 tissue types (gonad, brain, pronephros, and liver). 

We reanalyzed this dataset to test whether we would see comparable improvements in our 

scRNAseq analysis (Supplemental file 3). Although we identified similar overall annotated and 

novel genes, the bulk Iso-Seq annotation from adult stickleback captured 11% fewer reads than 

our embryonic ScISOr-Seq annotation (Figure 2.2A and 2.2B; Supplemental file 1). These 

results are likely due to differential expression of transcripts between the adult and embryonic 

 
Figure 2.4: Annotation improvements using Iso-Seq (bulk Iso-Seq or ScISOr-Seq) data 
extended genes both 5’ and 3’, with 3’ expansions leading to greater numbers of reads 
counted. Comparing the median difference in starting or ending position between the 
Ensembl annotations and pooled Iso-seq annotations shows that bulk Iso-Seq and ScISOr-
Seq reads overall lead to earlier starting positions (median=26, a) and extended ending 
positions (median=94, b). These Iso-Seq differences increased the number of reads counted 
in the final exon of the gene (median=341, d); however, the alterations led to negligible 
changes of reads counted in the first exon (median=0, c), which highlights the 30 bias of 
scRNAseq reads. The x-axis is plotted on a pseudo log scale to account for the negative 
values. 

0

5000

10000

15000

20000

−1
0,0

00
,00

0
0

Final exon read count difference (Iso−Seq minus Ensembl) 
 on psuedo−log scale

 10
,00

0,0
00

−1
0,0

00
,00

0
0

 10
,00

0,0
00

−1
0,0

00
,00

0
   −

10
,00

0

    
 10

,00
0

0

2500

5000

7500

10000

-2,
00

0,0
00 0

2,0
00

,00
0

Ending position difference (Iso−Seq minus Ensembl) 
 on psuedo−log scale

-2,
00

0
 2,

00
0

A B

C D

0

2500

5000

7500

10000

0

Starting position difference (Iso−Seq minus Ensembl) 
 on psuedo−log scale

C
ou

nt
s

-2,
00

0,0
00

2,0
00

,00
0

-2,
00

0
  2

,00
0

!"#$"%&'()$*$("&
+$,-.-"/-

0*1.*$"%&'()$*$("
&+$,-.-"/-

0

5000

10000

15000

20000

-10
,00

0,0
00 0

  1
0,0

00
,00

0

First exon read count difference (Iso−Seq minus Ensembl) 
 on psuedo−log scale

C
ou

nt
s

    
 10
,00

0

    
 -1
0,0

00

2$"13&!4("&5-1#&6(7"*&
+$,-.-"/-

2$.)*&!4("&5-1#&6(7"*&
+$,-.-"/-


28 
 

samples and highlight that even for species with previous Iso-Seq libraries, additional 

sequencing may be needed for scRNAseq analysis in other developmental stages or tissue types. 

Because both our ScISOr-Seq and the bulk Iso-Seq gtf files improved the number of 

scRNAseq reads retained, we pooled them to create a comprehensive annotation and also 

merged this pooled dataset with the Ensembl annotations. We retained all transcripts and united 

them under a single gene model for scRNAseq analysis (Supplemental file 4). The Iso-Seq 

generated gene models extended 3’ ends of transcripts and increased reads counted in the final 

exon (Figure 2.4). This new annotation contained 6,992 genes unique to Ensembl, 15,464 genes 

containing transcripts from Ensembl and Iso-Seq, and 5,206 genes unique to the pooled Iso-Seq 

dataset (Figure 2.2A). Since reads are filtered during analysis, the 6,992 genes unique to 

Ensembl might still be expressed in 70hpf embryos or the adult tissues. For instance, 2,152 

Ensembl annotated genes were removed from the pooled Iso-Seq dataset by SQANTI3’s intra-

priming and template switching filter. Although pooling and merging added 16 SMRT cells of 

Iso-Seq data as well as the full suite of Ensembl annotations, this effort only improved the 

proportion of reads retained by +0.8% beyond the annotation created by just our single ScISOr-

Seq sample (Figure 2.2B; Supplemental file 1). Therefore, the gene models originating from a 

matching ScISOr-Seq library alone could be used for scRNAseq analysis if a system lacks a 

prior annotation.  

ScISOr-Seq enhances biological interpretation of cell clusters  

Using the pooled (ScISOr-Seq and bulk Iso-Seq) annotation that was merged with 

Ensembl annotation, we proceeded with scRNAseq analysis (Supplemental file 5) to test whether 

biologically meaningful cell clusters would be created. We clustered cells by their transcriptional 

profiles into cell identities and analyzed the results with Seurat (Stuart et al. 2019). We identified 

30 clusters of cells with 38 principal components (Figure 2.5A). We putatively annotated cell 

types via distinguishing sets of marker genes based on zebrafish literature (Howe et al. 2013; 

Wagner et al. 2018; Farnsworth et al. 2020).   

Although zebrafish and stickleback diverged ~ 229.9 m.y.a. (Howe et al. 2021), we 

identified similar cell types as observed in existing zebrafish atlases for similar developmental 

stages as we analyzed in stickleback (Wagner et al. 2018; Farnsworth et al. 2020). Additionally, 

we compared the expression of sox9a and sox9b in our atlas to published descriptions of 

stickleback in situ expression at the same stage (Cresko et al. 2003). Supporting our cluster 


29 
 

annotations, we observed expression of both SOX genes in roughly the same cell types as 

defined by in situ hybridization (Figure S2.2A and S2.2B; Cresko et al. 2003). Despite our 

success using a single SMRT cell for ScISOr-Seq for isoform annotation, we would need to 

sequence additional SMRT cells to have enough data to correlate cell types and specific 

isoforms.  

Since the pooled and merged dataset has higher numbers of cells, genes per cell, and total 

genes detected than the Ensembl scRNAseq dataset (Figure 2.2B), assessing the degree to which 

pooled and merged gene models improved the scRNAseq analysis is complex. For instance, a 

higher overall expression of a gene in a cluster might be the result of different clustering 

patterns, more cells retained, and/or more reads counted. We compared the raw counts of 

specific genes, number of cells expressing these genes, and number of clusters where at least 

 
Figure 2.5:  Using the updated gene models from our merged and pooled dataset, we identified 
expected cell types in 70hpf stickleback scRNAseq data and illustrated how annotations 
improvements are relevant for analysis. A) the clustering and cell identities of the scRNAseq 
data is illustrated on the UMAP. B) frizzled gene family expression compared between the 
ensembl annotation scRNAseq analysis and the pooled and merged analysis using total raw 
counts, number of cells expressing each gene, and the number of clusters where over 10% of 
cells are expressing each gene. C) fzd1 gene model is greatly extended in the pooled and 
merged annotation (purple) compared to the ensembl model (blue) which allows for much 
greater counting of reads (grey). D) fzd9b had minor changes in the pooled and merged model 
(purple), a longer 5’ UTR, in comparison with the ensembl annotation (blue) which led to 
minimal increases in reads (grey) counted. The scRNAseq read pileups in C and D have 
colored lines on base pairs that are different from the stickleback reference genome (BROAD 
S1, 104.1 database version). 
 

0 Brain progenitors
1 Unknown cells
2 Brain 1
3 Neural 1
4 Neural 2
5 Neural 3
6 Somites 1
7 Brain 2
8 Developing muscle
9 Somites 2
10 Retinal progenitors
11 Neural 4
12 Epidermis
13 Neural 5
14 Pharyngeal arch endoderm

15 Developing mesenchyme/cartilage
16 Di!erentiating retinal cells
17 Blood
18 Pharyngeal arch neural crest
19 Skeletal muscle
20 Retina
21 Pigment cells
22 Hematopoietic cells
23 Heart
24 Neural 6
25 Digestive system precursors 
26 Endothelial
27 Neural crest
28 Cranial neural crest derived neural cells
29 Putative hatching gland

A

B C

D

Genes

To
tal c

ounts 

 (E
NS) 

To
tal c

ounts 

 (P
&M) # ce

lls
 

expre
ssi

ng 

gene

 (P
&M)# ce

lls
 

expre
ssi

ng 

gene

 (E
NS) # cl

uste
rs 

expre
sse

d    
    

(P&M)
# cl

uste
rs 

expre
sse

d 

 (E
NS)

!"# $%&" !"% #%'& % '!"#$

()!) $#"( !""( !)%& !# !%!"#%&

103 3238 101 1625 0 9!"#$

13 736 13 539 0 2!"#%

157335 3164 35 80!"#&'

689 693 420 424 2 1!"#()

!"#$%"&&&'() !"#$*"&&&'()

ENS

P&M

+#"##%"!&&'() +#"#*&"&&&'()

ENS

P&M

0

1

2

34

56

78

9

10

11

12

1314
15

16

17

18

19

20

21

22

23

24

25
26

27
28

29

!"#* 132 2327 129 1594 0 8
!"#+ 0 200 0 192 0 0

!"#,- 20 20 20 20 0 0
!"#,' 82 685 77 477 0 1

!"#. 1352 1962 753 3 5591

!"#$
*+,

!"#%&
!"#


30 
 

10% of cells expressed them (Figure 2.5B). Since Cell Ranger provided a subset of these values 

on a global scale (Figure 2.2B), we chose to examine changes in gene counts, cell number, and 

number of clusters with a case study: the frizzled genes, a family of Wnt-pathway signaling 

molecules expressed in a wide range of tissues (Huang and Klein 2004; Wang et al. 2016). For 

10/11 frizzled genes, we observed increases in raw counts and number of cells expressing them 

in the pooled and merged annotation (Figure 5B). Some genes, however, exhibited dramatic 

increases (fzd1, fzd2, fzd3b, fzd4, fzd6, fzd7, and fzd8b) while others exhibited minor increases 

(fzd5, fzd9b, and fzd10). Comparing the gene models of fzd1 and fzd9b, we determined which 

factors influenced expression detection. Importantly, the 5’ and 3’ UTRs expanded for fzd1 in the 

merged and pooled annotation, however only the 5’ end of fzd9b was extended (Figure 2.5C and 

2.5D). Based on the read alignments (Figure 2.5C and 2.5D), the increase in counted reads for 

fzd1 was due solely to 3’ UTR changes, while a slight increase in fzd9b reads was due to 5’ end 

extension. In addition to increased read counts, 7/11 genes were expressed in more cell clusters 

in the pooled and merged dataset. Two genes, fzd9b and fzd10, were found in one less cluster. 

We hypothesized that two clusters (containing fzd9b and fzd10) were combined in the merged 

and pooled dataset relative to Ensembl. Overall, this gene family comparison illustrated the 

problem of using an annotation with incomplete 3’UTRs. If the Ensembl dataset’s fzd counts 

were accepted as their true qualitative and quantitative expression, flawed conclusions would 

have been made regarding the contributions of each gene in the family.  

  
Conclusion 

Incomplete 3’UTR annotations can hinder single cell transcriptional profiling studies. In 

addition to reducing the overall number of genes included in an analysis, systematic differences 

in preliminary UTR annotations could lead to significant inferential errors. We illustrated in 

stickleback fish that a minimal ScISOr-Seq dataset, generated concurrently with scRNAseq data, 

was capable of dramatic improvements in retained read counts (+26.1%). Additionally, we 

showed that pooling additional Iso-Seq reads from adult bulk samples and merging with existing 

Ensembl annotations improved scRNAseq reads retained negligibly beyond models solely from 

embryonic ScISOr-Seq (+0.8%). Using the improved annotation for scRNAseq permitted 

identification of cell types and increased the observed expression of numerous genes. Overall, 


31 
 

our work illustrates that ScISOr-Seq is a rapid and cost-effective method to annotate genomes of 

various organisms for scRNAseq and can improve efficacy of biological inferences. 

   
32 
 

References 

Ali A, Thorgaard GH, Salem M. 2021. PacBio Iso-Seq improves the rainbow trout genome    

annotation and identifies alternative splicing associated with economically important 

phenotypes.       Front. Genet. 12.   

Beck EA, Currey MC, Small CM, Cresko WA. 2020. QTL mapping of intestinal neutrophil 

variation in threespine stickleback reveals possible gene targets connecting intestinal 

inflammation and systemic health. G3. 10(2):613–622.   

Beck EA, Healey HM, Small CM, Currey MC, Desvignes T, Cresko WA, Postlethwait JH. 2021 

Jul 29. Advancing human disease research with fish evolutionary mutant models. Trends 

Genet. 38(1):22-44.  

Beiki H, Liu H, Huang J, Manchanda N, Nonneman D, Smith TPL, Reecy JM, Tuggle CK. 2019. 

Improved annotation of the domestic pig genome through integration of Iso-Seq and 

RNA-seq data. BMC Genomics. 20(1):1–19.   

Bell MA, Foster SA. 1994. The evolutionary biology of the threespine. 1st ed. New York: 

Oxford University Press.  

Bresciani E, Broadbridge E, Liu PP. 2018. An efficient dissociation protocol for generation of 

single cell suspension from zebrafish embryos and larvae. MethodsX. 5:1287.   

Colosimo P, Summers B, Kingsley D, Hosemann K. 2004. A simple and efficient microinjection 

protocol for making transgenic sticklebacks. Behaviour. 141(11–12):1345–1355.   

Cresko WA, Amores A, Wilson C, Murphy J, Currey M, Phillips P, Bell MA, Kimmel CB, 

Postlethwait JH. 2004a. Parallel genetic basis for repeated evolution of armor loss in 

Alaskan threespine stickleback populations. Proc. Natl. Acad. Sci. U. S. A. 

101(16):6050–6055.   

Cresko WA, Amores A, Wilson C, Murphy J, Currey M, Phillips P, Bell MA, Kimmel CB, 

Postlethwait JH. 2004b. Parallel genetic basis for repeated evolution of armor loss in 

Alaskan threespine stickleback populations. Proc. Natl. Acad. Sci. 101(16):6050–6055.   

Cresko WA, McGuigan KL, Phillips PC, Postlethwait JH. 2007. Studies of threespine 

stickleback developmental evolution: Progress and promise. Genetica. 129(1):105–126.   

Cresko WA, Yan YL, Baltrus DA, Amores A, Singer A, Rodríguez-Marí A, Postlethwait JH. 

2003. Genome duplication, subfunction partitioning, and lineage divergence: Sox9 in 

stickleback and zebrafish. Dev. Dyn. 228(3):480–489.   


33 
 

Farnsworth DR, Saunders LM, Miller AC. 2020. A single-cell transcriptome atlas for zebrafish 

development. Dev. Biol. 459(2):100–108.   

Farrell JA, Wang Y, Riesenfeld SJ, Shekhar K, Regev A, Schier AF. 2018. Single-cell 

reconstruction of developmental trajectories during zebrafish embryogenesis. Science. 

360(6392).   

Fuess LE, den Haan S, Ling F, Weber JN, Steinel NC, Bolnick DI. 2021. Immune gene 

expression covaries with gut microbiome composition in stickleback. MBio. 12(3).   

Gardell AM, von Hippel FA, Adams EM, Dillon DM, Petersen AM, Postlethwait JH, Cresko 

WA, Buck CL. 2017. Exogenous iodide ameliorates perchlorate-induced thyroid 

phenotypes in threespine stickleback. Gen. Comp. Endocrinol. 243:60–69.   

Gupta I, Collier PG, Haase B, Mahfouz A, Joglekar A, Floyd T, Koopmans F, Barres B, Smit 

AB, Sloan SA, et al. 2018. Single-cell isoform RNA sequencing characterizes isoforms in 

thousands of cerebellar cells. Nat. Biotechnol. 36(12):1197–1202.   

Hafemeister C, Satija R. 2019. Normalization and variance stabilization of single-cell RNA-seq 

data using regularized negative binomial regression. Genome Biol. 20(1):296.   

Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA. 2010. Population 

genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. 

PLoS Genet. 6(2).   

Howe DG, Bradford YM, Conlin T, Eagle AE, Fashena D, Frazer K, Knight J, Mani P, Martin R, 

Moxon SAT, et al. 2013. ZFIN, the Zebrafish Model Organism Database: increased 

support for mutants and transgenics. Nucleic Acids Res. 41(D1):D854–D860.   

Howe KL, Achuthan P, Allen James, Allen Jamie, Alvarez-Jarreta J, Amode MR, Armean IM, 

Azov AG, Bennett R, Bhai J, et al. 2021. Ensembl 2021. Nucleic Acids Research. 

49(D1):D884–D891.   

Huang H-C, Klein PS. 2004. The Frizzled family: receptors for multiple signal transduction 

pathways. Genome Biol. 5(7):1–7.   

Hwang B, Lee JH, Bang D. 2018. Single-cell RNA sequencing technologies and bioinformatics 

pipelines. Exp. and Mol. Med. 50(8).   

Kuo RI, Cheng Y, Zhang R, Brown JWS, Smith J, Archibald AL, Burt DW. 2020. Illuminating 

the dark side of the human transcriptome with long read transcript sequencing. BMC 

Genomics. 21(1):1–22.   


34 
 

Kuo RI, Tseng E, Eory L, Paton IR, Archibald AL, Burt DW. 2017. Normalized long read RNA 

sequencing in chicken reveals transcriptome complexity similar to human. BMC 

Genomics. 18(1):1–19.   

Lawson ND, Li R, Shin M, Grosse A, Onur YMS, Stone OA, Kucukural A, Zhu LJ. 2020. An 

improved zebrafish transcriptome annotation for sensitive and comprehensive detection 

of cell type-specific genes. eLife. 9:1–76.   

Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 

34(18):3094–3100.   

Luecken MD, Theis FJ. 2019. Current best practices in single‐cell RNA‐seq analysis: a tutorial. 

Mol. Syst. Biol. 15(6).   

Miller CT, Beleza S, Pollen AA, Schluter D, Kittles RA, Shriver MD, Kingsley DM. 2007. cis-

Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in 

sticklebacks and humans. Cell. 131(6):1179–1189.   

Naftaly AS, Pau S, White MA. 2021. Long-read RNA sequencing reveals widespread sex-

specific alternative splicing in threespine stickleback fish. Genome Res. 31(8):1486–

1497.   

Reid K, Bell MA, Veeramah KR. 2021. Threespine stickleback: a model system for evolutionary 

genomics. Annu Rev Genomics Hum Genet. 22:357–383.   

Shapiro E, Biezuner T, Linnarsson S. 2013. Single-cell sequencing-based technologies will 

revolutionize whole-organism science. Nat. Rev. Gen. 14(9):618–630.   

Shapiro MD, Marks ME, Peichel CL, Blackman BK, Nereng KS, Jonsson B, Schluter D, 

Kingsley DM. 2004. Genetic and developmental basis of evolutionary pelvic reduction in 

threespine sticklebacks. Nature. 428(6984):717–723.   

Small CM, Milligan-Myhre K, Bassham S, Guillemin K, Cresko WA. 2017. Host genotype and 

microbiota contribute asymmetrically to transcriptional variation in the threespine 

stickleback gut. Genome Biol. Evol. 9(3):504–520.   

Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, 

Smibert P, Satija R. 2019. Comprehensive integration of single-cell data. Cell. 

177(7):1888-1902.e21.   

Tardaguila M, Fuente L de la, Marti C, Pereira C, Pardo-Palacios FJ, Risco H del, Ferrell M, 

Mellado M, Macchietto M, Verheggen K, et al. 2018. SQANTI: extensive 


35 
 

characterization of long-read transcript sequences for quality control in full-length 

transcriptome identification and quantification. Genome Res. 28(3):396.   

Tseng E. 2021. cDNA cupcake.  

Wagner DE, Weinreb C, Collins ZM, Briggs JA, Megason SG, Klein AM. 2018. Single-cell 

mapping of gene expression landscapes and lineage in the zebrafish embryo. Science. 

360(6392):981–987.   

Wang Y, Chang H, Rattner A, Nathans J. 2016. Frizzled receptors in development and 

disease. Curr. Top. Dev. Biol. 117:113.   

Zheng Y-F, Chen Z-C, Shi Z-X, Hu K-H, Zhong J-Y, Wang C-X, Shi W, Chen Y, Xie S-Q, Luo 

F, et al. 2020 Jul 28. HIT-scISOseq: high-throughput and high-accuracy single-cell full-

length isoform sequencing for corneal epithelium. bioRxiv.:2020.07.27.222349.   

 
36 
 

BRIDGE 

Chapter II detailed challenges to using single cell RNA sequencing (scRNAseq) in less 

traditional model organisms. This technique captures reads at the 3’ end of mRNA transcripts, 

therefore it requires high quality 3’ UTR annotations that are often incomplete in non-model 

systems. I harnessed single cell ISO-sequencing data to improve 3’ UTR annotations in 

threespine stickleback. Through lengthening 3’ UTR sequences, I increased the percentage of 

scRNAseq reads retained by 26.5%.  

In Chapter III, I demonstrated the utility of these techniques for analyzing the development 

of highly derived traits in Gulf pipefish. I created a single cell atlas for the Gulf pipefish, using 

single cell ISO-sequencing to improve gene models and single cell RNA-sequencing to quantify 

gene expression. Using this atlas, I studied the development of pipefish’s elongate snout and 

exoskeleton, explored developmental origins of their loss of teeth, and asked whether epidermal 

cells express genes related to development within the brood pouch. This study provided 

preliminary evidence that pipefish’s derived traits may originate from changes within 

developmental gene networks. 
 

37 
 

CHAPTER III 

SINGLE CELL RNA SEQUENCING PROVIDES CLUES FOR THE DEVELOPMENTAL 

GENETIC BASIS OF SYNGNATHID FISHES EVOLUTIONARY ADAPTATIONS 

 
From Healey, Penn, Small, Bassham, Goyal, Woods, Cresko. 2024. Single Cell RNA 

Sequencing Provides Clues for the Developmental Genetic Basis of Synganthidae’s Evolutionary 

Adaptations. eLIFE. 

The full supplemental material for thesis chapter can be found at 

https://doi.org/10.7554/eLife.97764.1	

This project was conceptualized by William Cresko and I. Sequencing data was generated by 

Clay Small and me. I completed the scRNAseq analysis. The in situ hybridizations were largely 

completed by Hayden Penn and myself with assistance from Susan Bassham, Vithika Goyal, and 

Micah Woods. I took the lead on writing the manuscript. The paper was reviewed and revised by 

all authors.  

Introduction 

Seahorses, pipefishes, and seadragons are extraordinary fishes in the family Syngnathidae 

with diverse body plans, coloration, and elaborate structures for paternal brooding. The 

syngnathid clade comprises over 300 diverse species that vary in conservation status, 

distribution, ecology, and morphology (Leysen et al., 2011; Manning et al., 2019; Schneider, 

Woltering, et al., 2023; Stiller et al., 2022). Syngnathids have numerous highly altered traits, trait 

losses, and evolutionary novelties. They have elongated snouts bearing small, toothless jaws 

(Leysen et al., 2011) specialized for capture of small zooplankton (Van Wassenbergh et al., 

2009). Additionally, syngnathids are distinctly protected by a bony dermal armor rather than 

scales (Jungerson, 1910). Other skeletal differences include a lack of ribs and pelvic fins, and an 

expansion in the number of vertebrae (Schneider, Woltering, et al., 2023).  Finally, syngnathids 

exhibit male pregnancy, which has involved the evolution of specialized brooding tissues and 

structures (Whittington & Friesen, 2020). Paternal investment varies among lineages; for 

example, seadragons tether embryos externally to their tails while seahorses and some pipefishes 


38 
 

have enclosed brood pouches proposed to support embryos through nutrient transfer and osmotic 

regulation (Carcupino, 2002; Melamed et al., 2005; Ripley & Foran, 2006).  

Despite advances in understanding the ecology and evolution of syngnathid novelties, the 

developmental genetic basis for these traits is largely unknown. The recent production of high 

quality syngnathid genome assemblies (Qu et al., 2021; Ramesh et al., 2023; Small et al., 2022; 

Wolf et al., 2024) provides initial clues for the developmental genetic basis of some evolutionary 

changes. Studies have found that syngnathids lack several genes with deeply conserved roles in 

vertebrate development, including pharyngeal arch development (fgf3), tooth development (fgf3, 

fgf4, and eve1, and most scpp genes), fin development (tbx4), and immune function (MHC 

pathway components) (Lin et al., 2016; Qu et al., 2021;  Small et al., 2016; Small et al., 2022; 

Zhang et al., 2020). Though these gene losses are highly suggestive of leading to unique changes, 

exploration of the actual developmental consequences of their losses is needed. 

To fill this gap in knowledge, we used single cell RNA sequencing (scRNAseq) to 

investigate how these striking genomic changes have affected the developmental genetic and 

cellular basis of syngnathids’ evolutionary derived traits. The Gulf pipefish (Syngnathus scovelli) 

is an attractive model for this study (Figure 3.1A). This species has a high-quality reference 

genome assembly annotated by NCBI and is amenable to laboratory culture (Anderson & Jones, 

2019; Ramesh et al., 2023). Furthermore, species from the Syngnathus genus are used worldwide 

to address questions about syngnathid evolution in microbial, developmental, histological, 

transcriptomic, ecotoxicological, and genomic studies (Berglund et al., 1986; Carcupino, 2002; 

Fuiten & Cresko, 2021; Harada et al., 2022; Harlin-Cognato et al., 2006; Partridge et al., 2007; 

Ripley & Foran, 2006a; Rose et al., 2023; Roth et al., 2012; Small et al., 2016; Small et al., 2013). 

In this paper, we focus on a subset of unique traits, including their elongated head, toothlessness, 

dermal armor, and development of embryos inside the brood pouch. These traits represent the 

diversity of evolutionary changes observed in the syngnathid clade (highly altered, lost, and novel 

traits) and hypotheses from studies of model organisms suggest developmental pathways involved 

in their evolution (Lin et al., 2016b; Roth et al., 2020; Small et al., 2016; Small et al., 2022). 

scRNAseq atlases are a powerful complement to genomic analyses (Shema et al., 2019; 

Ton et al., 2020). They can provide crucial insights into types of cells present, genes that 

distinguish cell types (marker genes), active gene networks, and a means to identify expression of 


39 
 

genes of interest within predicted cell types (Farnsworth et al., 2020; Farrell et al., 2018; Williams 

et al., 2019). Specifically, scRNAseq captures RNA expression profiles from individual cells 

allowing cell types to be inferred post-hoc. scRNAseq has successfully been applied to syngnathid 

adult kidneys (Parker et al., 2022), but there are no published syngnathid developmental atlases. 

Here we report the first developmental scRNAseq atlas for syngnathids from late 

embryogenesis staged Gulf pipefish. We delineate the overall structure of this atlas, which 

describes 38 cell clusters composed of 35,785 cells, and use these data to make inferences about 

the morphological evolution of several syngnathid innovations. In addition to inferring present 

cell types and their underlying genetic networks, we detail in situ spatial expression patterns of 

select marker and other candidate genes in pipefish embryos and juveniles. We found conserved 

signaling pathways expressed during craniofacial development but did not detect evidence of 

tooth primordia. The embryonic dermis and epidermis, respectively, expressed genes for the 

dermal armor development (bone development pathways) and genes potentially involved in 

interaction with the male brood pouch (e.g., nutrient acquisition genes). Overall, this atlas 

provides a deeper understanding of the development of Gulf pipefish and identifies gene 

candidates for understanding the development of syngnathid evolutionary innovations. In 


40 
 

addition to these discoveries, this atlas provides a significant resource for researchers studying 

syngnathid evolution and development. 

 
Materials and Methods 

Single Cell RNA Sequencing Libraries Preparation 

We created scRNAseq atlases from embryos of wild caught Gulf pipefish (Syngnathus 

scovelli, acquired from our collaborator Emily Rose using collection permit SAL-21-0182-E), 

and all work was performed under an approved IACUC protocol. We harvested 20 embryos per 

pouch from two wild caught male pipefish. The embryos were at a stage before the tubular face 

was fully elongated, and while the head skeleton is cartilaginous with minimal signs of 

mineralization of superficial intramembranous bones. This corresponds to a stage termed “frontal 

jaws” in a recent description of pipefish development (Sommer et al., 2012).  

We dissociated the embryos using 460ul of .25% trypsin in water and 40ul 100 mg/mL 

collagenase I (Sigma C0130-200mg) for 16 minutes. We filtered cells using a 40uM cell strainer 

 
Figure 3.1: Gulf pipefish exemplify syngnathid derived traits. Gulf pipefish have elongate 
snouts, have lost teeth on their oral and pharyngeal jaws, possess dermal armor, and have 
brood pouches in males (A-B). Cartilage (alcian) and bone (alizarin) stained clutch siblings of 
embryos from the two scRNAseq samples are shown in panels C-H. Embryos have 
cartilaginous craniofacial skeletons (panels B, C, F; E marks the (Mes)ethmoid cartilage, C 
indicates the Ceratohyal, H shows the Hyosymplectic cartilage, M marks the Meckel's 
cartilage, Q indicates the quadrate, B shows the Basihyal, and P marks the palatoquadrate) 
with the onset of ossification in the jaw. They have cartilaginous fin radials in the dorsal fin 
(D and G; F indicates fin radials and N denotes the notochord). The embryos do not have 
signs of ossification in the trunk where the exoskeleton will form later (panels E and H; PD 
shows the region where dermal armor primordia will arise). Panels C, D, F, and G, scale bar  
200µm; E, H scale bar 100 µm. 

A

B

C D E

F G

E

E

M

M

C

C

H

H

Elongated 
ethmoid 

region

No teeth

Dermal armor

Male brooding

N

F

N

PD

F

N

PD

N

E
H

C

B

Q

M

P

H

P


41 
 

(Thomas Scientific #1181X52). We quantified cell concentrations using the TC20 Automated 

Cell Counter (Biorad) then diluted the samples to 800 cells/ul in 4% BSA in PBS. The 

University of Oregon Genomics and Cell Characterization Core (GC3F; 

https://gc3f.uoregon.edu) prepared single cell libraries for each sample using 10X Genomics 

Single Cell 3’ Genome Expression mRNAseq kit with NextGEM v3.1 chemistry. We sequenced 

these libraries on an S4 lane on the NovaSeq 6000 at the GC3F. To improve the 3’ UTR genome 

annotations, we also prepared scISOrSeq libraries from the first embryonic sample and from 

dissociated pouch cells from pregnant and nonpregnant males. These libraries were produced in 

accordance with (Healey et al., 2022). Embryonic, pregnant pouch and non-pregnant pouch 

libraries were sequenced separately on PacBio Sequel II - SMRT Cells 8M. 

To turn the scISOrSeq reads into gene models, we followed the pipeline from Healey et al. 

(2022). We ran our custom script (scISOr_Seq_processing.py) to remove barcodes, identify cell 

barcodes, and demultiplex with the single cell flag and appropriate barcodes (5’ 

CCCATGTACTCTGCGTTGATACCACTGCT and 3’ CTACACGACGCTCTTCCGATCT). We 

aligned the reads to the 2022 Gulf pipefish genome GenBank: GCA_024217435.2) using minimap 

v2.9 (cite: Li 2018). We filtered the reads using cDNA cupcake to remove duplicate transcripts 

(Tseng, 2021). We used SQANTI3 to identify gene models and filter them (Tardaguila et al., 

2018). We merged the SQANTI3 annotations with the Gulf pipefish genome assembly (NCBI: 

ROL_Ssco_1.1) using TAMA merge (Kuo et al., 2017). Since the Gulf pipefish genome assembly 

does not contain mitochondrial genes, we appended the genome files with the Gulf pipefish 

mitochondrial genome (NCBI RefSeq: NC_065499.1). 

Single Cell Atlas Construction 

We ran Cell Ranger (10X Genomics 3.0.2) using our scRNAseq reads, the Gulf pipefish 

genome assembly with mitochondrial genome, and the modified gene annotations. Cell Ranger 

estimated 20,733 cells for sample one, 23,682 genes expressed, and 21,039 mean reads per cell. 

For sample two, Cell Ranger predicted 17,626 cells, 23,740 genes expressed, and 29,804 mean 

reads per cell. We analyzed Cell Ranger’s output using Seurat (v4.1.0) on R (v4.0.2; Butler et al., 

2018; Hafemeister & Satija, 2019).  


42 
 

To remove extraneous RNA counts from the dataset, we used SoupX (v1.5.2; Young & 

Behjati, 2020). We identified doublet scores for our dataset using scrublet (v0.2.3). The doublet 

removal step reduced the first sample by 114 cells (from 20,733 cells to 20,619 cells) and second 

sample by 167 cells (from 17,626 cells to 17,459 cells). We finally removed cells with less than 

500 features, greater than 9000 features, greater than 1E5 RNA counts, with a scrublet score 

greater than the detected threshold (.76 for sample 2 and .21 for sample 2), or greater than 10% 

mitochondrial reads. The second filtering step removed 727 cells from sample one (20,619 cells 

to 19,892 cells) and 1,566 cells from sample two (from 17,459 cells to 15,893 cells).  

We normalized the datasets with SCTransform (v0.3.3). We used Seurat’s integration 

tools, SelectIntegrationFeatures using 3,000 feature genes, FindIntegrationAnchors using SCT 

normalization, and IntegrateData using SCT normalization, to integrate the two datasets. After 

integration, our combined atlas had 35,785 cells (Supplementary File 1,2). We then used the 

integrated dataset to complete PCA analysis. We tested using a variety of principle components 

for further analysis and chose 30 PCs for our analysis based on the clear delineation of major cell 

types. We next clustered the cells using 30 PCs and plotted the data on a UMAP with Seurat. 

Single Cell Atlas Cluster Identification 

To identify cluster identities, we used the RNA assay of the scRNAseq data to find 

cluster markers with Seurat’s FindAllMarkers command with the parameters only.pos = TRUE 

and logfc.threshold = 0.25, requiring markers to be upregulated in the cluster and have a log fold 

change of at least .25. We found a second set of cluster markers through our custom function 

which searched through all genes and identified genes uniquely expressed in greater than 60% of 

cells in the cluster and in less than 10% of cells in every other cluster using Seurat’s DotPlots. 

We searched for our identified markers in available zebrafish datasets (Fabian et al., 2022; 

Farnsworth et al., 2020; Lange et al., 2023), ZFIN (Howe et al., 2013), NCBI, medlineplus.gov, 

and genecards to give the clusters initial annotations. For each cluster, we examined multiple 

genes using DotPlots and FeaturePlots to propose the cluster identity. 

Next, we identified one gene for each cluster which marked the cluster best (expressed in 

the most cells in the focal cluster and expressed in as few of the other clusters as possible) through 

consulting the two marker gene lists and examining markers with Dot Plots (Supplementary File 


43 
 

3). Using these markers, we completed a set of in situ hybridizations to hone our cluster 

annotations. Due to challenges culturing Gulf pipefish, we used both embryos and larvae from 

Syngnathus leptorhynchus, a local pipefish from the same genus, and from cultured Gulf pipefish 

for the cluster annotation in situ experiments. We caught pregnant male Syngnathus leptorhynchus 

using a beach seine near Coos Bay, Oregon under Oregon Department of Fisheries and Wildlife 

permit number 26987. 

 Syngnathus scovelli used for in situ experiments were purchased from Alyssa’s Seahorse 

Saavy and Gulf Specimens Marine Lab then reared in our facility at 25 degrees C water and 25-28 

PPT Salinity. Since Syngnathus leptorhynchus does not have a published genome assembly, we 

designed probes using NCBI Primer Blast with the Gulf pipefish genome and produced these 

probes using Gulf pipefish embryonic cDNA pools. To create the probes, we completed two 

rounds of PCR. The first round used a gene specific forward primer with a reverse gene specific 

primer that had 10 nucleotides of T7 promotor sequence attached. PCR products were cleaned with 

Zymo clean and concentrator DNA kit and eluted in 15 ul of elution buffer. Round two of PCR 

used the same gene specific forward primer with a modified T7 promoter sequence 

(TGGACTAATACGACTCACTATAGGG) as the reverse primer, and finally the product was 

cleaned again with Zymo clean and concentrator DNA kit and eluted in 15 ul of elution buffer.  

Round one PCR conditions are as followed: 95 degrees Celsius for 3:00 minutes, 40 cycles 

of denaturation (95 degrees for 30 seconds), annealing (30 seconds, annealing temperature varied 

by probe), and extension (72 degrees for 1:00 minute), and a final extension step of 3:00 minutes. 

Round two PCR conditions are as followed: 98 degrees Celsius for 30 seconds, 30 degrees for 10 

seconds, 72 degrees for 50 seconds, then 35 rounds of denaturation (98 degrees for 10 seconds), 

annealing (50 degrees for 10 seconds), and extension (72 degrees for 50 seconds), and a final 

extension step of 10 minutes at 72 degrees. For round two PCRs with multiple bands (specifically, 

ifitm5), the band of the expected size was excised with a razorblade and DNA was extracted with 

Zymoclean gel DNA recovery kit.  

The round two PCR product was Sanger sequenced to confirm identity. All the marker 

genes primers and successful PCR conditions are in Supplementary File 4. The probes were 

transcribed with T7 polymerase for 2-6 hours then cleaned with Zymo RNA clean and concentrator 

and eluted into 30 ul of water. For the in situ hybridizations, we selected embryos and newly 

spawned larvae close to the developmental stage used in the atlas. We completed in situ 


44 
 

hybridizations in keeping with Thisse & Thisse, 2007, leaving the embryos in stain until 

background was observed. For spawned larvae, we completed a bleaching step (1% H2O2 and .5% 

KOH for 8 minutes) prior to the proteinase K digest. After imaging, we used the levels tool in 

Adobe Photoshop (v23.4.2) to white balance the pictures. 

Differentiation State Analysis 

To assess whether proposed primordial cell clusters were composed of undifferentiated 

cells relative to other clusters from their lineages, we completed a differentiation analysis using 

CytoTRACE (v0.3.3). Clusters from similar lineages (neural: 0, 3, 7, 8, 12, 22, 25, 33, and 35; 

muscle: 2, 10, 17, and 37; connective: 4, 5, 6, 9, 15, 16, 18, 24, 27, 28, and 29) were isolated using 

subset. Cell counts were gathered using as.Matrix(GetAssayData), then CytoTRACE was run on 

these counts. Data was plotted on a VlnPlot to show the variability. 

Single Cell KEGG Analysis 

To identify pathways upregulated in cell clusters, we completed a KEGG analysis. We 

downloaded gulf pipefish KEGG pathways (https://www.kegg.jp). For the KEGG analysis, we 

used the marker genes identified from our FindAllMarkers list as the input genes. We converted 

these gene ids using keggConv to KEGG ids. We used a Wilcoxon enrichment test to ask 

whether cluster marker genes were enriched for each KEGG pathway. 

Single Cell Atlas Network Analysis 

To identify genetic networks present in our atlas, we completed a weighted gene network 

correlation analysis using WGCNA (v1.72-1,Langfelder & Horvath, 2008). We selected 3,000 

variable features from the integrated assay of the single cell dataset for the WGCNA. We created 

an adjacency matrix from the data using bicor with a maxPOutliers of .05. To decide on a beta 

value or the soft threshold power, we created an adjacency matrix plot using pickSoftThreshold 

and picked the threshold where the scale free topology model fit leveled off. We selected the 

value of two. Then, we raised the adjacency matrix to the power of two.  

We calculated the dissimilarity matrix by calculating TOM similarity of the adjacency 

matrix and subtracting it from one. We then created a gene tree through hclust of the 

dissimilarity matrix with the average method. From the gene tree, we created modules using 


45 
 

cutreeDynamic with deepSplit of two and the minClusterSize of 15 (setting the smallest cluster 

size to 15). We calculated module eigengenes using moduleEigengenes of the adjacency matrix 

then calculated the dissimilarity of the module eigengenes with cor of the module eigengenes 

subtracted from one and clustered the module eigengenes with hclust. We then chose a 

dissimilarity of 35% as the cut off for merging modules; however, no modules had dissimilarity 

scores with each other below 40%. Since the module eigengenes are calculated for each cell in 

the dataset, we used these values to calculate the t-statistic for each cell cluster (considered each 

sample) in the module (all of the module eigengene scores were used as the population).  

We next tested the hypothesis that certain cell clusters were strongly associated with 

specific gene modules through a two-way permutation test (1000 permutations) and corrected p-

values to control the False Discovery Rate (FDR). Additionally, we tested whether specific cell 

clusters drove underlying gene network structure by measuring network connectivity. The 

network connectivity was first calculated using the entire dataset, then each cell cluster was 

progressively dropped from the dataset and the connectivity was remeasured. This resulted in 

changes in connectivity scores for each module-cluster pair. To assess whether these changes 

were significant, we completed 1000 permutations whereby cells were randomly dropped (the 

number of cells dropped was equal to the cell cluster size of the focal cluster), connectivity 

measured, and the change in connectivity recorded. The p-value is the number of instances where 

the change in connectivity is greater in the permutations than in the focal cell cluster run. P-

values were corrected to control the FDR.  

Using the module gene lists, we identified the number of genes from each module that 

were found in each KEGG pathway. Since the KEGG modules do not have p-values associated 

with the genes, we could not complete a Wilcoxon enrichment test. We instead removed any 

pathways where there were fewer than three genes present for any pathway and note that there is 

no statistical test run on these KEGG results. We visualized the networks using Cytoscape 

(v3.10.0). 

In situ Hybridization 

For genes chosen for further follow up analysis, we completed in situ hybridizations in 

Gulf pipefish (Syngnathus scovelli) or bay pipefish (Syngnathus leptorhynchus). Primer 

sequences were designed using NCBI Primer Blast with the Gulf pipefish genome assembly and 


46 
 

synthesized using Gulf pipefish embryonic cDNA. The sfrp1a probe was synthesized using the 

PCR based probe preparation protocol described in the Single Cell Identification section. All 

other probes (bmp4, dlx2a, dlx3b, fgf22, lhx6a, pitx2, and scpp1) were prepared using TOPO 

cloning. Probe primer sequences as well as the species used for the in situ experiments are 

described in supplementary file In_Situ_Probe_and_Samples_Used.xlsx. Pregnant male bay 

pipefish were caught as described above. For Gulf pipefish embryos, we allowed the fish to mate 

in our facility then harvested embryos once they reached the appropriate stages. Fish were reared 

in 25 degrees C water with 25-28 ppt salinity. Embryos and larvae were fixed in 4% PFA, 

dehydrated through a series of PBT/MeOH washes, and stored in MeOH at -20C. 5-12 embryos 

were used for each probe. We completed in situ hybridizations in keeping with  Thisse & Thisse, 

2007, leaving the embryos in stain until background was observed. After imaging, we used the 

levels tool in Adobe Photoshop (v23.4.2) to white balance the pictures. 

Bone and Cartilage Staining 

We used alcian and alizarin stains to mark cartilage and bones. Specifically, we assayed 

cartilage and bone development in siblings of the scRNAseq samples. These embryos were fixed 

in 4% PFA then stored at -20C in MeOH. We followed the protocol from Walker and Kimmel 

(2007) with minor alterations. We stored samples in 50% glycerol/0.1% KOH at 4C and imaged 

them in 100% glycerol. After imaging, we white balanced the photographs using Photoshop 

(v23.4.2) levels tool. 

Gene Cluster Analysis 

To examine close syngnathid outgroups, we downloaded the 2023 mandarin dragonet 

genome assembly from NCBI (GenBank assembly accession: GCA_027744825.1) and used an 

unpublished highly contiguous cornetfish genome assembly. Since these genome assemblies were 

unannotated, we manually identified scpp genes. We searched for scpp genes using BLASTN with 

medaka and additional fish sequences as the query. We also searched for these genes with mVISTA 

plots across conserved gene synteny regions (LAGAN alignment using translated anchoring) with 

medaka as the focal species. To identify scpp genes in additional species, we gathered cluster 

information from NCBI and ensembl. Additionally, we used mVISTA plots to further search for 

unannotated scpp genes. 


47 
 

Results 

Valuable scRNAseq atlas for studying syngnathid development 

We produced the first developmental scRNAseq atlas for a syngnathid from two samples 

comprising 20 similarly staged embryos from pregnant, wild-caught Gulf pipefish (Syngnathus 

scovelli) males. The samples represent a late organogenesis developmental stage (Figure 1B-H). 

These embryos had a primarily cartilaginous skeleton with minimal mineralization, including 

jaw cartilages that were at the onset of mineralization and ethmoid elongation. The embryos also 

possessed cartilaginous dorsal fin pterygiophores but had no signs of dermal armor 

mineralization. This stage is referred to as “frontal jaws” in syngnathid literature (Sommer et al., 

2012).  


48 
 

The atlas included 35,785 cells (19,892 and 15,893 cells from each sample; S3.1:S3.2), 

which formed 38 cell clusters (Figure 3.2A, Supplementary File 2). We classified cells into 4 

different broad tissue types – epithelial, connective, neural, and muscle – using Seurat identified 

marker genes and published model organism resources (S3.3:S3.57). We next used Seurat 

identified marker genes to pinpoint single marker genes that were most unique to each cluster 

(Figure S3.58, Supplementary File 3). We completed in situ hybridization using Gulf and bay 

pipefish embryos for cell clusters for which examining gene expression would help hone and 

 
Figure 3.2: Gulf pipefish single cell atlas contains cells from the entire embryo and identifies 
genetic pathways active in different cell types. The UMAP plot (panel A) shows all of the cell 
clusters and their identities reduced to the first two UMAP dimensions. The graph in panel B 
displays results of the KEGG pathway analysis in cell clusters identified as connective tissue 
(excluding blood, pigment, digestive, and immune cells). The number of Seurat identified 
marker genes for each cluster that was a part of each pathway is displayed on the y axis. Bars 
are colored and labeled by cell cluster. 

!
"
#$

_%

−&'

−&(

−'

(

'

&(

−&( −' ( ' &( &'
!"#$_&

!
"
#$

 %

(
&
%
)
*
'
+
,
-
.
&(
&&
&%
&)
&*
&'
&+
&,
&-

&.
%(
%&
%%
%)
%*
%'
%+
%,
%-
%.
)(
)&
)%
))
)*
)'
)+
),

(/01234567809263597
&/0:5992;<=4280>?55@
%/0"67;?2
)/01234567809263597

+/0:5992;<=428057<25;A59@350B2729;ACB2
,/0123456780>3D=909263597
-/0123456780E577=>?20>3D=909263590E35F29=<537
./0:5992;<=4280<29@590D9@0?=FDB29<0;2??7
&(/0"67;?280E577=>?20E35F29=<537
&&/0!9G95H9
&%/01234567809263597

&*/0:5992;<=428029@5<A2?=D?0

&+/0:5992;<=4280E5<29<=D?0E35F29=<537
&,/0"67;?280;D3@=D;
&-/0:5992;<=428057<25>?D7<7

&./0:5992;<=4280>?55@
%(/0:5992;<=4280E=FB29<
%&/0:5992;<=4280=BB692
%%/0123456780F?=D
%)/0IE=<A2?=D?802E=@23B=7
%*/0:5992;<=4280B2729;ACB2
%'/01234567809263597
%+/0:5992;<=4280B2729;ACB2
%,/0:5992;<=4280;A59@35>?D7<7
%-/0:5992;<=4280J9
%./0:5992;<=4280B2729;ACB2
)(/0:5992;<=4280=BB692
)&/012345678092635970K32<=9DL
)%/0:5992;<=4280>?55@
))/0123456780AD=30;2??7

)'/01234567809263597
)+/0"67;?2
),/0"67;?2

)&
.

&'

&+
%+

&*
&&

%-

&-%,

'
%*
+

%.

&) *

)% &.

&

%&

)(
)+

%( %%
&( %

&,

%)

),)*
&%

-%'

))

,
)

(

)'

'/0:5992;<=428057<25;A59@350B2729;ACB2
*/0:5992;<=428057<25;A59@350B2729;ACB2

&)/0:5992;<=4280J>35>?D7<7

&'/0:5992;<=428095<5;A53@ )*/0:5992;<=4280F6<0;2??7

A

B

14

14
16

5
14

27
28 14

28

16 16

13
14

18 2627
28
29 16

13

14
2628
29

14

16

14

26

28 28

5

14

13

14

16

27

14
26

29

28 26

14

16

27

28

13

14
14

14

13

14

0
30

60
90

ABC tra
nsp

orte
rs 

   
Adherens ju
ncti

on

ATPdependent c
hromatin

 re
modelin

g    

Calci
um sig

nalin
g pathway    

Cell a
dhesio

n m
olecu

les

Cell c
ycle

Cellu
lar s

enesce
nce

Cytokine-cy
tokine re

ce
ptor in

teracti
on    

DNA re
plic

atio
n

ECMrece
ptor in

teracti
on    

Endocy
tosis

Fa
nco

ni a
nemia pathway    

Foca
l a

dhesio
n    

Hedgehog sig
nalin

g pathway    

MAPK sig
nalin

g pathway    

Melanogenesis

Motor p
roteins

Neuroacti
ve lig

andrece
ptor in

teracti
on    

Phagoso
me

PPA
R sig

nalin
g pathway    

Regulatio
n of a

cti
n cy

toskeleton    

TGFbeta sig
nalin

g pathway    

Tight ju
ncti

on

Vascu
lar s

mooth m
uscl

e co
ntra

cti
on    

Wnt s
ignalin

g pathway    

An
no

ta
te

d 
G

en
es

 in
 P

at
hw

ay

factor(cluster)

5

13

14

16

18

26

27

28

29


49 
 

validate cluster annotations (Figure S3.2, S3.59:S3.71).  

In total, our atlas contained 13,027 connective tissue cells (excluding cells from blood, 

immune, and the digestive system) from 14 clusters, 10,112 nervous system cells from ten clusters, 

4,363 muscle cells from five clusters, 4,133 blood cells from three clusters, 650 immune cells from 

two clusters, 432 pigment cells from one cluster, 370 epidermal cells from one cluster, and 137 gut 

cells from one cluster. Within the connective tissue cell types, we also identified cartilage (302 

cells), developing bone (442 cells), fins (253 cells), and notochord (693 cells). The number of 

recovered cells per identity may not necessarily represent organismal cellular proportions because 

of potential variability in dissociation success for different cell types (Denisenko et al., 2020; 

Venema et al., 2022). 

Discovery of cell cluster function and state using KEGG analysis 

To affirm identities and discover potential properties of each cluster, we completed a 

KEGG pathway analysis for each cluster using Seurat’s marker genes (Figure S3.4, Figure 3.2B). 

For eight of the clusters (1, 4, 6, 9, 11, 15, 19, and 24), we did not find any significantly enriched 

pathways, possibly due to similar gene expression profiles across cell types that reduced the 

number of identified markers. However, we found one or more significantly enriched pathways for 

the other 29 cell clusters. We observed enriched pathway terms that supported cluster annotations. 

For example, phototransduction in the retina cluster, melanogenesis in the pigment cluster, cardiac 

muscle contraction in muscle clusters, and neuroactive ligand receptor interaction in neuronal 

clusters. 

The inferred KEGG pathways demonstrated some commonalities across the different tissue 

types, including in signaling pathways and cell states. Notably, our identified KEGG terms 

delineated progenitor and differentiated cell clusters. Based on their KEGG terms, we classified 

clusters 8, 10, and 16 as possible neural, muscle, and connective tissue progenitor cells, 

respectively. We also detected expression of pax3a and pax3b, muscle primordia markers, in 

cluster 10, supporting this annotation. These clusters had enriched KEGG terms associated with 

cell division (‘cell cycle’, ‘DNA replication’, ‘nucleotide excision repair’, and ‘homologous 

recombination’), and lacked enrichment for KEGG pathways present with differentiated cell types 

of their lineage. Specifically, cluster 8 lacked the neural KEGG term ‘neuroactive ligand receptor 


50 
 

interaction’, cluster 10 lacked muscle KEGG terms ‘adrenergic signaling in cardiomyocytes’, 

‘calcium signaling pathways’, and ‘cardiac muscle contraction’, and cluster 16 lacked connective 

 
Figure 2.3: Weighted Gene Network Analysis (WGCNA) identifies gene modules that define 
and unite cell clusters. A) The strength of association between the gene modules and cell 
clusters is shown in panel A with dendrogram clustering illustrating the distance between 
modules and cell clusters. Gene modules are represented by rows and cell clusters by columns. 
The modules and clusters are clustered using the Pearson distance method. The number of genes 
in each gene module are shown in the right hand bar plots. Cell clusters are colored based on 
their identity. The asterisks indicate the module-cluster relationships that have a p-value less 
than .05 from a two-sided permutation test after correction for multiple tests (FDR). T