A GENOMIC INVESTIGATION OF BONOBO (PAN PANISCUS) AND 
CHIMPANZEE (PAN TROGLODYTES) DIVERGENCE 
 
 
 
 
 
 
by 
 
COLIN M. BRAND 
 
 
 
 
 
A DISSERTATION 
 
Presented to the Department of Anthropology 
and the Division of Graduate Studies of the University of Oregon 
in partial fulfillment of the requirements  
for the degree of 
Doctor of Philosophy 
 
June 2021 
 
  
DISSERTATION APPROVAL PAGE 
Student: Colin M. Brand 
Title: A Genomic Investigation of Bonobo (Pan paniscus) and Chimpanzee (Pan 
troglodytes) Divergence 
 
The dissertation has been accepted and approved in partial fulfillment of the 
requirements for the Doctor of Philosophy degree in the Department of Anthropology 
by: 
 
Frances J. White  Chair 
Nelson Ting   Core Member 
Larry R. Ulibarri  Core Member 
Timothy H. Webster  Core Member 
Andrew D. Kern  Institutional Representative 
and 
Andrew Karduna  Interim Vice Provost for Graduate Studies 
Original approval signatures are on file with the University of Oregon Division of 
Graduate Studies.  
 
Degree awarded June 2021. 
  
ii  
 
 
 
 
 
 
 
 
 
 
© 2021 Colin M. Brand 
This work is listed under a Creative Commons 
Attribution-NonCommerical-NoDerivatives 4.0 International (CC BY-NC-ND 
4.0) License. 
 
 
  
ii i 
 
DISSERTATION ABSTRACT 
Colin M. Brand 
Doctor of Philosophy 
Department of Anthropology 
June 2021 
Title: A Genomic Investigation of Bonobo (Pan paniscus) and Chimpanzee (Pan 
troglodytes) Divergence 
 
 
 Our closest living relatives are two species in the genus Pan: bonobos and 
chimpanzees. Chimpanzees are further divided into four subspecies. While there are a 
number of phenotypic similarities between bonobos and chimpanzees, there are also a 
number of differences, particularly in social behavior. Additionally, some phenotypes 
are highly variable among chimpanzees and within each of the five lineages. The 
absence of an extensive bonobo and chimpanzee fossil record means that genomic 
data provide the best window into their evolutionary past. This dissertation uses 
reassembled and remapped autosomal genomic data from all five Pan lineages to 
answer questions about adaptation and demography in the time following lineage 
divergence, ~ 1.88 Ma. We find evidence for positive selection in deep time within 
genes related to the brain, immune system, musculature, reproduction, and skeletal 
system. Most of these patterns are lineage specific and only one candidate gene was 
shared across all chimpanzee subspecies and another two were shared across all five 
taxa. We also observe that recent positive selection is largely the result of variable 
environmental conditions acting on standing genetic variation rather than de novo 
mutation in the four Pan lineages we could analyze. Finally, we consider previous 
models for the demographic history of these taxa. The best fit model includes a single 
introgression event from bonobos and central chimpanzees. We also find that the 
iv  
 
common ancestor of chimpanzees is older than previously estimated. Our results 
collectively broaden our understanding of the complex evolutionary history of the 
Pan genus. The identification of positively selected genes both recently and earlier 
during lineage divergence as well as understanding the processes that drove recent 
positive selection in these taxa contributes to better estimating the timing of lineage-
specific adaptations, reconstructing the behavior and genetics of the Pan common 
ancestor, and recognizing potential selective pressures for these adaptations during 
key time periods in chimpanzee evolution. Estimates of demographic parameters can 
also offer further insight into adaptation and other evolutionary processes in these 
species and more broadly. This dissertation includes previously unpublished co-
authored material.  
v  
 
 
CURRICULUM VITAE 
NAME OF AUTHOR: Colin M. Brand 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 University of Oregon, Eugene, OR, USA 
 Miami University, Oxford, OH, USA 
 
DEGREES AWARDED: 
 Doctor of Philosophy, Anthropology, 2021, University of Oregon 
 Master of Science, Anthropology, 2015, University of Oregon 
 Bachelor of Arts, Anthropology, Botany, Environmental Science, Zoology, 
  2014, Miami University 
 
AREAS OF SPECIAL INTEREST: 
 Behavioral Ecology  
 Biological Anthropology 
 Evolutionary Anthropology 
 Molecular Anthropology 
 Population Genetics 
 
PROFESSIONAL EXPERIENCE: 
 Sample Collector, COVID Monitoring and Assessment Program, University of 
  Oregon, Eugene, OR, 2020-2021 
 
 Tutor, Lane Community College, Eugene, OR, 2019-2021 
 Instructor of Record, Department of Anthropology, University of Oregon,  
  Eugene, OR, 2016-2021 
 
 Graduate Teaching Fellow, Department of Anthropology, University of  
  Oregon, Eugene, OR, 2014-2021 
 
 Animal Care Intern, Cincinnati Zoo and Botanical Garden, Cincinnati, OH, 
  2014 
 
 
GRANTS, AWARDS, AND HONORS: 
 William S. Pollitzer Student Travel Award, American Association of Physical 
  Anthropologists, 2021 
v i 
 
 
 Health Education Award, Anthropology, University of Oregon, 2020 
  
 Risa Palm Graduate Fellowship, University of Oregon, 2018 
 
 Young Explorer’s Grant, National Geographic Society, 2017 
 
 Pauline Wollenberg Juda Memorial Endowment Fund Award, Anthropology, 
  University of Oregon, 2017 
 
 Malcom McFee Award, Anthropology, University of Oregon, 2016 
 
 International Research Award, University of Oregon Global Studies Institute, 
  2016 
 
 Undergraduate Presentation Award, Office for the Advancement of  
  Scholarship and Research, Miami University, 2014 
 
 Senior Service Leadership Award, Miami University, 2014 
 
 Senior Service Award, Anthropology, Miami University, 2014 
 
 President’s Distinguished Service Award, Miami University, 2014 
 
 Employee Service Leadership Award, Miami University, 2014 
 
 Best Paper in Archaeology Award, Anthropology, Miami University, 2014 
 
 Rebecca Jeanne Andrew Memorial Award, Miami University Department of 
  Anthropology, 2013.  
 
 Provost’s Student Academic Achievement Award, Miami University, 2013 
 
 Dean’s Scholar Award, Miami University College of Arts and Science, 2013. 
 
 Cambridge Junior Visiting Fellows Fund, College of Arts and Science, Miami 
  University, 2013 
 
 Undergraduate Presentation Award, Office for the Advancement of  
  Scholarship and Research, Miami University, 2012 
  
 Rebecca Jeanne Andrew Memorial Award, Miami University Department of 
  Anthropology, 2012. 
 
 Employee Service Leadership Award, Miami University, 2012 
 
 
  
vi i 
 
PUBLICATIONS: 
 
 White FJ, Brand CM, Hickmott AJ, Minton IR. 2020. Sex differences in  
  bonobo (Pan paniscus) terrestriality: implications for human evolution. 
  J Anthropol Sci. 98:5-14.  
 
 Gartland KN, Brand CM, Ulibarri LR, White FJ. 2020. Variation in adult  
  male-juvenile affiliative behavior in Japanese macaques (Macaca  
  fuscata). Fol Primatol. 91:610-621. 
 
 Brand CM, Johnson MB, Parker LD, Maldonado JE, Korte L, Vanthomme H, 
  Alonso A, Ruiz-Lopez MJ, Wells CP, Ting N. 2020. Abundance,  
  density, and social structure of African forest elephants (Loxodonta 
  cyclotis) in a human-modified landscape in southwestern Gabon. PLoS 
  ONE. 15:e0231832. 
 
 Brand CM, Marchant LF. 2019. Social hair plucking is a grooming convention 
  in a group of captive bonobos (Pan paniscus). Primates. 60:487-491.  
 
 Wakefield ML, Hickmott AJ, Brand CM, Takaoka IY, Meador LM, Waller 
  MT, White FJ. 2019. New observations of meat eating and sharing in 
  wild bonobos (Pan paniscus) at Iyema, Lomako Forest Reserve,  
  Democratic Republic of Congo. Fol Primatol. 90:179-189. 
 
 Boose KJ, White FJ, Brand CM, Meinelt A, Snodgrass JJ. 2018. Infant  
  handling in bonobos (Pan paniscus): exploring functional hypotheses 
  and the relationship to oxytocin. Phys Behav. 193:154-166.  
 
 Brand CM, Marchant LF. 2018. Prevalence and characteristics of hair  
  plucking in captive bonobos (Pan paniscus) in North American zoos. 
  Am J Primatol. 80:e22751. 
 
 Brand CM, Marchant LF, Boose KJ, Rood TM, White FJ, Meinelt A. 2017. 
  Laterality of grooming and tool use in a group of captive bonobos (Pan 
  paniscus). Fol Primatol. 88:210-222. 
 
 Brand CM, Boose KJ, Squires EC, Marchant LF, White FJ, Meinelt A,  
  Snodgrass JJ. 2016. Hair plucking, stress, and urinary cortisol among 
  captive bonobos (Pan paniscus). Zoo Biol. 35:415-422. 
 
 Brand CM, White FJ, Wakefield ML, Waller MT, Ruiz-Lopez MJ, Ting N. 
  2016. Initiation of genetic demographic monitoring of bonobos (Pan 
  paniscus) at Iyema, Lomako Forest, DRC. Primate Conservation.  
  30:103-111. 
 
 Brand CM, Marchant LF. 2015. Hair plucking in captive bonobos (Pan  
  paniscus). App Anim Behav Sci. 171:192-196. 
 
  
vi ii 
 
ACKNOWLEDGEMENTS 
 This dissertation would not have been possible without the incredible love and 
support of a large social network. First, I thank my dissertation committee: Frances 
White, Nelson Ting, Larry Ulibarri, Tim Webster, and Andy Kern. I am so grateful 
for my incredible colleagues and chosen family including Monya Anderson, Klaree 
Boose, Diana Christie, Elisabeth Goldman, Alex Hickmott, Cam Johnson, Josh 
Schrock, Evan Simons, Noah Simons, Jessica Stone, Nicky Ulrich, and Hannah 
Wellman. I thank the UO Anthropology faculty who have taught me so much over the 
past seven years both in and out of the classroom including Diana Baxter, Alison 
Carter, Steve Frost, Ana Lara, Madonna Moss, Carol Silverman, Uli Streicher, 
Michelle Sugiyama, and Larry Sugiyama. I am proud to call myself an anthropologist 
and am appreciative of the Miami University faculty and staff who taught me so 
much: Jeb Card, Kathy Erbaugh, Cameron Hay-Rollins, Neringa Klumbyte, Leighton 
Peterson, Mark Peterson, Scott Suarez. The amazing four years I spent in Oxford also 
resulted in many friendships for which I am so grateful including Andrea Blackburn, 
Alex Cowper, Amanda Friend, Alex Intorcio, Jacob Negrey, Jordan Martin, Rob 
O’Malley, and Ashley Skolits. My experiences in the Division of Biological 
Anthropology at the University of Cambridge were fundamental to my undergraduate 
training and created lifelong memories. I thank Bill McGrew, Lia Betti, Jake Dunn, 
Leslie Knapp, Frank Marlowe, Alex Piel, Fiona Stewart, and Peter Walsh. I am also 
very grateful for Andrea Blackburn, Julianne Joswiak, Tina Lasisi, Dan Schofield, 
and many others for their friendship and helping make Cambridge feel like home. I 
feel so lucky to have an incredible network of colleagues beyond UO who make my 
life so joyful. These include the PEGL crew, especially Hazel Byrne, Tina Lasisi, Liz 
Tapanes, Andrew Zamora as well as Alexander Baxter, Melanie Beasley, Joel Brown, 
ix  
 
Morgan Chaney, Ashley Edes, Kelsey Ellis, Drew Enigk, Stephanie Fox, Brett Frye, 
Katie Gerstner, Luke Larter, Kaedan O’Brien, Ian Takaoka, Monica Wakefield, and 
Shasta Webb. Also thank you to Alan Rogers for his incredible mentorship and 
support. Many of these friendships stem from the various professional societies and 
committees, of which I have been fortunate to be a part including the ASP Student 
Committee, the UO Graduate Student Association, the UO Committee on Courses. I 
am very thankful for mentorship and support over the years from Tori Byington. 
Many of the ideas presented here stem from my wonderful time in the field. I thank 
our amazing field staff: Abdulay, Augustin, Beken, Bellevie, Christian, Dipon, 
Gedeon, Isaac, Mathieu, Papa Siri, and Teddy, as well as Alfred Simba, Hugues 
Akpona, Jef Dupain, Christelle Ilanga, Moïse Amisi Luenga, the African Wildlife 
Foundation, the Institut Congolais pour la Conservation de la Nature, and many 
others. This research was generously funded by National Geographic and the UO 
Global Studies Institute. The work presented in this dissertation would not be possible 
with the assistance of Mark Allen, Mike Coleman, and Rob Yelle at UO RACS as 
well as Talapas itself and the Utah CHPC. I thank my incredible work family in the 
TRiO department at Lane Community College including Lynn, Gwen, Jane, Rose, 
Shijo, Giulia, Alex, Bailey, Dru, Dustin, Aimee, and countless students for their 
friendship and support. Thank you also to my work family at the UO COVID 
Monitoring and Assessment Program including Hannah, Jaimyn, Katelyn, Josh, 
Matthew, Sam, Tanner, Katie, Clara, and Shuhao. All of you helped make this 
difficult year so much more bearable. I am privileged have long lasting friendships 
and am especially grateful for Megan Jackson, Katie Tela, and many others. Finally, I 
thank my amazing family who has supported my every endeavor from the very 
beginning.  
x  
 
TABLE OF CONTENTS 
Chapter                    Page   
    
I. INTRODUCTION ...................................................................................................... 1 
Brief Overview of Bonobos and Chimpanzees .......................................................... 1 
Genetic and Genomic Perspectives on Pan Evolutionary History .......................... 15 
Project Overview ..................................................................................................... 22 
II. ADAPTATION DURING DIVERGENCE IN BONOBOS (PAN PANISCUS) 
AND CHIMPANZEES (PAN TROGLODYTES) ........................................................ 25 
Introduction .............................................................................................................. 25 
Methods.................................................................................................................... 30 
Results ...................................................................................................................... 36 
Discussion ................................................................................................................ 43 
III. SOFT SWEEPS PREDOMINATE RECENT POSITIVE SELECTION IN 
BONOBOS (PAN PANISCUS) AND CHIMPANZEES (PAN TROGLODYTES) ..... 52 
Introduction .............................................................................................................. 52 
Methods.................................................................................................................... 57 
Results ...................................................................................................................... 66 
Discussion ................................................................................................................ 70 
IV. ESTIMATION OF PAN DEMOGRAPHY FROM SITE PATTERNS ................ 80 
Introduction .............................................................................................................. 80 
Methods.................................................................................................................... 84 
x i 
 
Chapter                    Page 
 
 
Results ...................................................................................................................... 92 
Discussion ................................................................................................................ 96 
V. CONCLUSION ..................................................................................................... 100 
REFERENCES CITED .............................................................................................. 104 
  
xi i 
 
LIST OF FIGURES 
 
Figure                    Page 
 
1. Unique and shared candidate genes for positive selection ....................................... 40 
2. Unique and shared hard selective sweep windows .................................................. 69 
3. Unique and shared soft selective sweep windows ................................................... 69 
4. Demographic model and introgression events considered ....................................... 89 
5. Observed site patterns .............................................................................................. 93 
6. Parameter estimate bias ............................................................................................ 95  
xi ii 
 
LIST OF TABLES 
 
Table                    Page 
 
1. Number of MK candidate genes under selection and tested .................................... 38 
2. Candidate genes under positive selection ................................................................ 39 
3. Candidate exons under positive selection ................................................................ 42 
4. Candidate genes under positive selection via SnIPRE ............................................ 42 
5. Number and proportion of sweep, linked, and neutral windows ............................. 68 
6. Demography parameter estimates ............................................................................ 94  
xi v 
 
CHAPTER I 
INTRODUCTION 
  
Brief Overview of Bonobos and Chimpanzees 
 The genus Pan consists of two species: bonobos and chimpanzees. 
Additionally, there are four subspecies of chimpanzee: central (Pan troglodytes 
troglodytes), eastern (P. t. schweinfurthii), Nigeria-Cameroon (P. t. ellioti, formerly P. 
t. vellerosus), and western (P. t. verus). These four subspecies are supported by both 
morphological (Pilbrow 2006) and genetic evidence although Groves (2005) has 
argued for a fifth chimpanzee subspecies (P. t. marungensis) based on skull 
morphology. Such an additional subspecies would include chimpanzees living in the 
southern portion of P. t. schweinfurthii range. No subspecies for bonobos have been 
formerly recognized, although variation in craniodental morphology has been 
described (Pilbrow and Groves 2013) and there is some genetic evidence for 
population structure (Kawamoto et al. 2013, but see Eriksson et al. 2004). All 
currently recognized lineages are allopatric and all are presently separated by rivers, 
except for P. t. ellioti and P. t. verus, which are separated by the Dahomey Gap. 
Bonobo and chimpanzee populations are currently trending downward and both 
species are currently listed as endangered by the IUCN (Fruth et al. 2016; Humle et 
al. 2016). The largest recognized threats include habitat loss, disease transmission, 
and hunting (Fruth et al. 2016; Humle et al. 2016). 
 Scientists have recognized chimpanzees as a species for centuries. The 
scientific name for chimpanzees, Pan troglodytes, was coined by Johann Friedrich 
Blumenbach, which is variably cited as Blumenbach 1775, Blumenbach 1779, and 
Blumenbach 1799. Bonobos were first classified as a subspecies of chimpanzee by 
1  
 
Ernst Schwarz based on a skin and skull collected near Befale, DRC in 1927 
(Schwarz 1929). He coined the new subspecies, Pan satyrus paniscus, believing it to 
be a dwarf of the right bank apes (chimpanzees), Pan satyrus satyrus (Schwarz 1929). 
Indeed, the subspecific name “paniscus” translates to “a little Pan”, in reference to the 
Greek god, reflecting the perceived size difference between bonobos and 
chimpanzees. Two specimens were sent to the American Museum of Natural History 
in December 1930 (Thompson 2001) and in 1933 American anatomist Harold 
Coolidge elevated the taxon to species status, Pan paniscus, based on his analysis of 
an adult female (Coolidge 1933). Coolidge had, in fact, compared this specimen to a 
central chimpanzee (Thompson 2001), which is the largest subspecies of chimpanzee 
(Smith and Jungers 1997) (see below). Later, Jungers and Susman (1984) would 
describe Coolidge’s specimen as the smallest adult bonobo they had ever encountered. 
This belief in a species size difference resulted in the use of the common names 
“dwarf chimpanzee” and “pygmy chimpanzee”, which were routinely used until the 
1980s and 1990s. 
 The uniqueness of bonobos and the differences between bonobos and 
chimpanzees were described before the taxonomic difference was officially 
designated. Anton Portielje, a Dutch naturalist, wondered if a popular ape housed at 
the Amsterdam Zoo, named Mafuca, was a new species of ape (de Waal and Lanting 
1997). Years later, the individual’s stuffed remains would be recognized as a bonobo. 
In August 1923, Robert Yerkes purchased two young apes from a dealer in New York 
(Susman 1984). He named the male “Prince Chim” and the female “Panzee” and 
would later reflect on the differences between the two in his book Almost Human 
(Yerkes 1925). Photographs of Prince Chim and Panzee would later confirm that they 
were a bonobo and chimpanzee, respectively. Despite early speculation of species’ 
2  
 
differences, it would be decades until the first systematic study comparing the two 
was carried out.  
 In the aftermath of World War II, Tratz and Heck (1954) published their 
findings collected before the war at the Hellabrunn Zoo in Munich. This publication 
also represents the first use of the term “bonobo” although this term was proposed as 
a new genus distinct from chimpanzees. This study identified eight differences 
between bonobos and chimpanzees and included comparisons of these taxa beyond 
morphology. Not long after this study, the first scientific studies of wild chimpanzees 
began in Tanzania. In 1960 Jane Goodall began research at Gombe Stream Reserve 
and Toshisada Nishida began research ~ 200 km south of Gombe at Mahale 
Mountains National Park (Goodall 1986; Nishida 2011). Fieldwork on bonobos 
followed just over a decade later. Takayoshi Kano surveyed bonobos across the 
Democratic Republic of Congo in 1973 establishing two sites: Wamba and Yalosidi 
(Kano 1992). That same year Noel and Alison Badrian established a field site at 
Lomako (Susman 1984). The late 1970s saw the start of research on wild Western 
chimpanzees at Bossou, Guinea (Matsuzawa and Humle 2011) and Taï, Ivory Coast 
(Boesch and Boesch-Achermann 2000). In the following decades, many other long-
term and more recent research sites have been established for chimpanzees, totaling > 
43 to date (van Leeuwen et al. 2020). Yet, our understanding of chimpanzees is 
heavily biased toward eastern and western chimpanzees, with only a few sites at 
which central chimpanzees are studied and no long-term data are available for 
Nigeria-Cameroon chimpanzees. Conflict in the 1990s and 2000s in central Africa 
impacted studies of wild bonobos for over a decade (Furuichi et al. 2012; Waller and 
White 2016). In the time since the conflict’s end, research has resumed at Wamba and 
intermittently at Lomako. Additionally, new sites have been established or a long-
3  
 
term research presence has resumed: Iyondji (Sakamaki et al. 2012), 
Lonoa/Kokolopori (Surbeck, Coxe, et al. 2017), LuiKotale (Hohmann and Fruth 
2003a), and Lake Tumba/Malebo (Inogwabini et al. 2007; Serckx et al. 2014). While 
some have argued that bonobos should no longer be considered “scientifically 
endangered” (Hare and Yamamoto 2015), there likely remains a gulf in our 
understanding of bonobos and chimpanzees.    
 Central to understanding the evolution of Pan is assessing similarities and 
differences in their behavior and biology. There are several reviews and volumes that 
explicitly cover such similarities and differences (Stanford 1998; Boesch et al. 2001; 
Stumpf 2011; Gruber and Clay 2016) and other volumes that focus on either bonobos 
or chimpanzees. Recapitulating an exhaustive review is beyond the scope of this 
introduction. Therefore, I describe the more relevant similarities and differences in the 
paragraphs below, focusing on morphology, ecology, social behavior, and 
reproduction.  
 The common names for the bonobo clearly reflect that earlier analyses were 
performed on relatively small adult individuals and the view that bonobos are 
specialized dwarf chimpanzees (Johnson et al. 1981) or exhibit paedomorphic 
characteristics (Shea 1983). Such species dimorphism is not reflected in mean body 
mass where bonobos and chimpanzees overlap, with male weight averaging between 
42.7 to 45 kg and 42.7 and 59.7 kg, respectively, and female weight averaging 33.2 to 
34.3 kg and 33.7 to 45.8 kg, respectively (Smith and Jungers 1997; Zihlman and 
Bolter 2015). The larger range of body mass in chimpanzees highlights the 
considerable variation across the subspecies in which both eastern and western 
chimpanzee females and males are much smaller than central chimpanzees (Smith and 
4  
 
Jungers 1997). Such data are lacking for Nigeria-Cameroon chimpanzees although it 
appears males weigh < 70 kg (Hof and Sommer 2010).  
 Multiple post-cranial skeletal differences have been described. These include 
differences in clavicles, scapulae, and pelves as well as the humerus/femur and 
femoral head/length ratios (Zihlman and Cramer 1978). However, long bone lengths 
and talar breadths are similar between species (Zihlman and Cramer 1978). More 
recently, Turley and Frost (2014) described that the appositional articular morphology 
of the talo-crural joint in adult bonobos was different to that of chimpanzees. Bonobos 
are more similar to highly arboreal hylobatids whereas chimpanzees exhibited a more 
terrestrial pattern. This morphology is consistent with locomotor patterns for these 
taxa. While females across all African apes exhibit a higher degree of terrestriality, 
bonobos are more arboreal than chimpanzees or gorillas (Doran 1993).  
 Other aspects of morphology are more difficult to assess. For example, there 
remains considerable debate regarding whether or not the bonobo cranium is 
paeodomorphic relative to chimpanzees. While some have maintained that 
heterochrony explains skull shape differences (Shea 1983), other analyses have 
yielded partial support (Lieberman et al. 2007) or rejected this hypothesis altogether 
(Mitteroecker et al. 2005; Simons and Frost 2020).  
 In addition to osteological differences, bonobos and chimpanzees differ in 
their overall appearance. Bonobos have hair that parts down the middle of the head 
whereas chimpanzees do not (Stumpf 2011). The hair around their cheeks is also quite 
long (Kano 1992). The lips of bonobos are depigmented and pink and they are born 
with dark faces whereas most chimpanzees are born with lighter faces (Kano 1992; 
Stumpf 2011). While both bonobos and chimpanzees are born with a white tail tuft 
only bonobos maintain this trait into adulthood (Kano 1992; Stumpf 2011). In bonobo 
5  
 
females, the vulva is more anterior than in chimpanzees, which may be related to 
female-female sexual behavior (see below) (de Waal and Lanting 1997).  
 Both species live in multi-male/multi-female groups known as communities 
and exhibit a fission-fusion social structure (Goodall 1986; Kano 1992). Community 
sizes vary greatly across both species, averaging between 20 and 40 individuals in 
most bonobo communities (Stumpf 2011) and ranging from ~ 20 at Bossou 
(Sugiyama and Fujita 2011) to > 200 individuals at Ngogo, Uganda (Sandel and 
Watts 2021) in chimpanzees. While there is considerable intraspecific variation and 
complications with how to quantify party size, bonobo parties appear to be larger, on 
average, than those of chimpanzees (Furuichi 2009). Unlike many other primates, 
apes do not exhibit female residence and both bonobos and chimpanzees are typically 
male philopatric where females emigrate upon sexual maturation (Goodall 1986; 
Kano 1992). There are some exceptions, such that only ~ 50% of females at Gombe 
emigrate (Pusey et al. 1997) and both males and females are thought to emigrate at 
Bossou (Sugiyama 1999). Additionally, male transfer has been documented in the 
Eyengo community of bonobos at Lomako (Hohmann 2001). Immigration itself 
appears to be different between bonobos and chimpanzees. Among chimpanzees, 
newly immigrant females form bonds with males as resident females are largely 
intolerant of new females (Kahlenberg, Thompson, et al. 2008; Kahlenberg, Emery 
Thompson, et al. 2008). This starkly contrasts with bonobos, where new females form 
bonds with resident females (Furuichi 1989; Idani 1991).   
 Some of the most striking differences between the two Pan species are those 
related to intracommunity and intercommunity social relationships. Chimpanzees 
exhibit a primate-typical pattern of male dominance (Goodall 1986; Boesch and 
Boesch-Achermann 2000). Additionally, the strongest relationships in chimpanzee 
6  
 
communities are between adult males. Given that both Pan species are male 
philopatric, this observation is consistent with kin selection theory (Hamilton 1964). 
Despite this, affiliation between kin among males is complex. For example, Nishida 
(1983) noted that the dynamic nature of alliances between chimpanzee males 
contradicts the prediction from kin selection theory. Further, male chimpanzees at 
Ngogo, Uganda preferentially associate with maternal kin rather than paternal kin 
(Langergraber et al. 2007; Mitani 2009).  
 In contrast, bonobos are not male dominant and the strongest bonds do not 
occur between adult males. While some authors have described bonobos as female-
dominant (Parish 1994; Parish 1996; Parish et al. 2000), others have described their 
dominance patterns as more equivocal (Furuichi 1989; Kano 1992; White 1996) 
because while the highest ranking individual in bonobo communities can be female, 
not all adult females outrank all adult males. It is possible that captive conditions 
facilitate full female dominance in bonobos as those studies that describe female 
dominance were of zoo-housed individuals. However, other zoo-housed bonobo 
groups do not show full female dominance (Paoli et al. 2006; Stevens et al. 2010; 
Brand and Marchant 2019). Another important perspective is the consideration of 
female feeding priority rather than only social dominance. For example, adult and 
subadult males at Lomako were socially dominant to females; however, females had 
feeding priority (White and Wood 2007). Unlike chimpanzees, the strongest bonds in 
bonobos occur among females and between females and males (White 1988; White 
and Burgman 1990; Kano 1992; Parish 1996). Female-female bonds are dynamic such 
that grooming relationships are stable over time, while proximity and genito-genital 
(GG) rubbing, a female-female sexual behavior, preferences are more flexible 
7  
 
(Moscovice et al. 2017). This behavioral coordination may function to enable 
cooperation with a wider range of individuals (Moscovice et al. 2017).  
 Aggression is common in primates and other animals; however, lethal 
aggression is rare. When lethal aggression occurs, it is most often in the form of male 
infanticide although female infanticide does occur in primates, including chimpanzees 
(Townsend et al. 2007; Pusey et al. 2008). Lethal aggression directed towards non-
infants does occur in a number of primate species in addition to chimpanzees 
including capuchins (Scarry and Tujague 2012), spider monkeys (Campbell 2006; 
Valero et al. 2006), muriqui (Talebi et al. 2009), and orangutans (Marzec et al. 2016). 
Intracommunity lethal aggression has been documented in multiple eastern 
chimpanzee communities (Fawcett and Muhumuza 2000; Watts 2004; Kaburu et al. 
2013; Sandel and Watts 2021), at least one central chimpanzee community (Boesch et 
al. 2007) and one western chimpanzee community (Pruetz et al. 2017). This behavior 
may function to reduce mating competition (Watts 2004). Lethal intercommunity 
aggression also occurs (Wilson and Wrangham 2003) and may be explained by the 
imbalance of power hypothesis (Wrangham 1999). In contrast, neither lethal 
aggression nor infanticide have ever been directly observed in bonobos (Stanford 
1998; Wrangham 1999), although one potential case has been reported (Hohmann and 
Fruth 2011, but see White 2012). Lethal aggression in Pan appears to be adaptive and 
is not the result of anthropogenic effects (M.L. Wilson et al. 2014). While the 
presence/absence of lethal aggression (and infanticide) differentiates bonobos and 
chimpanzees, there is considerable variation in the rate of this behavior not only 
across subspecies but within subspecies as well (M.L. Wilson et al. 2014). The 
absence of lethal aggression and infanticide in bonobos has been suggested to stem 
from the frequency of copulation and an alleged extended period of receptivity (de 
8  
 
Waal and Lanting 1997) or the high status of females and initiative in various social 
behaviors (Furuichi 2011).  
 Bonobos and chimpanzees exhibit such notable differences in sexual behavior 
that some of these were described as early as the 1950s (Tratz and Heck 1954). The 
females of both species exhibit a sexual swelling (Stumpf 2011). Chimpanzees have 
sexual cycle that averages 35 days, estrus lasts 10 - 15 days, and ovulation occurs near 
the end of maximal swelling (Stumpf and Boesch 2005). Bonobos have a slightly 
longer sexual cycle (~ 40 days) than chimpanzees and ovulation does not always 
occur during maximum tumescence (Reichert et al. 2002). Both species copulate 
dorsoventrally; however, ventroventral mating is common in bonobos (Thompson-
Handler et al. 1984; Kano 1992; de Waal and Lanting 1997). Bonobos are also well 
known for socio-sexual behavior, such that sex may serve non-reproductive functions 
(Hashimoto 1997; de Waal and Lanting 1997). Males may engage in rump-rubbing 
(Kano 1992) whereas GG rubbing is common among females where two individuals 
rub their sexual swellings together (Thompson-Handler et al. 1984; Kano 1992; 
Hohmann and Fruth 2000). Chimpanzees sometimes engage in socio-sexual behavior 
and GG rubbing; however, it is much less common than what is observed in bonobos 
(Anestis 2004; Sandel and Reddy 2021).  
 Sexual strategies may also differ between species. Tutin (1979) described four 
types of mating among chimpanzees: opportunistic, consortship, possessive, and 
extragroup. Both consortship and possessive mating seem rare in bonobos (Kano 
1992). While dominance rank is often elicited as a predictor for mating success there 
is mixed evidence in primates (Fedigan 1983). This holds true in chimpanzees. Such a 
relationship is sometimes absent (Tutin 1979; Boesch and Boesch-Achermann 2000) 
or present (Goodall 1986). High rank in chimpanzees can also result in higher 
9  
 
reproductive success (Vigilant et al. 2001). Dominance is also positively related to 
both mating and reproductive success in bonobos (Gerloff et al. 1999). One notable 
species difference lies in the relationship between dominant males and females such 
that high ranking male bonobos tend to be the sons of high ranking females (Kano 
1992; Surbeck et al. 2011). This mother-son relationship only affects reproductive 
success in bonobos and not chimpanzees (Surbeck et al. 2019). Finally, it is worth 
noting that aggression in the context of mating in both bonobos and chimpanzees may 
constrain female choice. Among eastern chimpanzees, sexual coercion is a well 
described male reproductive strategy (Muller et al. 2007; Muller et al. 2011; Feldblum 
et al. 2014) and can result in higher reproductive success (Feldblum et al. 2014). Male 
aggression also occurs in bonobo mating (Kano 1992; Surbeck et al. 2011), of which 
some events are coercive (White and Wood 2007).  
 Despite the complexity of differences between bonobos and chimpanzees that 
are further complicated by intraspecific variation in both species, multiple models 
have been suggested to explain species’ differences. Below, I describe some of these 
models and consider evidence that supports and is counter to each model.  
 One early ecological model proposed to explain differences in female sociality 
between bonobos and chimpanzees has been coined the terrestrial herbaceous 
vegetation or THV hypothesis (Wrangham 1986). THV is both ubiquitous and non-
seasonal in both bonobo and chimpanzee habitat (Yamakoshi 2004). However, 
chimpanzees can occur sympatrically with gorillas whereas bonobos do not. Thus, the 
THV hypothesis posits that the feeding competition female chimpanzees experience 
from gorillas, who consume large amounts of THV, prevents the formation of larger 
parties. As bonobos are not subjected to such competition, THV is used to 
compensate during periods of fruit scarcity and maintain larger parties (Wrangham 
10  
 
1986). In this paper, Wrangham (1986) reports THV represents 7% of the monthly 
food intake for chimpanzees at Gombe and 33% of monthly food intake for bonobos 
at Wamba. However, the difference is less pronounced when considering other Pan 
sites. For example, the Lomako bonobos spend 2% of total feeding time consuming 
THV (White 1992). Yamakoshi (2004) notes that while there are no data from 
chimpanzees sympatric with gorillas, chimpanzees at sites outside the range of 
gorillas consume THV that ranges from 3% at Taï (Boesch 1996) to 17% at Kibale 
(Wrangham et al. 1996). Averaging across months may obscure seasonal patterns; 
however, which is a prediction from the model. Chimpanzees at Kahuzi-Biega, Lopé, 
and Ndoki are sympatric with gorillas and fibrous content in their feces is greater 
during non-fruiting periods (Tutin et al. 1991; Kuroda et al. 1996; Basabose 2002). At 
non-gorilla sites, there is mixed evidence for seasonality. Fibrous content in Kibale 
chimpanzee feces was higher during fruit scarcity (Wrangham et al. 1991) but party 
size also decreased during these periods (Wrangham et al. 1992). At Bossou, party 
size is stable regardless of fruit scarcity and THV is consumed consistently 
(Yamakoshi 1998). Furuichi et al. (2001) also reported that THV consumption was 
unrelated to fruit scarcity at Kalinzu. Among the Lomako bonobos, THV 
consumption is unrelated to fruit scarcity nor does it exhibit a seasonal pattern 
(Malenky and Wrangham 1994; White 1998). Rates of THV consumption could be 
driven simply by its density. Malenky et al. (1994) compared THV density at Kibale, 
Lomako, and Ndoki. While there was a significant difference between Kibale and 
Lomako, neither of the other pairs of sites were significantly different in THV 
density. A further complication lies in possible nutritional differences across sites. 
THV at Kibale was reported to have lower protein compared to Lomako, suggesting 
THV may act as a fruit substitute in the Kibale chimpanzees, which is consistent with 
11  
 
its increased consumption during periods of low fruit availability (Malenky and 
Wrangham 1994). Collectively, these results do not support the THV hypothesis. 
Indeed, Wrangham et al. (1996) proposed a revised hypothesis in which they divide 
THV into low and high quality, L-THV and H-THV respectively. These authors argue 
that H-THV is protein-rich, has relatively high nutritional value, is more preferred 
than “typical fig fruits”, and occurs at low density prompting consumption upon 
encounter. H-THV are said to occur at Kahuzi, Lomako, Wamba while L-THV occurs 
at Kibale (Wrangham et al. 1996). As such, the occurrence of H-THV at Kahuzi 
should result in increased gregariousness among those female chimpanzees compared 
to Kibale, however, there is no present evidence for such a difference. In addition to 
the issues with this revised hypothesis, there are few data presently available to 
evaluate its predictions (Yamakoshi 2004).  
 Another potential model for Pan differences considers the observations of 
female coalitionary behavior reported in both captive and wild bonobos (Kano 1992; 
White and Wood 2007; Furuichi 2011; Tokuyama and Furuichi 2016). While female 
coalitions are generally thought to function in the context of female-female 
competition (Sterck et al. 1997), they may also deter males from aggressing against 
females due to the threat of female coalitionary counteraggression. However, female 
coalitions have also been reported in chimpanzees at Budongo and Taï (Boesch and 
Boesch-Achermann 2000). Thus, Tokuyama and Furuichi (2016) suggest that this 
may be a shared Pan trait. However, the context of coalitionary behavior may be 
important. Given the male deference is evident in feeding contexts, female 
coalitionary action may function to gain feeding priority through male deference 
(White and Wood 2007). 
12  
 
 Recently, similarity between some bonobo phenotypes and those of other 
domesticated mammals, largely canids, prompted the introduction of the self-
domestication hypothesis (SDH) (Wrangham and Pilbeam 2001; Hare et al. 2012). 
This model invokes sexual selection theory and argues that female bonobos selected 
for less aggressive males, producing the bonobo phenotype. Despite the simplicity of 
the argument, the model is complex and contains a multitude of predictions for 
various phenotypes including behavior, morphology, and psychology. While some 
bonobo morphological traits, such as depigmented lips and white tail tufts (Stumpf 
2011), support the SDH, other morphological predictions are less well supported. The 
most recent studies of bonobo crania find that they are, at best, only partially 
paedomorphic or not paedomorphic at all (Mitteroecker et al. 2005; Lieberman et al. 
2007; Simons and Frost 2020). This prediction from the model also supposes dogs are 
paedomorphic wolves, which is not supported by three-dimensional geometric 
morphometric analyses of dog and wolf crania (Drake 2011). Additionally, few 
morphological characteristics are even shared across domesticated mammals, aside 
from canids (Sánchez-Villagra et al. 2016).    
 Similar to morphology, the behavioral and psychological data from Pan offers 
mixed support for the SDH. Bonobos may exhibit some delay in psychological 
development (Wobber et al. 2010). Yet, there is conflicting evidence based on 
behavioral experiments for the core of the SDH: tolerance. In one set of experiments, 
bonobos were more tolerant and more cooperative when food was monopolizable 
(Hare et al. 2007). Bonobos have also been observed to share food, even with 
unfamiliar conspecifics (Hare and Kwetuenda 2010; Tan and Hare 2013). Yet, one 
attempt to replicate Hare and Kwetuenda’s finding in bonobos found no evidence of 
this behavior (Bullinger et al. 2013). These authors speculate that this discrepancy 
13  
 
may be related to the rearing of each study’s subjects as Hare and Kwetuenda (2010) 
studied sanctuary-housed bonobos whereas Bullinger et al. (2013) studied zoo-housed 
bonobos. Bonobos have also been described as equally or less tolerant than 
chimpanzees (Jaeggi et al. 2010). In one study, chimpanzees not only shared more 
frequently but also more actively (Jaeggi et al. 2010). A follow up to this study 
highlighted how bonobos received more aggression and were less successful at 
acquiring food from conspecifics than chimpanzees (Jaeggi et al. 2013). Further, 
social tolerance, as measured by the proportion of a group at a resource (an artificial 
termite mound) or inside a resource zone (scattered food), was found to be lower in 
bonobos when compared to chimpanzees (Cronin et al. 2015) and both chimpanzees 
and gorillas (Boose et al. 2013). 
 There are notable species differences with respect to play such that adult 
bonobos play more frequently than adult chimpanzees and rough play is common in 
bonobos, which may reflect higher tolerance (Palagi 2006). Bonobo socio-sexual 
behavior lends support to the SDH; however, the assertation that males compete less 
intensely than chimpanzees is not immediately obvious given the high reproductive 
skew in bonobos (Surbeck, Langergraber, et al. 2017; Ishizuka et al. 2018), 
aggression in mating contexts (White and Wood 2007; Surbeck et al. 2011), and mate 
defense behavior during intergroup encounters or IGEs (Tokuyama et al. 2019). 
 Further evaluation of these models requires considerable behavioral, 
ecological, morphological, and physiological data that is both cross-sectional and 
longitudinal. Further insight could be gained from data on fossil panins as is widely 
used for many other taxonomic groups. However, to date, the Pan fossil record is 
limited to two central incisors and a first molar, recovered from the Kapthurin 
Formation, Kenya (McBrearty and Jablonski 2005). A second molar was reported but 
14  
 
is not discussed in detail. These fossils are likely from the same individual and are 
estimated to be near 545 ka in age (McBrearty and Jablonski 2005). Overlap in the 
dental variation in bonobos and chimpanzees does not permit any insight on whether 
this individual was more bonobo or chimpanzee-like. However, these fossils do 
highlight that Pan lived beyond its current range around 545 ka.  
 In the absence of such a fossil record, we must turn to genetic and genomic 
data to gain additional insight on the evolutionary history of the genus Pan. The 
section below provides a review of the previous research on this topic.  
 
Genetic and Genomic Perspectives on Pan Evolutionary History 
 The phylogenetic proximity of Pan to humans meant that these taxa were the 
focus of some of the earliest studies on non-human primate genetics. The first draft of 
the chimpanzee genome became available in 2005 (The Chimpanzee Sequencing and 
Analysis Consortium 2005) and the bonobo genome in 2012 (Prüfer et al. 2012). 
Following the publication of these genomes, additional genomes were sequenced 
across all great ape species as part of the Great Ape Genome Project (GAGP) (Prado-
Martinez et al. 2013). To date, this remains the largest genomic dataset for non-human 
hominids. While some additional data have been subsequently generated, these data 
form the foundation from which the majority of our understanding of great ape 
genomes stems, including this dissertation. In the following paragraphs, I briefly 
review this body of literature focusing solely on Pan.   
 The first analysis for admixture using whole genome sequences calculated D 
statistics from two western, seven eastern, and seven central chimpanzees and three 
bonobos and found no evidence of interspecies gene flow (Prüfer et al. 2012). 
However, de Manuel et al. (2016) used a larger sample size from the GAGP and 
15  
 
found evidence of gene flow within chimpanzee lineages and at least two episodes of 
introgression from bonobos into central chimpanzees. Further evidence of these 
events come from an analysis examining the potential adaptiveness of the putatively 
introgressed regions (Nye et al. 2018). More recently, introgression from an extinct 
Pan species into bonobos was reported (Kuhlwilm et al. 2019). 
 Central to understanding the evolution of bonobos and chimpanzees, is 
assessing the potential nature of mutations that have occurred in these lineages over 
time following their divergence. This range of possible effects is captured by the 
distribution of fitness effects or DFE (Eyre-Walker and Keightley 2007). Two 
important parameters for DFE are the shape parameter (b) and the mean selection 
coefficient for deleterious mutations (Sd). One recent analysis of great ape genomes, 
including bonobos and chimpanzees, found that the model with a shared b across all 
species and a lineage-specific Sd  fit the genomic data better than other models 
(Castellano et al. 2019). This suggests a strong effect of effective population size, or 
Ne, on purifying selection, which is consistent with nearly neutral theory (Ohta 1992). 
Another analysis found that lineages with the smallest historical Ne had low levels of 
genetic diversity, larger numbers of deleterious homozygous alleles, and an increased 
proportion of deleterious variants at low frequency (Han et al. 2019). However, the 
efficacy of purifying may be less constrained given higher deleteriousness. Analysis 
of loss of function variants indicated that the number of variants was related to Ne but 
the number of variants was more equal across lineages with different Ne for variants 
that had drastic phenotypic effects (de Valles-Ibáñez et al. 2016). 
 Genomic data from bonobos, chimpanzees, gorillas, and orangutans reveal a 
more complex relationship between Ne and adaptation in these taxa. Both the 
proportion of nonsynonymous substitutions and the ratio of adaptive to neutral 
16  
 
divergence were positively correlated to long-term Ne (Cagan et al. 2016). Assuming 
that the targets for most selective sweeps are near or in genes, Nam et al. (2017) found 
that the relative amount of genetic diversity in great apes was more reduced in species 
with higher Ne. Simulations suggested that background selection alone could not 
explain this pattern. This reduction in diversity could be explained by either stronger 
sweeps or a higher frequency of selective sweeps in larger populations. The authors of 
this study suggest the latter, which is consistent with the theoretical prediction that 
larger populations wait “less” than smaller populations for beneficial mutations to 
occur, per a hard sweep model in which a beneficial allele arises de novo and rapidly 
sweeps to fixation (Maynard Smith and Haigh 1974). A recent application of machine 
learning to the GAGP data yielded partial supported this perspective. Nye et al. 
(2020) used a random forest algorithm and employed 15 different statistics to detect 
selective sweeps including a demographic model from Schmidt et al. (2019). While 
central chimpanzees had the most selective sweeps, the highest genomic proportion of 
putative sweeps, and total number of genes, this linear relationship was not upheld for 
the remaining three chimpanzee subspecies with smaller Ne. Further, Castellano et al. 
(2019) did not find a relationship between the proportion of beneficial alleles and Ne 
based on zerofold nonsynonymous and fourfold synonymous sites, although bonobos 
had a substantially high number of beneficial mutations, despite their low estimated 
Ne. 
  Studies of Pan genomics are useful to testing hypotheses and predictions from 
population genetic theory and these results inform the nature of evolutionary 
processes in these lineages. Of equal interest is the identification of genomic regions 
that exhibit various selection signatures that may inform the genomic underpinnings 
of phenotypes involved in lineage divergence.  
17  
 
 Considerable attention has focused on host evolutionary responses to disease 
in apes, especially bonobos and chimpanzees. The ongoing COVID-19 pandemic 
highlights the need for understanding zoonoses for human health as well as 
anthroponoses given the endangered status of Pan and other primates, some of whom 
are susceptible to COVID-19 (Melin et al. 2020; Melin et al. 2021). One example 
infection that may have shaped Pan immune systems is simian immunodeficiency 
virus (SIV). SIV has been known to occur in chimpanzees (SIVcpz) for over two 
decades (Gao et al. 1999) although it is curiously absent from bonobos (Inogwabini 
2020). Indeed, HIV-1 is partially the result of a zoonotic event from an SIV infected 
chimpanzee (Sharp and Hahn 2011). Until recently, SIVcpz was thought to be non-
pathogenic in chimpanzees although this is no longer the case (Keele et al. 2009; 
Etienne et al. 2011; Terio et al. 2011). Genomic data point to several regions that may 
reflect the potential selective pressure. There is evidence for at least one selective 
sweep near the major histocompatibility complex in chimpanzees (de Groot et al. 
2010; Prüfer et al. 2012). Cagan et al. (2016) used Fay and Wu’s H statistic (Fay and 
Wu 2000) to identify IDO2 as a candidate for recent positive selection in all four 
chimpanzee subspecies and bonobos. McDonald-Kreitman tests (McDonald and 
Kreitman 1991) were used by Cagan et al. (2016) and revealed HIVEP1 as a positive 
selection candidate in bonobos and eastern chimpanzees. While SIV is absent from 
bonobos, a number of potentially zoonotic diseases were recently reported in this 
species (Medkour et al. 2021). HKA tests (Hudson et al. 1987) also identified genes 
related to the activation of the innate immune system (GO category: complement 
activation) to be significantly enriched in bonobos (Cagan et al. 2016). Schmidt et al. 
(2019) used a modification of population-branch statistics to examine recent 
adaptation in central and eastern chimpanzees. These authors did not find evidence for 
18  
 
enrichment in immune genes in central chimpanzees. However, multiple immune 
related GO categories as well as genes in three different sets of viral interacting 
proteins were significantly enriched. This signature was so strong that the removal of 
genes in these categories greatly reduces the selection signature (Schmidt et al. 2019). 
 Beyond the immune system, other Pan phenotypes show evidence of 
selection. Kovalaskas et al. (2020) reported two candidate SNPs subject to recent 
positive selection near AMY2A using XP-EHH, a test that detects recent selection. 
AMY2A codes for the production of pancreatic amylase and the authors suggest their 
findings offer support that bonobos are adapted to the consumption of starchy 
resources compared to chimpanzees (Kovalaskas et al. 2020). 
 Positive selection may have also shaped phenotypes related to the SDH. 
Kovalaskas et al. (2020) report a strong signal near DIO2. This gene provides the 
brain with triiodothyronine (T3). Interestingly, bonobos exhibit higher levels of 
circulating T3 than compared to chimpanzees and humans (Verena Behringer et al. 
2014). Both SOX5 and SOX14 were identified as under recent positive selection 
(Kovalaskas et al. 2020). SOX5 organizes the production of cartilage cells (Lefebvre 
et al. 2001) and is involved in nervous system development, which may have 
consequences on skeletal morphology, particularly in the cranium. SOX14 has been 
associated with nervous system development and several disorders that impact the 
face (Arsic et al. 1998).  
 Genes that underlie social behavior may also been the targets of adaptation. 
Kovalaskas et al. (2020) identified variants in CD38, DRD1, OT, and OXTR as well as 
AVPR1A in bonobos. These genes are well characterized in modulating social 
behavior and genetic variation in AVPR1A has been previously linked to sociality in 
Pan. Staes et al. (2014) originally reported no polymorphism for a regulatory element 
19  
 
(RS3) of AVPR1A; however, both bonobos and chimpanzees appear to be 
polymorphic for this locus and this variation has been linked to differences in 
personality (Anestis et al. 2014; Staes et al. 2015; Staes et al. 2016). Yet, Staes et al. 
(2015) did not find an association between OXTR variation and sociality in 
chimpanzees. Finally, humans and bonobos shared a single amino-acid change in 
TAAR8, which encodes a G-coupled protein receptor that may provide social cues 
(Prüfer et al. 2012). 
 Positive selection may have also shaped Pan brains. Cagan et al. (2016) 
described an enrichment for genes under recent positive selection in the GO 
categories “dendrite” and “neuron spine” in central chimpanzees. NRXN3 exhibited a 
strong signature of recent positive selection using Fay and Wu’s H statistic in central, 
eastern, and Nigeria-Cameroon chimpanzees. This gene is largely expressed in the 
brain and related to synaptic plasticity and transmission. This test also detected 
CSMD1 in bonobos, eastern chimpanzees, and Nigeria-Cameroon chimpanzees, 
which is a gene with unknown function but is highly expressed in the nervous system.  
 Signals of balancing selection appear to be shared to a higher degree than 
adaptive signals in great apes, including Pan (Cagan et al. 2016). It is not surprising 
that many immunity-related genes were found to be under balancing selection in 
bonobos and chimpanzees (and the other apes) (Cagan et al. 2016). This study also 
found evidence for enrichment in genes involved in keratinocyte differentiation 
(LCE3D, LCE3E, SCEL, SPRR2B, SPRR2G) in western chimpanzees and cornified 
envelope development in central, Nigeria-Cameroon, and western chimpanzees. 
CDSN was also identified as a putative balancing selection candidate in bonobos and 
western chimpanzees. Cagan et al. (2016) note that balancing selection on these genes 
20  
 
may enable low levels of pathogen penetrance into a host potentially resulting in 
immunity to such pathogens.  
 Other immunity-related candidates for balancing selection have been 
identified. Cheng and DeGiorgio (2020) developed a suite of statistics, called B 
statistics, and applied these to both humans and bonobos. MHC-DQ and MHC-DP 
were identified as well as KLRD1, which encodes a cell-surface antigen, and GPNMB, 
which encodes osteoactivin (a transmembrane glycoprotein found on several cells) 
(Cheng and DeGiorgio 2020). Balancing selection may act on innate immune genes as 
an intergenic region between BPIFA2 and BPIFB4 exhibited a strong selection signal. 
Balancing selection may also act on non-immunity related phenotypes. Cheng and 
DeGiorgio (2020) describe potential selection on genes related to pain and 
neurodevelopment including EPHA6, HPCAL1, SCN9A, and SUSD2. This study also 
noted that such a signal may arise because of conflicting functions, which may 
explain the observed signatures in CAMK4, GPNMB, and PDE1A.   
 The studies above primarily focus on allelic variation; however, many other 
changes and/or interactions can occur that impact bonobo and chimpanzee 
phenotypes. Inversions can play an important role in disease but they are notoriously 
difficult to characterize. Porubsky et al. (2020) recently identified novel simple 
inversions and inverted duplications in the great apes, including bonobos and 
chimpanzees, which may contribute to differences in Pan phenotypes. Soto et al. 
(2020) described new structural variation in chimpanzee genomes, including variants 
in 56 genes that may underlie chimpanzee phenotypes. A recent high-quality bonobo 
genome assembly also revealed novel structural variants (Mao et al. 2021). This study 
identified gene family expansion in EIF4A3, a translation initiation factor subunit, 
that began ~2.9 Ma and resulted in six and five copies in bonobos and chimpanzees, 
21  
 
respectively. These authors also described 15,786 bonobo-specific insertions and 
7,082 deletions. These deletions are enriched in membrane-associated genes with 
extracellular domains and two structural variants ablate LYPD8 and SAMD9 (Mao et 
al. 2021). As with allelic variation, structural variation may be related to Ne. Sudmant 
et al. (2013) found that western chimpanzees and bonobos (and Sumatran orangutans) 
exhibited an excess of segregating duplications > 30 kb. Further, western 
chimpanzees also exhibited an excess of segregating deletions > 30 kb. As these 
populations are estimated to have experienced recent bottlenecks, it appears that Ne 
may affect the extent of structural variation in great apes and other species. While the 
project described in this dissertation does not focus on structural variation, this area is 
a key future avenue for understanding the genomic architecture of Pan phenotypes.   
 
Project Overview  
  This dissertation uses genomic data on all five Pan lineages to answer 
questions about their evolutionary history following divergence, specifically related to 
adaptation and demography. The second chapter of this dissertation, which includes 
unpublished but co-authored material with Frances White and Timothy Webster, 
focuses on signatures of positive selection that reflect adaptation in deeper time using 
two approaches. We find that most genes with sufficient statistical power to evaluate 
for selection have been subject to purifying selection. Candidates for positive 
selection are largely unique to each lineage and include genes related to the brain, 
immune system, musculature, reproduction and skeletal system. We did not find 
evidence of a shared pattern among chimpanzee lineages except for one gene, which 
may reflect the deep divergence and variation within the species. 
22  
 
 The third chapter of this dissertation, which includes co-authored unpublished 
material with Frances White, Nelson Ting, and Timothy Webster, considers some of 
the most recent evolutionary processes in Pan evolution. We use supervised machine 
learning to identify genomic regions that are evolving neutrally, are linked to selective 
sweeps, or subject to a recent hard or soft sweep. In the four lineages we could 
analyze, we find that soft sweeps are overwhelmingly more common than hard 
although most of the genome is linked to these sweeps or is evolving neutrally. Most 
sweep windows are unique to each lineage although there are some shared windows, 
particularly for soft sweeps and especially between central and eastern chimpanzees. 
We find evidence of enrichment for genes related to the nervous system in central 
chimpanzees and identify candidates that may drive phenotypic differences in these 
taxa.  
 The fourth chapter of this dissertation addresses the evolutionary history of 
these lineages and includes unpublished but co-authored material with Frances White, 
Alan Rogers, and Timothy Webster. This topic has been the subject of many analyses 
resulting in some increased agreement of particular demographic parameter estimates 
whereas others remain less well known. Some currently used demographic methods 
produce biased parameters. We build and compare various demographic models by 
analyzing the site patterns of derived alleles and find that a simpler model than have 
been previous proposed best fits the data. This model includes an episode of 
introgression from bonobos into central chimpanzees and also points to a deeper 
divergence in the chimpanzee common ancestor than formerly estimated.   
 These results not only shed light on different facets of Pan evolutionary 
history at various points following speciation, they also offer important insight on 
evolutionary processes broadly and, more specifically, processes that occurred 
23  
 
specifically in western and central Africa during a critical time period for the 
evolution of other species in this region, including humans.  
24  
 
CHAPTER II 
ADAPTATION DURING DIVERGENCE IN BONOBOS (PAN PANISCUS) AND 
CHIMPANZEES (PAN TROGLODYTES) 
 
 Frances White, Timothy Webster, and I conceived of this analysis. The 
assembly and mapping of the genomic data in this analysis was conducted by 
Timothy Webster and he provided some code for the preparation of the SnIPRE 
analysis. I performed the other data analyses and wrote the initial draft of the 
manuscript. Frances White, Timothy Webster, and I edited the manuscript.  
 
Introduction 
 Genomic data can provide an important window into the evolutionary past of a 
population, particularly when paleontological and archaeological data are lacking. 
Considerable emphasis has been placed on positive selection and the identification of 
adaptive traits that may differentiate a lineage from others. However, methods for 
detecting positive selection are dependent on genetic variation, which can be impacted 
by a population’s demographic history (Nielsen 2001; Przeworski 2002). Mitigating 
such effects require either models robust to demography or the specification of a 
demographic model, when such information is known or can be inferred. Further, 
different metrics are informative for specific timescales such that some selection tests 
are better suited for more recent events, whereas others speak to the distant past 
(Weigand and Leese 2018). In particular, two tests are especially useful for detecting 
older signatures of selection: Hudson-Kreitman-Aguadé (HKA) tests (Hudson et al. 
1987) and McDonald-Kreitman (MK) tests (McDonald and Kreitman 1991). 
25  
 
However, HKA tests can result in false positives under certain migration rates 
(Nielsen 2001). 
 MK tests, on the other hand, are more robust to certain aspects of demography 
because the rate of polymorphism and divergence for two site categories are 
compared within a gene in a single lineage and an outgroup is used to determine 
divergence. These rates are compared because neutral theory predicts that the ratio of 
polymorphic to divergent sites should be equal for both synonymous and non-
synonymous sites (McDonald and Kreitman 1991). A significant result from this test 
does not offer any information about the type of selection, only that a neutral model 
can be rejected (Nielsen 2001). Yet, a significant difference can arise in a population 
of constant size and under an additive model of selection in one of two ways. An 
excess of divergent nonsynonymous mutations would suggest that different amino 
acids are being selected for, and thus suggesting positive selection, whereas fewer 
than expected divergent nonsynonymous mutations indicates the locus is under 
negative or purifying selection, removing mutations that would alter the resulting 
amino acids. This also assumes that mutations are strongly deleterious. However, if 
weakly deleterious mutations are present at a locus, they are unlikely to become fixed 
and can inflate the number of polymorphic sites, reducing the power to detect positive 
selection. This issue may be ameliorated by excluding rare alleles (i.e., biallelic sites 
whose minor allele frequency (MAF) was < 0.1). The other major consideration for 
MK tests are changes in effective population size, hereafter referred to as Ne 
(McDonald and Kreitman 1991; Eyre-Walker 2002). When Ne increases, a larger 
number of mutations shift from nearly neutral to deleterious, thus increasing the 
constraint on a gene and decreasing the effectively neutral mutation rate (Wright and 
Andolfatto 2008). Therefore, sufficient differences in Ne between the time period that 
26  
 
is captured by polymorphisms vs the time period reflected in substitutions may result 
in different effectively neutral mutation rates. For example, slightly deleterious alleles 
may have been fixed during a population’s divergence but may not affect 
polymorphisms following population size increase resulting in false signatures of 
adaptation (McDonald and Kreitman 1991). This suggests that, given particular 
population histories, MK tests may not be able to distinguish between positive 
selection or reduced constraint during divergence. Awareness of such caveats is key 
to applying this critical test and its extensions to detect positive selection.  
 The genus Pan provides an intriguing model for understanding positive 
selection at deeper time scales. The two extant species, bonobos (Pan paniscus) and 
chimpanzees (P. troglodytes), diverged ~ 1.88 Ma (de Manuel et al. 2016) and 
chimpanzees subsequently split into four subspecies (Stumpf 2011). While both 
species exhibit a considerable and often overlooked number of similarities, 
phenotypic differences, particularly evidenced in behavior, are well documented. 
Despite sharing a male philopatric fission-fusion social structure with similar 
community sizes, bonobos and chimpanzees, on the whole differ in patterns of power, 
adult sex-based bondedness, and gregariousness, such that adult male chimpanzees 
exhibit the strongest bonds with other males and are typically aggressively dominant 
to females (Goodall 1986; Wrangham 1986; Boesch and Boesch-Achermann 2000; 
Mitani 2009; Nishida 2011), whereas relationships among females and between males 
and females are strongest in bonobos, females can hold high ranking positions, and 
aggression is less intense and less frequent than in chimpanzees (Kano 1992; White 
1996; White and Wood 2007; Furuichi 2011; Tokuyama and Furuichi 2016; 
Moscovice et al. 2017). Additionally, some chimpanzees engage in lethal aggression 
both within and between communities (Watts 2004; Kaburu et al. 2013; M.L. Wilson 
27  
 
et al. 2014). While lethal aggression is not a defining characteristic of chimpanzees 
because it is so variable and has not been seen in all communities, the behavior has 
never been observed in bonobos, with only one potential suspected case (Hohmann 
and Fruth 2011, but see White 2012). These patterns appear unlinked to 
anthropogenic influence and lethal aggression (or lack thereof) may be adaptive (M.L. 
Wilson et al. 2014). Indeed, the typical nature of intergroup encounters (IGEs) 
appears fundamentally different between the two Pan species (Kano 1992; Boesch et 
al. 2008; Mitani et al. 2010; Furuichi 2011; Fruth and Hohmann 2018; Sakamaki et al. 
2018; Lucchesi et al. 2020).   
 A number of hypotheses have been proposed to explain these differences, 
either socio-ecological or behavioral. Socio-ecological hypotheses point to differences 
in available terrestrial herbaceous vegetation that would allow grouping by reducing 
competition (Wrangham 1986; Wrangham et al. 1996) or qualitative and/or 
quantitative differences in food patches that would actively select for female 
cooperation (White 1986). Behavioral hypotheses focus on such factors as the 
importance of mothers on male reproductive success (e.g., Kano 1992), the role of 
tension regulation in social contexts (de Waal 1989), and the impact of female 
coalitions (Tokuyama and Furuichi 2016). Additionally, sexual selection may drive 
phenotypic differences in Pan as suggested by the self-domestication hypothesis that 
posits female bonobos selected for less aggressive males resulting in some phenotypes 
that are similar to some domesticated mammals (Wrangham and Pilbeam 2001; Hare 
et al. 2012). Females may specifically select for less aggressive males to reduce 
infanticide rather than monopolize resources, the “baby dominance hypothesis” 
(Walker and Hare 2017).  
28  
 
 Although testing of these hypotheses typically requires considerable 
behavioral and ecological data, a new, complementary approach uses genomic data to 
identify signatures of adaptation that address various explicit predictions of these 
hypotheses. These include, for example, looking at the potential impact of the thyroid 
on morphology and behavior, examining digestive enzymes to test hypotheses on the 
importance of different foods, and considering the proximate mechanisms for species 
differences in reproduction-related traits. Variation in the ontogenetic patterns of 
circulating thyroid hormone (triiodothyronine or T3) has been suggested as an 
explanation for differences in bonobos and chimpanzees (Verena Behringer et al. 
2014). Recently, a single nucleotide polymorphism, or SNP, near DIO2, a gene that 
catalyzes the conversion of thyroxine (T4) to T3, has been reported to exhibit a 
signature of positive selection in bonobos (Kovalaskas et al. 2020). This study also 
described an adaptive signature near the AMY2 locus in bonobos. As AMY2 codes for 
pancreatic amylase, the authors interpreted this result as support for the THV 
hypothesis. Embedded in several of the hypotheses above are differences in female 
reproduction between bonobos and chimpanzees (Stumpf 2011). Han et al. (2019) 
previously noted enrichment for bonobo-specific nonsynonymous changes at loci 
associated with menarche in humans. Thus, we predicted genes related to 
reproduction to exhibit signatures of positive selection.  
 In addition to testing these and other candidate genes, genome-wide selection 
scans can also shed light on previously underappreciated unique or shared phenotypes 
between extant lineages (i.e., “reverse ecology” (Li et al. 2008)), that have not yet 
been built into a hypothesis. 
 The present study investigates adaptation in Pan that occurred in the distant 
past, closer to the speciation of the extant members of this genus using MK tests. We 
29  
 
build on a previous analysis (Cagan et al. 2016) using reassembled data and a 
different and improved chimpanzee reference genome, addressing contamination 
issues, and including all five, rather than four, Pan lineages. We also apply a Bayesian 
implementation of a generalized linear mixed model to identify putative candidates 
for positive selection that leverages genome-wide averages to increase statistical 
power. 
 
Methods 
Genomic Data 
 We retrieved raw short read data on bonobos and all four chimpanzee 
subspecies from the Great Ape Genome Project (GAGP) (Prado-Martinez et al. 2013). 
This dataset contained high coverage genomes 
(https://github.com/brandcm/Dissertation: File S0: Figures S1, S2) from 13 bonobos 
(P. paniscus), 18 central chimpanzees (P. troglodytes troglodytes), 19 eastern 
chimpanzees (P. t. schweinfurthii), 10 Nigerian chimpanzees (P. t. ellioti), and 11 
western chimpanzees (P. t. verus). See https://github.com/brandcm/Dissertation: Files 
S0 and S1 for more information on these samples. MK tests require an outgroup to 
determine whether substitutions are unique or shared. We retrieved short read data on 
a high-coverage human female, HG00513, collected as part of the 1000 Genomes 
Project (Auton et al. 2015) to use as the outgroup sequence (Biosample ID: 
SAME123526).  
 
Read Mapping and Variant Calling 
 Initial quality assessments in fastqc (Andrews 2010) and multiqc (Ewels et al. 
2016) indicated a number of quality issues, including failed runs, problematic tiles, 
30  
 
and substantial variation in base quality. We removed adapters and trimmed all reads 
with BBduk (https://sourceforge.net/projects/bbmap/). For trimming, we used the 
parameters “ktrim=r k=21 mink=11 hdist=2 qtrim=rl trimq=15 minlen=50 maq=20” 
for all reads and added “tpo and tpe” for paired reads.  
 We used XYalign (Webster et al. 2019) to create versions of the chimpanzee 
reference genome, panTro6 (Kronenberg et al. 2018), for male- and female-specific 
mapping. Specifically, the version of the reference for female mapping has the Y 
chromosome completely masked, as its presence can lead to mismapping (Webster et 
al. 2019). We then mapped reads with BWA MEM (Li 2013) and used SAMtools (Li 
et al. 2009) to fix mate pairs, sort BAM files, merge BAM files per individual, and 
index BAM files. We use Picard (Broad Institute 2018) to mark duplicates with 
default parameters, before calculating BAM statistics with SAMtools. We next 
measured depth of coverage with mosdepth (Pedersen and Quinlan 2018), removing 
duplicates and reads with a mapping quality less than 30 for calculations. 
 We used GATK4 (Poplin et al. 2018) for joint variant calling across all 
samples. We used default settings for all steps—HaplotypeCaller, CombineGVCFs, 
and GenotypeGVCFs—with three exceptions. First, we turned off physical phasing 
for computational efficiency and downstream VCF compatibility with filtering tools. 
Second, because multiple samples in this dataset suffer from contamination from 
other samples both within and across taxa (Prado-Martinez et al. 2013), we employed 
a contamination filter to randomly remove 10% of reads during variant calling. This 
should have the effect of reducing confidence in contaminant alleles. Finally, we 
output non-variant sites to allow equivalent filtering of all sites in the genome and 
more accurate assessments of callability. 
31  
 
 The above quality control, assembly, and variant calling steps are all contained 
in an automated Snakemake (Köster and Rahmann 2012) available on GitHub 
(https://github.com/thw17/Pan_reassembly). The repository also contains a Conda 
environment with all software versions and origins, most of which are available 
through Bioconda (Grüning et al. 2018). 
 
Variant Filtration  
 We considered only autosomes for this analysis as the X and Y chromosome 
violate many of the assumptions for the following methods (Webster and Wilson 
Sayres 2016). We also excluded unlocalized scaffolds (N = 4), unplaced contigs (N = 
4,316), and the mitochondrial genome from any downstream analyses. Additional 
filtration steps were completed using bcftools (Li 2011) and command line inputs are 
provided in parentheses. MK tests rely on accurate assessments of whether a SNP is 
synonymous or nonsynonymous. We first normalized variants by joining biallelic 
sites and merging indels and SNPs into a single record (“norm -m +any”) using the 
panTro6 FASTA. Next, we filtered to retain only coding sequence (“-R 
CDS_autosomes.bed”) as designated by the panTro6 GFF (retrieved from: 
https://www.ncbi.nlm.nih.gov/genome/202?genome_assembly_id=380228). Further, 
we only included single nucleotide polymorphisms (SNPs) (“-v snps”) that were 
biallelic (“-m2 -M2”). On a per sample basis within each site, we marked genotypes 
where sample read depth was less than 10 and/or genotype quality was less than 30 as 
uncalled (“-S . -i FMT/DP ≥ 10 && FMT/GT ≥ 30”). To ensure that missing data did 
not bias our results, we further excluded any sites where less than ~ 80% of 
individuals (N = 56) were confidently genotyped (“AN ≥ 112”). We also removed any 
positions that were monomorphic for either the reference or alternate allele (“AC > 0 
32  
 
&& AC ≠ AN”). While lack of or low coverage at a locus is problematic, loci with 
excessive coverage are also of concern. These sites may yield false heterozygotes that 
are usually the result of copy number variation or paralogous sequences (Li 2014). As 
our data exhibit a high degree of inter-individual and inter-chromosomal variation in 
mean coverage (Brand et al. 2021), we applied Li’s (2014) recommendation for a 
maximum depth filter (d + 4√d) to the mean chromosomal coverage of the individual 
in our sample (Pan or Homo) with the highest coverage and excluded any loci that 
exceeded this value (“filter -e FMT/DP > d + 
4√d")	(https://github.com/brandcm/Dissertation: File S2). These filtration steps 
yielded 291,782 SNPs for our downstream analyses 
(https://github.com/brandcm/Dissertation: File S0: Table S1). 
 
Analysis   
 We built a custom database in snpEff (Cingolani et al. 2012) using only the 
assembled autosomes from the panTro6 FASTA and the panTro6 GFF (“java -jar 
snpEff/snpEff.jar build -gff3 -v chimp”). We included only rows for coding sequences 
(‘CDS’) for assembled autosomes in the GFF and used one transcript per gene (N = 
20,265). In cases where genes had multiple transcripts, we determined the longest 
transcript (based on CDS bp) using a custom R script (R Core Team 2020), 
one_transcript_per_gene_filtered_gff.R, and used that transcript. This database was 
then used to annotate VCFs for each autosome via snpEff. Allele frequencies per SNP 
per Pan population were calculated via VCFtools (Danecek et al. 2011). 
 For each autosome in each Pan lineage, we used a modified R script, 
command_line_mk_script.R, to run MK tests. This script was based on an existing 
33  
 
script (https://github.com/thomasblankers/popgen/blob/master/MKTtest) and uses the 
stringr, version 1.4.0 (Wickham 2019) and tidyverse, version 1.3.1 (Wickham et al. 
2019) packages. Our script first filtered for SNPs identified by snpEff as either 
synonymous or missense (nonsynonymous) and subsequently categorized each SNP 
as 1) divergent (i.e., fixed for different alleles) and synonymous (Ds), 2) divergent and 
nonsynonymous (Dn), 3) polymorphic and synonymous (Ps), or 4) polymorphic and 
nonsynonymous (Pn) via the Pan allele frequencies calculated in VCFtools above (for 
all four categories: Ds, Dn, Ps, Pn) and using the human sample as the outgroup (for Ds 
and Dn). We summed the number of SNPs per category per gene and then ran Fisher’s 
exact test on the contingency table for each gene using a < 0.05.  
 Our script also calculated the neutrality index (Rand and Kann 1996), NI =
!!/#! = !!	#" , per gene. Values greater than one reflect more polymorphic 
!"/#" !"	#!
nonsynonymous sites than expected, or an abundance of weakly deleterious alleles, 
whereas values less than one suggests more fixed nonsynonymous mutations than 
expected, i.e., adaptive mutations. This statistic is informative when Ps and Dn are 
defined; however, this is not always the case (Stoletzki and Eyre-Walker 2011). NI 
can also be biased when either or both Ps and Dn are small (Stoletzki and Eyre-Walker 
2011). Therefore, we also calculated the direction of selection, DoS, statistic 
(Stoletzki and Eyre-Walker 2011), 𝐷𝑜𝑆 = 	 #! −	 !!   and used this metric to 
#!%#" !!%!"
classify genes as subject to either positive (DoS > 1) or purifying (DoS < 1) selection.   
 We immediately discarded genes where the contingency table was incomplete 
and Fisher’s test could not be performed (i.e., there were either no fixed SNPs or no 
polymorphic SNPs). We further removed genes (N = 6,892) for which < 50% of the 
coding sequence exhibited poor coverage across the entire Pan sample (N = 71) 
34  
 
(Brand et al. 2021) and retained 13,228 genes 
(https://github.com/brandcm/Dissertation: File S3). Fisher’s exact test is 
underpowered, i.e., exhibits a high false positive rate, when the overall observations 
in a contingency table are low (Begun et al. 2007; Holloway et al. 2007; Andolfatto 
2008; Darolti et al. 2018). We followed Holloway et al. (2007) and excluded genes 
for which the sum of each row and column in the 2 x 2 table was < 5. We designated a 
gene as a candidate for being previously subjected to natural selection when both the 
p-value for Fisher’s exact test was < 0.05 and the sum of each row and column in the 
contingency table was ≥ 5. We further categorized these genes as subject to positive 
selection where DoS was > 1 and purifying selection where DoS was < 1.  
 We repeated the analysis above two times. In our second analysis, we removed 
SNPs whose minor allele frequency (MAF) was < 0.1 in order to assess the effects of 
weakly deleterious mutations on our results. We also considered within gene 
heterogeneity, i.e., differences between exons of the same gene, by constructing 
contingency tables and running the aforementioned analyses per exon rather than per 
gene. To ensure that bias in CDS length did not affect our analyses, we visualized the 
distribution of SNPs/bp for all genes/exons per lineage that passed our initial filter (an 
incomplete contingency table) and the distribution of SNPs/bp for candidate 
genes/exon under positive selection.  
 Additionally, we performed another set of selection analyses implemented 
using SnIPRE (Eilertson et al. 2012). This non-parametric approach uses the same 
input data as MK tests (i.e., the contingency table) as well as genome-wide 
information on the average and variance in polymorphism to divergence, therefore, 
increasing power. Additionally, if the assumptions of a neutral demographic model 
are met, the resulting parameters can be used to estimate the strength, directionality, 
35  
 
and timing of selection. We applied the SnIPRE method to our data where the 
contingency table was complete and did not set a row/column filter for the number of 
SNPs. SnIPRE also requires the fraction of time a site can mutate synonymously or 
nonsynonymously. We generated these data per gene using the panTro6 FASTA and a 
custom script, collect_snipre_data.py. We used the Bayesian implementation of 
SnIPRE, using the default MCMC sampling settings by discarding the first 10,000 
values, retaining every fourth value, and running 15,000 iterations per chain 
(“BSnIPRE.run(data, burnin = 10000, thin = 4, iter = 15000)”). 
 All scripts used in our analysis can be found at: 
https://github.com/brandcm/Pan_MK. Figures were generated using ggplot2, version 
3.3.3 (Wickham 2016). Additionally, some scripts to build figures used gridExtra, 
version 2.3 (Baptiste 2015). Unique candidate genes for positive selection and those 
shared between two or more lineages were visualized using Upset plots created with 
the ComplexUpset package, version 1.2.1 (Lex et al. 2014; Krassowski 2020). 
 
Data Availability 
 The raw data underlying this article are previously published (Prado-Martinez 
et al. 2013; de Manuel et al. 2016) and are available from the Sequence Read Archive 
(PRJNA189439 and SRP018689) and the European Nucleotide Archive 
(PRJEB15086). 
   
Results 
 The distribution for the number of annotations per SNP ranged from 1 to 5 
(https://github.com/brandcm/Dissertation: File S0: Figure S3), with approximately 
90% having one annotation. Therefore, disagreement between variant effects across 
36  
 
multiple annotations as determined by snpEff is unlikely to bias these results. We 
found that the number of SNPs per autosome available for use in our MK analysis 
was similar across all five populations after filtering 
(https://github.com/brandcm/Dissertation: File S4). After filtering for loci whose 
MAF was < 0.1, the remaining number of SNPs was strongly related to estimated Ne 
and partially related to sample size (https://github.com/brandcm/Dissertation: File 
S4). Thus, ~ 20% of SNPs were excluded for central chimpanzees, ~ 10% for eastern 
chimpanzees, and < ~ 6% for the other lineages.  
 
Gene Analysis 
 The results for all assessable genes can be found at 
https://github.com/brandcm/Dissertation: Files S5 and S6. Based on the DoS statistic, 
the majority of significant candidate genes for each population were found to be under 
purifying selection (Table 1). The number of candidate genes for both positive and 
purifying selection was variable across lineages and somewhat mirrored both Ne as 
well as the sample size used in the analysis. As predicted, the number of statistically 
significant genes with a positive selection signature changed when the MAF filter was 
applied. These lists were not only shorter but also included a number of genes not 
previously identified in the analysis without the MAF filter. This was particularly true 
for eastern chimpanzees where 14/23 (60.9%) of the list included different genes as 
well as central chimpanzees for which 11/21 (52.4%) were new. The presence of 
slightly deleterious mutations reduces the power to detect positive selection because 
such mutations will disproportionately affect polymorphic sites. Therefore, we 
combined the list of genes exhibiting a signature of positive selection generated both 
with and without the MAF filter per lineage and consider every gene in these collated 
37  
 
lists to be a candidate for positive selection. The results of these analyses with and 
without the MAF filter can be found in https://github.com/brandcm/Dissertation: Files 
S5 and S6 and the subset of positive selection candidates are provided in 
https://github.com/brandcm/Dissertation: Files S7. 
 
Table 1. Number of candidate genes under positive selection, purifying selection, and 
the number of total genes/exons tested per lineage after all filtration steps. 
 
Analysis Type P. paniscus P. t. ellioti P. t. P. t. P. t. verus 
schweinfurthii troglodytes 
per gene Positive 11 24 31 41 5 
Purifying 55 51 74 89 34 
Total Tested 1399 1679 2433 3180 906 
per gene, Positive 6 14 23 21 4 
without 
rare alleles Purifying 14 27 19 19 17 
Total Tested 477 1091 1101 1170 474 
per exon Positive 5 5 8 17 3 
Purifying 5 4 15 18 9 
Total Tested 378 482 705 1028 231 
 
 
We found that the distributions of SNPs per bp for candidates under positive 
selection per lineage overlapped the distribution for all assessable genes/exons 
(https://github.com/brandcm/Dissertation: File S0: Figures S4-S6). The candidate 
genes/exons that fell within the right tail of these distributions were almost completely 
short sequences (< 1k bp). Only one gene, RNF213, exhibited a signature of positive 
selection in all five lineages (Figure 1, https://github.com/brandcm/Dissertation: File 
S8). Contrary to our prediction, we did not find any genes under positive selection 
that were unique to all four chimpanzee lineages (Figure 1). Consistent with 
phylogenetic expectations, the most shared selection signals were between eastern and 
central chimpanzees (N = 10) (Figure 1). Further, candidate genes for bonobos were 
38  
 
unique to their lineage except for one gene, NPAP1, which was also detected in 
eastern chimpanzees (Figure 1). Figures S7 and S8 
(https://github.com/brandcm/Dissertation: File S0) show candidate gene overlap for 
our datasets including and excluding rare polymorphisms, respectively.  
 
Table 2. Candidate genes under positive selection per lineage from the MK analyses. 
Results from the analyses with and without rare polymorphisms (MAF < 0.1) are 
combined here. 
 
Lineage Genes 
P. paniscus ALPK2, C2AH2orf78, CC2D2B, EFCAB8, KCNU1, NPAP1, OR5J2, PIK3C2G, 
RNF213, SCAPER, TSHR, WDR49, ZNF135 
P. t. ellioti  ABCC2, ALMS1, ANKRD30A, CXCR1, DNAH14, DNAH6, FAN1, FARP2, 
HASPIN, HEATR5A, IL1RL2, LOC100608661, LOC100613827, LOC466407, 
LOC739832, LOC741747, MIA3, OR51G1, PARP14, RNF213, SHPRH, SLC17A3, 
SLC26A3, SNTG1, SOHLH2, TDRD15, TGM3, XRN1, ZFAND4 
P. t. schweinfurthii ADAM2, ADGRV1, ANKS4B, BDP1, C7, CAGE1, CCPG1, CEACAM5, CX3CR1, 
DDX60L, DNAH14, DNAH6, DOCK8, EFCAB5, FAM111A, FAN1, FGA, FLT3, 
GCNT2, HAVCR1, HEATR5A, IFI16, JHY, KIAA1257, KIAA2026, LOC100608661, 
LOC107972003, LOC451494, LOC469634, MROH8, MYH2, NPAP1, NXPE2, 
OR6X1, RNF213, SAMD7, SLC17A3, SLC26A3, SLC6A16, TMCO5A, TRIM5, 
UBR2, VWA8, XRN1, ZFAND4 
P. t. troglodytes ABCB5, ADGRV1, ANKS4B, ANP32C, BUB1, CMYA5, COL24A1, CX3CR1, 
DHTKD1, DNAH6, FAM111A, FAM209A, FAM71A, FGA, FLT3, GEMIN4, 
HASPIN, HEATR5A, HERC5, HMCN1, IQCA1L, JHY, LOC100608047, 
LOC107966998, LOC452946, LOC456268, LOC466407, LOC468520, 
LOC469634, LRRC53, M1AP, MROH8, MYH2, NLRP11, PCDHB10, PKDREJ, 
PPP1R15A, RNF213, SCUBE2, SLC17A3, SLC26A3, TANC1, TGM3, TMPRSS2, 
TOGARAM1, TTC6, TTLL6, TULP2, XRN1, ZFAND4, ZGRF1, ZNF480, ZNF518A, 
ZNF649 
P. t. verus HASPIN, KIAA1257, PRAME, RNF213, SHPRH, SLC6A16, ZNF473 
  
The function of some candidate loci (N = 13) in our analysis is unknown 
(genes beginning with “LOC”), in addition to a few other genes that have been 
assigned an identifier but whose function remains unknown or poorly understood. 
However, some interesting patterns emerge for the remaining loci. We caution that the 
following counts should be treated with caution as the full function of many genes is 
unknown, which may underestimate this value. Conversely, some of these genes are 
only associated with particular phenotypes and the causality is not fully determined. 
39  
 
 
Figure 1. Upset plot of unique and shared candidate genes for positive selection.  
  
We note a number of genes that are associated with the brain and the central 
nervous system were found to be positive selection candidates (N = 18), including 
JHY in eastern and central chimpanzees. Consistent with the hypothesis that disease 
has strongly shaped recent hominid evolution, many candidates (N=12), particularly 
for eastern and central chimpanzees, are known or are speculated to play a role in 
immune function. We identified a few other functional categories including genes 
related to sensory systems (N = 10), muscle development, function, and maintenance 
(N = 4), the skeletal system (N = 4), and reproduction (N = 4). As predicted, TSHR 
emerged as a candidate for positive selection in bonobos.  
 We found some overlap in genes exhibiting signatures of positive selection at 
the gene-level when compared to a previous MK analysis in four of the same lineages 
40  
 
(Cagan et al. 2016). This includes 4 genes in P. paniscus, 6 in P. t. ellioti, 10 in P. t. 
schweinfurthii, and 4 in P. t. verus (https://github.com/brandcm/Dissertation: File S9). 
 
Exon Analysis 
 Despite reduced power to detect positive selection at the exon-level, we found 
a number of specific exons (N = 16) across all lineages that exhibited a significantly 
different ratio of polymorphisms to substitutions relative to the entire gene, when 
excluding genes with unknown function and those with only one exon (Table 3) 
(https://github.com/brandcm/Dissertation: Files S10 and S11). Six of the genes 
(FSIP1, KIAA1755, LRRC63, MROH7, NLRP8, VPS13D) were only discovered using 
this exon-level analysis as they were not significant when the entire gene was 
considered, both with and without rare alleles. 
 
SnIPRE Analysis 
 The SnIPRE approach yielded an entirely new set of candidate genes for 
positive selection as compared to the MK tests (Table 4) 
(https://github.com/brandcm/Dissertation: Files S12-16). The majority of the genes 
are unknown or poorly characterized. One gene, MUC17, was detected in all five Pan 
lineages. This analysis did yield a shared gene among chimpanzees, RFPL4B, and a 
gene shared by all chimpanzees except for P. t. verus, MS4A3.  
 
  
41  
 
Table 3. Exons with a statistically significant signature of positive selection excluding 
genes whose function is unknown and genes with only one exon. ppn = P. paniscus, 
pte = P. t. ellioti, pts = P. t. schweinfurthii, ptt = P. t. troglodytes, ptv = P. t. verus. 
 
Gene Exon Chromosome Lineage pN pS dN dS p-value DoS 
ANKS4B exon 2 chr16 pts 1 4 7 1 0.03185703 0.675 
BDP1 exon 24 chr5 ptt 13 7 14 0 0.02621228 0.35 
CMYA5 exon 12 chr5 ptt 49 34 43 12 0.02650057 0.19145674 
FAM111A exon 2 chr11 pts 2 5 14 0 0.00103199 0.71428571 
   ptt 5 6 13 0 0.00343249 0.54545455 
FGA exon 2 chr4 pts 2 4 24 1 0.00224235 0.62666667 
ptt 3 5 24 1 0.00128931 0.585 
FSIP1 exon 2 chr15 ptt 1 6 5 0 0.01515152 0.85714286 
KIAA1755 exon 12 chr20 ptt 1 4 7 1 0.03185703 0.675 
LRRC63 exon 8 chr13 pts 2 5 5 0 0.02777778 0.71428571 
MROH7 exon 1 chr1 pts 1 5 7 2 0.04055944 0.61111111 
NLRP8 exon 3 chr19 ptt 6 12 8 2 0.0460717 0.46666667 
PPP1R15A exon 1 chr19 ptt 3 4 11 1 0.0379257 0.48809524 
SLC6A16 exon 11 chr19 ptv 1 4 8 1 0.02297702 0.68888889 
TSHR exon 10 chr14 ppn 0 5 6 4 0.04395604 0.6 
VPS13D exon 18 chr1 pts 1 8 5 3 0.04977376 0.51388889 
 ptv 0 6 5 3 0.03096903 0.625 
ZNF135 exon 4 chr19 ppn 0 5 5 2 0.02777778 0.71428571 
ZNF480 exon 4 chr19 ptt 0 6 5 2 0.02097902 0.71428571 
 
 
 
Table 4. Candidate genes under positive selection per lineage from the SnIPRE 
analyses. 
 
Lineage Genes 
P. paniscus LOC104001091, LOC107972003, MUC17 
P. t. ellioti  C2AH2orf16, C6H6orf201, C9H9orf131, CD244, FGA, LOC101058953, 
LOC104001091, LOC107967308, LOC107969623, LOC107970484, 
LOC107972003, LOC470467, LOC739951, MS4A3, MUC17, NBPF7, RFPL4B 
P. t. schweinfurthii C6H6orf201, LOC100608661, LOC101058953, LOC104001091, LOC107970484, 
LOC107972003, LOC739951, MS4A3, MUC17, RFPL4B 
P. t. troglodytes C6H6orf201, FAM208B, LOC101057029, LOC104001091, LOC107972003, 
LOC470467, MS4A3, MUC17, RFPL4B, TXNDC2 
P. t. verus C2AH2orf16, LOC104001091, LOC107970484, LOC107972003, MUC17, RFPL4B 
 
 
42  
 
Discussion 
 The limited number of methods appropriate for detecting adaptation at deeper 
time scales underscores the importance of their appropriate application to 
understanding the evolutionary history of a lineage. Here, we confirm and expand 
upon genomic insights into the past of the genus Pan. Using the direction of selection 
statistic, we found that the majority of candidates appear subjected to purifying 
selection. Curiously, controlling for rare alleles as a means to correct for slightly 
deleterious alleles was related to sample size and estimated Ne. As these variables are 
correlated in this dataset, additional samples are needed to tease apart whether this 
result could be driven by sample size, Ne, or another unknown factor.     
 Only one gene, RNF213, emerged as a candidate for positive selection across 
all lineages based on the traditional MK test. This gene is relatively large (CDS = 
15,771 bp) and encodes a finger motif that functions as an E3 ubiquitin ligase (Wu et 
al. 2012). The locus has been associated with Moyamoya disease, an uncommon 
cerebrovascular progressive disease that results in the narrowing and blockage of 
blood vessels (Kamada et al. 2011; Liu et al. 2011; Wu et al. 2012). The phenotypic 
effects of this gene may partially explain some of the morphological and 
physiological differences in the brain and resulting cognitive differences between 
Homo and Pan. This shared gene is striking given the estimated divergence for the 
genus. Thus, convergence in bonobos and chimpanzees may be more plausible as an 
explanation for a shared adaptive signature at this locus.   
 As predicted, we identified one thyroid associated gene, a receptor, as a 
positive selection candidate in bonobos: TSHR. This is consistent with the hypothesis 
that thyroid-related differences may contribute to species differences between 
bonobos and chimpanzees. Additionally, our results align with the currently available 
43  
 
data on ontogenetic changes in circulating T3 in Pan and Homo (Verena Behringer et 
al. 2014). That study found that chimpanzees exhibited a decline in T3 at 
approximately ten years of age, falling within the variation observed in humans, 
whereas T3 in bonobos did not decline until about 20 years of age. While differences 
in thyroid hormone “rhythms” may facilitate speciation (Crockford 2003), it is unclear 
whether sequence variation would result in such ontogenetic differences. Future work 
at and near this locus and DIO2, as reported by Kovalaskas et al. (2020), may help 
shed light on how the thyroid contributes to differences in Pan. 
 We did not find any evidence supporting positive selection at AMY2A or 
AMY2B in bonobos. Based on our sample, these genes did not exhibit enough variants 
for robust statistical testing; however, the data suggest that both genes may be subject 
to purifying selection (https://github.com/brandcm/Dissertation: File S5). One caveat 
is that starch digestion differences in bonobos vs. chimpanzees could be shaped by 
only one or a few SNPs. MK tests would fail to detect such a signal because multiple 
genic changes increase power to identify selection signatures. This also does not 
preclude structural variation or differences in gene expression that may enable these 
loci to shape bonobo feeding ecology. For example, the SNPs identified by 
Kovalaskas et al. (2020) that occurred near AMY2A may be related to gene 
expression. Additional study of these loci, as well as AMY1, may yield support for the 
THV hypothesis. This is particularly needed as it has been proposed that, despite a 
gain in copy number relative to chimpanzees, AMY1 may be non-functional in 
bonobos (Perry et al. 2007). However, the current behavioral and ecological data do 
not offer more than sparse support (Yamakoshi 2004). Further, if the SNPs identified 
by Kovalaskas et al. (2020) at AMY2A are responsible for such phenotypic effects, 
44  
 
this would imply that differences in Pan feeding ecology are ~ 100 ka old, which 
seems unlikely.  
 Differences in the reproductive and sexual cycles of female bonobos and 
chimpanzees have been well documented. Indeed, Han et al. (2019) described a 
number of nonsynonymous changes in bonobo genes that have been associated with 
menarche in humans. Additionally, sperm competition is not evenly distributed 
among hominoids and one would predict selection to act on genes related to this 
phenotype. While one study found little evidence for adaptation in these genes in Pan 
compared to Gorilla and Homo (Good et al. 2013), we found evidence of at least one 
reproduction related gene in all Pan lineages, including genes related to male 
reproduction. A few genes appear to impact reproduction broadly: KIAA1257 in P. t. 
verus, which may impact gene expression of the gene NR5A1 that is a transcriptional 
activator involved in sex determination (Sakai et al. 2008); and SOHLH2 in P. t. 
ellioti, that affects both oogenesis and spermatogenesis (Toyoda et al. 2009). Some 
remaining candidates appear to primarily act on male gametes: PKDREJ in P. t. 
troglodytes (Hamm et al. 2007), and PRAME in P. t. verus (Chang et al. 2011). Two 
other candidates have been implicated in male murine reproduction but remain 
unknown in humans: ADAM2 in P. t. schweinfurthii (Choi et al. 2016) and KCNU1 in 
P. paniscus (Vyklicka and Lishko 2020).  
 Physical differences between bonobos and chimpanzees are well described. 
Additionally, variation and the distribution of such variation between different 
chimpanzee populations has also be reported (Groves 2001), as data are somewhat 
lacking for P. t. ellioti and the full extent of variation may not be yet realized in the 
other lineages. Four of the positive selection candidates are related to skeletal 
variation and another four seem to affect the development and maintenance of muscle 
45  
 
tissue. Collectively, these candidates covered all chimpanzee subspecies except for 
western chimpanzees and we did not find any such candidates in bonobos. For 
example, FAM111A was detected as a candidate in both eastern and central 
chimpanzees. While this gene appears to be highly pleiotropic, its link to skeletal 
development is well established (Unger et al. 2013). MYH2 also appeared as a 
candidate in these lineages. This gene codes for a protein that is an essential 
component of myosin (Tajsharghi et al. 2005) and variants at this locus or the other 
skeletal/muscle related candidate genes may contribute to physical differences in Pan.   
 Infectious disease has been long thought to play an important role in human 
and hominid evolution. Concordant with this perspective, we found multiple genes 
involved in antiviral activity and immune function in the non-Western chimpanzees. 
These include DDX60L, HERC5, and TMPRSS2. We also note that balancing 
selection likely plays an equally, if not more important, role in the evolution of 
immune systems (Andrés et al. 2009). A number of important studies (Ferguson et al. 
2012; Cagan et al. 2016; Cheng and DeGiorgio 2020) have provided valuable insight 
on such roles and work in this arena is very clearly just beginning. 
 Positive or purifying selection may not uniformly impact the entire coding 
sequence of a gene. Therefore, the consideration of selection per exon can identify 
previously unknown candidates and better pinpoint the region(s) under selection. We 
identified a number of specific exons that exhibited signatures of positive selection, 
some of which were detected in our gene-level analyses. The functions of three of the 
novel positive selection candidates are not well understood beyond some associations 
with cancer: FSIP1, KIAA1755, and LRRC63. One candidate gene revealed by our 
exon-level analysis, MROH7, was detected in eastern chimpanzees and may be related 
to reproduction (Kenigsberg et al. 2017). This class of genes, Maestro Heat-Like 
46  
 
Repeat Family Members, is generally not well understood; however, another family 
member MROH8, was detected in both eastern and central chimpanzees by our gene-
level analysis and has been associated with Alzheimer’s, suggesting brain-related 
functions for that locus and potentially other MROH genes (Potkin et al. 2009). We 
found another signature of positive selection at a reproduction related-gene (Tian et 
al. 2009) in the third exon of NLRP8 in central chimpanzees. Finally, the eighteenth 
exon of VPS13D emerged as a positive selection candidate in eastern and western 
chimpanzees. Variants in this gene have been associated with neurological and 
movement effects and the locus appears involved in mitochondrial clearance (Anding 
et al. 2018; Gauthier et al. 2018; Seong et al. 2018). 
 Application of a complementary method, SnIPRE (Eilertson et al. 2012), to 
detect signatures from positive selection from MK data also yielded a few additional 
candidate genes, most of which do not even have a gene symbol identifier and are 
thus poorly characterized. MUC17 was categorized as a candidate in all five lineages. 
This gene codes for mucins that protect epithelial cells (Moniaux et al. 2006) and 
variants at this locus were recently associated with endometriosis and infertility in 
Taiwanese women (Yang et al. 2015). Similar to RNF213, the deep divergence of Pan 
makes finding another gene with a shared selective signature across all descendent 
lineages quite puzzling. RFPL4B was identified across all four chimpanzee subspecies 
but the function of this gene and its associated phenotypes are not well understood.  
While the absence of multiple shared genes across all four chimpanzee 
subspecies based on the traditional MK test and the identification of only one shared 
gene by SnIPRE is initially puzzling, this pattern can be explained by a number of 
factors. The neutral theory of molecular evolution posits that genetic diversity is a 
balance between mutation and drift (Kimura 1983). Accordingly, genetic drift is 
47  
 
inversely proportional to Ne. As such, the power to detect selection, particularly 
positive selection, seems to be particularly reduced for bonobos and western 
chimpanzees. Chimpanzees may share more genes under positive selection than 
reported here but low polymorphism in P. t. verus does allow for such patterns to be 
detected. This is also exacerbated by our strict filtering to reduce the number of false 
positives. Reducing our row/column sum to values less than five SNPs, immediately 
decreases the number of genes that can be evaluated in each lineage. Yet, we favor 
this more conservative approach used here and suggest that potential positive 
selection candidates present in other lineages but missing from P. t. verus be 
examined using a larger sample and with different methods.  
These data may also or conversely suggest that the selection signatures 
described here are older than signatures from selective sweeps, yet far younger than 
the time period immediately following the estimated Pan divergence (~ 1.88 Ma). 
This may point to rapid and strong divergence in the chimpanzee subspecies. A recent 
analysis of selective sweeps in these lineages revealed that soft sweeps make up a 
substantial proportion of recent positive selection (Brand et al. 2021). It is quite 
possible that if these evolutionary processes operated similarly for the past ~2 Ma, 
then rapid adaptation could be more likely.  
Along with other genome-wide analyses, MK tests may result in a higher 
number of false positives due to multiple hypothesis testing. That is, there is a 5% 
chance that non-significant data could be deviant enough due to random chance to 
produce a statistically significant result. This probability increases with the number of 
tests run. Therefore, the significant results from a set of MK tests may be composed 
of both true positives and false positives. The false discovery rate reflects the 
proportion of false positives among all significant results. Typically, one adjusts (or 
48  
 
corrects) α to address false positives; however, given the nature of the data used in 
MK tests, this is not typically done (e.g., Begun et al. 2007; Cagan et al. 2016). This is 
because such a correction might result in true positives (i.e., genes under positive 
selection) being missed based on an adjusted α. For example, application of a 
Benjamini-Hochberg procedure (Benjamini and Hochberg 1995) to these data using a 
25% false discovery rate results in zero significant results for both negative and 
positive selection across all lineages. A critical second step is to subsequently 
examine candidate gene sequences for those genes deemed statistically significant. In 
addition, because significant results are used in MK tests to only identify possible 
candidate genes and these are then further evaluated, this second step reduces 
potential impact of false discoveries while minimizing the damaging effect of 
excluding true positives through a conservative correction. Subsequent examination of 
gene sequences, therefore, allows for a confident assessment of true positives in the 
candidate gene list. 
 As described above, the MK test is generally robust to specific aspects of 
demography when compared other methods. However, differences in more recent Ne, 
reflected in the polymorphism data, vs. Ne during divergence, reflected in the 
substitution data, violates the premise of equivalent neutral mutation rates because the 
effective neutral mutation rate is related to Ne. Eyre-Walker (2002) examined the 
conditions under which population size differences yielded false signals of adaptation. 
When there is no selection on synonymous codon use and the population size changed 
recently a 3-fold increase can generate a false signature, whereas size differences 
further back in evolutionary time requires changes of even larger magnitude (Eyre-
Walker 2002). Curiously, when there is selection on synonymous codon use, it 
becomes more difficult to generate a false signal of positive selection; however, this 
49  
 
does not appear to be the case in humans. In light of these findings and recent 
estimates for Pan population history, we conclude that differences in Ne were not 
sufficient to generate widespread false positives in this study.  
 Another MK test model assumption is that selection coefficients do not vary 
over time at a locus. The environmental variability of the Plio-Pleistocene in Africa 
(deMenocal 2004) would suggest that this assumption is unlikely for many taxa, 
including Pan. Yet, there does not appear to be a consensus on how fluctuating 
selection coefficients would shape MK tests. Huerta-Sanchez et al. (2008) found that 
variation in s increased the ratio of substitutions to polymorphisms; mimicking the 
signature of positive, directional selection. Gossmann et al. (2014) replicated this 
finding, but, they also reported that mutations that contribute to divergence and 
polymorphism tend to be net positive over their lifetimes. Those authors conclude that 
such adaptive signatures are genuine but rates of adaptation are likely underestimated 
when s fluctuates. 
 Finally, we note that a number of genes were not analyzed in the present study 
due to suboptimal read depth across the entire sample. As this issue is inherent to 
genome-wide analyses, complementary subsequent candidate gene analyses for loci of 
particular interest are warranted (e.g., AVPR1A). Additionally, genes located on the 
sex chromosomes could not be tested here without violating model assumptions. 
Different methods are needed to identify signatures of positive selection at deeper 
time scales on X and Y chromosome genes.  
 This analysis highlights candidate genes that may be involved in lineage 
divergence in Pan, including genes related to the brain, immunity, musculature, 
reproduction, and skeletal system. Further analysis of TSHR in bonobos is warranted, 
particularly given the developmental differences in T3 between bonobos and 
50  
 
chimpanzees. The absence of multiple genes unique to chimpanzees may point to 
deep divergence in common chimpanzees and may be supported by phenotypic 
variation observed within chimpanzees.  
51  
 
CHAPTER III 
SOFT SWEEPS PREDOMINATE RECENT POSITIVE SELECTION IN 
BONOBOS (PAN PANISCUS) AND CHIMPANZEES (PAN TROGLODYTES) 
 
 Frances White, Nelson Ting, Timothy Webster, and I conceived of this 
analysis. The assembly and mapping of the genomic data in this analysis was 
conducted by Timothy Webster. I performed the data analyses and wrote the initial 
draft of the manuscript. Frances White, Nelson Ting, Timothy Webster, and I edited 
the manuscript. 
 
Introduction 
 The identification of adaptative traits and their genetic basis is one of the 
central goals of evolutionary biology. Two approaches, top-down and bottom-up, 
have been used to accomplish this goal; the latter of which leverages population-level 
data to recognize the genomic signatures of positive selection (Barrett and Hoekstra 
2011). At the genomic level, the process of adaptation results in a window of reduced 
variation that erodes over time. As these signatures do not persist, they can only be 
used to infer selection over a particular time scale in a population. In most species, 
this time frame is restricted to a few thousand generations, and roughly ~ 200,000 
years in humans (Oleksyk et al. 2010). The classic model for positive selection for a 
given locus proposes that a single, novel mutation, that confers a fitness advantage 
(i.e., a beneficial allele) will rapidly spread in a population and eventually reach 
fixation (Maynard Smith and Haigh 1974). Neutral polymorphism adjacent to the 
novel allele will ‘hitchhike’, resulting in a distinct pattern of reduced genomic 
52  
 
diversity at the locus and surrounding sites. The term ‘hard sweep’ has been used to 
identify this pattern and process.  
 ‘Soft sweeps’ describe the presence of two or more haplotypes that occur at 
intermediate frequencies (Hermisson and Pennings 2005). Thus, the signature of a 
soft sweep is intermediate to those of neutral or ‘background’ genomic variation and 
the signature of a hard sweep. This pattern can result from recurrent de novo 
mutations followed by positive selection. Alternatively, soft sweeps can also result 
from positive selection on standing genetic variation where alleles were already 
present in a population before selection. This variation may be the result of 
independent mutations (multiple origin soft sweep) or when an adaptive allele arose 
before selection, but multiple copies have subsequently swept through the population 
(single origin soft sweep). Soft sweeps are often incorrectly viewed synonymously 
with standing genetic variation; hard sweeps can emerge from standing genetic 
variation if a single copy of the beneficial allele was the ancestor of all beneficial 
alleles in a sample (Hermisson and Pennings 2017). 
 Hard and soft sweeps are locus-specific and, thus, not mutually exclusive 
across a genome. Unsurprisingly, soft sweeps are also much more difficult to 
recognize than hard sweeps because their genomic patterns are intermediate. 
Additionally, the identification of selective sweeps, hard or soft, is further 
complicated by the possibility that neutral loci linked to either soft or hard sweeps 
may produce a false signature similar to that of a sweep (Schrider et al. 2015; Kern 
and Schrider 2018). 
 With these challenges in mind, a considerable amount of work has been 
dedicated to both developing robust methods to identify selective sweeps and also 
understanding the evolutionary parameters that determine hard or soft sweeps. 
53  
 
Mutation-limited scenarios are expected to exclusively produce hard sweeps because 
beneficial alleles rarely occur (Hermisson and Pennings 2017). Thus, the most 
important parameter for estimating the likelihood of hard vs soft sweeps is the 
population-scaled mutation rate: 𝛩 = 4Neμ, where Ne is the effective population size 
and μ is the mutation rate. However, this single parameter can vary widely depending 
on the advantage of the beneficial allele, the effective population size, the size of the 
mutational target, and the timescale for adaptation (Messer and Petrov 2013; 
Hermisson and Pennings 2017). Therefore, adaptation across the genome for a given 
population can be simultaneously mutation-limited and non-mutation-limited (B.A. 
Wilson et al. 2014). While it has become clear that most populations will likely 
exhibit a mosaic of hard and soft sweeps (Hermisson and Pennings 2017), additional 
data on sweep type frequencies in various species are sorely needed to better tease 
apart which parameters may determine each of those frequencies.  
 Both species of the Pan genus represent important evolutionary models due to 
their phylogenetic proximity to humans. Homo and Pan diverged ~ 5 to 7 Ma (Sarich 
and Wilson 1967; Bradley 2008; Scally et al. 2012; Besenbacher et al. 2019) and the 
most recent estimates for the divergence of bonobos and chimpanzees range between 
1 and 2 Ma (Prüfer et al. 2012; de Manuel et al. 2016). Four extant chimpanzee 
subspecies evolved from a chimpanzee common ancestor that split ~ 600 ka with both 
subsequent lineages further splitting: one ~ 250 ka and the other ~ 160 ka (de Manuel 
et al. 2016). These two species exhibit stark differences in aspects of their 
morphology, physiology, behavior, and ecology (Susman 1984; Goodall 1986; 
Wrangham 1986; Kano 1992; White 1996; Furuichi 2011; Nishida 2011; Stumpf 
2011; Verena Behringer et al. 2014; Turley and Frost 2014; M.L. Wilson et al. 2014). 
Many of these distinguishing traits are inferred to have occurred shortly after 
54  
 
divergence, while much less is known about recent evolutionary processes in these 
lineages.  
 Understanding recent positive selection in Pan is intriguing because of the 
dynamic physical and social environments in which they evolved. Climatic variation 
across Africa is well-documented for the Pleistocene (deMenocal 2004) and has been 
proposed to drive the evolution of early Homo (Potts 1998; Antón et al. 2014), and 
such variation probably impacted other taxa throughout the Pleistocene, including the 
genus Pan. Chimpanzee populations living in more stable environments that were 
closer to Pleistocene refugia were recently described to exhibit less behavioral 
diversity than chimpanzees living in more seasonal habitats that are more distant to 
forest refugia (Kalan et al. 2020). While the formation of these refugia may have 
resulted in periods of habitat stability for some bonobo and chimpanzee populations 
during glacial periods (Takemoto et al. 2017; Barratt et al. 2020), climatic fluctuations 
throughout the Pleistocene likely affected both the physical environment—via 
changes in habitat structure and type—and the social environment—via changes in 
the frequency of dispersal and intergroup encounters. Further, evidence of admixture 
within extant and between extant and extinct members of the Pan genus adds even 
more variation to the social environments in which these apes evolved (Hey 2010; 
Wegmann and Excoffier 2010; de Manuel et al. 2016; Kuhlwilm et al. 2019). A 
dynamic environment may result in selection for multiple existing alleles, resulting in 
a greater frequency of soft sweeps than in a more stable environment where one 
would expect a greater frequency of hard sweeps.  
 In this study, we apply a recently developed supervised machine-learning 
approach to population-level genomic data for bonobos (Pan paniscus) and 
chimpanzees (Pan troglodytes) to assess the extent of different completed sweep 
55  
 
types in these species. While a few studies have examined recent positive selection in 
bonobos and chimpanzees (e.g., Cagan et al. 2016; Han et al. 2019; Schmidt et al. 
2019; Kovalaskas et al. 2020; Nye et al. 2020), the role of hard and soft sweeps in 
shaping their adaptations is currently unknown. We sought to categorize genomic 
regions as subject to recent hard or soft sweeps, as linked to recent hard or soft 
selective sweeps, or as evolving neutrally. Data from simulations have predicted that 
hard sweeps would be common in humans because of our overall low mutation rate 
(Hermisson and Pennings 2017). Under this “mutation limitation hypothesis” and 
given the similarity in mutation rate between Homo and Pan, one could predict that 
bonobos and chimpanzees should also exhibit a high degree of hard sweeps. However, 
hard sweeps have been thought and observed to be quite rare in recent human 
evolution (Hernandez et al. 2011; Schrider and Kern 2017), although this perspective 
is debated (Jensen 2014; Harris et al. 2018). This could be explained by several non-
mutually exclusive alternatives including demographic effects. Larger populations can 
have more standing variation for selection to act on (Hermisson and Pennings 2005) 
which may result in more soft sweeps, whereas bottlenecks can result in drift and thus 
potentially more hard sweeps if intermediate frequency haplotypes are lost (B.A. 
Wilson et al. 2014). For example, some human populations experienced recent 
demographic changes (e.g., Schiffels and Durbin 2014), such as a bottleneck upon 
leaving Africa (e.g., Henn et al. 2012). Indeed, Schrider and Kern (2017) found that 
hard sweeps were more frequent in non-African than African populations. 
Chimpanzees and bonobos have also experienced recent demographic changes, 
including in effective population size, within the time frame (< 200 ka) for selective 
sweeps, based on PSMC analyses (Prado-Martinez et al. 2013; de Manuel et al. 2016). 
Three of the five lineages appear to have declined, whereas the other two have 
56  
 
increased and then decreased. Under such changes in population size, the strength of 
selection plays a strong role in the likelihood of soft sweeps (B.A. Wilson et al. 2014). 
We therefore predicted that we would observe a higher frequency of soft sweeps in 
Pan, but that lineage-specific population histories might affect the degree to which 
soft sweeps dominate. 
 
Methods 
Genomic Data 
 We retrieved raw short read data on bonobos and all four chimpanzee 
subspecies from the Great Ape Genome Project (GAGP) (Prado-Martinez et al. 2013). 
This dataset contained high coverage genomes 
(https://github.com/brandcm/Dissertation: File S0: Figures S1, S2) from 13 bonobos 
(P. paniscus), 18 central chimpanzees (P. troglodytes troglodytes), 19 eastern 
chimpanzees (P. t. schweinfurthii), 10 Nigeria-Cameroon chimpanzees (P. t. ellioti), 
and 11 western chimpanzees (P. t. verus). 
 
Read Mapping and Variant Calling 
 Initial quality assessments in fastqc (Andrews 2010) and multiqc (Ewels et al. 
2016) indicated a number of quality issues, including failed runs, problematic tiles, 
and substantial variation in base quality. We removed adapters and trimmed all reads 
for quality with BBduk (https://sourceforge.net/projects/bbmap/). For trimming, we 
used the parameters “ktrim=r k=21 mink=11 hdist=2 qtrim=rl trimq=15 minlen=50 
maq=20” for all reads and added “tpo and tpe” for paired reads. 
 We used XYalign (Webster et al. 2019) to create versions of the chimpanzee 
reference genome, panTro6 (Kronenberg et al. 2018), for male- and female-specific 
57  
 
mapping. Specifically, the version of the reference for female mapping has the Y 
chromosome completely masked, as its presence can lead to mismapping (Webster et 
al. 2019). We then mapped reads with BWA MEM (Li 2013) and used SAMtools (Li 
et al. 2009) to fix mate pairs, sort BAM files, merge BAM files per individual, and 
index BAM files. We use Picard (Broad Institute 2018) to mark duplicates with 
default parameters, before calculating BAM statistics with SAMtools. We next 
measured depth of coverage with mosdepth (Pedersen and Quinlan 2018), removing 
duplicates and reads with a mapping quality less than 30 for calculations. 
Visualizations for coverage and demography (see Generation of Simulated 
Chromosomes below) were created in R, version 3.6.3 (R Core Team 2020), using 
ggplot2, version 3.3.3 (Wickham 2016). 
 We used GATK4 (Poplin et al., unpublished data) for joint variant calling 
across all samples. We used default settings for all steps—HaplotypeCaller, 
CombineGVCFs, and GenotypeGVCFs—with three exceptions. First, we turned off 
physical phasing for computational efficiency and downstream VCF compatibility 
with filtering tools. Second, because multiple samples in this dataset suffer from 
contamination from other samples both within and across taxa (Prado-Martinez et al. 
2013), we employed a contamination filter to randomly remove 10% of reads during 
variant calling. This should have the effect of reducing confidence in contaminant 
alleles. Finally, we output non-variant sites to allow equivalent filtering of all sites in 
the genome and more accurate assessments of callability. 
 The above quality control, assembly, and variant calling steps are all contained 
in an automated Snakemake (Köster and Rahmann 2012) available on GitHub 
(https://github.com/thw17/Pan_reassembly). The repository also contains a Conda 
58  
 
environment with all software versions and origins, most of which are available 
through Bioconda (Grüning et al. 2018). 
 
Variant Filtration and Genome Accessibility 
 We considered only autosomes for this analysis as the X and Y chromosome 
violate many of the assumptions for the following methods (Webster and Wilson 
Sayres 2016). We also excluded unlocalized scaffolds (N = 4), unplaced contigs (N = 
4,316), and the mitochondrial genome from any downstream analyses. Additional 
filtration steps were completed using bcftools (Li 2011); command line inputs are 
provided in parentheses. Given our focus on selective sweeps, we only included single 
nucleotide variants (SNPs) (“-v snps”) that were biallelic (“-m2 -M2”). On a per 
sample basis within each site, we marked genotypes where sample read depth was less 
than 10 and/or genotype quality was less than 30 as uncalled (“-S . -i FMT/DP ≥ 10 
&& FMT/GT ≥ 30”). To ensure that missing data did not bias our results, we further 
excluded any sites where less than ~ 80% of individuals (N = 56) were confidently 
genotyped (“AN ≥ 112”). We also removed any positions that were monomorphic for 
either the reference or alternate allele (“AC > 0 && AC ≠ AN”). These filtration steps 
yielded 41,869,892 SNPs for our downstream analyses 
(https://github.com/brandcm/Dissertation: File S0: Table S2).  
 We considered sites in our sample with low to no coverage to be ‘inaccessible’ 
in the reference genome. Using the output of mosdepth (see Read Mapping and 
Variant Calling above), we identified and filtered sites exhibiting low coverage as 
defined above. We used the ‘maskfasta’ function in bedtools (Quinlan and Hall 2010) 
to mark these sites (N) in the panTro6 FASTA, featuring only the autosomes, for use 
59  
 
in downstream analyses. This resulted in 86.3% of the assembled autosomes as 
accessible (https://github.com/brandcm/Dissertation: File S17).  
 
Generation of Simulated Chromosomes 
 We used the software ‘discoal’ to generate simulated chromosomes on which 
we trained a classifier per lineage (Kern and Schrider 2016). We generated a matching 
number of simulated haploid chromosomes for the sample size of each Pan lineage 
(i.e., 26 chromosomes for 13 P. paniscus, 20 chromosomes for 10 P. t. ellioti, etc.). 
Simulated chromosomes were set to 1.1 Mb in length and divided into 0.1 Mb 
subwindows for a total of 11 subwindows. These simulations included a population-
scaled mutation rate (4NμL), where N is the effective population size, μ is the per base 
pair per generation mutation rate, and L is the length of the simulated chromosome. 
We used the median of the previously reported effective population size range per 
lineage (Prado-Martinez et al. 2013). As estimates of genome-wide mutation rates 
vary considerably and are complicated in that mutation rates vary across individual 
genomes, we based our parameter on a mutation rate of 1.6 x 10-8, which falls 
between estimates from genome-wide data and phylogenetic estimates (Narasimhan et 
al. 2017). We introduced some variation in this rate by setting a lower and upper-
bound to 1.5 and 1.7 x 10-8 and sampled a new mutation rate per simulation drawing 
from this uniform prior. All simulations also included a population-scaled 
recombination rate (4NrL), where r is the recombination rate per base pair per 
generation, again calculated from the median effective population size for each 
lineage from Prado-Martinez et al. (2013) and a recombination rate drawn from a 
uniform prior of 1.1 - 1.3 x 10-8, based on the mean genome-wide rate (1.2 x 10-8) 
reported for bonobos, chimpanzees, and gorillas (Stevison et al. 2015). Recent results 
60  
 
from a different selective sweep classifier, Trendsetter, suggest that including a range 
of recombination rates is important to reducing misclassification (Mughal and 
DeGiorgio 2019). We note that while some of the estimated recombination rates in 
bonobos and chimpanzees are beyond the uniform distribution used in our 
simulations, many of these values are the high rates present in the telomeres, regions 
that generally exhibit lower or no coverage and thus will be largely if not entirely 
masked from this analysis (see Variant Filtration and Genome Accessibility above). 
We also included a demographic string reflecting approximate changes in population 
size for each lineage between ~ 0.05 and 2 Ma. Changes in population size were set in 
units of 4N0 generations, N0 was set to the approximate median effective population 
size from Prado-Martinez et al. (2013) and we used a generation time of 25 years 
(Langergraber et al. 2012). Population size changes for this time period were drawn 
from a previous PSMC analysis (de Manuel et al. 2016) (Figure S3). While this is 
only one study from which to draw demographic information and reconstructions of 
Pan demography vary widely across studies, the downstream program used to classify 
genomic windows, diploS/HIC, is robust to demographic misspecification (Kern and 
Schrider 2018). We generated 2 x 103 simulations using these parameters as a set of 
simulations under neutral evolution per lineage.  
 Hard and soft selective sweeps were simulated with all of the aforementioned 
parameters and using a uniform prior of population-scaled selection coefficients (α = 
2Ns) derived from each lineage’s median effective population size (Prado-Martinez et 
al. 2013) and moderately weak to moderately strong selection coefficients between 
0.02 and 0.05. Sweeps also included a parameter (τ) for the time to fixation of the 
beneficial allele over a uniform range in units of 4N generations. This value ranged 
from 0 to 0.001 for all lineages. Linked-hard and linked-soft sweeps were generated 
61  
 
by placing the selected site at the center of each of the 10 subwindows flanking the 
center (6th) subwindow. Additionally, we included a uniform prior on the frequency at 
which a mutation is segregating at the time it becomes beneficial for soft and linked-
soft sweeps, setting this range from 0 to 0.2. We generated 1 x 103 simulations per 
subwindow for linked-hard and linked-soft sweeps (N = 10) and 2 x 103 simulations 
for hard and soft sweeps.  This resulted in a total of 2 x 103 hard, 1 x 104 hard-linked, 
2 x 103 soft, and 1 x 104 soft-linked simulated sweeps. Parameters for these 
simulations can be found here: https://github.com/brandcm/Dissertation: File S18.  
 
Calculation of Simulation Feature Vectors and Classifier Training 
 We calculated feature vectors from these simulated chromosomes using the 
‘fvecSim’ function in the program diploS/HIC (Kern and Schrider 2018). Briefly, 
diploS/HIC calculates 12 summary statistics for all 11 subwindows: π, Watterson’s θ, 
Tajima’s D, the variance, skew, and kurtosis of genotype distance (gkl), the number of 
multilocus genotypes, J1, J12, J2/J1, unphased Zns, and the maximum value of 
unphased ω. Collectively, these summary statistics capture information about the site 
frequency spectrum (SFS), haplotype structure, and linkage disequilibrium (LD). 
diploS/HIC uses a convolutional neural network (CNN) to capture essential aspects of 
a feature (the feature vector) by sliding a receptive field over the image to compute 
dot product between the original filter and the convolutional filter. In diploS/HIC, the 
CNN uses three branches of a CNN, of which each has two dimensional convolutional 
layers with ReLu activations followed by max pooling. This is followed by a dropout 
layer to control for model overfitting. Outputs from all three units are fed into two 
fully connected dense layers, which also use dropout layers, before arriving at a 
softmax activation that outputs the probability for each categorical class (hard, hard-
62  
 
linked, neutral, soft-linked, or soft). Complete details for this procedure can be found 
in Kern and Schrider (2018). 
 When calculating feature vectors for the simulated chromosomes, we used the 
optional arguments for the ‘fvecSim’ function to mask each simulation with 110,000 
bp segment randomly drawn from our masked FASTA where > 0.25 of SNPs in a 
subwindow were accessible (i.e., not marked by Ns). This enabled us to train our 
classifiers on simulated data featuring the same patterns of inaccessible genomic 
regions that the classifier would encounter in the empirical data.   
 We created a balanced set with equal representation (2 x 103) of all five 
classes via sampling without replacement in which to train the classifier using 
diploS/HIC’s ‘makeTrainingSets’ function. These were divided into 8,000 training 
examples, 1,000 validation examples, and 1,000 testing examples to test the accuracy 
of the classifier via the ‘train’ function in diploS/HIC. We built ten classifiers per 
lineage and selected the one with the highest accuracy to apply to the empirical data 
(https://github.com/brandcm/Dissertation: File 19).  
 A second, independent set of simulated chromosomes was generated per 
lineage using the same parameters. We then calculated feature vectors and created 
another balanced training set with 2 x 103 chromosomes per class (hard, linked-hard, 
neutral, linked-soft, and soft). We used diploS/HIC’s ‘predict’ function by applying 
each trained classifier to all five classes separately per lineage. In other words, we ran 
each classifier on 2,000 simulated hard sweeps, 2,000 simulated linked-hard sweeps, 
2,000 simulated neutral regions, 2,000 simulated linked-soft sweeps, and 2,000 
simulated soft sweeps and for each lineage. We used a binary classification scheme, 
where the identification of a sweep (hard or soft) was considered to be positive and 
linked or neutral regions were negative, to assess the true positive rate, false positive 
63  
 
rate, and obtain a second estimate of accuracy for each trained classifier 
(https://github.com/brandcm/Dissertation: File S0: Tables S3-S6). We also calculated 
class-specific accuracy, by summing the number of instances per lineage where the 
predicted class matched the simulated class divided by the total (1 x 104) 
(https://github.com/brandcm/Dissertation: File S0: Tables S3-S6). 
 
Empirical Data Feature Vectors and Prediction 
 Upon achieving > 0.8 accuracy, each trained classifier was applied to its 
respective Pan lineage. Each autosome was analyzed separately and feature vectors 
calculated using diploS/HIC’s ‘fvecVcf’ function. We supplied this function with the 
masked FASTA for that chromosome and discarded windows where any subwindow 
had < 0.25 unmasked sites following Schrider and Kern (2017) 
(https://github.com/brandcm/Dissertation: File S20). This step reduces the potential 
effect of the number of SNPs in a given window on sweep classification. Finally, the 
trained classifier was applied to the feature vector files using the ‘predict’ function.  
 
Sweep Identification, Potential Target Genes, and Gene Ontology 
 As diploS/HIC outputs the probability for each sweep class, we first report the 
class inferred to be the most likely. However, as the difference between the most 
likely class and the next most likely may be small, we further report windows where 
the sweep class probability is > 0.5, > 0.75, and > 0.9 
(https://github.com/brandcm/Dissertation: File S21). We also examined our data for 
spatial patterns. Windows classified as immediately abutting other windows with the 
same sweep type for hard and soft sweeps were considered to be a single sweep. 
Unique sweep windows and those shared between two or more lineages were 
64  
 
visualized using Upset plots created with the ComplexUpset package, version 1.2.1 
(Lex et al. 2014; Krassowski 2020) in R, version 3.6.3 (R Core Team 2020).  
 We examined what genes lie in the windows identified as being subject to a 
recent selective sweep by extracting the genomic coordinates of all autosomal coding 
regions for the longest transcript per gene (N = 20,119 genes) in the panTro6 genome 
via the panTro6 GFF (retrieved from: 
https://www.ncbi.nlm.nih.gov/genome/202?genome_assembly_id=380228). We used 
the bedtools ‘intersect’ function (Quinlan and Hall 2010) to identify overlap between 
coding regions and candidate sweep windows after converting both CDS and sweep 
window coordinates to 0-start, half-open format. As some coding sequences may have 
been masked (see Variant Filtration and Genome Accessibility above), we extracted 
FASTAs for each coding sequence using bedtools ‘getfasta’ function (Quinlan and 
Hall 2010) and used a custom R script to calculate the percent of each gene that was 
masked. Overall, 66.2% of all coding sequence was unmasked. We excluded listing 
genes for candidate sweep regions if > 50% of the total coding sequence per gene was 
masked. Thus, we considered 13,228 genes as potential targets for selective sweeps 
(https://github.com/brandcm/Dissertation: File S3).  
 We investigated the enrichment of particular pathways by performing a gene 
ontology analysis using the Functional Annotation Tool in DAVID (Huang et al. 
2008; Huang et al. 2009). We used the custom background described above (genes 
whose total coding sequence was > 50% unmasked) rather than all panTro6 genes to 
ensure our analysis was not underpowered. DAVID does not allow for official gene 
symbols to be used in a background list, so we converted gene symbols to Entrez gene 
IDs. As not all gene symbols have a corresponding Entrez gene ID, we removed genes 
for which there was no Entrez gene ID (N = 98 in background list). We collated genes 
65  
 
for both hard and soft sweeps into a single input per lineage. We evaluated statistical 
significance for biological process gene ontology terms via p-values adjusted using 
the Benjamini-Hochberg method (Benjamini and Hochberg 1995). 
 Scripts for all data analyses are available on GitHub 
(https://github.com/brandcm/Pan_Selective_Sweeps). 
 
Data Availability 
 The raw data underlying this article are previously published (Prado-Martinez 
et al. 2013; de Manuel et al. 2016) and are available from the Sequence Read Archive 
(PRJNA189439 and SRP018689) and the European Nucleotide Archive 
(PRJEB15086). 
 
Results 
  We generated four classifiers that reached an acceptable level of accuracy for 
bonobos (P. paniscus), central chimpanzees (P. t. troglodytes), eastern chimpanzees 
(P. t. schweinfurthii), and Nigeria-Cameroon (P. t. ellioti) chimpanzees. These 
classifiers ranged in accuracy from 85.6% (Nigeria-Cameroon chimpanzees) to 93.9% 
(central chimpanzees) (https://github.com/brandcm/Dissertation: File S19). We could 
not produce a sufficiently accurate classifier using realistic parameters for western 
chimpanzees (P. t. verus); therefore, they were excluded from downstream analyses. 
Following Kern and Schrider (2018), we calculated false positive rates by testing our 
classifiers on a second, independent set of simulated chromosomes per lineage. We 
used a binary classification, considering the identification of either sweep type as a 
positive and identification of a linked or neutral region to be negative. Our trained 
classifiers had considerable statistical power (1 - false positives) ranging from 96.6 to 
66  
 
99.2% and a low false positive rate (false positives / false positives + true negatives) 
that ranged from 1.4 to 4.3% across all four classifiers (Tables S2 - S5). When 
considered separately—i.e., true positives only included one sweep type (hard or soft) 
rather than both—we had greater power to detect hard sweeps than soft sweeps, 
averaging 99% and 96.9% across lineages, respectively (Tables S2 - S5). Accuracy 
(true positives + true negatives / total) for identifying sweep regions vs non-sweep 
regions ranged from 94.1 to 98.3%. In addition to the initial class-specific accuracy 
generated during classifier training, a second estimate of class-specific accuracy 
ranged from 81.6 to 92.1% (Tables S2 - S5). 
 We classified ~ 91.6% of the assembled autosomes in each lineage (Table 5, 
https://github.com/brandcm/Dissertation: File S0: Tables S3-S7), even after masking 
for inaccessible regions and excluding windows with few SNPs. We found that soft 
sweeps were abundant in all four lineages, accounting for > 73% of all individual 
sweeps, whereas hard sweeps were relatively rare (Table 5, 
https://github.com/brandcm/Dissertation: File S22). This pattern held true even when 
more stringent posterior probabilities were applied to consider a region a sweep and at 
least 30% of hard sweep windows and 76% of soft sweep windows were called with 
50% or greater posterior probability (https://github.com/brandcm/Dissertation: File 
S21). Genomic regions linked to sweeps were also quite pervasive in all four lineages 
(Table 5); particularly among eastern chimpanzees, where roughly 86% of the 
genome was classified as linked to selective sweeps.  
 
  
67  
 
Table 5. Selective sweep summary per population. 
 Number / Percent of Windows per Class Type Number and Percent of Sweep Type 
Lineage Hard Linked- Neutral Linked-hard soft Soft Total Hard Soft Total 
P. paniscus 85 1,576 7,488 13,168 2,002 24,319 81 1,585 (0.4%) (6.5%) (30.8%) (54.1%) (8.2%) (4.9%) (95.1%) 1,666 
P. t. ellioti 573 6,358 1,389 14,498 1,505 488 1,323 (2.4%) (26.1%) (5.7%) (59.6%) (6.2%) 24,323 (26.9%) (73.1%) 1,811 
P. t. 32 696 1,835 20,179 1,581 32 1,376 
schweinfurthii (0.1%) (2.9%) (7.5%) (83.0%) (6.5%) 24,323 (2.3%) (97.7%) 1,408 
P. t. 224 1,746 5,483 15,121 1,749 184 1,557 
troglodytes (0.9%) (7.2%) (22.5%) (62.2%) (7.2%) 24,323 (10.6%) (89.4%) 1,741 
 
 We examined overlap in windows classified as either a hard or soft sweep 
across lineages, which may reflect either ancestral or parallel adaptation. Most hard 
sweep windows were unique to each lineage; however, we did find some shared 
windows across lineages (Figure 2). Central and Nigeria-Cameroon chimpanzees 
shared the highest number of sweep windows (N = 33) but when weighted by the total 
possible number of windows, the highest overlap for hard sweeps was between 
eastern and Nigeria-Cameroon chimpanzees (7/32 or ~ 0.21). No hard sweeps 
windows were shared across all lineages. Like hard sweeps, most soft sweep windows 
were also unique to each lineage (Figure 3). Among pairs of lineages there was 
remarkable consistency in the number of shared soft sweep windows (N = 111-147), 
even when the total possible number of shared windows is considered. One exception 
is eastern and central chimpanzees who shared nearly twice the number of soft sweep 
windows (N = 267). The highest number of shared soft sweep windows between three 
lineages occurred in the three chimpanzee subspecies (N = 80). Only 19 windows 
were shared across all four lineages.  
 
68  
 
 
 
Figure 2. Unique and shared hard sweep windows. The frequency of windows shared 
by two or more lineages should be considered relative to the total possible number of 
shared windows (i.e., the set size of the lineage with the smallest set size). 
 
 
  
 
69  
 
Figure 3. Unique and shared soft sweep windows. The frequency of windows shared 
by two or more lineages should be considered relative to the total possible number of 
shared windows (i.e., the set size of the lineage with the smallest set size). 
 
 
 
After excluding genes that were > 50% masked, we identified 1,671 candidate genes 
in bonobo hard and soft sweeps, 1,761 genes in central chimpanzee sweeps, 1,372 
genes in eastern chimpanzee sweeps, and 1,844 genes in Nigeria-Cameroon 
chimpanzee sweeps (https://github.com/brandcm/Dissertation: File S23). After 
correcting for multiple testing using the Benjamini-Hochberg method across all 
lineages, we identified only two significantly enriched pathways in central 
chimpanzees: nervous system development and central nervous system development 
(https://github.com/brandcm/Dissertation: File S24).  
  
Discussion 
 Our study contributes to the emerging picture of recent evolution in Pan and 
adaptation more broadly. Contrary to the predictions of a mutation-limitation 
hypothesis, yet concordant with recent results for humans (e.g., Hernandez et al. 
2011; Schrider and Kern 2017) and flies (Garud et al. 2015), we find soft sweeps to 
overwhelmingly predominate regions of the genome experiencing selective sweeps in 
both bonobos and the three chimpanzee subspecies we could analyze. These results 
confirm the prediction from Schmidt et al. (2019) who speculated that soft sweeps 
played a major role in the evolution of eastern and central chimpanzees. Those 
authors also posit that hard sweeps should be more frequent in western chimpanzees 
relative to other subspecies because of their low effective population size. While 
western chimpanzees are estimated to have the lowest effective population size, it is 
estimated to be only slightly lower than that of bonobos for which we found a high 
70  
 
number (95.1%) of soft sweeps (e.g., Prado-Martinez et al. 2013; de Manuel et al. 
2016). It is curious that Nigeria-Cameroon chimpanzees exhibit the most hard sweeps 
in this analysis. While this could be the result of a multitude of factors, it is 
particularly interesting because this lineage has experienced a rather stable effective 
population size in recent evolutionary time as estimated by PSMC (Prado-Martinez et 
al. 2013; de Manuel et al. 2016), whereas a scenario with dramatic population decline 
would be expected to “harden” soft sweeps as haplotypes are stochastically lost, 
resulting in more hard sweeps (B.A. Wilson et al. 2014). 
 Our analysis of shared hard and soft sweeps found that most sweeps of both 
types were unique to each lineage. However, there was a high number of hard sweep 
windows shared between central and Nigeria-Cameroon chimpanzees as well as 
between eastern and Nigeria-Cameroon chimpanzees when the total possible number 
of shared sweeps was considered. Further, there were nearly twice the number of 
shared soft sweep windows shared between eastern and central chimpanzees. These 
results are similar to another recent study that found a large number of candidate 
sweep regions to be shared between those taxa (Nye et al. 2020). It is impossible to 
discern whether or not the overlap in hard sweeps between central and Nigeria-
Cameroon chimpanzees and the overlap in soft sweeps for eastern and central 
chimpanzees is the result of shared ancestry and/or similar environmental conditions 
because both pairs of lineages share a geographic boundary: the Ubangi river for 
eastern and central chimpanzees and Sanaga river for central and Nigeria-Cameroon 
chimpanzees. The overlap in hard sweeps between eastern and Nigeria-Cameroon 
chimpanzees is more puzzling because they are not sister taxa and share a common 
ancestor ~ 600 ka (de Manuel et al. 2016). Therefore, parallel adaptation via similar 
physical and/or social environments may serve as a more likely hypothesis. While the 
71  
 
lowest in overall frequency, we also identified a number of soft sweep windows that 
were shared across three lineages as well as 19 windows that occurred in all four. 
Future work should further investigate these shared sweep windows.  
 As mentioned above, soft sweeps are not exclusively the result of selection on 
standing genetic variation (Pennings and Hermisson 2006a; Pennings and Hermisson 
2006b). However, given the estimated mutation rate for bonobos and chimpanzees, it 
appears unlikely that recurrent de novo mutations explain the majority of these soft 
sweeps. We did not explicitly model for different types of soft sweeps in our analysis. 
However, while soft sweeps from standing genetic variation and de novo mutations 
may exhibit similar genomic signatures, this must be tested before any additional 
conclusions are drawn. Hartfield and Bataillon (2020) recently suggested differences 
in diversity (as measured by π) at the selected locus may be used to differentiate soft 
sweep types, although this may be more difficult to accomplish in outcrossing species. 
Nonetheless, our results reveal a major role of standing genetic variation, and thus 
changes in the physical and social environment, in driving recent adaptations in Pan. 
 A few recent studies have considered the impact of effective population size 
on adaptive evolution in the great apes (Cagan et al. 2016; Nam et al. 2017). Theory 
predicts that the rate of adaptive evolution should be positively correlated with 
effective population size when Nes is >> 1 (Gossmann et al. 2012). Both Cagan et al. 
(2016) and Nam et al. (2017) found a positive association between effective 
population size and the rate of adaptive evolution, measured by proportion of adaptive 
substitutions and the number of selective sweeps, respectively. However, we observed 
no clear linear relationship between the number of sweeps (hard, soft, or both) 
estimated from this analysis and the estimated effective population sizes for these four 
lineages (see https://github.com/brandcm/Dissertation: File S18 for population sizes). 
72  
 
This descriptive result should be considered cautiously because of the limited number 
of lineages analyzed here and the potential confounding effect of phylogeny. It is 
possible that this relationship may not be driven by the number of sweeps, but rather 
the strength of sweeps a population experiences (Nam et al. 2017). Estimates of 
selection strength are generally lacking for the great apes so this relationship remains 
a question for further study.  
 In addition to characterizing broad patterns in the genomic landscape for 
bonobos and chimpanzees, the results of this study also highlight thousands of 
candidate regions and genes for further analysis. We also find additional support for 
previous selection candidates. For example, disease has been long thought to shape 
evolution in primates (Nakajima et al. 2008; van der Lee et al. 2017). The potential 
for disease transmission between non-human primates and humans has also prompted 
much research, particularly focusing on the genomic underpinnings of host responses 
to lentiviruses, which include HIV and SIV (Gao et al. 1999; Van Heuverswyn et al. 
2006; Compton et al. 2013; Nakano et al. 2020). Cagan and colleagues (2016) found 
evidence of recent positive selection within IDO2, a T-cell regulatory gene, among all 
four-chimpanzee subspecies and bonobos. We identified a candidate soft sweep 
region for eastern chimpanzees that overlaps this gene. However, this window had 
one of the lowest posterior probabilities in this lineage (49.7%) and there was a nearly 
equally high probability that this window was linked to a soft sweep (43.8%). Clearly, 
additional work is needed to understand the potential role of IDO2 in Pan evolution. 
Schmidt et al. (2019) recently described three chemokine receptor genes—CCR3, 
CCR9, and CXCR6—had a significant number of highly differentiated SNPs in central 
chimpanzees. We could evaluate all three of these genes in our analysis but only one 
fell within a candidate sweep window: CXCR6. The window containing this gene was 
73  
 
confidently called as a soft sweep with a posterior probability of 85.5%. It is not 
known as to whether or not SIVcpz uses CXCR6 to enter chimpanzee host cells 
(Wetzel et al. 2018). However, multiple lines of evidence for selection either at this 
locus or within the window overlapping this gene prompt a closer examination of this 
genomic region. Finally, TRIM5 fell within a hard sweep window in central 
chimpanzees. TRIM5 is a well-known retrovirus restriction factor that appears subject 
to ancient, multi-episodic positive selection in primates (Sawyer et al. 2005).    
 Recent attention has focused on admixture between lineages in the genus Pan 
and the potential adaptiveness of introgressed genomic elements. de Manuel and 
colleagues (2016) identified 221 genes that fell within putatively introgressed 
elements in central chimpanzees from admixture with bonobos. Some of this 
admixture is estimated to occur < 200 ka, thus within the timeframe that the present 
analysis can detect selective sweeps. While we could not evaluate six of these 221 
genes, five fell within candidate sweep regions in central chimpanzees from our 
study: CDK8, EIF4E3, GRID2, PTPRM, and TRIM5. As described above, TRIM5 was 
unique to central chimpanzees. We found CDK8 in sweep windows for bonobos, 
eastern chimpanzees, and Nigeria-Cameroon chimpanzees. In humans, CDK8 
mutations have been associated with multiple phenotypic effects including hypotonia, 
behavioral disorders, and facial dysmorphism (Calpena et al. 2019). We also 
identified EIF4E3 in candidate sweeps for bonobos whereas GRID2 and PTPRM were 
found in eastern chimpanzees. EIF4E3 is a translation initiation factor (Osborne et al. 
2013) while PTPRM is a member of the protein phosphatase family (PTP) and has 
multiple functions including cell proliferation and differentiation (Sun et al. 2012). 
GRID2 generates ionotropic glutamate receptors and mutations have been associated 
with abnormalities of the cerebellum (Lalouette et al. 1998). 
74  
 
 The gene ontology analysis produced only two statistically significant terms, 
nervous system development and central nervous system development, for a single 
Pan lineage: central chimpanzees. While cognitive and neurological differences are 
widely considered to differentiate bonobos and chimpanzees (e.g., Rilling et al. 2012; 
Stimpson et al. 2016; Staes et al. 2019), we are unaware of any studies that investigate 
variation among chimpanzee subspecies that may explain enrichment for nervous 
system and central nervous system development related genes specifically in central 
chimpanzees. We note that compared to other gene ontology analyses, our level of 
enrichment is quite low. While we excluded a large number of genes from our 
analysis due to poor coverage, our use of a custom background should increase, rather 
than decrease, statistical power.  
 The results from our analysis should be interpreted with some caution. First, 
while our classifiers achieved a high degree of accuracy, it is possible that some 
selective sweeps in each lineage were not detected or regions were incorrectly 
identified as such (Tables S2 - S5). We also note that we did not model small 
selection coefficients (s < 0.02) as we could not accurately classify sweeps under 
weak selection, which may be the result of the large window size (1.1 Mb) used here. 
One consequence may be that if weakly beneficial hard sweeps are present in the 
empirical data, they may have been sometimes classified as soft (Harris et al. 2018). 
Nonetheless, our classifiers were overall quite good at identifying moderately selected 
hard and linked-hard sweeps with both at approximately 95% accuracy across all 
lineages. Neutral and linked-soft regions were the most difficult to recognize with 
neutral regions typically being classed as soft-linked when they did not appear 
neutral. This suggests that the neutral portion of the genome for each lineage is 
slightly underestimated here. Finally, some moderately selected soft sweeps were 
75  
 
identified as hard sweeps in each of our classifiers, suggesting that some portion of 
identified hard sweeps in each lineage are, in fact, soft sweeps. The low false positive 
rates demonstrate the overall accuracy of the observed genomic patterns (i.e., the 
proportion of hard and soft sweeps) for these taxa. However, this point underscores 
the need to conduct subsequent analyses of the candidate regions and genes to 
confirm such the proposed mode of adaptation and investigate any functional 
consequences of that adaptation. In the ‘era of -omics’, the generation of candidate 
regions for any type of selection across populations and species appears to 
overwhelmingly outpace the confirmation of such patterns. Avenues of research that 
investigate these candidate genes in more detail are thus well poised to provide a 
deeper and more accurate understanding of lineage-specific adaptations.  
 Second, background selection, the loss of a linked neutral site from purifying 
selection on a deleterious allele, can potentially mimic patterns of selective sweeps 
and thus may impact the results of this study (Charlesworth et al. 1993). We did not 
explicitly model background selection in our analysis; however, evidence from 
simulations in various taxa demonstrate that this pattern of selection does not 
substantially increase the rate of false positives in selective sweep analyses (Schrider 
and Kern 2017; Schrider 2020). Further, Nam et al. (2017) considered the effect of 
background selection on genomic diversity in extant apes, including all five Pan 
lineages, and note that background selection alone does not produce the observed 
diversity reduction near genic regions in these lineages. While background selection 
may not largely affect certain selective sweep analyses, it may impact estimations of 
demography that are inferred using PSMC/MSMC approaches (Johri, Riall, et al. 
2020; Johri, Charlesworth, et al. 2020). The demographic strings calculated from 
PSMC used in this analysis also broadly agree in population size shape with other 
76  
 
demographic estimates generated using other methods (e.g., Becquet and Przeworski 
2007; Hey 2010), therefore, background selection unlikely affects the demographic 
models used in analysis. Yet, this issue should be strongly considered in future studies 
where demography is only inferred from PSMC/MSMC.      
 Further, sampling bias can reduce the accuracy of identifying selective 
sweeps. If multiple haplotypes are present in a population but only individuals sharing 
one haplotype are sampled, then the sweep would be classified as a hard sweep when 
it is a soft sweep. However, this scenario would only underestimate the degree of 
recent adaptation from soft sweeps. Therefore, if this sampling bias is present in this 
analysis, then soft sweeps may predominate recent Pan evolution to an even larger 
degree than described here. Population structure adds further complications to the 
classification of hard sweeps. Parallel adaptation produces multi-origin soft sweeps at 
the global population level that would appear to be hard in local populations, although 
even local samples may sometimes appear to be soft sweeps (Ralph and Coop 2010). 
Thus, if samples stemmed from one or few local populations then global soft sweeps 
may be misclassified as hard. A previous analysis estimated the geographic origin of 
individuals used in this analysis (de Manuel et al. 2016). These authors found that 
individuals from both eastern and central chimpanzee populations were sampled from 
multiple countries across the geographic range for both subspecies. Therefore, any 
hard sweeps detected in these populations are likely accurate at the subspecies level. 
The precis geographic origin could not be assessed for any of the bonobos or all of the 
Nigeria-Cameroon chimpanzees used in this analysis (de Manuel et al. 2016). As 
such, sampling or geographic bias may partially explain the high degree of hard 
sweeps observed in Nigeria-Cameroon chimpanzees, if they were sampled from a 
smaller geographic area than the other subspecies. We encourage future studies to 
77  
 
consider this potential bias when hard sweeps are encountered in existing data and 
during study design.  
 This analysis focuses on signatures of positive selection at single loci. 
However, there is theoretical and empirical evidence that a number of adaptive traits 
have a complex, multilocus architecture (Pritchard et al. 2010; Yang et al. 2017; 
Bergey et al. 2018). For these polygenic traits, shifts in the physical or social 
environment might result in allele frequency changes at many loci, of which, 
according to models, few to none of which would reach fixation (Pritchard et al. 
2010). This may, in part, explain why hard sweeps appear to be rare in humans and 
other species if it represents a dominant mode of adaptation in these taxa. 
Unfortunately, at this point, we lack the data and methods to investigate the extent of 
polygenic selection across the genome in many non-model taxa such as Pan. Another 
factor to consider is dominance. Here, we assumed advantageous alleles were 
codominant; however, there is evidence that dominance may influence patterns of 
selective sweeps when variants occur via de novo mutation or recurrent mutation 
(Hartfield and Bataillon 2020). It is also worthwhile to address that this analysis 
explicitly focused on modelling very recent completed selective sweeps. Another 
future avenue of study in these lineages is the identification of incomplete or partial 
sweeps using existing approaches (Ferrer-Admetlla et al. 2014; Vy and Kim 2015) as 
well as explicitly modelling both incomplete and complete sweeps to address 
potential “temporal misclassification” (Zheng and Wiehe 2019). 
 Finally, while our approach to identifying hard and soft sweeps is a logical 
first step, future work should consider sweeps within subspecies to assess population-
level (i.e., local), rather than lineage-specific (i.e., global) adaptations. This is 
underscored by the extensive phenotypic variation among chimpanzees, particularly 
78  
 
that of behavioral variation, which includes key characteristics that are often used to 
dichotomize bonobos and chimpanzees (Wilson et al. 2014). Further investigation is 
also clearly warranted in bonobos, whose overall phenotypic variation is likely 
underappreciated compared to chimpanzees (Hohmann and Fruth 2003b; Sakamaki et 
al. 2016; Beaune et al. 2017; Wakefield et al. 2019).  
 This study highlights the importance of changes in physical and/or social 
environment via soft selective sweeps in the recent evolution of our closest living 
relatives, chimpanzees and bonobos. Our results also yield further support for the 
ubiquity of soft, rather than hard, sweeps in adaptation. We contribute candidate 
regions and genes that may help identify unique phenotypes in each Pan lineage. Our 
findings also prompt many new questions including the estimation of selection 
strength coefficients and the degree of haplotypic diversity in candidate sweep 
regions. While our study focuses on these lineages broadly, this point also 
underscores the need for high-coverage genomic data collected using non-invasive 
methods at more local geographies.   
79  
 
CHAPTER IV 
ESTIMATION OF PAN DEMOGRAPHY FROM SITE PATTERNS  
 
 Frances White, Alan Rogers, Timothy Webster, and I conceived of this 
analysis. The assembly and mapping of the genomic data in this analysis was 
conducted by Timothy Webster. Alan Rogers provided some code for this analysis. I 
performed the data analyses and wrote the initial draft of the manuscript. Frances 
White, Alan Rogers, Timothy Webster, and I edited the manuscript. 
 
Introduction 
 The study of hybridization and admixture has a deep history, particularly for 
plants. This research not only contributes to our broader understanding of 
evolutionary processes but can shed light on past environmental conditions or 
population ranges that facilitated such admixture. Further, introgression can introduce 
novel advantageous alleles into a population on which positive selection can act 
(Hedrick 2013). This adaptive introgression is potentially faster than positive 
selection acting on de novo mutations, although it may be slower than adaptation from 
standing genetic variation (Hedrick 2013). Recent work using whole genome 
sequencing data points to the increasing ubiquity of introgression in the evolutionary 
history of large mammals, including hominins (Wall and Hammer 2006; Browning et 
al. 2018; Villanea and Schraiber 2019; Gokcumen 2020; Rogers et al. 2020), 
elephants (Palkopoulou et al. 2018), and bears (Cahill et al. 2015).  
 Our closest living relatives, bonobos (Pan paniscus) and chimpanzees (P. 
troglodytes), have been long examined for genomic signatures of admixture. These 
species can hybridize in captivity (Vervaecke and Van Elsacker 1992) but wild 
80  
 
populations are completely separated by the Congo River. Chimpanzees are also poor 
swimmers (Angus 1971) and are afraid of water (Kano 1992). Interestingly, bonobos 
do not share this fear of water (Kano 1992) and are known to forage in swamps 
(Uehara 1990; Hohmann et al. 2019). The geographic distribution of Pan prompted 
early speculation that the formation of this river, which was dated at the time to ~ 1.5 
- 3.5 Ma, coincided with or prompted speciation in this genus (Horn 1979; Beadle 
1981; Myers Thompson 2003). However, some early genetic studies of Pan 
speciation consistently dated their divergence to younger than 1.5 Ma (Won and Hey 
2005; Becquet and Przeworski 2007; Caswell et al. 2008, but see Stone et al. 2002; 
Yu et al. 2003; Wegmann and Excoffier 2010). Further, it now appears that the Congo 
River is considerably older than previously thought, possibly up to 34 Ma (Leturmy et 
al. 2003; Lucazeau et al. 2003; Anka et al. 2010). Admixture between bonobos and 
chimpanzees would thus require a connection between the north and south banks of 
the Congo River unless a Pan population ranged south enough to travel around the 
headwaters of the Congo River, although the distance makes this scenario less likely 
assuming the historical ranges of Pan are identical to their current ranges (Takemoto 
et al. 2015). We note that the impermeability of this geographic barrier is partially a 
function of river discharge, which can vary widely in both space and time. There is 
some evidence that river discharge has varied in the recent past, which could create an 
opportunity for bonobos and chimpanzees to diverge and subsequent opportunities for 
gene flow (Takemoto et al. 2015). This scenario is the most plausible given the 
current evidence. Such riverine barriers also separate three of the four chimpanzee 
subspecies while western chimpanzees occur west of a large forest-savannah mosaic 
known as the Dahomey Gap (Lester et al. 2021). These rivers also likely experience 
81  
 
variation in discharge, which may facilitate introgression between geographically 
proximate subspecies.  
 Early analyses for gene flow in Pan yielded inconsistent results and did not 
include data from Nigeria-Cameroon chimpanzees. Won and Hey (2005) described 
evidence of gene flow from western chimpanzees into central chimpanzees which was 
reported in subsequent studies (Caswell et al. 2008; Hey 2010; Wegmann and 
Excoffier 2010). Western chimpanzees may have also admixed with other lineages 
earlier in Pan evolutionary history. For example, introgression may have occurred 
from western chimpanzees into the ancestor of eastern and central chimpanzees (Hey 
2010). In contrast, Becquet and Przeworski (2007) did not find evidence of gene flow 
in Pan with the possible exception between bonobos and eastern chimpanzees. 
Admixture between bonobos and both the ancestor of eastern and central chimpanzees 
and the ancestor of all chimpanzees has also been described (Wegmann and Excoffier 
2010). Hey (2010) also noted eastern-central chimpanzee gene flow, suggesting it 
occurred from central chimpanzees into eastern chimpanzees. One model from this 
study also indicated gene flow between eastern and western chimpanzees (Hey 2010). 
This scenario is puzzling given the current Pan biogeography, but this assumes that 
the ranges of the four subspecies have remained relatively static since the chimpanzee 
common ancestor.  
  Admixture inference from whole genome sequences has replicated many of 
these conflicting earlier results. Prado-Martinez et al. (2013) was the first whole 
genome analysis to consider gene flow across all five lineages and noted admixture 
between eastern and Nigeria-Cameroon chimpanzees as well as eastern and western 
chimpanzees. Given the large number of demographic parameters needed to be 
estimated under a complex evolutionary history (i.e., many introgression events), de 
82  
 
Manuel et al. (2016) estimated parameters using two sets of populations: one that 
included all lineages except for western chimpanzees and one that included all 
lineages except for Nigeria-Cameroon chimpanzees. These authors found 1) 
additional evidence for bonobo introgression into chimpanzees, 2) evidence of earlier 
gene flow between bonobos and chimpanzees, and 3) evidence for admixture between 
chimpanzee lineages. The most robust evidence from introgression of bonobos into 
chimpanzees suggested two events: one that occurred between 200 and 550 ka into 
the ancestor of eastern and central chimpanzees and a second event < 180 ka, after the 
eastern and central chimpanzee split (de Manuel et al. 2016). Kuhlwilm et al. (2019) 
further detected introgression from an extinct Pan species into bonobos between 377 
and 1,627 ka. This ghost lineage was estimated to diverge from the bonobo and 
chimpanzee common ancestor > 3 Ma.  
 Here, we apply a recently developed method to compare previously proposed 
models for Pan evolutionary history and estimate 1) divergence times, 2) effective 
population sizes, and the 3) timing and degree of introgression. This approach 
employs site pattern frequencies to infer deep population history by simultaneously 
estimating all parameters. There are a few advantages to this approach compared to 
other commonly used methods for demography. First, within-population variation is 
ignored and recent changes in population size therefore cannot affect analyses. This 
results in fewer parameters that must be estimated. Second, the uncertainty introduced 
by statistical identifiability (i.e., when more than one model fits the data well) that is 
commonly encountered when ascertaining complex demographies can be incorporated 
into confidence intervals via model averaging.    
 
83  
 
Methods 
Genomic Data 
 We retrieved raw short read data on bonobos and all four chimpanzee 
subspecies from the Great Ape Genome Project (GAGP) (Prado-Martinez et al. 2013). 
This dataset contained high coverage genomes 
(https://github.com/brandcm/Dissertation: File S0: Figures S1, S2) from 13 bonobos 
(P. paniscus), 18 central chimpanzees (P. troglodytes troglodytes), 19 eastern 
chimpanzees (P. t. schweinfurthii), 10 Nigerian chimpanzees (P. t. ellioti), and 11 
western chimpanzees (P. t. verus). We retrieved short read data on a high-coverage 
human female, HG00513, collected as part of the 1000 Genomes Project (Auton et al. 
2015) to use as an outgroup sequence to determine ancestral alleles per locus 
(Biosample ID: SAME123526).  
 
Read Mapping and Variant Calling 
 Initial quality assessments in fastqc (Andrews 2010) and multiqc (Ewels et al. 
2016) indicated a number of quality issues, including failed runs, problematic tiles, 
and substantial variation in base quality. We removed adapters and trimmed all reads 
with BBduk (https://sourceforge.net/projects/bbmap/). For trimming, we used the 
parameters “ktrim=r k=21 mink=11 hdist=2 qtrim=rl trimq=15 minlen=50 maq=20” 
for all reads and added “tpo and tpe” for paired reads.  
 We used XYalign (Webster et al. 2019) to create versions of the chimpanzee 
reference genome, panTro6 (Kronenberg et al. 2018), for male- and female-specific 
mapping. Specifically, the version of the reference for female mapping has the Y 
chromosome completely masked, as its presence can lead to mismapping (Webster et 
al. 2019). We then mapped reads with BWA MEM (Li 2013) and used SAMtools (Li 
84  
 
et al. 2009) to fix mate pairs, sort BAM files, merge BAM files per individual, and 
index BAM files. We use Picard (Broad Institute 2018) to mark duplicates with 
default parameters, before calculating BAM statistics with SAMtools. We next 
measured depth of coverage with mosdepth (Pedersen and Quinlan 2018), removing 
duplicates and reads with a mapping quality less than 30 for calculations. 
 We used GATK4 (Poplin et al. 2018) for joint variant calling across all 
samples. We used default settings for all steps—HaplotypeCaller, CombineGVCFs, 
and GenotypeGVCFs—with three exceptions. First, we turned off physical phasing 
for computational efficiency and downstream VCF compatibility with filtering tools. 
Second, because multiple samples in this dataset suffer from contamination from 
other samples both within and across taxa (Prado-Martinez et al. 2013), we employed 
a contamination filter to randomly remove 10% of reads during variant calling. This 
should have the effect of reducing confidence in contaminant alleles. Finally, we 
output non-variant sites to allow equivalent filtering of all sites in the genome and 
more accurate assessments of callability. 
 The above quality control, assembly, and variant calling steps are all contained 
in an automated Snakemake (Köster and Rahmann 2012) available on GitHub 
(https://github.com/thw17/Pan_reassembly). The repository also contains a Conda 
environment with all software versions and origins, most of which are available 
through Bioconda (Grüning et al. 2018). 
 
Variant Filtration  
 We considered only autosomes for this analysis as the X and Y chromosome 
violate many of the assumptions for the following methods (Webster and Wilson 
Sayres 2016). We also excluded unlocalized scaffolds (N = 4), unplaced contigs (N = 
85  
 
4,316), and the mitochondrial genome from any downstream analyses. Additional 
filtration steps were completed using bcftools (Li 2011); command line inputs are 
provided in parentheses. We first normalized variants by joining biallelic sites and 
merging indels and SNPs into a single record (“norm -m +any”) using the panTro6 
FASTA. We only included SNPs (“-v snps”) that were biallelic (“-m2 -M2”). On a 
per sample basis within each site, we marked genotypes where sample read depth was 
less than 10 and/or genotype quality was less than 30 as uncalled (“-S . -i FMT/DP ≥ 
10 && FMT/GT ≥ 30”). To ensure that missing data did not bias our results, we 
further excluded any sites where less than ~ 80% of individuals (N = 56) were 
confidently genotyped (“AN ≥ 112”). We also removed any positions that were 
monomorphic for either the reference or alternate allele (“AC > 0 && AC ≠ AN”). 
While lack of or low coverage at a locus is problematic, loci with excessive coverage 
are also of concern. These sites may yield false heterozygotes that are usually the 
result of copy number variation or paralogous sequences (Li 2014). As our data 
exhibit a high degree of inter-individual and inter-chromosomal variation in mean 
coverage (Brand et al. 2021), we applied Li's (2014) recommendation for a maximum 
depth filter (d + 4√d) to the mean chromosomal coverage of the individual in our 
sample (Pan or Homo) with the highest coverage and excluded any loci that exceeded 
this value (“filter -e FMT/DP > d + 4√d ") (https://github.com/brandcm/Dissertation: 
File S2). These filtrations steps yielded between 2,413,791,600 and 2,493,198,004 
SNVs for our downstream analyses (https://github.com/brandcm/Dissertation: File 
S25). 
 After filtration, we generated reference allele frequency (RAF) files for each 
population.  
 
86  
 
Null Model of Demography 
 We first constructed a null model with all five populations and no 
introgression events. As the topology of this model is well supported, we use it in 
each of the alternative models (see below). Demographic modelling was conducted 
using Legofit (Rogers 2019; Rogers et al. 2020; Rogers 2021). Legofit requires at 
least one “fixed” parameter to set the molecular clock, so we chose to set the 
divergence between bonobos and chimpanzees to the median value as estimated from 
de Manuel et al. (2016). This value (1.88 Ma) was input in generation units (75,200), 
based on a generation time of 25 years (Langergraber et al. 2012). While each of the 
remaining nodes were set with the median estimate from de Manuel et al. (2016), we 
designated these parameters to be “free” in order to generate parameter estimates. We 
also estimated population size by setting them to free and using rough estimates as 
initial values.  
 
Alternative Demography Models 
 We then constructed a set of models using all subsets of previously described 
introgression events. We did not include events that would be uninformative for site 
pattern analysis (e.g., admixture between eastern and central chimpanzees, although 
such an event or state would simply broaden the confidence intervals for the 
divergence time parameter). We initially included four introgression events: α, β, γ, 
and δ. α denotes introgression from a ghost Pan lineage into bonobos (Kuhlwilm et al. 
2019). β denotes introgression from bonobos into the ancestor of eastern and central 
chimpanzees (de Manuel et al. 2016). γ denotes introgression from the ancestor of 
eastern and central chimpanzees into Nigeria-Cameroon chimpanzees (de Manuel et 
al. 2016). δ denotes introgression from bonobos into central chimpanzees (de Manuel 
87  
 
et al. 2016). For models with multiple admixture events, we used the estimated order 
of events from oldest to youngest when naming the model (de Manuel et al. 2016; 
Kuhlwilm et al. 2019). We also reversed the direction of introgression for γ, γr, and 
considered whether this may yield a better fitting model. We did so because site 
patterns will reflect the net admixture such that bidirectional gene flow will yield a 
positive value if the direction of gene flow is correctly specified. As γ may have been 
bidirectional (de Manuel et al. 2016) we included models where the net introgression 
was larger in either direction (from Nigeria-Cameroon chimpanzees into the ancestor 
of eastern and central chimpanzees and vice-versa) to ensure that we considered the 
full range of possible scenarios. Finally, we also considered gene flow from western 
chimpanzees into the ancestor of eastern and central chimpanzees (defined as ε), as 
suggested by demographic analyses of STRs (Hey 2010; Wegmann and Excoffier 
2010). As the timing of this event relative to γ is unclear, we considered scenarios 
where either γ or ε was first and the other followed. The order of these events is 
reflected in the model name. In total, we considered 42 demographic models 
including the null model (Figure 4). 
88  
 
 
Figure 4. Demographic model and introgression events considered in this analysis. 
Ghost refers to the extinct Pan lineage proposed by (Kuhlwilm et al. 2019). 
  
89  
 
Analysis 
 We used Legofit (Rogers 2019; Rogers et al. 2020; Rogers 2021) to estimate 
demographic history in the five extant lineages of bonobos and chimpanzees. We first 
used the “sitepat” function to 1) call ancestral alleles, 2) tabulate site patterns from the 
RAF files including singletons and 3) generate 50 bootstrap replicates. Site patterns 
are calculated by sampling one haploid genome from each population and the 
contribution of a given site pattern is the probability that a subsample would exhibit 
this site pattern.  
 Once site patterns have been tabulated, Legofit uses these data and a 
population model (described above and stored as a .lgo file) to estimate parameters by 
maximizing the composite likelihood via the “legofit” function. Full likelihood is not 
maximized because information on linkage disequilibrium is not considered. Legofit 
employs differential evolution (DE) to maximize composite likelihood. Uncertainty is 
measured via moving-blocks bootstrap. Loci that are linked are not statistically 
independent, so Legofit resamples blocks of 500 SNVs.  
 Legofit can be run using one of two algorithms: deterministic and stochastic 
(Rogers 2021). For computational efficiency, we employed the deterministic 
algorithm in all but our two most complex models (αβγδε and αβεγδ) where we used 
the stochastic algorithm. References below to “modest precision” apply only to the 
stochastic algorithm.  
 We ran the “legofit” function per demographic model on our real data and 
each of the 50 bootstrap replicates. We conducted this in several stages following 
Rogers et al. (2020). In Stage 1, points in the DE swarm were scattered widely across 
parameter space and the objective function was evaluated with modest precision. As 
some legofit jobs may converge on different local maxima of the composite likelihood 
90  
 
surface, each of the legofit jobs wrote its own swarm of points to a state file. In Stage 
2, each legofit job initialized its DE swarm by reading all of the state files produced in 
Stage 1, enabling legofit to choose among local optima discovered in Stage 1. The 
evaluation of the objective function was done to high precision in Stage 2. At this 
point, we used the “pclgo” function to re-express free variables as principal 
components. Some free parameters may be tightly correlated and this can result in 
broader confidence intervals because there are fewer dimensions than parameters. 
This issue can be addressed by reducing the dimension of the parameter space. Our 
early analyses used a value of 0.001 (“--tol 0.001”) such that principal components 
were only retained if they explained >  0.001% of the variance. However, as the 
exclusion of dimensions may introduce bias, we retained the full dimension. Re-
expression of dimensions as principal components can also improve model fit because 
it allows legofit to operate on uncorrelated dimensions (Rogers 2021). This step 
produces a new model file (.lgo file). We then repeated Stages 1 and 2 as Stages 3 and 
4 using the new .lgo file. 
 We tested for potential bias in the parameter estimates from our best fitting 
model by generating simulations using msprime (Kelleher et al. 2016) and fitted those 
simulated data to the δ model. We used parameter point estimates from our fitted 
model, the previous fixed time parameter (75,200 generations or 1.88 Ma for the Pan 
common ancestor), and used median effective population sizes from Prado-Martinez 
et al. (2013) for lineages where we did not have an estimate for Ne from our model (P. 
t. ellioti, P. t. schweinfurthii, and P. t. verus). We simulated 1 x 104 chromosomes, 
each 2 x 106 bp in length, and used a mutation rate of 1.4 x 10-8 and a recombination 
rate of 1 x 10-8. This was repeated to generate a total of 50 simulated data sets to 
which we fit the δ model using all four stages of the deterministic approach described 
91  
 
above. We then visually compared the model’s point estimates to these simulated 
bootstraps to assess parameter bias. 
 All models and scripts for this analysis are available on GitHub 
(https://github.com/brandcm/Pan_Demography). Many figures were made in R, 
version 3.6.3 (R Core Team 2020) using ggplot2, version 3.3.3 (Wickham 2016) and 
correlations between the estimated parameters for the best fit model were visualized 
using ‘corrplot’ (Wei and Simko 2021). 
 
Data Availability 
 The raw data underlying this article are previously published (Prado-Martinez 
et al. 2013; de Manuel et al. 2016) and are available from the Sequence Read Archive 
(PRJNA189439 and SRP018689) and the European Nucleotide Archive 
(PRJEB15086). 
 
Results 
 Legofit aligned 2,366,070,805 loci across all six lineages and determined the 
ancestral allele for 52,809,700 sites. These sites were used to determine site pattern 
frequencies in the data and 50 bootstrap replicates (Figure 5). 
 
92  
 
 
Figure 5. Observed site patterns. The width of vertical line for each point represents 
the 95% CI.  
  
 After comparing models, we found a single model that best fit the observed 
site patterns: model δ. This model includes a single episode of introgression from 
bonobos into central chimpanzees. It had small residuals and exhibited the smallest 
bepe value (1.3 x 10-5) and a booma weight of 1 
(https://github.com/brandcm/Dissertation: File S26), therefore model averaging was 
not invoked. 
93  
 
 Point estimates and confidence intervals for the δ model parameters are 
provided in Table 1. This model estimated the age for the common ancestor of all 
chimpanzees to be 895 ka (95% CI: 892 - 898 ka), while the ancestor for western and 
Nigerian-Cameroon chimpanzees dates to 183 ka (95% CI: 178 - 195 ka) and the 
ancestor of eastern and central chimpanzees was dated to 142 ka (95% CI: 136 - 152 
ka). The model also estimated effective population size to vary considerably over time 
with approximately 40,000 individuals at the time of Pan divergence (95% CI: 36,849 
- 37,011) and ~17,100 chimpanzees (95% CI: 17,029 - 17,203) immediately prior to 
the divergence of the chimpanzee common ancestor. Both lineages subsequently 
increased in size. We found that ~ 2.3% (95% CI: 2.26 - 2.36%) of the central 
chimpanzee genome was introgressed from bonobos and dated this event to 
approximately 71 ka.   
 
Table 6. Model parameter estimates. δ = introgression from P. paniscus into P. t. 
troglodytes, ec = ancestor of P. t. schweinfurthii and P. t. troglodytes, nw = ancestor 
of P. t. ellioti and P. t. verus, ecnw = common ancestor of all P. troglodytes lineages, 
becnw = Pan common ancestor. 
 
  Point estimate Lower bound Upper bound 
Admixture δ 0.023298 0.0226364 0.0236331 
b 605000 585000 630000 
c 1002.24 1.42472 3055.845 
ec 166113 162911 166727 
Population Size 
nw 75548 74368.5 76928.5 
ecnw 17124.5 17029.25 17203 
becnw 36937.2 36849.6 37010.55 
δ 71127.25 68233.5 76227.75 
ec 142254.5 136467 152455.75 
Time 
nw 183462.25 177824.75 195139.5 
ecnw 894540 892297.5 898012.5 
 
 
94  
 
 After simulating data using the best fitting model, we found minimal bias in 
our parameter estimates for admixture and the effective population size of older 
events (Figure 6). Population sizes for bonobos and central chimpanzees were under- 
and over-estimated, respectively, whereas the population size for the ancestor of 
eastern and central chimpanzees was slightly under-estimated. Point estimates for 
divergence times exhibited some bias to older ages although the age for the common 
ancestor of all chimpanzees agreed with the simulated data.   
 
Figure 6. Parameter estimate bias. The orange points represent point estimates for the 
parameters from the δ model. Open gray circles represent 50 values estimated by 
legofit using site patterns generated from data simulated with the δ model parameters 
using msprime. If the simulated data are < the point estimate, the point estimate is 
underestimated, while if the simulated data are > the point estimate, the point estimate 
is overestimated. δ = introgression from P. paniscus into P. t. troglodytes, ec = 
ancestor of P. t. schweinfurthii and P. t. troglodytes, nw = ancestor of P. t. ellioti and 
P. t. verus, ecnw = common ancestor of all P. troglodytes lineages, becnw = Pan 
common ancestor. 
 
 
 
95  
 
Discussion 
  We evaluated previously proposed models for the evolutionary history of the 
genus Pan. While mitochondrial, Y-chromosomal, and nuclear DNA have yielded 
some consistent parameter estimates, many others remain imprecise or may suffer 
from bias. Our results suggest a more simple evolutionary history: the best fitting 
model only includes one introgression event from bonobos into central chimpanzees. 
As we did not explicitly model introgression events that would not affect site patterns 
(e.g., subsequent gene flow between recently diverged lineages), we cannot speak to 
that aspect of Pan demographic history. However, if such events occurred shortly 
after divergence, we would expect the resulting confidence interval to be quite large. 
This may be the case for the ancestors of both eastern and central as well as Nigeria-
Cameroon and western chimpanzees. Yet, this interval is quite small for the common 
ancestor of all chimpanzees.   
 We estimate that approximately 2.3% of central chimpanzee DNA is derived 
from bonobos and that this event dated to ~ 71 ka. To our knowledge, there is no data 
on the discharge of the Congo River for this time period. Presently, this part of the 
river is one of the deepest and widest sections, with little seasonal variation in 
discharge near Kinshasa (Takemoto et al. 2015). This suggests that direct contact 
between bonobos and central chimpanzees would be difficult; however, our result and 
those from others (de Manuel et al. 2016) strongly suggest that this contact occurred. 
Evidence of gene flow from bonobos into central chimpanzees is not only consistent 
with previous reports but is further evidenced by the possible adaptiveness of 
introgressed bonobo alleles in central chimpanzees, potentially related to reproduction 
in males (Nye et al. 2018). Identification of the sites with shared site patterns between 
96  
 
these lineages identified here could be informative to confirm and expand candidate 
regions for adaptive introgression.  
 Our estimate of the time of divergence for the ancestor of eastern and central 
and the ancestor of Nigeria-Cameroon and western chimpanzees is similar to other 
past results (Becquet and Przeworski 2007; Hey 2010; Prado-Martinez et al. 2013; de 
Manuel et al. 2016 but see Wegmann and Excoffier 2010). Assessment of parameter 
bias suggests that the point estimates may be slightly overestimated but not 
considerably so. Our analysis did not find bias in our estimate of the age of the 
common chimpanzee ancestor. This particular parameter is much older than others 
(e.g., 544 - 633 ka (de Manuel et al. 2016)). While the phenotypic differences 
between chimpanzees subspecies are likely still emerging, previously described 
differences, particularly between eastern and western chimpanzees, may support an 
older divergence date for the chimpanzee common ancestor. The absence of a shared 
positive selection signal in chimpanzees, as described in Chapter II, also tentatively 
supports a deep divergence for common chimpanzees.  
 Estimates for population size largely support previous findings (Prado-
Martinez et al. 2013; de Manuel et al. 2016). Following divergence, the common 
ancestor of all chimpanzees experienced a period of decline. This was followed by 
substantial increases in both the ancestor of Nigeria-Cameroon and western 
chimpanzees, and particularly the ancestor of eastern and central chimpanzees, which 
we estimate to be approximately 76,000 and 166,000 individuals, respectively. The 
estimated Ne for each lineage suggests that each subspecies experienced a population 
decline after divergence with their common ancestor. However, two of our population 
size estimates are puzzling. We found a small population size for central chimpanzees 
at the time of introgression from bonobos and a large population size for bonobos at 
97  
 
the same time. This may represent an instance of statistical identifiability where 
parameters are correlated, resulting in a broader confidence interval (Rogers 2019). 
However, neither parameter is tightly correlated with any others 
(https://github.com/brandcm/Dissertation: File S0: Figure S10). Further, the genetic 
diversity of both subspecies does not support these numbers. Regardless of genetic 
diversity, it seems implausible the central chimpanzees would experience a bottleneck 
~ 70 ka and generate a recent Ne estimate of ~ 36,000 individuals. In fact, our analysis 
of bias in parameter estimates suggests that the present estimate of ~ 1,000 is a 
generous overestimate. There appears to be a more plausible explanation for the 
bonobo population size at the time of admixture. Such a high parameter value could 
be explained by geographic population structure (Nei and Takahata 1993). The 
geographic origin of the bonobo genomes used in this analysis are unknown. 
However, some structure has been inferred from cranio-dental morphology (Pilbrow 
and Groves 2013) and mitochondrial haplotypes (Kawamoto et al. 2013, but see 
Eriksson et al. 2004). Another potential line of support for structure in bonobos comes 
from the curious geographic restriction of bonobo malarial infection to individuals 
east of the Lomami River (Liu et al. 2017). Therefore, it is possible that our unusually 
large parameter estimate is driven by bonobo population structure.  
 We note that the parameters estimated from this analysis were generated by 
setting one fixed parameter (the Pan divergence date or Tbecnw) to set the molecular 
clock. We chose this parameter because there is increasing consensus from Pan 
genomic data relative to other time parameters. However, the point estimate used in 
this analysis was the median of a range from de Manuel et al. (2016). Thus, if the true 
divergence date is different to that used here, our parameter estimates would change 
98  
 
as well. Additional genomic data from bonobos and chimpanzees may yield more 
accurate estimates of this critical parameter.  
 The site patterns of derived alleles in bonobos and chimpanzees confirm 
multiple aspects of their evolutionary history while offering new insights into other 
facets. We find support for a single introgression event from bonobos into central 
chimpanzees although the biogeography of this event remains difficult to explain. 
Collectively, the best fit demographic model is simpler than more recently proposed 
models. Finally, our results point to a deeper divergence time for common 
chimpanzees. Additional genomic and paleoenvironmental data would be immensely 
informative in deciphering the evolutionary history of our closest living history and 
may provide insight into the evolution of other taxa in this region during this time 
period, including humans.  
  
99  
 
CHAPTER V 
CONCLUSION 
 
 Our closest living relatives are two species in the genus Pan: bonobos and 
chimpanzees. The phylogenetic proximity of these taxa to humans highlights their 
importance as models for human evolution. Studies of living bonobos and 
chimpanzees are essential to this goal. However, the virtual absence of a bonobo and 
chimpanzee fossil record means that genomic data provide the best window into their 
evolutionary past to better understand how bonobos and chimpanzees diverged and 
came to be the lineages we know today. This dissertation uses reassembled and 
remapped autosomal genomic data from all five Pan lineages to answer questions and 
test hypotheses about adaptation and demography in these apes.  
 The evolutionary history of chimpanzees and bonobos has provided many time 
points during which positive selection could drive the phenotypes observed today in 
all five lineages. In Chapter II, we leverage genomic data on synonymous and 
nonsynonymous within lineage polymorphic and divergent loci between Pan and 
humans to identify candidates under positive selection at deeper times scales. We also 
apply a modification of this test to increase statistical power for detecting selection 
candidates using genome wide averaged parameters and inferring other evolutionary 
parameters for identified genes. We found a range of candidate genes for adaptation in 
each lineage ranging from 7 in western chimpanzees to 54 in central chimpanzees. 
Many candidates were unique to each lineage. We found only one gene unique to all 
chimpanzees and another two genes were found to be shared across all five lineages. 
Together, these candidates may have phenotypic impacts on various traits including 
the brain, immunity, musculature, reproduction, and the skeletal system. Analysis of 
10 0 
 
individual exons also may provide insight into regions of the coding sequence that are 
under selection and also generated additional candidates for multiple lineages. We do 
not find evidence in support of the THV hypothesis, although a thyroid receptor gene 
emerged as a candidate in bonobos, which may contribute to their behavior and 
biology.  
 While older positive selection may have driven key differences between 
bonobos and chimpanzees, recent selection likely also contributes to their similarities 
and differences. In Chapter III, we used supervised machine learning to assess the 
extent to which this adaptation is the result of de novo mutation or standing genetic 
variation. There remains considerable debate on whether positive selection at single 
loci is predominantly shaped by hard or soft sweeps. Previous empirical tests 
stemmed from data in humans, Drosophila, and HIV. We demonstrate that similar to 
humans, Drosophila, and some cases of HIV, soft sweeps are much more common 
than hard sweeps in four of the five lineages we could examine. These results 
underscore the role of the physical and/or social environment in shaping Pan 
adaptations during the late Pleistocene. Most candidate sweeps were unique to each 
lineage although there was some overlap, particularly for soft sweeps. We found 19 
candidate soft sweep windows shared across all four lineages. While plentiful, the 
genes in these windows may be important in bonobo and chimpanzee phenotypes.  
 Considerable attention has been paid to estimating demography in these 
species. However, many of the past studies rely on parameter heavy approaches and 
may suffer from parameter bias, particularly estimates of admixture. In Chapter IV, 
we consider previous models for the demographic history of Pan using site patterns. 
This approach allows for both parameter estimation and model selection. Among 42 
different models, we fit the best fit model is rather simple and includes a single 
10 1 
 
introgression event from bonobos into central chimpanzees. We also estimate the 
common ancestor of chimpanzees is ~ 900 ka, much older than previously suggested.  
 The results of this dissertation expand our understanding of the evolutionary 
history of the genus Pan, particularly related to adaptation and demography. Such 
findings also prompt many new questions. These include analyses of the candidate 
genes described in Chapters II and III to identify variants and characterize the 
potential functional consequences. The GAGP dataset has and will continue to yield 
immense insight. Yet, additional genomic data would be massively beneficial for 
many reasons. First, no small number of newer genomic methods leverage massive 
sample sizes to detect different types of selection and demographic events. For 
example, large samples are particularly needed for confidently inferring haplotypes 
(Browning and Browning 2011). Second, the advent of sequencing genomes in wild 
primate (and other animal) populations using non-invasive methods is a critical 
development (Chiou and Bergey 2018; Ozga et al. 2021). These approaches will be 
able to shed light on many topics, including local adaptation. Information on 
individual life history and fine-scale environmental data can better tease apart 
complex gene-environment interactions. Although not necessarily restricted to wild 
primates, larger samples with information on phenotype may facilitate analysis of 
polygenic selection beyond humans and other model organisms. Results from Chapter 
IV may point to population structure in bonobos. Additional genomes from across 
their geographic range may help confirm or reject this possibility. Finally, future 
studies on gene expression and structural variation in Pan will likely fill in critical 
gaps regarding phenotypic differences in this genus. Beyond genomics, the findings 
from Chapters II, III, and IV also highlight the need for more ecological and 
paleoenvironmental data, particularly from the Congo Basin. In my future research, I 
10 2 
 
hope to build upon the findings of this dissertation with additional genomic data and 
methods to better understand our closest living relatives and their evolutionary 
history.   
10 3 
 
REFERENCES CITED 
 Anding AL, Wang C, Chang T-K, Sliter DA, Powers CM, Hofmann K, Youle RJ, 
Baehrecke EH. 2018. Vps13D encodes a ubiquitin-binding protein that is 
required for the regulation of mitochondrial size and clearance. Curr Biol 
28:287-295.e6. 
Andolfatto P. 2008. Controlling type-I error of the McDonald–Kreitman test in 
genomewide scans for selection on noncoding DNA. Genetics 180:1767. 
Andrés AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, 
Gutenkunst RN, White TJ, Green ED, Bustamante CD, et al. 2009. Targets of 
balancing selection in the human genome. Mol Biol Evol 26:2755–2764. 
Andrews S. 2010. FASTQC. A quality control tool for high throughput sequence data. 
Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc 
Anestis SF. 2004. Female genito-genital rubbing in a group of captive chimpanzees. 
Int J Primatol 25:477–488. 
Anestis SF, Webster TH, Kamilar JM, Fontenot MB, Watts DP, Bradley BJ. 2014. 
AVPR1A variation in chimpanzees (Pan troglodytes): Population differences 
and association with behavioral style. Int J Primatol 35:305–324. 
Angus S. 1971. Water-contact behavior of chimpanzees. Fol Primatol 14:51–58. 
Anka Z, Séranne M, Primio R di. 2010. Evidence of a large upper-Cretaceous 
depocentre across the Continent-Ocean boundary of the Congo-Angola basin. 
Implications for palaeo-drainage and potential ultra-deep source rocks. Marine 
and Petroleum Geology 27:601–611. 
Antón SC, Potts R, Aiello LC. 2014. Evolution of early Homo: An integrated 
biological perspective. Science 345:1236828. 
Arsic N, Rajic T, Stanojcic S, Goodfellow PN, Stevanovic M. 1998. Characterisation 
and mapping of the human SOX14 gene. Cytogenet Cell Genet 83:139–146. 
Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, 
Chakravarti A, Clark AG, Donnelly P, Eichler EE, et al. 2015. A global 
reference for human genetic variation. Nature 526:68–74. 
Baptiste A. 2015. gridExtra: Miscellaneous Functions for “Grid” Graphics. Available 
from: http://CRAN.R-project.org/package=gridExtra 
Barratt CD, Lester JD, Gratton P, Onstein RE, Kalan AK, McCarthy MS, 
Bocksberger G, White LC, Vigilant L, Dieguez P, et al. 2020. Late Quaternary 
habitat suitability models for chimpanzees (Pan troglodytes) since the Last 
Interglacial (120,000 BP). bioRxiv [Internet]. Available from: 
http://biorxiv.org/content/early/2020/05/25/2020.05.15.066662 
10 4 
 
Barrett RDH, Hoekstra HE. 2011. Molecular spandrels: tests of adaptation at the 
genetic level. Nat Rev Genet 12:767–780. 
Basabose AK. 2002. Diet composition of chimpanzees inhabiting the Montane forest 
of Kahuzi, Democratic Republic of Congo. Am J Primatol 58:1–21. 
Beadle L. 1981. The inland waters of tropical Africa: An introduction to tropical 
limnology. New York: Longman 
Beaune D, Hohmann G, Serckx A, Sakamaki T, Narat V, Fruth B. 2017. How bonobo 
communities deal with tannin rich fruits: Re-ingestion and other feeding 
processes. Behav Process 142:131–137. 
Becquet C, Przeworski M. 2007. A new approach to estimate parameters of speciation 
models with application to apes. Genome Res 17:1505–1519. 
Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, Hahn MW, Nista PM, 
Jones CD, Kern AD, Dewey CN, et al. 2007. Population genomics: Whole-
genome analysis of polymorphism and divergence in Drosophila simulans. 
PLOS Biol 5:e310. 
Behringer Verena, Deschner T, Murtagh R, Stevens JMG, Hohmann G. 2014. Age-
related changes in thyroid hormone levels of bonobos and chimpanzees 
indicate heterochrony in development. J Hum Evol 66:83–88. 
Behringer Verena, Deschner Tobias, Deimel Caroline, Stevens JMG, Hohmann G. 
2014. Age-related changes in urinary testosterone levels suggest differences in 
puberty onset and divergent life history strategies in bonobos and 
chimpanzees. Horm Behav 66:525–533. 
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and 
powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 
57:289–300. 
Bergey CM, Lopez M, Harrison GF, Patin E, Cohen JA, Quintana-Murci L, Barreiro 
LB, Perry GH. 2018. Polygenic adaptation and convergent evolution on 
growth and cardiac genetic pathways in African and Asian rainforest hunter-
gatherers. Proc Natl Acad Sci USA 115:E11256. 
Besenbacher S, Hvilsom C, Marques-Bonet T, Mailund T, Schierup MH. 2019. Direct 
estimation of mutations in great apes reconciles phylogenetic dating. Nat Ecol 
Evol 3:286–292. 
Boesch C. 1996. Social grouping in Taï chimpanzees. In: McGrew WC, Marchant LF, 
Nishida T, editors. Great Ape Societies. Cambridge: Cambridge University 
Press. p. 101–113. 
Boesch C, Boesch-Achermann H. 2000. The chimpanzees of the Taï Forest: 
Behavioral ecology and evolution. Oxford: Oxford University Press 
10 5 
 
Boesch C, Crockford C, Herbinger I, Wittig R, Moebius Y, Normand E. 2008. 
Intergroup conflicts among chimpanzees in Taï National Park: lethal violence 
and the female perspective. Am J Primatol 70:519–532. 
Boesch C, Head J, Tagg N, Arandjelovic M, Vigilant L, Robbins MM. 2007. Fatal 
Chimpanzee Attack in Loango National Park, Gabon. Int J Primatol 28:1025–
1034. 
Boesch C, Hohmann G, Marchant LF eds. 2001. Behavioural Diversity in 
Chimpanzees and Bonobos. Cambridge: Cambridge University Press 
Boose KJ, White FJ, Meinelt A. 2013. Sex differences in tool use acquisition in 
bonobos (Pan paniscus). American Journal of Primatology 75:917–926. 
Bradley BJ. 2008. Reconstructing phylogenies and phenotypes: a molecular view of 
human evolution. Journal of Anatomy 212:337–353. 
Brand CM, Marchant LF. 2019. Social hair plucking is a grooming convention in a 
group of captive bonobos (Pan paniscus). Primates 60:487–491. 
Brand CM, White FJ, Ting N, Webster TH. 2021. Soft sweeps predominate recent 
positive selection in bonobos (Pan paniscus) and chimpanzees (Pan 
troglodytes). bioRxiv:2020.12.14.422788. 
Broad Institute. 2018. Picard Tools. Available from: 
http://broadinstitute.github.io/picard/ 
Browning SR, Browning BL. 2011. Haplotype phasing: existing methods and new 
developments. Nat Rev Genet 12:703–714. 
Browning SR, Browning BL, Zhou Y, Tucci S, Akey JM. 2018. Analysis of human 
sequence data reveals two pulses of archaic Denisovan admixture. Cell 
173:53-61.e9. 
Bullinger AF, Burkart JM, Melis AP, Tomasello M. 2013. Bonobos, Pan paniscus, 
chimpanzees, Pan troglodytes, and marmosets, Callithrix jacchus, prefer to 
feed alone. Anim Behav 85:51–60. 
Cagan A, Theunert C, Laayouni H, Santpere G, Pybus M, Casals F, Prüfer K, Navarro 
A, Marques-Bonet T, Bertranpetit J, et al. 2016. Natural selection in the great 
apes. Mol Biol Evol 33:3268–3283. 
Cahill JA, Stirling I, Kistler L, Salamzade R, Ersmark E, Fulton TL, Stiller M, Green 
RE, Shapiro B. 2015. Genomic evidence of geographically widespread effect 
of gene flow from polar bears into brown bears. Mol Ecol 24:1205–1217. 
 
 
10 6 
 
Calpena E, Hervieu A, Kaserer T, Swagemakers SMA, Goos JAC, Popoola O, Ortiz-
Ruiz MJ, Barbaro-Dieber T, Bownass L, Brilstra EH, et al. 2019. De novo 
missense substitutions in the gene encoding CDK8, a regulator of the mediator 
complex, cause a syndromic developmental disorder. Am J Hum Genet 
104:709–720. 
Campbell CJ. 2006. Lethal intragroup aggression by adult male spider monkeys 
(Ateles geoffroyi). Am J Primatol 68:1197–1201. 
Castellano D, Macià MC, Tataru P, Bataillon T, Munch K. 2019. Comparison of the 
full distribution of fitness effects of new amino acid mutations across great 
apes. Genetics 213:953–966. 
Caswell JL, Mallick S, Richter DJ, Neubauer J, Schirmer C, Gnerre S, Reich D. 2008. 
Analysis of chimpanzee history based on genome sequence alignments. PLOS 
Genet 4:e1000057. 
Chang T-C, Yang Y, Yasue H, Bharti AK, Retzel EF, Liu W-S. 2011. The expansion 
of the PRAME gene family in Eutheria. PLOS One 6:e16867. 
Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious 
mutations on neutral molecular variation. Genetics 134:1289–1303. 
Cheng X, DeGiorgio M. 2020. Flexible mixture model approaches that accommodate 
footprint size variability for robust detection of balancing selection. Mol Biol 
Evol 37:3267–3291. 
Chiou KL, Bergey CM. 2018. Methylation-based enrichment facilitates low-cost, 
noninvasive genomic scale sequencing of populations from feces. Sci Rep 
8:1975. 
Choi H, Jin S, Kwon JT, Kim Jihye, Jeong J, Kim Jaehwan, Jeon S, Park ZY, Jung K-
J, Park K, et al. 2016. Characterization of mammalian ADAM2 and its absence 
from human sperm. PLOS One 11:e0158321. 
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden 
DM. 2012. A program for annotating and predicting the effects of single 
nucleotide polymorphisms, SnpEff. Fly 6:80–92. 
Compton AA, Malik HS, Emerman M. 2013. Host gene evolution traces the 
evolutionary history of ancient primate lentiviruses. Philosophical 
Transactions of the Royal Society B: Biological Sciences 368:20120496. 
Coolidge HJ. 1933. Pan paniscus. Pigmy chimpanzee from south of the Congo river. 
Am J Phys Anth 18:1–59. 
Crockford SJ. 2003. Thyroid rhythm phenotypes and hominid evolution: a new 
paradigm implicates pulsatile hormone secretion in speciation and adaptation 
changes. Comparative Biochemistry and Physiology Part A: Molecular & 
Integrative Physiology 135:105–129. 
10 7 
 
Cronin KA, De Groot E, Stevens JMG. 2015. Bonobos show limited social tolerance 
in a group setting: A comparison with chimpanzees and a test of the relational 
model. Fol Primatol 86:164–177. 
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, 
Lunter G, Marth GT, Sherry ST, et al. 2011. The variant call format and 
VCFtools. Bioinformatics 27:2156–2158. 
Darolti I, Wright AE, Pucholt P, Berlin S, Mank JE. 2018. Slow evolution of sex-
biased genes in the reproductive tissue of the dioecious plant Salix viminalis. 
Mol Ecol 27:694–708. 
deMenocal PB. 2004. African climate change and faunal evolution during the 
Pliocene–Pleistocene. Earth Planet Let Sci 220:3–24. 
Doran DM. 1993. Comparative locomotor behavior of chimpanzees and bonobos: The 
influence of morphology on locomotion. Am J Phys Anth 91:83–98. 
Drake AG. 2011. Dispelling dog dogma: an investigation of heterochrony in dogs 
using 3D geometric morphometric analysis of skull shape. Evol Dev 13:204–
213. 
Eilertson KE, Booth JG, Bustamante CD. 2012. SnIPRE: Selection inference using a 
Poisson random effects model. PLOS Comput Biol 8:e1002806. 
Eriksson J, Hohmann G, Boesch C, Vigilant L. 2004. Rivers influence the population 
genetic structure of bonobos (Pan paniscus). Mol Ecol 13:3425–3435. 
Etienne L, Nerrienet E, LeBreton M, Bibila GT, Foupouapouognigni Y, Rousset D, 
Nana A, Djoko CF, Tamoufe U, Aghokeng AF, et al. 2011. Characterization 
of a new simian immunodeficiency virus strain in a naturally infected Pan 
troglodytes troglodyteschimpanzee with AIDS related symptoms. 
Retrovirology 8:4. 
Ewels P, Magnusson M, Lundin S, Käller M. 2016. MultiQC: summarize analysis 
results for multiple tools and samples in a single report. Bioinformatics 
32:3047–3048. 
Eyre-Walker A. 2002. Changing effective population size and the McDonald-
Kreitman test. Genetics 162:2017. 
Eyre-Walker A, Keightley PD. 2007. The distribution of fitness effects of new 
mutations. Nat Rev Genet 8:610–618. 
Fawcett K, Muhumuza G. 2000. Death of a wild chimpanzee community member: 
Possible outcome of intense sexual competition. Am J Primatol 51:243–247. 
Fay JC, Wu C-I. 2000. Hitchhiking under positive Darwinian selection. Genetics 
155:1405–1413. 
10 8 
 
Fedigan LM. 1983. Dominance and reproductive success in primates. Am J Phys Anth 
26:91–129. 
Feldblum JT, Wroblewski EE, Rudicell RS, Hahn BH, Paiva T, Cetinkaya-Rundel M, 
Pusey AE, Gilby IC. 2014. Sexually coercive male chimpanzees sire more 
offspring. Curr Biol 24:2855–2860. 
Ferguson W, Dvora S, Fikes RW, Stone AC, Boissinot S. 2012. Long-term balancing 
selection at the antiviral gene OAS1 in central African chimpanzees. Mol Biol 
Evol 29:1093–1103. 
Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. 2014. On detecting 
incomplete soft or hard selective sweeps using haplotype structure. Mol Biol 
Evol 31:1275–1291. 
Fruth B, Hickey J, André C, Furuichi T, Hart JA, Hart TB, Kuehl H, Maisels F, 
Nackoney J, Reinartz GE, et al. 2016. Pan paniscus (errata version published 
in 2016). Available from: https://dx.doi.org/10.2305/IUCN.UK.2016-
2.RLTS.T15932A17964305.en 
Fruth B, Hohmann G. 2018. Food sharing across borders. Hum Nat 29:91–103. 
Furuichi T. 1989. Social interactions and the life history of female Pan paniscus in 
Wamba, Zaire. Int J Primatol 10:173–197. 
Furuichi T. 2009. Factors underlying party size differences between chimpanzees and 
bonobos: a review and hypotheses for future study. Primates 50:197–209. 
Furuichi T. 2011. Female contributions to the peaceful nature of bonobo society. Ev 
Anth 20:131–142. 
Furuichi T, Hashimoto C, Tashiro Y. 2001. Fruit availability and habitat use by 
chimpanzees in the Kalinzu Forest, Uganda: Examination of fallback foods. 
Int J Primatol 22:929–945. 
Furuichi T, Idani G, Ihobe H, Hashimoto C, Tashiro Y, Sakamaki T, Mulavwa MN, 
Yangozene K, Kuroda S. 2012. Long-term studies on wild bonobos at Wamba, 
Luo Scientific Reserve, D. R. Congo: Towards the understanding of female 
life history in a male-philopatric species. In: Kappeler PM, Watts DP, editors. 
Long-Term Field Studies of Primates. Berlin, Heidelberg: Springer Berlin 
Heidelberg. p. 413–433. Available from: https://doi.org/10.1007/978-3-642-
22514-7_18 
Gao F, Bailes E, Robertson DL, Chen Y, Rodenburg CM, Michael SF, Cummins LB, 
Arthur LO, Peeters M, Shaw GM, et al. 1999. Origin of HIV-1 in the 
chimpanzee Pan troglodytes troglodytes. Nature 397:436–441. 
Garud NR, Messer PW, Buzbas EO, Petrov DA. 2015. Recent selective sweeps in 
North American Drosophila melanogaster show signatures of soft sweeps. 
PLOS Genet 11:e1005004. 
10 9 
 
Gauthier J, Meijer IA, Lessel D, Mencacci NE, Krainc D, Hempel M, Tsiakas K, 
Prokisch H, Rossignol E, Helm MH, et al. 2018. Recessive mutations in 
VPS13D cause childhood onset movement disorders. Ann Neurol 83:1089–
1095. 
Gerloff U, Hartung B, Fruth B, Hohmann G, Tautz D. 1999. Intracommunity 
relationships, dispersal pattern and paternity success in a wild living 
community of bonobos (Pan paniscus) determined from DNA analysis of 
faecal samples. Proc R Soc Lond B Biol Sci 266:1189–1195. 
Gokcumen O. 2020. Archaic hominin introgression into modern human genomes. Am 
J Phys Anth 171:60–73. 
Good JM, Wiebe V, Albert FW, Burbano HA, Kircher M, Green RE, Halbwax M, 
André C, Atencia R, Fischer A, et al. 2013. Comparative population genomics 
of the ejaculate in humans and the great apes. Mol Biol Evol 30:964–976. 
Goodall J. 1986. The chimpanzees of Gombe: Patterns of behavior. Cambridge, MA: 
Belknap Press 
Gossmann TI, Keightley PD, Eyre-Walker A. 2012. The Effect of Variation in the 
Effective Population Size on the Rate of Adaptive Molecular Evolution in 
Eukaryotes. Genome Biology and Evolution 4:658–667. 
Gossmann TI, Waxman D, Eyre-Walker A. 2014. Fluctuating selection models and 
Mcdonald-Kreitman type analyses. PLOS One 9:e84540. 
de Groot NG, Heijmans CMC, Zoet YM, de Ru AH, Verreck FA, van Veelen PA, 
Drijfhout JW, Doxiadis GGM, Remarque EJ, Doxiadis IIN, et al. 2010. AIDS-
protective HLA-B*27/B*57 and chimpanzee MHC class I molecules target 
analogous conserved areas of HIV-1/SIVcpz. Proc Natl Acad Sci USA 
107:15175. 
Groves CP. 2001. Primate Taxonomy. Washington DC: Smithsonian Institution Press 
Groves CP. 2005. Geographic variation within eastern chimpanzees (Pan troglodytes 
cf schweinfurthii Giglioli, 1872). Australas Primatol 17:19–46. 
Gruber T, Clay Z. 2016. A comparison between bonobos and chimpanzees: A review 
and update. Ev Anth 25:239–252. 
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, 
Köster J, The Bioconda Team. 2018. Bioconda: sustainable and 
comprehensive software distribution for the life sciences. Nat Methods 
15:475–476. 
Hamilton WD. 1964. The genetical evolution of social behaviour I & II. J Theor Biol 
7:1–52. 
11 0 
 
Hamm D, Mautz BS, Wolfner MF, Aquadro CF, Swanson WJ. 2007. Evidence of 
amino acid diversity–enhancing selection within humans and among primates 
at the candidate sperm receptor gene PKDREJ. Am J Hum Genet 81:44–52. 
Han S, Andrés AM, Marques-Bonet T, Kuhlwilm M. 2019. Genetic variation in Pan 
species is shaped by demographic history and harbors lineage-specific 
functions. Genome Biol Evol 11:1178–1191. 
Hare B, Kwetuenda S. 2010. Bonobos voluntarily share their own food with others. 
Curr Biol 20:R230–R231. 
Hare B, Melis AP, Woods V, Hastings S, Wrangham R. 2007. Tolerance allows 
bonobos to outperform chimpanzees on a cooperative task. Curr Biol 17:619–
623. 
Hare B, Wobber V, Wrangham R. 2012. The self-domestication hypothesis: evolution 
of bonobo psychology is due to selection against aggression. Anim Behav 
83:573–585. 
Hare B, Yamamoto S. 2015. Moving bonobos off the scientifically endangered list. 
Behaviour 152:247–258. 
Harris RB, Sackman A, Jensen JD. 2018. On the unfounded enthusiasm for soft 
selective sweeps II: Examining recent evidence from humans, flies, and 
viruses. PLOS Genet 14:e1007859. 
Hartfield M, Bataillon T. 2020. Selective sweeps under dominance and inbreeding. 
G3 Genes Genom Genet 10:1063. 
Hashimoto C. 1997. Context and development of sexual behavior of wild bonobos 
(Pan paniscus) at Wamba, Zaire. Int J Primatol 18:1–21. 
Hedrick PW. 2013. Adaptive introgression in animals: examples and comparison to 
new mutation and standing variation as sources of adaptive variation. Mol 
Ecol 22:4606–4618. 
Henn BM, Cavalli-Sforza LL, Feldman MW. 2012. The great human expansion. Proc 
Natl Acad Sci USA 109:17758. 
Hermisson J, Pennings PS. 2005. Soft sweeps: Molecular population genetics of 
adaptation from standing genetic variation. Genetics 169:2335–2352. 
Hermisson J, Pennings PS. 2017. Soft sweeps and beyond: understanding the patterns 
and probabilities of selection footprints under rapid adaptation. Methods Ecol 
Evol 8:700–716. 
Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Project 1000 
Genomes, Sella G, Przeworski M. 2011. Classic selective sweeps were rare in 
recent human evolution. Science 331:920–924. 
11 1 
 
Hey J. 2010. The divergence of chimpanzee species and subspecies as revealed in 
multipopulation isolation-with-migration analyses. Mol Biol Evol 27:921–933. 
Hof J, Sommer V. 2010. Apes Like Us: Portraits of a Kinship. Edition Panorama. 
Germany: Mannheim 
Hohmann G. 2001. Association and social interactions between strangers and 
residents in bonobos (Pan paniscus). Primates 42:91–99. 
Hohmann G, Fruth B. 2000. Use and function of genital contacts among female 
bonobos. Anim Behav 60:107–120. 
Hohmann G, Fruth B. 2003a. Lui Kotal - A new site for field research on bonobos in 
the Salonga National Park. Pan Africa News 10:25–27. 
Hohmann G, Fruth B. 2003b. Culture in bonobos? Between‐species and within‐
species variation in behavior. Curr Anthropol 44:563–571. 
Hohmann G, Fruth B. 2011. Is blood thicker than water? In: Robbins MM, Boesch C, 
editors. Among African Apes: Stories and Photos from the Field. Berkeley: 
University of California Press. p. 61–76. 
Hohmann G, Ortmann S, Remer T, Fruth B. 2019. Fishing for iodine: what aquatic 
foraging by bonobos tells us about human evolution. BMC Zool 4:5. 
Holloway AK, Lawniczak MKN, Mezey JG, Begun DJ, Jones CD. 2007. Adaptive 
gene expression divergence inferred from population genomics. PLOS Genet 
3:e187. 
Horn AD. 1979. The taxonomic status of the bonobo chimpanzee. Am J Phys Anth 
51:273–281. 
Huang DW, Sherman BT, Lempicki RA. 2008. Bioinformatics enrichment tools: 
paths toward the comprehensive functional analysis of large gene lists. Nucleic 
Acids Res 37:1–13. 
Huang DW, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of 
large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. 
Hudson RR, Kreitman M, Aguadé M. 1987. A test of neutral molecular evolution 
based on nucleotide data. Genetics 116:153. 
Huerta-Sanchez E, Durrett R, Bustamante CD. 2008. Population genetics of 
polymorphism and divergence under fluctuating selection. Genetics 178:325. 
Humle T, Maisels F, Oates JF, Plumptre AJ, Williamson EA. 2016. Pan troglodytes 
(errata version published in 2018). e.T15933A129038584. Available from: 
https://dx.doi.org/10.2305/IUCN.UK.2016-2.RLTS.T15933A17964454.en. 
Idani G. 1991. Social relationships between immigrant and resident bonobo (Pan 
paniscus) females at Wamba. Fol Primatol 57:83–95. 
11 2 
 
Inogwabini B-I. 2020. Wild Bonobos and Wild Chimpanzees and Human Diseases. 
In: Inogwabini B-I, editor. Reconciling Human Needs and Conserving 
Biodiversity: Large Landscapes as a New Conservation Paradigm: The Lake 
Tumba, Democratic Republic of Congo. Cham: Springer International 
Publishing. p. 109–121. Available from: https://doi.org/10.1007/978-3-030-
38728-0_9 
Inogwabini B-I, Matungila B, Mbende L, Abokome M, Tshimanga T wa. 2007. Great 
apes in the Lake Tumba landscape, Democratic Republic of Congo: newly 
described populations. Oryx 41:532–538. 
Ishizuka S, Kawamoto Y, Sakamaki T, Tokuyama N, Toda K, Okamura H, Furuichi 
T. 2018. Paternity and kin structure among neighbouring groups in wild 
bonobos at Wamba. R Soc Open Sci 5:171006. 
Jaeggi AV, De Groot E, Stevens JMG, Van Schaik CP. 2013. Mechanisms of 
reciprocity in primates: testing for short-term contingency of grooming and 
food sharing in bonobos and chimpanzees. Evol Hum Behav 34:69–77. 
Jaeggi AV, Stevens JMG, Van Schaik CP. 2010. Tolerant food sharing and 
reciprocity is precluded by despotism among bonobos but not chimpanzees. 
Am J Phys Anth 143:41–51. 
Jensen JD. 2014. On the unfounded enthusiasm for soft selective sweeps. Nat 
Commun 5:5281. 
Johnson SC, Bonnefille R, Chivers DJ, Groves CP, Horn AD, Jungers WL, Kimura T, 
McHenry HM, Prasad KN, Schwartz JH, et al. 1981. Bonobos: Generalized 
hominid prototypes or specialized insular dwarfs? Curr Anthropol 22:363–
375. 
Johri P, Charlesworth B, Jensen JD. 2020. Toward an evolutionarily appropriate null 
model: Jointly inferring demography and purifying selection. Genetics 
215:173. 
Johri P, Riall K, Becher H, Charlesworth B, Jensen JD. 2020. The impact of purifying 
and background selection on the inference of population history: problems and 
prospects. bioRxiv:2020.04.28.066365. 
Jungers WL, Susman RL. 1984. Body size and skeletal allometry in African apes. In: 
Susman RL, editor. The Pygmy Chimpanzee Evolutionary Biology and 
Behavior. New York: Plenum Press. p. 131–178. 
Kaburu SSK, Inoue S, Newton‐Fisher NE. 2013. Death of the alpha: Within-
community lethal violence among chimpanzees of the Mahale Mountains 
National Park. Am J Primatol 75:789–797. 
Kahlenberg SM, Emery Thompson M, Wrangham RW. 2008. Female competition 
over core areas in Pan troglodytes schweinfurthii, Kibale National Park, 
Uganda. Int J Primatol 29:931. 
11 3 
 
Kahlenberg SM, Thompson ME, Muller MN, Wrangham RW. 2008. Immigration 
costs for female chimpanzees and male protection as an immigrant 
counterstrategy to intrasexual aggression. Anim Behav 76:1497–1509. 
Kalan AK, Kulik L, Arandjelovic M, Boesch C, Haas F, Dieguez P, Barratt CD, 
Abwe EE, Agbor A, Angedakin S, et al. 2020. Environmental variability 
supports chimpanzee behavioural diversity. Nat Commun 11:4451. 
Kamada F, Aoki Y, Narisawa A, Abe Y, Komatsuzaki S, Kikuchi A, Kanno J, Niihori 
T, Ono M, Ishii N, et al. 2011. A genome-wide association study identifies 
RNF213 as the first Moyamoya disease gene. J Hum Genet 56:34–40. 
Kano T. 1992. The last ape: Pygmy chimpanzee behavior and ecology. Stanford: 
Stanford University Press 
Kawamoto Y, Takemoto H, Higuchi S, Sakamaki T, Hart JA, Hart TB, Tokuyama N, 
Reinartz GE, Guislain P, Dupain J, et al. 2013. Genetic structure of wild 
bonobo populations: Diversity of mitochondrial DNA and geographical 
distribution. PLOS One 8:e59660. 
Keele BF, Jones JH, Terio KA, Estes JD, Rudicell RS, Wilson ML, Li Y, Learn GH, 
Beasley TM, Schumacher-Stankey J, et al. 2009. Increased mortality and 
AIDS-like immunopathology in wild chimpanzees infected with SIVcpz. 
Nature 460:515–519. 
Kelleher J, Etheridge AM, McVean G. 2016. Efficient coalescent simulation and 
genealogical analysis for large sample sizes. PLOS Comput Biol 12:e1004842. 
Kenigsberg S, Lima PDA, Maghen L, Wyse BA, Lackan C, Cheung ANY, Tsang BK, 
Librach CL. 2017. The elusive MAESTRO gene: Its human reproductive 
tissue-specific expression pattern. PLOS One 12:e0174873. 
Kern AD, Schrider DR. 2016. Discoal: flexible coalescent simulations with selection. 
Bioinformatics 32:3839–3841. 
Kern AD, Schrider DR. 2018. diploS/HIC: An updated approach to classifying 
selective sweeps. G3 Genes Genom Genet 8:1959–1970. 
Kimura M. 1983. The neutral theory of molecular evolution. Cambridge: Cambridge 
University Press 
Köster J, Rahmann S. 2012. Snakemake—a scalable bioinformatics workflow engine. 
Bioinformatics 28:2520–2522. 
Kovalaskas S, Rilling JK, Lindo J. 2020. Comparative analyses of the Pan lineage 
reveal selection on gene pathways associated with diet and sociality in 
bonobos. Genes Brain Behav n/a:e12715. 
Krassowski M. 2020. ComplexUpset. Available from: 
https://doi.org/10.5281/zenodo.3700590 
11 4 
 
Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, 
Underwood JG, Nelson BJ, Chaisson MJP, Dougherty ML, et al. 2018. High-
resolution comparative analysis of great ape genomes. Science [Internet] 360. 
Available from: https://science.sciencemag.org/content/360/6393/eaar6343 
Kuhlwilm M, Han S, Sousa VC, Excoffier L, Marques-Bonet T. 2019. Ancient 
admixture from an extinct ape lineage into bonobos. Nat Ecol Evol 3:957–965. 
Kuroda S, Nishihara T, Suzuki S, Oko RA. 1996. Sympatric chimpanzees and gorillas 
in the Ndoki Forest, Congo. In: McGrew WC, Marchant LF, Nishida T, 
editors. Great Ape Societies. Cambridge: Cambridge University Press. p. 71–
81. 
Lalouette A, Guénet J-L, Vriz S. 1998. Hotfoot mouse mutations affect the δ2 
glutamate receptor gene and are allelic to lurcher. Genomics 50:9–13. 
Langergraber KE, Mitani JC, Vigilant L. 2007. The limited impact of kinship on 
cooperation in wild chimpanzees. Proc Natl Acad Sci USA 104:7786. 
Langergraber KE, Prüfer K, Rowney C, Boesch C, Crockford C, Fawcett K, Inoue E, 
Inoue-Muruyama M, Mitani JC, Muller MN, et al. 2012. Generation times in 
wild chimpanzees and gorillas suggest earlier divergence times in great ape 
and human evolution. Proc Natl Acad Sci USA 109:15716. 
van Leeuwen KL, Hill RA, Korstjens AH. 2020. Classifying chimpanzee (Pan 
troglodytes) landscapes across large-scale environmental gradients in Africa. 
Int J Primatol [Internet]. Available from: https://doi.org/10.1007/s10764-020-
00164-5 
Lefebvre V, Behringer RR, de Crombrugghe B. 2001. L-Sox5, Sox6 and Sox9 control 
essential steps of the chondrocyte differentiation pathway. Osteoarthr Cartil 
9:S69–S75. 
Lester JD, Vigilant L, Gratton P, McCarthy MS, Barratt CD, Dieguez P, Agbor A, 
Álvarez-Varona P, Angedakin S, Ayimisin EA, et al. 2021. Recent genetic 
connectivity and clinal variation in chimpanzees. Commun Biol 4:283. 
Leturmy P, Lucazeau F, Brigaud F. 2003. Dynamic interactions between the gulf of 
Guinea passive margin and the Congo River drainage basin: 1. Morphology 
and mass balance. J Geophys Res Solid Earth [Internet] 108. Available from: 
https://doi.org/10.1029/2002JB001927 
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. 2014. UpSet: Visualization 
of intersecting sets. IEEE Transactions on Visualization and Computer 
Graphics 20:1983–1992. 
Li H. 2011. A statistical framework for SNP calling, mutation discovery, association 
mapping and population genetical parameter estimation from sequencing data. 
Bioinformatics 27:2987–2993. 
11 5 
 
Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with 
BWA-MEM. arXiv:1303.3997. 
Li H. 2014. Toward better understanding of artifacts in variant calling from high-
coverage samples. Bioinformatics 30:2843–2851. 
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, 
Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The 
sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. 
Li YF, Costello JC, Holloway AK, Hahn MW. 2008. “Reverse ecology” and the 
power of population genomics. Evolution 62:2984–2994. 
Lieberman DE, Carlo J, Ponce de León M, Zollikofer CPE. 2007. A geometric 
morphometric analysis of heterochrony in the cranium of chimpanzees and 
bonobos. J Hum Evol 52:647–662. 
Liu W, Morito D, Takashima S, Mineharu Y, Kobayashi H, Hitomi T, Hashikata H, 
Matsuura N, Yamazaki S, Toyoda A, et al. 2011. Identification of RNF213 as 
a susceptibility gene for Moyamoya disease and its possible role in vascular 
development. PLOS One 6:e22542. 
Liu W, Sherrill-Mix S, Learn GH, Scully EJ, Li Y, Avitto AN, Loy DE, Lauder AP, 
Sundararaman SA, Plenderleith LJ, et al. 2017. Wild bonobos host 
geographically restricted malaria parasites including a putative new Laverania 
species. Nat Commun 8:1–14. 
Lucazeau F, Brigaud F, Leturmy P. 2003. Dynamic interactions between the Gulf of 
Guinea passive margin and the Congo River drainage basin: 2. Isostasy and 
uplift. J Geophys Res Solid Earth [Internet] 108. Available from: 
https://doi.org/10.1029/2002JB001928 
Lucchesi S, Cheng L, Janmaat K, Mundry R, Pisor A, Surbeck M. 2020. Beyond the 
group: How food, mates, and group size influence intergroup encounters in 
wild bonobos. Behav Ecol 31:519–532. 
Malenky RK, Kuroda S, Vineberg EO, Wrangham RW. 1994. The significance of 
terrestrial herbaceous foods for bonobos, chimpanzees, and gorillas. In: 
Wrangham RW, McGrew WC, de Waal FBM, Heltne PG, editors. 
Chimpanzee Cultures. Cambridge: Harvard University Press. p. 59–75. 
Malenky RK, Wrangham RW. 1994. A quantitative comparison of terrestrial 
herbaceous food consumption by Pan paniscus in the Lomako Forest, Zaire, 
and Pan troglodytes in the Kibale Forest, Uganda. Am J Primatol 32:1–12. 
de Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, Prado-Martinez J, 
Hernandez-Rodriguez J, Dupanloup I, Lao O, Hallast P, et al. 2016. 
Chimpanzee genomic diversity reveals ancient admixture with bonobos. 
Science 354:477–481. 
11 6 
 
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, 
Montinaro F, Gordon DS, Storer JM, et al. 2021. A high-quality bonobo 
genome refines the analysis of hominid evolution. Nature [Internet]. Available 
from: https://doi.org/10.1038/s41586-021-03519-x 
Marzec AM, Kunz JA, Falkner S, Atmoko SSU, Alavi SE, Moldawer AM, Vogel ER, 
Schuppli C, van Schaik CP, van Noordwijk MA. 2016. The dark side of the 
red ape: male-mediated lethal female competition in Bornean orangutans. 
Behav Ecol Sociobiol 70:459–466. 
Matsuzawa T, Humle T. 2011. Bossou: 33 Years. In: Matsuzawa T, Humle T, 
Sugiyama Y, editors. The Chimpanzees of Bossou and Nimba. Tokyo: 
Springer Japan. p. 3–10. Available from: https://doi.org/10.1007/978-4-431-
53921-6_2 
Maynard Smith J, Haigh J. 1974. The hitch-hiking effect of a favourable gene. Genet 
Res 23:23–35. 
McBrearty S, Jablonski NG. 2005. First fossil chimpanzee. Nature 437:105–108. 
McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in 
Drosophila. Nature 351:652–654. 
Medkour H, Castaneda S, Amona I, Fenollar F, André C, Belais R, Mungongo P, 
Muyembé-Tamfum J-J, Levasseur A, Raoult D, et al. 2021. Potential zoonotic 
pathogens hosted by endangered bonobos. Sci Rep 11:6331. 
Melin AD, Janiak MC, Marrone F, Arora PS, Higham JP. 2020. Comparative ACE2 
variation and primate COVID-19 risk. Commun Biol 3:641. 
Melin AD, Orkin JD, Janiak MC, Valenzuela A, Kuderna L, Marrone III F, 
Ramangason H, Horvath JE, Roos C, Kitchener AC, et al. 2021. Variation in 
predicted COVID-19 risk among lemurs and lorises. Am J Primatol 
n/a:e23255. 
Messer PW, Petrov DA. 2013. Population genomics of rapid adaptation by soft 
selective sweeps. Trends Ecol Evol 28:659–669. 
Mitani JC. 2009. Male chimpanzees form enduring and equitable social bonds. Anim 
Behav 77:633–640. 
Mitani JC, Watts DP, Amsler SJ. 2010. Lethal intergroup aggression leads to 
territorial expansion in wild chimpanzees. Curr Biol 20:R507–R508. 
Mitteroecker P, Gunz P, Bookstein FL. 2005. Heterochrony and geometric 
morphometrics: a comparison of cranial growth in Pan paniscus versus Pan 
troglodytes. Evol Dev 7:244–258. 
11 7 
 
Moniaux N, Junker WM, Singh AP, Jones AM, Batra SK. 2006. Characterization of 
Human Mucin MUC17: COMPLETE CODING SEQUENCE AND 
ORGANIZATION. J Biol Chem 281:23676–23685. 
Moscovice LR, Douglas PH, Martinez‐Iñigo L, Surbeck M, Vigilant L, Hohmann G. 
2017. Stable and fluctuating social preferences and implications for 
cooperation among female bonobos at LuiKotale, Salonga National Park, 
DRC. Am J Phys Anthropol 163:158–172. 
Mughal MR, DeGiorgio M. 2019. Localizing and classifying adaptive targets with 
trend filtered regression. Mol Biol Evol 36:252–270. 
Muller MN, Kahlenberg SM, Emery Thompson M, Wrangham RW. 2007. Male 
coercion and the costs of promiscuous mating for female chimpanzees. Proc R 
Soc Lond B Biol Sci 274:1009–1014. 
Muller MN, Thompson ME, Kahlenberg SM, Wrangham RW. 2011. Sexual coercion 
by male chimpanzees shows that female choice may be more apparent than 
real. Behav Ecol Sociobiol 65:921–933. 
Myers Thompson JA. 2003. A model of the biogeographical journey from Proto-pan 
to Pan paniscus. Primates 44:191–197. 
Nakajima T, Ohtani H, Satta Y, Uno Y, Akari H, Ishida T, Kimura A. 2008. Natural 
selection in the TLR-related genes in the course of primate evolution. 
Immunogenetics 60:727–735. 
Nakano Y, Yamamoto K, Ueda MT, Soper A, Konno Y, Kimura I, Uriu K, Kumata 
R, Aso H, Misawa N, et al. 2020. A role for gorilla APOBEC3G in shaping 
lentivirus evolution including transmission to humans. PLOS Pathog 
16:e1008812. 
Nam K, Munch K, Mailund T, Nater A, Greminger MP, Krützen M, Marquès-Bonet 
T, Schierup MH. 2017. Evidence that the rate of strong selective sweeps 
increases with population size in the great apes. PNAS 114:1613–1618. 
Narasimhan VM, Rahbari R, Scally A, Wuster A, Mason D, Xue Y, Wright J, 
Trembath RC, Maher ER, Heel DA van, et al. 2017. Estimating the human 
mutation rate from autozygous segments reveals population differences in 
human mutational processes. Nat Commun 8:1–7. 
Nei M, Takahata N. 1993. Effective population size, genetic diversity, and 
coalescence time in subdivided populations. J Mol Evol 37:240–244. 
Nielsen R. 2001. Statistical tests of selective neutrality in the age of genomics. 
Heredity 86:641–647. 
Nishida T. 1983. Alpha status and agonistic alliance in wild chimpanzees (Pan 
troglodytes schweinfurthii). Primates 24:318–336. 
11 8 
 
Nishida T. 2011. Chimpanzees of the lakeshore: Natural history and culture at 
Mahale. Cambridge: Cambridge University Press 
Nye J, Laayouni H, Kuhlwilm M, Mondal M, Marques-Bonet T, Bertranpetit J. 2018. 
Selection in the introgressed regions of the chimpanzee genome. Genome Biol 
Evol 10:1132–1138. 
Nye J, Mondal M, Bertranpetit J, Laayouni H. 2020. A fully integrated machine 
learning scan of selection in the chimpanzee genome. NAR Genom Bioinform 
[Internet] 2. Available from: https://doi.org/10.1093/nargab/lqaa061 
Ohta T. 1992. The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst 
23:263–286. 
Oleksyk TK, Smith MW, O’Brien SJ. 2010. Genome-wide scans for footprints of 
natural selection. Philos Trans R Soc B Biol Sci 365:185–205. 
Osborne MJ, Volpon L, Kornblatt JA, Culjkovic-Kraljacic B, Baguet A, Borden KLB. 
2013. eIF4E3 acts as a tumor suppressor by utilizing an atypical mode of 
methyl-7-guanosine cap recognition. Proc Natl Acad Sci USA 110:3877. 
Ozga AT, Webster TH, Gilby IC, Wilson MA, Nockerts RS, Wilson ML, Pusey AE, 
Li Y, Hahn BH, Stone AC. 2021. Urine as a high-quality source of host 
genomic DNA from wild populations. Mol Ecol Resour 21:170–182. 
Palagi E. 2006. Social play in bonobos (Pan paniscus) and chimpanzees (Pan 
troglodytes): Implications for natural social systems and interindividual 
relationships. Am J Phys Anth 129:418–426. 
Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, 
Ivancevic AM, To T-H, Kortschak RD, et al. 2018. A comprehensive genomic 
history of extinct and living elephants. Proc Natl Acad Sci USA 115:E2566. 
Paoli T, Palagi E, Tarli SMB. 2006. Reevaluation of dominance hierarchy in bonobos 
(Pan paniscus). Am J Phys Anth 130:116–122. 
Parish AR. 1994. Sex and food control in the “uncommon chimpanzee”: How Bonobo 
females overcome a phylogenetic legacy of male dominance. Ethol Sociobiol 
15:157–179. 
Parish AR. 1996. Female relationships in bonobos (Pan paniscus). Hum Nat 7:61–96. 
Parish AR, de Waal FBM, Haig D. 2000. The other “closest living relative”: How 
bonobos (Pan paniscus) challenge traditional assumptions about females, 
dominance, intra- and intersexual interactions, and hominid evolution. Ann N 
Y Acad Sci 907:97–113. 
Pedersen BS, Quinlan AR. 2018. Mosdepth: quick coverage calculation for genomes 
and exomes. Bioinformatics 34:867–868. 
11 9 
 
Pennings PS, Hermisson J. 2006a. Soft sweeps II—Molecular population genetics of 
adaptation from recurrent mutation or immigration. Mol Biol Evol 23:1076–
1084. 
Pennings PS, Hermisson J. 2006b. Soft sweeps III: The signature of positive selection 
from recurrent mutation. PLOS Genet 2:e186. 
Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea 
FA, Mountain JL, Misra R, et al. 2007. Diet and the evolution of human 
amylase gene copy number variation. Nat Genet 39:1256–1260. 
Pilbrow V. 2006. Population systematics of chimpanzees using molar morphometrics. 
J Hum Evol 51:646–662. 
Pilbrow V, Groves C. 2013. Evidence for divergence in populations of bonobos (Pan 
paniscus) in the Lomami-Lualaba and Kasai-Sankuru regions based on 
preliminary analysis of craniodental variation. Int J Primatol 34:1244–1260. 
Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Van der Auwera 
GA, Kling DE, Gauthier LD, Levy-Moonshine A, Roazen D, et al. 2018. 
Scaling accurate genetic variant discovery to tens of thousands of samples. 
bioRxiv:201178. 
Porubsky D, Sanders AD, Höps W, Hsieh P, Sulovari A, Li R, Mercuri L, Sorensen 
M, Murali SC, Gordon D, et al. 2020. Recurrent inversion toggling and great 
ape genome evolution. Nat Genet 52:849–858. 
Potkin SG, Guffanti G, Lakatos A, Turner JA, Kruggel F, Fallon JH, Saykin AJ, Orro 
A, Lupoli S, Salvi E, et al. 2009. Hippocampal atrophy as a quantitative trait 
in a genome-wide association study identifying novel susceptibility genes for 
Alzheimer’s disease. PLOS One 4:e6501. 
Potts R. 1998. Variability selection in hominid evolution. Ev Anth 7:81–96. 
Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, 
Veeramah KR, Woerner AE, O’Connor TD, Santpere G, et al. 2013. Great ape 
genetic diversity and population history. Nature 499:471–475. 
Pritchard JK, Pickrell JK, Coop G. 2010. The genetics of human adaptation: Hard 
sweeps, soft sweeps, and polygenic adaptation. Curr Biol 20:R208–R215. 
Pruetz JD, Ontl KB, Cleaveland E, Lindshield S, Marshack J, Wessling EG. 2017. 
Intragroup lethal aggression in West African chimpanzees (Pan troglodytes 
verus): Inferred killing of a former alpha male at Fongoli, Senegal. Int J 
Primatol 38:31–57. 
Prüfer K, Munch K, Hellmann I, Akagi K, Miller JR, Walenz B, Koren S, Sutton G, 
Kodira C, Winer R, et al. 2012. The bonobo genome compared with the 
chimpanzee and human genomes. Nature 486:527–531. 
12 0 
 
Przeworski M. 2002. The signature of positive selection at randomly chosen loci. 
Genetics 160:1179–1189. 
Przeworski M, Coop G, Wall JD. 2005. The signature of positive selection on 
standing genetic variation. Evolution 59:2312–2323. 
Pusey A, Murray C, Wallauer W, Wilson M, Wroblewski E, Goodall J. 2008. Severe 
aggression among female Pan troglodytes schweinfurthii at Gombe National 
Park, Tanzania. Int J Primatol 29:949. 
Pusey A, Williams J, Goodall J. 1997. The influence of dominance rank on the 
reproductive success of female chimpanzees. Science 277:828. 
Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing 
genomic features. Bioinformatics 26:841–842. 
R Core Team. 2020. R: A language and environment for statistical computing. 
Vienna, Austria: R Foundation for Statistical Computing Available from: 
https://www.R-project.org/ 
Ralph P, Coop G. 2010. Parallel adaptation: One or many waves of advance of an 
advantageous allele? Genetics 186:647–668. 
Rand DM, Kann LM. 1996. Excess amino acid polymorphism in mitochondrial DNA: 
contrasts among genes from Drosophila, mice, and humans. Mol Biol Evol 
13:735–748. 
Reichert KE, Heistermann M, Keith Hodges J, Boesch C, Hohmann G. 2002. What 
females tell males about their reproductive status: Are morphological and 
behavioural cues reliable signals of ovulation in bonobos (Pan paniscus)? 
Ethology 108:583–600. 
Rilling JK, Scholz J, Preuss TM, Glasser MF, Errangi BK, Behrens TE. 2012. 
Differences between chimpanzees and bonobos in neural systems supporting 
social cognition. Soc Cogn Affect Neurosci 7:369–379. 
Rogers AR. 2019. Legofit: Estimating population history from genetic data. 
bioRxiv:613067. 
Rogers AR. 2021. An efficient algorithm for estimating population history from 
genetic data. bioRxiv:2021.01.23.427922. 
Rogers AR, Harris NS, Achenbach AA. 2020. Neanderthal-Denisovan ancestors 
interbred with a distantly related hominin. Sci Adv 6:eaay5483. 
Sakai N, Hiromi Terami, Suzuki S, Megumi Haga, Ken Nomoto, Nobuko Tsuchida, 
Ken-ichirou Morohashi, Naoaki Saito, Maki Asada, Megumi Hashimoto, et al. 
2008. Identification of NR5A1 (SF-1/AD4BP) gene expression modulators by 
large-scale gain and loss of function studies. J Endocrinol 198:489–497. 
12 1 
 
Sakamaki T, Kasalevo P, Bokamba MB, Bongoli L. 2012. Iyondji Community 
Bonobo Reserve: A recently established reserve in the Democratic Republic of 
Congo. Pan Africa News 19:16–19. 
Sakamaki T, Maloueki U, Bakaa B, Bongoli L, Kasalevo P, Terada S, Furuichi T. 
2016. Mammals consumed by bonobos (Pan paniscus): new data from the 
Iyondji forest, Tshuapa, Democratic Republic of the Congo. Primates 57:295–
301. 
Sakamaki T, Ryu H, Toda K, Tokuyama N, Furuichi T. 2018. Increased frequency of 
intergroup encounters in wild bonobos (Pan paniscus) around the yearly peak 
in fruit abundance at Wamba. Int J Primatol 39:685–704. 
Sánchez-Villagra MR, Geiger M, Schneider RA. 2016. The taming of the neural crest: 
a developmental perspective on the origins of morphological covariation in 
domesticated mammals. R Soc Open Sci 3:160107. 
Sandel AA, Reddy RB. 2021. Sociosexual behaviour in wild chimpanzees occurs in 
variable contexts and is frequent between same-sex partners. Behaviour 
158:249–276. 
Sandel AA, Watts DP. 2021. Lethal coalitionary aggression associated with a 
community fission in chimpanzees (Pan troglodytes) at Ngogo, Kibale 
National Park, Uganda. Int J Primatol 42:26–48. 
Sarich VM, Wilson AC. 1967. Immunological time scale for hominid evolution. 
Science 158:1200. 
Sawyer SL, Wu LI, Emerman M, Malik HS. 2005. Positive selection of primate 
TRIM5α identifies a critical species-specific retroviral restriction domain. Proc 
Natl Acad Sci USA 102:2832. 
Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, 
Lappalainen T, Mailund T, Marques-Bonet T, et al. 2012. Insights into 
hominid evolution from the gorilla genome sequence. Nature 483:169–175. 
Scarry CJ, Tujague MP. 2012. Consequences of lethal intragroup aggression and 
alpha male replacement on intergroup relations and home range use in tufted 
capuchin monkeys (Cebus apella nigritus). Am J Primatol 74:804–810. 
Schiffels S, Durbin R. 2014. Inferring human population size and separation history 
from multiple genome sequences. Nat Genet 46:919–925. 
Schmidt JM, Manuel M de, Marques-Bonet T, Castellano S, Andrés AM. 2019. The 
impact of genetic adaptation on chimpanzee subspecies differentiation. PLOS 
Genet 15:e1008485. 
Schrider DR. 2020. Background selection does not mimic the patterns of genetic 
diversity produced by selective sweeps. Genetics 216:499–519. 
12 2 
 
Schrider DR, Kern AD. 2017. Soft sweeps are the dominant mode of adaptation in the 
human genome. Mol Biol Evol 34:1863–1877. 
Schrider DR, Mendes FK, Hahn MW, Kern AD. 2015. Soft shoulders ahead: Spurious 
signatures of soft and partial selective sweeps result from linked hard sweeps. 
Genetics 200:267–284. 
Schwarz E. 1929. Das Vorkommen des Schimpansen auf den linked Kongo-Ufer. Rev 
Zool Bot Afr 16. 
Seong E, Insolera R, Dulovic M, Kamsteeg E-J, Trinh J, Brüggemann N, Sandford E, 
Li S, Ozel AB, Li JZ, et al. 2018. Mutations in VPS13D lead to a new 
recessive ataxia with spasticity and mitochondrial defects. Ann Neurol 
83:1075–1088. 
Serckx A, Huynen M-C, Bastin J-F, Hambuckers A, Beudels-Jamar RC, Vimond M, 
Raynaud E, Kühl HS. 2014. Nest grouping patterns of bonobos (Pan paniscus) 
in relation to fruit availability in a forest-savannah mosaic. PLOS One 
9:e93742. 
Sharp PM, Hahn BH. 2011. Origins of HIV and the AIDS Pandemic. Cold Spring 
Harb Perspect Med [Internet] 1. Available from: 
http://perspectivesinmedicine.cshlp.org/content/1/1/a006841.abstract 
Shea B. 1983. Paedomorphosis and neoteny in the pygmy chimpanzee. Science 
222:521. 
Simons EA, Frost SR. 2020. Ontogenetic allometry and scaling in catarrhine crania. J 
Anat [Internet] n/a. Available from: https://doi.org/10.1111/joa.13331 
Smith RJ, Jungers WL. 1997. Body mass in comparative primatology. J Hum Evol 
32:523–559. 
Soto DC, Shew C, Mastoras M, Schmidt JM, Sahasrabudhe R, Kaya G, Andrés AM, 
Dennis MY. 2020. Identification of structural variation in chimpanzees using 
optical mapping and nanopore sequencing. Genes 11. 
Staes N, Koski SE, Helsen P, Fransen E, Eens M, Stevens JMG. 2015. Chimpanzee 
sociability is associated with vasopressin (Avpr1a) but not oxytocin receptor 
gene (OXTR) variation. Horm Behav 75:84–90. 
Staes N, Smaers JB, Kunkle AE, Hopkins WD, Bradley BJ, Sherwood CC. 2019. 
Evolutionary divergence of neuroanatomical organization and related genes in 
chimpanzees and bonobos. Cortex 118:154–164. 
Staes N, Stevens JMG, Helsen P, Hillyer M, Korody M, Eens M. 2014. Oxytocin and 
vasopressin receptor gene variation as a proximate base for inter- and 
intraspecific behavioral differences in bonobos and chimpanzees. PLOS One 
9:e113364. 
12 3 
 
Staes N, Weiss A, Helsen P, Korody M, Eens M, Stevens JMG. 2016. Bonobo 
personality traits are heritable and associated with vasopressin receptor gene 
1a variation. Sci Rep 6:38193. 
Stanford CB. 1998. The social behavior of chimpanzees and bonobos: Empirical 
evidence and shifting assumptions. Curr Anthropol 39:399–420. 
Sterck EHM, Watts DP, van Schaik CP. 1997. The evolution of female social 
relationships in nonhuman primates. Behav Ecol Sociobiol 41:291–309. 
Stevens JMG, Vervaecke H, Van Elsacker L. 2010. The bonobo’s adaptive potential: 
social relations under captive conditions. In: Furuichi T, Thompson J, editors. 
The Bonobos: Behavior, Ecology, and Conservation. New York: Springer. p. 
19–38. 
Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, McManus KF, Great 
Ape Genome Project, Bustamante CD, Hammer MF, Wall JD. 2015. The time 
scale of recombination rate evolution in great apes. Mol Biol Evol 33:928–945. 
Stimpson CD, Barger N, Taglialatela JP, Gendron-Fitzpatrick A, Hof PR, Hopkins 
WD, Sherwood CC. 2016. Differential serotonergic innervation of the 
amygdala in bonobos and chimpanzees. Soc Cogn Affect Neurosci 11:413–
422. 
Stoletzki N, Eyre-Walker A. 2011. Estimation of the neutrality index. Mol Biol Evol 
28:63–70. 
Stone AC, Griffiths RC, Zegura SL, Hammer MF. 2002. High levels of Y-
chromosome nucleotide diversity in the genus Pan. Proc Natl Acad Sci USA 
99:43. 
Stumpf RM. 2011. Chimpanzees and bonobos: Inter- and intraspecies diversity. In: 
Campbell CJ, Fuentes A, MacKinnon KC, Bearder SK, Stumpf RM, editors. 
Primates in perspective. New York: Oxford University Press. p. 340–356. 
Stumpf RM, Boesch C. 2005. Does promiscuous mating preclude female choice? 
Female sexual strategies in chimpanzees (Pan troglodytes verus) of the Taï 
National Park, Côte d’Ivoire. Behav Ecol Sociobiol 57:511–524. 
Sudmant PH, Huddleston J, Catacchio CR, Malig M, Hillier LW, Baker C, Mohajeri 
K, Kondova I, Bontrop RE, Persengiev S, et al. 2013. Evolution and diversity 
of copy number variation in the great ape lineage. Genome Res 23:1373–1382. 
Sugiyama Y. 1999. Socioecological factors of male chimpanzee migration at Bossou, 
Guinea. Primates 40:61–68. 
Sugiyama Y, Fujita S. 2011. The demography and reproductive parameters of Bossou 
chimpanzees. In: Matsuzawa T, Humle T, Sugiyama Y, editors. The 
Chipanzees of Bossou and Nimba. New York: Springer. p. 23–34. 
12 4 
 
Sun P-H, Ye L, Mason MD, Jiang WG. 2012. Protein tyrosine phosphatase µ (PTP µ 
or PTPRM), a negative regulator of proliferation and invasion of breast cancer 
cells, is associated with disease prognosis. PLOS One 7:e50183. 
Surbeck M, Boesch C, Crockford C, Thompson ME, Furuichi T, Fruth B, Hohmann 
G, Ishizuka S, Machanda Z, Muller MN, et al. 2019. Males with a mother 
living in their group have higher paternity success in bonobos but not 
chimpanzees. Curr Biol 29:R354–R355. 
Surbeck M, Coxe S, Lokasola AL. 2017. Lonoa: The establishment of a permanent 
field site for behavioural research on bonobos in the Kokolopori Bonobo 
Reserve. Pan Africa News 24:13–15. 
Surbeck M, Langergraber KE, Fruth B, Vigilant L, Hohmann G. 2017. Male 
reproductive skew is higher in bonobos than chimpanzees. Curr Biol 
27:R640–R641. 
Surbeck M, Mundry R, Hohmann G. 2011. Mothers matter! Maternal support, 
dominance status and mating success in male bonobos (Pan paniscus). Proc R 
Soc Lond B Biol Sci 278:590–598. 
Susman RL ed. 1984. The pygmy chimpanzee: Evolutionary biology and behavior. 
New York: Springer 
Tajsharghi H, Darin N, Rekabdar E, Kyllerman M, Wahlström J, Martinsson T, 
Oldfors A. 2005. Mutations and sequence variation in the human myosin 
heavy chain IIa gene (MYH2). Eur J Hum Genet 13:617–622. 
Takemoto H, Kawamoto Y, Furuichi T. 2015. How did bonobos come to range south 
of the congo river? Reconsideration of the divergence of Pan paniscus from 
other Pan populations. Ev Anth 24:170–184. 
Takemoto H, Kawamoto Y, Higuchi S, Makinose E, Hart JA, Hart TB, Sakamaki T, 
Tokuyama N, Reinartz GE, Guislain P, et al. 2017. The mitochondrial ancestor 
of bonobos and the origin of their major haplogroups. PLOS One 
12:e0174851. 
Talebi MG, Beltrão-Mendes R, Lee PC. 2009. Intra-community coalitionary lethal 
attack of an adult male southern muriqui (Brachyteles arachnoides). Am J 
Primatol 71:860–867. 
Tan J, Hare B. 2013. Bonobos share with strangers. PLOS One 8:e51922. 
Terio KA, Kinsel MJ, Raphael J, Mlengeya T, Lipende I, Kirchhoff CA, Gilagiza B, 
Wilson ML, Kamenya S, Estes JD, et al. 2011. Pathologic lesions in 
chimpanzees (Pan trogylodytes schweinfurthii) from Gombe National Park, 
Tanzania, 2004–2010. J Zoo Wildl Med 42:597–607. 
12 5 
 
The Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the 
chimpanzee genome and comparison with the human genome. Nature 437:69–
87. 
Thompson JAM. 2001. On the nomenclature of Pan paniscus. Primates 42:101–111. 
Thompson-Handler N, Malenky RK, Badrian N. 1984. Sexual behavior of Pan 
paniscus under natural conditions in the Lomako Forest, Equateur, Zaire. In: 
Susman RL, editor. The Pygmy Chimpanzee: Evolutionary Biology and 
Behavior. Boston, MA: Springer US. p. 347–368. Available from: 
https://doi.org/10.1007/978-1-4757-0082-4_14 
Tian X, Pascal G, Monget P. 2009. Evolution and functional divergence of NLRP 
genes in mammalian reproductive systems. BMC Evol Biol 9:202. 
Tokuyama N, Furuichi T. 2016. Do friends help each other? Patterns of female 
coalition formation in wild bonobos at Wamba. Anim Behav 119:27–35. 
Tokuyama N, Sakamaki T, Furuichi T. 2019. Inter-group aggressive interaction 
patterns indicate male mate defense and female cooperation across bonobo 
groups at Wamba, Democratic Republic of the Congo. Am J Phys Anth 
170:535–550. 
Townsend SW, Slocombe KE, Emery Thompson M, Zuberbühler K. 2007. Female-
led infanticide in wild chimpanzees. Curr Biol 17:R355–R356. 
Toyoda S, Miyazaki T, Miyazaki S, Yoshimura T, Yamamoto M, Tashiro F, Yamato 
E, Miyazaki J. 2009. Sohlh2 affects differentiation of KIT positive oocytes and 
spermatogonia. Dev Biol 325:238–248. 
Tratz EP, Heck H. 1954. Der afrikanische Anthropoide “Bonobo”: Eine neue 
Menschenaffengattung. Säugetierkundliche Mitteilungen 2:97–101. 
Turley K, Frost SR. 2014. The appositional articular morphology of the talo-crural 
joint: The influence of substrate use on joint shape. Anat Rec 297:618–629. 
Tutin CEG. 1979. Mating patterns and reproductive strategies in a community of wild 
chimpanzees (Pan troglodytes schweinfurthii). Behav Ecol Sociobiol 6:29–38. 
Tutin CEG, Fernandez M, Rogers ME, Williamson EA, McGrew WC, Altmann SA, 
Southgate DAT, Crowe I, Whiten A, Conklin NL, et al. 1991. Foraging 
profiles of sympatric lowland gorillas and chimpanzees in the Lopé Reserve, 
Gabon. Philos Trans R Soc B Biol Sci 334:179–186. 
Uehara S. 1990. Utilization patterns of a marsh grassland within the tropical rain 
forest by the bonobos (Pan paniscus) of Yalosidi, Republic of Zaire. Primates 
31:311–322. 
 
12 6 
 
Unger S, Górna MW, Le Béchec A, Do Vale-Pereira S, Bedeschi MF, Geiberger S, 
Grigelioniene G, Horemuzova E, Lalatta F, Lausch E, et al. 2013. FAM111A 
mutations result in hypoparathyroidism and impaired skeletal development. 
Am J Hum Genet 92:990–995. 
Valero A, Schaffner CM, Vick LG, Aureli F, Ramos-Fernandez G. 2006. Intragroup 
lethal aggression in wild spider monkeys. Am J Primatol 68:732–737. 
de Valles-Ibáñez G, Hernandez-Rodriguez J, Prado-Martinez J, Luisi P, Marquès-
Bonet T, Casals F. 2016. Genetic load of loss-of-function polymorphic 
variants in great apes. Genome Biol Evol 8:871–877. 
Van Heuverswyn F, Li Y, Neel C, Bailes E, Keele BF, Liu W, Loul S, Butel C, 
Liegeois F, Bienvenue Y, et al. 2006. SIV infection in wild gorillas. Nature 
444:164–164. 
van der Lee R, Wiel L, van Dam TJP, Huynen MA. 2017. Genome-scale detection of 
positive selection in nine primates predicts human-virus evolutionary 
conflicts. Nucleic Acids Res 45:10634–10648. 
Vervaecke H, Van Elsacker L. 1992. Hybrids between common chimpanzees (Pan 
troglodytes) and pygmy chimpanzees (Pan paniscus) in captivity. Mammalia 
56:667–669. 
Vigilant L, Hofreiter M, Siedel H, Boesch C. 2001. Paternity and relatedness in wild 
chimpanzee communities. Proc Natl Acad Sci USA 98:12890. 
Villanea FA, Schraiber JG. 2019. Multiple episodes of interbreeding between 
Neanderthal and modern humans. Nat Ecol Evol 3:39–44. 
Vy HMT, Kim Y. 2015. A composite-likelihood method for detecting incomplete 
selective sweep from population genomic data. Genetics 200:633. 
Vyklicka L, Lishko PV. 2020. Dissecting the signaling pathways involved in the 
function of sperm flagellum. Curr Opin Cell Biol 63:154–161. 
de Waal FBM. 1989. Peacemaking among primates. Cambridge, MA: Harvard 
University Press 
de Waal FBM, Lanting F. 1997. Bonobo: The forgotten ape. Berkeley: University of 
California Press 
Wakefield ML, Hickmott AJ, Brand CM, Takaoka IY, Meador LM, Waller MT, 
White FJ. 2019. New observations of meat eating and sharing in wild bonobos 
(Pan paniscus) at Iyema, Lomako Forest Reserve, Democratic Republic of the 
Congo. Fol Primatol 90:179–189. 
 
 
12 7 
 
Walker K, Hare B. 2017. Bonobo baby dominance: Did female defense of offspring 
lead to reduced male aggression? In: Hare B, Yamamoto S, editors. Bonobos: 
Unique in Mind, Brain and Behavior. Oxford: University of Oxford Press. p. 
49–64. 
Wall JD, Hammer MF. 2006. Archaic admixture in the human genome. Curr Opin 
Genet Dev 16:606–610. 
Waller MT, White FJ. 2016. The effects of war on bonobos and other nonhuman 
primates in the Democratic Republic of the Congo. In: Waller MT, editor. 
Ethnoprimatology: Primate Conservation in the 21st Century. New York: 
Springer International Publishing. p. 179–192. Available from: 
https://doi.org/10.1007/978-3-319-30469-4_10 
Watts DP. 2004. Intracommunity coalitionary killing of an adult male chimpanzee at 
Ngogo, Kibale National Park, Uganda. Int J Primatol 25:507–521. 
Webster TH, Couse M, Grande BM, Karlins E, Phung TN, Richmond PA, Whitford 
W, Wilson MA. 2019. Identifying, understanding, and correcting technical 
artifacts on the sex chromosomes in next-generation sequencing data. 
Gigascience [Internet] 8. Available from: 
https://academic.oup.com/gigascience/article/8/7/giz074/5530326 
Webster TH, Wilson Sayres MA. 2016. Genomic signatures of sex-biased 
demography: progress and prospects. Curr Opin Genet 41:62–71. 
Wegmann D, Excoffier L. 2010. Bayesian inference of the demographic history of 
chimpanzees. Mol Biol Evol 27:1425–1435. 
Wei T, Simko V. 2021. R package “corrplot”: Visualization of a Correlation Matrix. 
Available from: https://github.com/taiyun/corrplot 
Weigand H, Leese F. 2018. Detecting signatures of positive selection in non-model 
species using genomic data. Zool J Linn Soc 184:528–583. 
Wetzel KS, Yi Y, Yadav A, Bauer AM, Bello EA, Romero DC, Bibollet-Ruche F, 
Hahn BH, Paiardini M, Silvestri G, et al. 2018. Loss of CXCR6 coreceptor 
usage characterizes pathogenic lentiviruses. PLOS Pathog 14:e1007003. 
White FJ. 1986. Behavioral ecology of the pygmy chimpanzee. 
White FJ. 1988. Party composition and dynamics in Pan paniscus. Int J Primatol 
9:179–193. 
White FJ. 1992. Activity Budgets, feeding behavior, and habitat use of pygmy 
chimpanzees at Lomako, Zaire. Am J Primatol 26:215–223. 
White FJ. 1996. Pan paniscus 1973 to 1996: Twenty-three years of field research. Ev 
Anth 5:11–17. 
12 8 
 
White FJ. 1998. Seasonality and socioecology: The importance of variation in fruit 
abundance to bonobo sociality. Int J Primatol 19:1013–1027. 
White FJ. 2012. The Quarterly Review of Biology 87:171–172. 
White FJ, Burgman MA. 1990. Social organization of the pygmy chimpanzee (Pan 
paniscus): Multivariate analysis of intracommunity associations. Am J Phys 
Anth 83:193–201. 
White FJ, Wood KD. 2007. Female feeding priority in bonobos, Pan paniscus, and 
the question of female dominance. Am J Primatol 69:837–850. 
Wickham H. 2016. ggplot2: Elegant graphics for data analysis. New York: Springer-
Verlag Available from: https://ggplot2.tidyverse.org 
Wickham H. 2019. stringr: Simple, Consistent Wrappers for Common String 
Operations. 
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund 
G, Hayes A, Henry L, Hester J, et al. 2019. Welcome to the tidyverse. J Open 
Source Softw 4:1686. 
Wilson BA, Petrov DA, Messer PW. 2014. Soft selective sweeps in complex 
demographic scenarios. Genetics 198:669. 
Wilson ML, Boesch C, Fruth B, Furuichi T, Gilby IC, Hashimoto C, Hobaiter CL, 
Hohmann G, Itoh N, Koops K, et al. 2014. Lethal aggression in Pan is better 
explained by adaptive strategies than human impacts. Nature 513:414–417. 
Wilson ML, Wrangham RW. 2003. Intergroup relations in chimpanzees. Annu Rev 
Anthropol 32:363–392. 
Wobber V, Wrangham R, Hare B. 2010. Bonobos exhibit delayed development of 
social behavior and cognition relative to chimpanzees. Curr Biol 20:226–230. 
Won Y-J, Hey J. 2005. Divergence population genetics of chimpanzees. Mol Biol 
Evol 22:297–307. 
Wrangham RW. 1986. Ecology and social relationships in two species of chimpanzee. 
In: Rubenstein DI, Wrangham RW, editors. Ecological aspects of social 
evolution: Birds and mammals. Princeton, NJ: Princeton University Press. p. 
354–378. 
Wrangham RW. 1999. Evolution of coalitionary killing. Am J Phys Anth 110:1–30. 
Wrangham RW, Chapman CA, Clark-Arcadi AP, Isabirye-Basuta G. 1996. Social 
ecology of Kanyawara chimpanzees: implications for understanding the costs 
of great ape groups. In: McGrew WC, Marchant LF, Nishida T, editors. Great 
Ape Societies. Cambridge: Cambridge University Press. p. 45–57. 
12 9 
 
Wrangham RW, Clark AP, Isabirye-Basuta G. 1992. Female social relationships and 
social organization of Kibale Forest chimpanzees. In: Nishida T, McGrew 
WC, Marler P, Pickford M, de Waal FBM, editors. Topics in Primatology, 
Vol. 1 Human Origins. Tokyo: University of Tokyo Press. p. 81–98. 
Wrangham RW, Conklin NL, Chapman CA, Hunt KD, Milton K, Rogers E, Whiten 
A, Barton RA, Widdowson EM, Whiten A, et al. 1991. The significance of 
fibrous foods for Kibale Forest chimpanzees. Philos Trans R Soc B 334:171–
178. 
Wrangham RW, Pilbeam D. 2001. African apes as time machines. In: Galdikas B, 
Briggs N, Sheeran L, Shapiro G, Goodall J, editors. All Apes Great and Small. 
New York: Plenum. p. 5–17. 
Wright SI, Andolfatto P. 2008. The impact of natural selection on the genome: 
Emerging patterns in Drosophila and Arabidopsis. Annu Rev Ecol Evol Syst 
39:193–213. 
Wu Z, Jiang H, Zhang L, Xu X, Zhang X, Kang Z, Song D, Zhang J, Guan M, Gu Y. 
2012. Molecular analysis of RNF213 gene for Moyamoya disease in the 
Chinese Han population. PLOS One 7:e48179. 
Yamakoshi G. 1998. Dietary responses to fruit scarcity of wild chimpanzees at 
Bossou, Guinea: Possible implications for ecological importance of tool use. 
Am J Phys Anth 106:283–295. 
Yamakoshi G. 2004. Food seasonality and socioecology in Pan: Are West African 
chimpanzees another bonobo? Afr Study Monogr 25:45–60. 
Yang C-W, Chang CY-Y, Lai M-T, Chang H-W, Lu C-C, Chen Y, Chen C-M, Lee S-
C, Tsai P-W, Yang S-H, et al. 2015. Genetic variations of MUC17 are 
associated with endometriosis development and related infertility. BMC Med 
Genet 16:60. 
Yang J, Jin Z-B, Chen J, Huang X-F, Li X-M, Liang Y-B, Mao J-Y, Chen X, Zheng 
Z, Bakshi A, et al. 2017. Genetic signatures of high-altitude adaptation in 
Tibetans. Proc Natl Acad Sci USA 114:4189. 
Yerkes R. 1925. Almost Human. New York: Century 
Yu N, Jensen-Seaman MI, Chemnick L, Kidd JR, Deinard AS, Ryder O, Kidd KK, Li 
W-H. 2003. Low nucleotide diversity in chimpanzees and bonobos. Genetics 
164:1511–1518. 
Zheng Y, Wiehe T. 2019. Adaptation in structured populations and fuzzy boundaries 
between hard and soft sweeps. PLOS Comput Biol 15:e1007426. 
Zihlman AL, Bolter DR. 2015. Body composition in Pan paniscus compared with 
Homo sapiens has implications for changes during human evolution. Proc 
Natl Acad Sci USA 112:7466. 
13 0 
 
Zihlman AL, Cramer DL. 1978. Skeletal differences between pygmy (Pan paniscus) 
and common chimpanzees (Pan troglodytes). Fol Primatol 29:86–94. 
 
13 1