TEACHING PAPA TO CHA-CHA: HOW CHANGE MAGNITUDE, TEMPORAL 
CONTIGUITY, AND TASK AFFECT ALTERNATION LEARNING 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
by 
 
AMY ELIZABETH SMOLEK 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A DISSERTATION 
 
Presented to the Department of Linguistics 
and the Graduate School of the University of Oregon 
in partial fulfillment of the requirements 
for the degree of 
Doctor of Philosophy 
 
December 2019 
 
DISSERTATION APPROVAL PAGE 
Student: Amy Elizabeth Smolek 
Title: Teaching Papa to Cha-Cha: How Change Magnitude, Temporal Contiguity, and 
Task Affect Alternation Learning 
This dissertation has been accepted and approved in partial fulfillment of the 
requirements for the Doctor of Philosophy degree in the Department of Linguistics by: 
Vsevolod Kapatsinski Chairperson 
Melissa Baese-Berk Core Member 
Eric Pederson Core Member 
Kaori Idemaru Institutional Representative 
and 
Kate Mondloch  Interim Vice Provost and Dean of the Graduate School  
Original approval signatures are on file with the University of Oregon Graduate School. 
Degree awarded December 2019. 
ii  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2019 Amy Elizabeth Smolek 
  
ii i 
  
DISSERTATION ABSTRACT 
 
Amy Elizabeth Smolek 
 
Doctor of Philosophy 
 
Department of Linguistics 
 
December 2019 
 
Title: Teaching Papa to Cha-Cha: How Change Magnitude, Temporal Contiguity, and 
Task Affect Alternation Learning 
 
 
In this dissertation, we investigate how speakers produce wordforms they may not 
have heard before. Paradigm Uniformity (PU) is the cross-linguistic bias against stem 
changes, particularly large changes. We propose the Perseveration Hypothesis: Motor 
perseveration in the production system encourages copying from related wordforms. 
When this conflicts with paradigmatic associations requiring a change to the base, the 
change may be leveled, resulting in PU. Associations are more difficult to acquire when 
the forms are articulatorily dissimilar, and poorly-learned associations are a lesser 
obstacle to the perseveratory bias, which accounts for the stronger bias against large 
changes.  
Participants trained on a miniature artificial language with labial palatalization 
(pàtʃi), a large change, produce the alternation much less often than participants trained 
on alveolar (tàtʃi) or velar (kàtʃi) palatalization. The difficulty arises from articulatory, 
rather than perceptual, dissimilarity: kàtʃi and gàdʒi are learned equally well despite 
differing in perceptual similarity, and the bias against large changes is observed in 
production but not in judgment. Ratings of labial palatalization improve as much post-
training as do ratings of lingual palatalization, suggesting that participants learn what 
iv  
  
they should produce by acquiring product-oriented schemas, but are unable to acquire a 
paradigmatic labial-to-alveopalatal association necessary for producing the alternation. 
How, then, do speakers learn to produce large changes? We propose that temporal 
contiguity between related forms allows speakers to notice the relationship between 
forms, strengthening paradigmatic associations between the chunks by which the forms 
differ and syntagmatic associations within these chunks. Presenting a plural immediately 
after the corresponding singular in training leads to more production of the exemplified 
pattern, whether the mapping is faithful (e.g. pàpa) or unfaithful (e.g. kàtʃa). If only 
one type of mapping is shown in contiguity, the pattern spreads to all inputs. Only when 
both types of mappings are shown in contiguity do participants learn to match inputs to 
the correct outputs. A simple two-layer discriminative model captures the results of the 
trial order manipulations, including cue availability and “chunking.”  
In sum, our work shows that paradigmatic associations are acquired through 
syntagmatic correspondence, which enables even large changes to be produced. 
 
 
v  
  
CURRICULUM VITAE 
 
NAME OF AUTHOR:  Amy Elizabeth Smolek 
 
 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 
 University of Oregon, Eugene, OR 
 Swarthmore College, Swarthmore, PA 
 
 
DEGREES AWARDED: 
 
 Doctor of Philosophy, Linguistics, 2019, University of Oregon 
 Bachelor of Arts, Linguistics, 2011, Swarthmore College 
  
 
AREAS OF SPECIAL INTEREST: 
 
 Morphophonology 
 Learning 
 
 
PROFESSIONAL EXPERIENCE: 
 
 Graduate Employee (teaching assistant), Department of Linguistics, University of  
 Oregon, Eugene, 2011-2012, 2017 
  
 Graduate Employee (instructor), American English Institute, University of  
 Oregon, Eugene, 2012-2016 
 
 
GRANTS, AWARDS, AND HONORS: 
 
 Graduate Teaching Fellowship, Linguistics, 2011 to 2017 
  
 Phi Beta Kappa, 2011 
 
 
PUBLICATIONS: 
 
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning correspondence 
from contiguity. Manuscript submitted for publication. 
 
v i 
  
Smolek, A. & Kapatsinski, V. (2018). What happens to large changes? Saltation produces 
well-liked outputs that are hard to generate. Laboratory Phonology: Journal of the 
Association for Laboratory Phonology, 9(1), 10. 
 
vi i 
  
ACKNOWLEDGMENTS 
I have received an incredible amount of help and support from so many people over 
the process of researching and writing this thesis. Words are paltry recompense for all 
they have given me, but they’re all I’ve got.  
My most profound and everlasting thanks go to my advisor, Vsevolod Kapatsinski, 
whose breadth of knowledge is matched only by his enthusiasm in sharing it. He has 
always been available to discuss puzzling results and propose novel avenues for 
exploration. He patiently read countless drafts of this thesis, and without his comments 
and questions, it would be but a shadow of its current form. I truly cannot imagine a 
better advisor, and I am so grateful that he took me on. 
My many thanks to my committee members, Eric Pederson, Melissa Baese-Berk, and 
Kaori Idemaru, who have generously provided their expertise and feedback to help this 
manuscript become a dissertation. They have reminded me to connect this work to the 
larger context and strengthened my arguments.  
My wonderful colleagues made grad school a lot more enjoyable. Zara Harmon, Matt 
Stave, Paul Olejarczuk, Hideko Teruya, Manuel Otero, Zoe Tribur, Becky Paterson, 
Allison Taylor-Adams, Misaki Kato, Amos Teo, Kaylynn Gunter, and Ellen Gillooly-
Kress: Thanks for your suggestions, discussion, and friendship. 
I owe a great debt of gratitude to the hundreds and hundreds of undergrads who 
suffered through these frustrating experiments (“the worst thing I’ve ever been through,” 
according to one), and to the many undergrads who helped me code the data.  
My wonderful non-linguist friends have patiently listened to me alternately whine and 
rave about linguistics and assorted miscellany for years. Elizabeth Hubin, Neena 
vi ii 
  
Cherayil, Shilpa Boppana, Vy Vo, Zoe Davis, Ryan Carlson, and David Baum: You are 
so wonderful; thanks for sticking with me through it all.  
To my political representatives, Peter DeFazio, Jeff Merkley, Ron Wyden, and Kate 
Brown: Thank you for fighting for my health care. And to my doctors, Kathleen Cordes, 
Mary Sichi, Kelly Fitzpatrick, and Paula Eschtruth, who helped me beat back my illness 
and get me well enough to finally finish this thing: I literally, physically, could not have 
done it without you. 
My extended family has been incredibly supportive through my many years in grad 
school. My loving thanks to Beth and Bill Hartlerode, my Nana and Papa, who drove me 
to and from school and the lab for years; Karen and Laura Pyeatt, McKenna and Erik 
Knapp, and Tony Jesko, who have reminded me that there is life outside academia; the 
current and former Smoleks, many of whom have trod this path themselves, for their 
sympathy and encouragement; and my non-blood family – my godparents, Steve Mustoe 
and Rhonda Stoltz; my neighbor-parents, Heather Henderson and Dave Donielson; and 
my East Coast parents, Deb and Robbin Carlson: I am so grateful for all your love. 
Last but most certainly not least, I must thank my immediate family, Beckie Abbott, 
Ken Smolek, and Kevin Smolek, for their unending support and encouragement over the 
years, most especially over the course of writing this dissertation. They have sacrificed so 
much in helping me reach my goal, and this dissertation is dedicated to them.  
There were many times over the course of my graduate school tenure that I did not 
think I would be able to finish, and it is only thanks to the efforts of all of these people 
that I have made it. Their help is the greatest gift I have ever received; I can never repay 
them. 
ix  
  
 
 
 
 
 
 
 
To my mother, who made the cookies; 
my father, who paid for the cookies; 
and my brother, who let me have all the cookies: 
I couldn’t have done this without you. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
x  
  
TABLE OF CONTENTS 
Chapter Page 
 
I.   INTRODUCTION ........................................................................................... 1 
   1.1. Paradigm Uniformity .......................................................................... 2 
    1.1.1. Vowel Height in Canadian Raising ............................................ 2 
    1.1.2. Saltation ..................................................................................... 3 
    1.1.3. Explaining Paradigm Uniformity ............................................... 4 
  1.2. Perseveration ....................................................................................... 6 
   1.2.1. Where Perseveration Is Visible .................................................. 7 
    1.2.1.1. Paradigmatic Perseveration in Russian ............................. 8 
   1.2.2. How Perseveration Creates PU .................................................. 10 
  1.3. Associations ........................................................................................ 13 
   1.3.1. The Paradigm Cell Filling Problem ........................................... 14 
   1.3.2. The Difficulty in Associating Dissimilar Representations ........ 16 
  1.4. Learning Associations ......................................................................... 18 
   1.4.1. Early Proposals .......................................................................... 18 
   1.4.2. Discriminative Learning and Contiguity ................................... 20 
   1.4.3. The Neurological Underpinnings of Learning ........................... 21 
   1.4.4. Temporal Contiguity .................................................................. 22 
   1.4.5. Variation Sets ............................................................................. 25 
  1.5. Creating Novel Forms ......................................................................... 26 
  1.6. Alternative Theories ............................................................................ 27 
   1.6.1. Stem and Affix ........................................................................... 27 
x i 
  
Chapter Page 
  
   1.6.2. Storage Economy ....................................................................... 29 
   1.6.3. Perceptual Similarity .................................................................. 31 
   1.6.4. Categorization ............................................................................ 32 
   1.6.5. Perseveration Hypothesis vs. Optimality Theory ...................... 33 
  1.7. Structure of Dissertation ..................................................................... 34 
 
II.   INVESTIGATING CHANGE MAGNITUDE EXPERIMENTALLY ........... 36 
   2.1. Prior Experimental Work on Change Magnitude ............................... 36 
    2.1.1. Learning in Adults ..................................................................... 36 
   2.1.2. Learning in Infants ..................................................................... 38  
  2.2. Palatalization ....................................................................................... 40 
   2.2.1. The Typology of Palatalization .................................................. 40 
   2.2.2. Palatalization and Phonetic Naturalness .................................... 40 
    2.2.3. Learnability ................................................................................ 42 
  2.2.4. The Potential Problem With Palatalization ................................ 43 
  2.3. Experiment Review ............................................................................. 44 
   2.3.1. Experiment 1: Baseline .............................................................. 44 
   2.3.2. Experiment 2: Palatalization Before -i ....................................... 44 
   2.3.3. Experiment 3: Palatalization Before -a With Contiguity  .......... 46 
   2.3.4. Discriminative Model of Experiment 3 ..................................... 47 
   2.3.5. Comparison of Experiments 2 and 3 .......................................... 47 
 
xi i 
  
Chapter Page 
 
III.  EXPERIMENT 1: PALATALIZATION BASELINE .................................... 49 
   3.1. Methods............................................................................................... 50 
   3.1.1. Participants ................................................................................. 50 
   3.1.2. Materials .................................................................................... 50 
   3.1.3. Procedure ................................................................................... 51 
   3.1.4. Measures .................................................................................... 52 
   3.1.5. Predictions .................................................................................. 53 
  3.2. Results ................................................................................................. 54 
   3.2.1. Judgments of Palatalized Plurals ............................................... 54 
   3.2.2. Judgments of Faithful Plurals .................................................... 57 
   3.2.3. Judgments of Plurals Before -i ................................................... 58 
   3.2.4. Judgments of Plurals Before -a .................................................. 60 
   3.2.5. Judgments by Faithfulness ......................................................... 61 
  3.3. Discussion ........................................................................................... 63 
 
IV.  EXPERIMENT 2: PALATALIZATION BEFORE -i ..................................... 66 
  4.1. Predictions and Hypotheses ................................................................ 66 
  4.2. Methods .............................................................................................. 70 
   4.2.1. Languages .................................................................................. 70 
   4.2.2. Participants ................................................................................. 71 
   4.2.3. Materials .................................................................................... 71 
    4.2.3.1. Training ............................................................................. 71 
xi ii 
  
Chapter Page 
   
    4.2.3.2. Production Test ................................................................. 72 
    4.2.3.3. Judgment Test ................................................................... 72 
   4.2.4. Procedure ................................................................................... 73 
    4.2.4.1. Training ............................................................................. 73 
    4.2.4.2. Production Test ................................................................. 73 
    4.2.4.3. Judgment Test ................................................................... 74 
   4.2.5. Measures .................................................................................... 76 
  4.3. Results ................................................................................................. 77 
   4.3.1. Hypothesis 1: Labial Palatalization Is Hard to Learn Because  
   of Faithfulness, Not Markedness ......................................................... 77 
 
   4.3.2. Hypothesis 2: Large Alternations, Including Saltation, Are  
   Hard to Produce ................................................................................... 79 
 
   4.3.3. Hypothesis 3: Saltatory Alternations Are Likely to Be  
   Overgeneralized ................................................................................... 82 
    4.3.4. Hypothesis 4: Large Changes Are Hard to Produce, Even if  
    They Are Judged to Be Preferable ....................................................... 85 
 
   4.3.5. Hypothesis 5: The Bias Against Labial Palatalization Is Due to  
   Perceptual Dissimilarity ....................................................................... 91 
 
   4.3.6. Effect of Training ....................................................................... 93 
  4.4. Discussion ........................................................................................... 96 
   4.4.1. Implications for Other Theories ................................................. 97 
    4.4.1.1. Perceptual Similarity ......................................................... 97 
     4.4.1.1.1. Influence of Markedness .......................................... 100 
    4.4.1.2. Storage Economy .............................................................. 101 
    4.4.1.3. Categorization ................................................................... 101 
xi v 
  
Chapter Page 
   
    4.4.1.4. Learnability ....................................................................... 102 
   4.4.2. Limitations ................................................................................. 105 
   4.4.3. Summary .................................................................................... 106 
 
V.  EXPERIMENT 3: EFFECTS OF ADJACENCY ON LEARNABILITY OF 
PALATALIZATION BEFORE -a ................................................................... 109 
 
  5.1. Methods............................................................................................... 112 
   5.1.1. Participants ................................................................................. 112 
   5.1.2. Languages .................................................................................. 113 
    5.1.2.1. What Learners Need to Weight ......................................... 115 
   5.1.3. Materials .................................................................................... 116 
   5.1.4. Procedure ................................................................................... 116 
    5.1.4.1. Training ............................................................................. 116 
    5.1.4.2. Test .................................................................................... 117 
   5.1.5. Measures .................................................................................... 117 
    5.1.5.1. Transcription Protocol and Exclusions ............................. 117 
    5.1.5.2. Model Structure ................................................................ 117 
    5.1.5.3. Predictions ......................................................................... 118 
  5.2. Results ................................................................................................. 119 
   5.2.1. Error Patterns ............................................................................. 119  
    5.2.1.1. Consonant and Vowel Error Types ................................... 120 
    5.2.1.2. Consonant Errors .............................................................. 121 
    5.2.1.3. Vowel Errors ..................................................................... 124 
xv  
  
Chapter Page 
 
   5.2.2. Suffix Choice ............................................................................. 127 
    5.2.2.1. Suffix Frequency ............................................................... 127 
    5.2.2.2. Consonant Choice Effect .................................................. 129 
    5.2.2.3. Other Effects ..................................................................... 130 
   5.2.3. Suffix Content ............................................................................ 131 
    5.2.3.1. Trial Order Effects ............................................................ 131 
    5.2.3.2. Suffix Vowel Effects ......................................................... 132 
   5.2.4. Input Consonant ......................................................................... 133 
    5.2.4.1. Trial Order Effects on Overgeneralization of  
    Palatalization .................................................................................. 134 
  5.3. Summary ............................................................................................. 137 
 
VI.  COMPUTATIONAL MODEL: EFFECTS OF ADJACENCY ON 
LEARNABILITY OF PALATALIZATION BEFORE -a .............................. 139 
 
  6.1. Discriminative Learning ..................................................................... 139  
  6.2. Model Design ...................................................................................... 141 
   6.2.1. Relevant Cues and Outcomes .................................................... 141 
   6.2.2. Capturing Implicational Hierarchies .......................................... 141 
   6.2.3. Trial Order Effects on Cue Availability ..................................... 142 
   6.2.4. Prior Beliefs ............................................................................... 142 
   6.2.5. Linking Hypothesis .................................................................... 145 
  6.3. Modeling Results ................................................................................ 145 
   6.3.1. Suffix Vowel Choice .................................................................. 145 
xv i 
  
Chapter Page 
  
    6.3.1.1. Baseline Model Results ..................................................... 145 
    6.3.1.2. Shortcomings and Modifications ...................................... 147 
   6.3.2. Palatalization .............................................................................. 148 
    6.3.2.1. Baseline Model Results ..................................................... 148 
    6.3.2.2. Shortcomings and Modifications ...................................... 150 
     6.3.2.2.1. Perceptual Contrast and Chunking ........................... 150 
     6.3.2.2.2. Overgeneralization Asymmetries in To-Be- 
     Palatalized Consonants ............................................................ 151 
 
  6.4. General Discussion ............................................................................. 154 
   6.4.1. Implications for Learning .......................................................... 154 
    6.4.1.1. Discriminative Framework ............................................... 154 
    6.4.1.2. Saliency and Adjacency .................................................... 155 
   6.4.2. Implications for Phonological Theory ....................................... 156 
    6.4.2.1. Retreating From Overgeneralization ................................. 158 
     6.4.2.1.1. Entrenchment and Pre-Emption ............................... 158 
     6.4.2.1.2. Other Accounts of Overgeneralization .................... 160 
    6.4.2.2. Schemas ............................................................................ 161 
    6.4.2.3. Implicational Hierarchy .................................................... 161 
    6.4.2.4. Morphology Feeds Phonology .......................................... 162 
   6.4.3. Limitations ................................................................................. 163 
  6.5. Summary ............................................................................................. 165 
 
 
xv ii 
  
Chapter Page 
 
VII.  REVIEW, GENERAL DISCUSSION, AND CONCLUSIONS ..................... 168 
  7.1. Review of Results ............................................................................... 171 
   7.1.1. Experiment 1 .............................................................................. 171 
   7.1.2. Experiment 2 .............................................................................. 173 
   7.1.3. Experiment 3 and a Discriminative Model ................................ 177 
    7.1.3.1. Suffix Vowel ..................................................................... 179 
    7.1.3.2. Palatalization ..................................................................... 181 
   7.1.4. Context Naturalness and Alternation Learnability .................... 183 
    7.1.4.1. Palatalization of To-Be-Palatalized Consonants in the  
    Triggering Context ......................................................................... 185 
 
    7.1.4.2. Generalization to the “Wrong” Suffix .............................. 190 
 7.2. Theoretical Implications ........................................................................... 193 
  7.2.1. The Fate of Large Changes .............................................................. 193 
  7.2.2. The Importance of Syntagmatic Co-Occurrence ............................. 194 
  7.2.3. Chunking and Common Fate ........................................................... 195 
  7.2.4. Variation Sets ................................................................................... 196 
  7.2.5. Surprise! ........................................................................................... 197 
 7.3. Conclusion ................................................................................................ 198 
 
APPENDICES .............................................................................................................  202 
 A. EXPERIMENT 1 AND EXPERIMENT 2 JUDGMENT STIMULUS  
      LISTS ..................................................................................................................... 202 
 B. EXPERIMENT 2 STIMULUS LISTS .............................................................. 203 
xv iii 
  
Chapter Page 
 
 C. EXPERIMENT 3 STIMULUS LISTS .............................................................. 205 
 
REFERENCES CITED ................................................................................................ 206 
xi x 
  
LIST OF FIGURES 
 
Figure Page 
 
1.1. A language with phonologized labial palatalization ....................................... 13 
 
3.1. Example display for stimulus pair with labial palatalization .......................... 51 
 
3.2. Acceptance of palatalized plurals by place of articulation and suffix ............  55 
 
3.3. Acceptance of palatalized plurals by place of articulation and voicing .......... 57 
 
3.4. Acceptance of faithful plurals by place of articulation and suffix .................. 58 
 
3.5. Acceptance of plurals before -i by place of articulation and faithfulness ....... 59 
 
3.6. Acceptance of plurals before -a by place of articulation and faithfulness ...... 61 
 
3.7. Acceptance of plurals by place of articulation and whether the plural was  
       faithful to the singular ..................................................................................... 63 
 
4.1. Distribution of ratings by training condition and place of articulation ........... 76 
4.2. Judgments of faithful plurals .......................................................................... 78 
4.3. Palatalization rates before -i in production ..................................................... 80 
4.4. Palatalization rates of Not-To-Be-Palatalized consonants .............................. 81 
 
4.5. Overgeneralization of palatalization depending on magnitude ...................... 83 
4.6. Acceptance of overgeneralization of palatalization to velars ......................... 84 
4.7. Acceptance of overgeneralization of palatalization to alveolars .................... 85 
4.8. Acceptance of palatalization in judgment ....................................................... 87 
4.9. Judgments of To-Be-Palatalized plurals by faithfulness before -i .................. 88 
4.10. Comparison of rate of palatalization in production to acceptance of  
         palatalized plurals in judgment by training language ................................... 89 
 
4.11. Judgments of Not-To-Be-Palatalized plurals by faithfulness before -i ........ 91 
4.12. Rates of palatalization in production and acceptance in judgment of  
         velars before -i .............................................................................................. 92 
xx  
  
Figure Page 
 
4.13. Acceptance of correct palatalized plurals before -i ....................................... 95 
5.1. Percentage of plural productions without mistakes ........................................ 120 
5.2. Plural productions containing consonant errors .............................................. 122 
 
5.3. Plural productions containing vowel errors .................................................... 125 
5.4. Suffix choice probabilities across trial order conditions ................................. 128 
5.5. Conditional inference tree of the factors that influence suffix vowel  
       choice .............................................................................................................. 130 
 
5.6. Palatalization rates across conditions in the appropriate context. ................... 132 
5.7. Conditional inference tree of the effects of vowel suffix and trial order on  
       probability of palatalization ............................................................................ 133 
 
5.8. An overview of the probability of palatalization before -a ............................. 135 
6.1. Expected vs. observed vowel choice probabilities ......................................... 146 
6.2. Expected vs. observed palatalization probabilities ......................................... 149 
6.3. Expected vs. observed palatalization probabilities after model  
       adjustments ..................................................................................................... 154 
 
7.1. Palatalization rates in production before -i across training languages ............ 173 
 
7.2. Judgments of faithful mappings before -i across training languages .............. 175 
 
7.3. Judgments of palatalization before -i across training languages .................... 175 
7.4. Percent of plurals suffixed with -a by trial order and To-Be-Palatalized ....... 179 
 
7.5. Percent of plurals suffixed with -a by trial order and training language ........ 180 
 
7.6. Palatalization probability by trial order and suffix vowel .............................. 181 
7.7. Production of palatalization before -a by trial order and  
       To-Be-Palatalized ........................................................................................... 182 
 
7.8. Palatalization of To-Be-Palatalized consonants by trial order and training  
       language .......................................................................................................... 183 
xx i 
  
 
Figure Page 
 
7.9. Palatalization of To-Be-Palatalized consonant in palatalization-triggering  
       context by training language and experiment ................................................. 186 
 
7.10. Proportion of plurals of To-Be-Palatalized consonants that were  
         palatalized ..................................................................................................... 188 
 
7.11. Proportion of plurals that followed patterns included in training ................. 189 
7.12. Palatalization of To-Be-Palatalized consonants by language, plural vowel,  
         experiment ..................................................................................................... 191 
 
7.13. Proportion of palatalized plurals suffixed with correct vowel ...................... 193 
 
 
 
 
 
 
 
 
 
 
 
 
xx ii 
  
LIST OF TABLES 
 
Table Page 
 
3.1. Generalized linear effects model output for acceptance of palatalized  
       plural-singular pairs by stem place of articulation, suffix vowel, voicing,  
       and interactions ............................................................................................... 56 
 
3.2. Generalized linear mixed effects model output for acceptance of singular- 
       plural pairs suffixed with -i by whether the plural was palatalized, stem-final  
       consonant place of articulation, voicing, and the interaction between 
       palatalization and place of articulation ........................................................... 60 
 
3.3. Generalized linear mixed effects model output for acceptance of singular- 
       plural pairs suffixed with -a by whether the plural was palatalized, stem- 
       final consonant place of articulation, voicing, and the interaction between  
       palatalization and place of articulation ........................................................... 61 
 
4.1. Labial, Alveolar, and Velar Palatalization patterns in Experiment 2 ............. 70 
 
4.2. Judgments of incorrect faithful mappings for To-Be-Palatalized consonants  
       across training conditions ............................................................................... 78 
 
4.3. Judgments of correct faithful mappings for Not-To-Be-Palatalized  
       consonants across training conditions ............................................................. 78 
 
4.4. The effect of Training Language on (erroneous) retention rates of   
       To-Be-Palatalized consonants in production before -i .................................... 81 
 
4.5. Overgeneralization of palatalization from alveolars to labials and labials to   
       alveolars .......................................................................................................... 82 
 
4.6. Overgeneralization of palatalization from velars to labials and labials to  
       velars ............................................................................................................... 82 
 
4.7. The effects of training on Labial vs. Alveolar and Velar Palatalization on      
       judgment vs. production of palatalized forms before -i .................................. 87 
 
4.8. The effects of training on Labial vs. Lingual palatalization on correct vs.  
       erroneous palatalization and judgments of correct vs. erroneous  
       palatalization ................................................................................................... 89 
 
5.1. Labial and Velar Palatalization patterns presented to participants in 
       Experiment 3 ................................................................................................... 114 
 
 
xx iii 
  
Table Page 
 
5.2. Trial selection “blocks” by trial order and To-Be-Palatalized status of  
       stem-final consonant ....................................................................................... 117 
 
5.3. Generalized linear mixed-effects model output for suffix choice by trial  
       adjacency and To-Be-Palatalized .................................................................... 129 
 
5.4. The influence of trial order on palatalization rates ......................................... 136 
 
6.1. Expected / observed production probabilities for palatalizing suffix -a ......... 146 
 
6.2. Expected / observed palatalization rates before -a, unmodified model .......... 149 
 
6.3. Expected / observed palatalization rates before -i, unmodified model ........... 149 
 
6.4. Expected / observed palatalization rates before -a, modified model .............. 153 
 
6.5. Expected / observed palatalization rates before -i, modified model ............... 153 
 
 
 
 
xx iv 
  
CHAPTER I 
INTRODUCTION 
Portions of this chapter were taken from: 
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning 
correspondence from contiguity. Manuscript submitted for publication. 
Smolek, A. & Kapatsinski, V. (2018). What happens to large changes? Saltation 
produces well-liked outputs that are hard to generate. Laboratory Phonology: Journal of 
the Association for Laboratory Phonology, 9(1), 10. 
The acquisition of morphology includes the process of acquiring paradigm mappings, 
or associations between related forms (Kapatsinski, 2018b; J. P. Blevins, 2013). 
Everyday language use requires speakers to solve the Paradigm Cell Filling Problem 
(Ackerman et al., 2009): They need to be able to produce forms they may not have heard 
before (J. P. Blevins et al., 2017; Bonami & Beniamine, 2016; Hockett, 1967; Malouf, 
2017). Diverse languages show that it is impossible for adult speakers to be exposed to 
every form of a known word; only 0.1% of Czech nouns are present in every inflected 
form in a 100 million word corpus (Malouf, 2017), and increasing the size of the corpus 
does not improve and may even exacerbate the problem (J. P. Blevins et al., 2017). The 
result is that speakers must generate novel forms of known words, and how that may 
happen is the topic of this dissertation. 
Paradigm Uniformity (PU) is the leveling of changes across paradigmatic cells, in 
order to regularize the related forms of a word. The cause of PU is subject to discussion, 
but we propose the Perseveration Hypothesis: that weak associations are insufficient to 
override the perseveratory tendency in production. In order to prevent PU, there needs to 
1  
  
be some strategy for strengthening associations. We propose that contiguity is a vital 
contributor to association strength1, and that without contiguity in language learning, 
there would be much more leveling of paradigms. 
1.1. Paradigm Uniformity 
Paradigm Uniformity is the force that militates against multiple forms of the same 
stem (Benua, 1997; Kenstowicz, 1996; Steriade, 2000). Especially undesirable are 
phonologically dissimilar allomorphs of the stem (Skoruppa et al., 2011; White, 2014). 
Learning of phonological patterns is biased against dissimilar sound alternations (Hayes 
& White, 2015; Moreton & Pater, 2012a; Peperkamp et al., 2006; Steriade, 2001/2009; 
White, 2013, 2014, 2017). Dissimilarity can be operationalized as an alternation 
“skipping over” another sound, also called saltation (Peperkamp et al., 2006; Skoruppa et 
al., 2011; White, 2013, 2014), or defined in terms of phonological features, articulatory 
gestures, or perceptual dimensions. 
1.1.1. Vowel height in Canadian Raising 
PU can be observed when productive phonological processes seem to fail to apply 
when their application would violate PU, or when they seem to overapply in order to 
increase PU. For example, in Canadian Raising (Joos, 1942), /aɪ/ raises to [ᴧɪ] before 
voiceless consonants, hence [bᴧɪt] but [baɪd]. However, the alternation overapplies before 
voiced flaps that correspond to voiceless stops in other forms of the same stem, and as a 
result, the paradigm is more uniform: All forms share the same vowel height, as shown in 
(1). The overapplication of the alternation is a form of perseveration, where an element of 
the base form (e.g. [ᴧɪ]) is retained in the production of a related form. 
                                                
1 Which may be implemented via variation sets (Küntay & Slobin, 1996; Onnis et al., 2008), and in 
addition to consolidation (Davis & Gaskell, 2009; Kumaran et al., 2016; Lewis & Durrant, 2011; 
McClelland et al., 1995; §6.4.3). 
2  
  
1. bᴧɪt ‘bite’  
baɪd ‘bide’  
bᴧɪɾɪŋ ‘biting’  
baɪɾɪŋ ‘biding’ 
1.1.2. Saltation 
There is a strong typological tendency against saltation, which is a type of large 
change. If a language contains X, Y, and Z sounds, such that Y is between X and Z in 
phonetic similarity space, then XàZ implies YàZ, but not the opposite (shown 
experimentally by White, 2013, 2014; White & Sundara, 2014). For example, in White 
(2013, 2014), English-speaking adults exposed to pàv prefer fàv over fàf, but those 
exposed to bàv prefer fàf. The extension of an alternation to intermediate sounds 
makes sense synchronically, but less so diachronically; the rarity of saltation is implied to 
be due to overgeneralization of alternations to intermediate sounds, but Bybee (2008) 
argues that alternations lose productivity as they require greater degree of change (i.e. 
they become more likely to be replaced with faithful mappings). White (2013) discusses 
Crosswhite (2000) for diachronic vowel saltation in Russian, who shows that it is in the 
process of losing productivity, rather than being overextended. Most errors by children 
are caused by underapplying stem changes rather than overgeneralizing alternations (Do, 
2013; Kerkhoff, 2007; Krajewski et al., 2011; Tomas et al., 2017), the latter of which is 
rare in adults as well (Bolognesi, 1998; White, 2017). Since error can seed language 
change (Andersen, 1973; Bybee, 2010; Bybee & Slobin, 1982; Harmon & Kapatsinski, 
2017; Hudson Kam & Newport, 2009), saltation is likely to disappear through 
underapplication rather than extend through overapplication. 
 
 
3  
  
1.1.3. Explaining paradigm uniformity 
Current accounts of PU are couched within the framework of Optimality Theory 
(Prince & Smolensky, 1993/2004; Benua, 1997; Burzio, 1996; Kenstowicz, 1996; 
McCarthy, 1998). Pre-OT theorists noted the importance of PU (Pinker & Prince, 1988), 
but claimed that its origin was morphological, not phonological. If Word = Stem + Affix, 
then the stem must be preserved as a consequence of the existence of morphology. 
However, not all aspects of the stem are likely to be retained in the output. In the 
Canadian Raising example in (1), the height of the vowel is retained, but the /t/ is not. 
Phonological theories of PU, as captured through OT, capture this asymmetry by 
describing retention of the vowel and retention of the consonant by distinct constraints, 
which can have differing weights.  
Base identity constraints (Kenstowicz, 1996), also called output-output (OO) 
faithfulness constraints (Benua, 1997; Hayes, 2004; Kager, 1999; McCarthy, 1998), favor 
preserving particular features of the derivational/inflectional base. For the Canadian 
Raising example in (1), a high ranking of the IdentBA-[low] constraint would force biting 
to retain the raised [ᴧɪ] of bite. Base identity constraints are posited to be universal and at 
the top of the constraint hierarchy unless demoted by learning (Hayes, 2004; McCarthy, 
1998). The initial high ranking is consistent with observations that children often level 
stem changes that adults in the same community reliably produce; Kerkhoff (2007) shows 
that the Dutch voicing alternation is less productive for Dutch children than adults, 
despite its high degree of regularity and phonetic naturalness, and Do (2013) shows that 
Korean children avoid producing stem changes using a variety of repair strategies. 
4  
  
Claiming that base identity constraints are innate accounts for their universality, but 
does not explain where they came from in the first place. What adaptive advantage do 
they give? The other major kinds of constraints, markedness, faithfulness, and alignment, 
are motivated by ease of articulation, the need to maintain lexical contrast, and the need 
for temporal alignment of independent structures, respectively (Kager, 1999; Prince & 
Smolensky, 1993/2004). PU is an independent force that shapes phonological patterns, 
and is separate from markedness of the output and from faithfulness to an underlying 
form, because some patterns can only be explained by PU (Benua, 1997; Kenstowicz, 
1996), which motivates their presence in the phonological grammar. There is no 
universally accepted origin for PU constraints, with alternatives proposed by Kenstowicz 
(1996, 1998, §1.6.2), Steriade (2008) and White (2017, §1.6.3), and Moreton & Pater 
(2012a, §1.6.4). 
We propose the Perseveration Hypothesis, summarized below in (2). 
2. The Perseveration Hypothesis: The tendency to avoid stem changes is grounded 
in motor perseveration during the process of generating a novel form of a known 
word. When the form cannot be retrieved from memory quickly enough, 
activation cascades over related forms, and their activated motor representations 
are incorporated into the production plan under construction.  
 
We accept the existence of a PU bias, but do not think it originates from an innate 
universal grammar or constraint inventory. Instead, it is grounded in two sources: 
1) Paradigmatic motor perseveration (§1.2), and 
2) The difficulty of learning associations between dissimilar alternants (§1.3). 
It is the competing tendencies between paradigmatic perseveration and paradigmatic 
associations that lead to the patterns in paradigm uniformity that we find. In §1.2, we 
discuss prior work on perseveration. In §1.3 and §1.4, we explore the learning of 
5  
  
associations. §1.5 describes our view of the process that speakers undergo when 
attempting to construct a novel form of a known word, and §1.6 compares our theory to 
other work. In §1.7, we lay out the structure of the dissertation. 
1.2. Perseveration 
Perseveration on the base is usually beneficial. Even though we most often notice it 
when it is a mistake, most to-be-produced forms retain at least most of aspects of the 
base, and full suppletion is rare. Otherwise, there could be no “morphology” (Pinker & 
Prince, 1988). Perseveration on aspects of recently-produced forms that do not share a 
stem with the target form (in other words, mistaken perseveration) has been noted in 
elicited production tests and natural language. Bickel et al. (2007) show that speakers of 
Chintang produce prefix orderings that mirror those of a previous utterance when the 
following utterance is syntactically similar; Caballero (2010) demonstrates that certain 
morphemes are produced in non-scopal order if they had just been produced in the same 
order in a context where the order was scopal; and Lobben (1991) shows that Hausa 
plurals produced in seeming violation of the rules of the language can be explained by 
referencing the form produced immediately beforehand, which is usually formally similar 
to the anomalous form. 
While perseveration may be encouraged in elicited production, the same process still 
applies any time a speaker produces a derivative of a known word, whereby known forms 
of the word are activated, providing the speaker with gestures on which to perseverate. 
Perseveration on morphological elements is always in the background, racing against 
lexical retrieval (Baayen et al., 1997), but it is usually outpaced; thus, paradigmatic 
perseveration is most obvious when a novel form of a known word is to be produced. 
6  
  
1.2.1. Where perseveration is visible 
Paradigm Uniformity is visible when productive phonological processes fail to apply 
when they should (because it would violate PU), or apply when they should not (to 
increase PU; see Benua, 1997, for a review, and Raffelsiefen, 2005, for a critical 
perspective). One avenue for exploration of these processes is through “wug tests” 
(Berko, 1958), where participants are given a form of a word and asked to produce 
another form using the given as a base. In the original study, participants were shown a 
picture of an unfamiliar creature and told, “This is a wug.” They were then shown a 
picture with multiple creatures and prompted with, “Now there are two of them, there are 
two…”. For productive morphemes like the English plural -s, young children are able to 
generate the correct plural wugs and often able to select the correct allomorph of /z/ 
(wug[z] from wug, fep[s] from fep, and gutch[əz] from gutch).  
The aim of the study of morphology is to explain real language, not the elicited 
production task, but we think the latter is a good approximation of the former. The 
primary difference is that in the lab, a single form is activated, whereas in natural 
language multiple forms can be activated and compete simultaneously2 (cf. Albright, 
2008), which can lead to multiple inheritance/multiple motivation, where the output 
retains aspects of multiple related forms (Goldberg, 1995; Umbreit, 2011).  
Novel wordforms are the result of a blending process, whereby the production plan is 
constructed by blending the base form’s production representation with the schema 
associated with the meaning to be conveyed (e.g. ...z#~PLURAL for the English plural). 
                                                
2 Rich-get-richer dynamics do favor re-use of previously-used forms and paradigmatic mappings between 
forms (Martin, 2007; Zipf, 1949), so a single wordform may often have a dominant influence on the 
constructed output because it is more frequent than competitors and/or more predictive of characteristics of 
the output (Albright, 2008). However, activation spreads to multiple words in parallel during lexical 
retrieval (Dell, 1986; Roelofs, 1992), offering them the opportunity to influence the output. 
7  
  
When a morphologically-related word is activated more (or earlier) than the target, 
aspects of its form can be retained and erroneously copied into the production plan under 
construction. Children show motor perseveration across domains (Dell et al., 1997; Smith 
et al., 1999) and are slower to retrieve and plan the target. Thus the morphological 
relative is often partially activated before the target is fully accessed, which allows 
aspects of the relative’s form to influence production of the target, leveling stem changes 
(Do, 2013; Kerkhoff, 2007). Allowing a recently-activated form to drive production may 
help children learn language by encouraging imitation. Children will often repeat words 
used by the interlocutor, which can lead to pronoun confusion, e.g. in using “you” instead 
of “I” in response to “Do you want to eat?” (Clark, 1974, 1977; Rubino & Pine, 1998). 
Except for the rare pronoun confusion and stem leveling, perseveration in child speech is 
often functional, in that it allows the child to reproduce structures that would be too 
complex for them to produce compositionally on their own (Clark, 1977; Farrar, 1992; 
Rubino & Pine, 1998). This imitation of the interlocutor is akin to perseverating on 
aspects of the given or recently produced form in wug tests, in that in both cases, 
speakers incorporate a form activated by external factors (rather than internal, top-down 
input) into the production plan when the target is difficult to plan or retrieve, which 
suggests that the same mechanism lies behind both processes. 
1.2.1.1. Paradigmatic perseveration in Russian 
Paradigmatic perseveration seems inevitable in the context of elicited production; 
how could a speaker not perseverate on the form provided when there is nothing else 
available for memory retrieval? But its existence in natural language requires empirical 
support that the base form can become activated, down to the level of production units, 
8  
  
before the target form is produced, and then those activated units must be incorporated 
into the production plan under construction. In contrast to syntagmatic perseveration, in 
(3), paradigmatic perseveration has rarely been subject to scholarly attention (though see 
Kapatsinski, 2010).  
3. Was there, like, an expl[oʊ]sion of p[oʊ]p? 
 
Bez-initial adjectives (bez- meaning ‘-less’) in Russian, like those in (4) and (5), 
provide evidence for paradigmatic perseveration in natural language. These adjectives 
always have a corresponding prepositional phrase, but often lack a bez-less adjective 
counterpart. Bez-initial adjectives are predictable if we assume that they are created from 
prepositional phrases via an adjectivizing schema like [N]nyi#~ADJ.MASC.SG.NOM, 
but not if we assume they are created from bez-less adjectives, which frequently do not 
exist.  
4. bezkrylyj  bez kryl’jev  *krylyj  
‘wingless’   ‘without wings’ ‘winged’ 
 
5. bezsmyslennyj  bez smysla  *smyslennyj   
‘pointless’  ‘without a point’ ‘having a point’ 
The preposition bez and the prefix bez- are both underlyingly /z/, which is [z] before 
vowels and voiced consonants and [s] before voiceless consonants. The preposition has a 
constant spelling, corresponding to the underlying /z/ (written with the Cyrillic equivalent 
of <z>). The prefix, like all /z/-final prefixes in Russian, should be spelled the way it is 
pronounced, namely <s> before voiceless consonants and <z> elsewhere. Kapatsinski 
(2010) shows that Russian speakers struggle to correctly spell unfamiliar/novel bez-initial 
adjectives, where their errors (like bezkreditnyi ‘creditless’) stem from spelling the prefix 
like the corresponding prepositional phrase (bez kredita ‘without credit’). The error rates 
9  
  
for familiar bez-adjectives, whose orthographic representations they can retrieve from 
memory, are very low. They do not make the same spelling mistakes with /z/-final 
prefixes that do not have a differently-spelled preposition (like iz-, roz-, voz-), and the 
verbal prefix iz- is particularly informative as to why. 
Iz- ‘out’ has a synonymous preposition iz ‘out of’, but iz-initial words do not have 
corresponding prepositional phrases; for example, izpisatj ‘to cover with writing’ comes 
from pisatj ‘to write’, not from a prepositional phrase. This means that it is not the 
preposition bez that interferes with bez-, but rather the prepositional phrase, the 
derivational base, that interferes. This is direct evidence for paradigmatic perseveration: 
A base form (in this case, a prepositional phrase) is activated more and/or earlier than the 
target (in this case, a rare or unfamiliar bez-initial adjective), which allows some elements 
of its form to be perseverated on in the output. 
1.2.2. How perseveration creates PU 
Perseveration can become conventionalized (i.e. phonologized), like other processing 
pressures, thereby becoming part of the grammar. Conventionalized perseveration is 
copying, where certain parts of the input are conventionally copied into the output under 
construction. Following faithfulness constraints (Prince & Smolensky, 1993/2004), the 
strength of the pressure to copy can vary across submorphological structures and is 
acquired through language experience. In our proposal, the target structures are positions 
within prosodic templates rather than features or gestures (e.g. CopyFIN mandates copying 
of the final element of the stem, regardless of its identity). The reasons are twofold: 
1) Changes do not target all instances of a unit within a base, but rather only instances 
in certain positions (Kapatsinski, 2013; Kapatsinski, 2017a). For example, in Korean 
10  
  
verbs, the voiceless obstruents [p;t;k] are voiced before vowel-initial suffixes, thus /tat-a/ 
‘close-INTERROGATIVE’ surfaces as tada (Do, 2013, 2018). Only the stem-final stop 
undergoes the change: The initial /t/ should be retained, even though it is also followed 
by /a/. 
2) Learners need to be able to generalize from experienced elements in a certain 
position to novel elements in the same position (Kapatsinski, 2017a). The English regular 
past tense mandates copying over the entire root, and Berko (1958) shows that children 
and adults are able to apply the rule to stems they have not previously encountered. 
Without the ability to extract copy generalizations, learners would be prone to producing 
garbled foms like membled from mail (Pinker & Prince, 1988; Rumelhart & McClelland, 
1986). 
Copying is conditioned by two factors, the meaning to be conveyed (e.g. PLURAL) 
and the phonological characteristics of the input (e.g. word-final [p]). For example, in 
hypothetical language A, CopyFIN is activated when the speaker intends to express the 
plural meaning and the input ends in [p], whereas in hypothetical language B, CopyFIN is 
inhibited in the same context and allows the speaker to replace the final [p] with a 
different segment. Speakers of Southern Bantu, where labial palatalization is productive 
(pàtʃ/__w; Braver & Bennett, 2015; Ohala, 1978), must have learned an inhibitory 
association from word-final [p] to CopyFIN, like that of language B, presumably on the 
basis of observing many cases where [p] is not preserved when a form with a certain 
meaning is produced.  
11  
  
Figure 1.1 shows a partial model of a language with labial palatalization3, like 
language B and Southern Bantu, with input nodes representing [blaɪp] and output nodes 
that are activated or inhibited when the network is asked to express the PLURAL 
meaning. The state of knowledge represented in Figure 1.1 can be achieved by the 
network experiencing singular and plural forms and, on a minority of occasions, recalling 
the singular when the plural is experienced, and trying to predict the plural from the 
recalled singular and punishing (downweighting) the connections that lead to incorrect 
predictions (Kapatsinski, 2018a). Exposure to enough unfaithful mappings between cells 
in a morphological paradigm can cause the learner to prefer not to retain units that are 
frequently changed or removed when producing the novel form from the known. The 
ability to learn a preference for anti-faithfulness (non-retention) is useful for learning 
stem changes, subtractive patterns (like affix stripping), and “morphological toggles” 
(Alderete, 2001; Kurisu, 2001). Learners must be able to derive all the forms of a 
paradigm, including the base: For example, sometimes the plural is retrieved before the 
singular (Biedermann et al., 2013), and producing the singular requires removing the 
plural suffix. Kurisu (2001) discusses a range of languages that employ grammatical 
subtraction, including Koasati, which produces plurals through rime deletion (p. 83-84), 
and Icelandic, which creates deverbal nouns by deleting the final vowel of the infinitive 
(p. 111). The variety of languages and contexts in which subtraction applies suggests it is 
a fairly common process and that speakers must therefore be capable of learning anti-
faithfulness. 
 
                                                
3 The full network (see Kapatsinski, 2018a) includes nodes for all phonological units of the language and 
all meanings corresponding to cells in a morphological paradigm, but for our purposes, the simplified 
version suffices. 
12  
  
 
Figure 1.1. A language with phonologized labial palatalization. Width of the line shows 
connection strength. The dashed line shows an inhibitory connection. Word-initial onsets 
are always copied into the output, as are the vowel nuclei that follow them. However, a 
final [p] is not copied into the plural, being replaced with [tʃi]a. Every feature of the input 
excites CopyInit and CopyN1 because initial onsets and following vowels are always 
copied between paradigm cells in this language. A final [p] strongly activates output [tʃi] 
and inhibits copying of the consonant in the final position (itself). The plural meaning is 
strongly associated with the plural suffix -i (present whenever the plural meaning is 
present and not otherwise) and is more weakly associated with a preceding [tʃ] (which is 
overrepresented in plurals). Copying of onsets and first-syllable nuclei is associated with 
plurality as it is with other input features, while copying of the final consonant is 
associated more weakly since it does not always happen in the plural.  
a This templatic coding scheme is sufficient for the languages presented to learners in the present thesis, 
where stems are monosyllabic, but would have to be extended for real languages. 
 
Paradigm uniformity is generally seen as the tendency to preserve the stem of a form. 
Regarding cases like Polish noun case (Krajewski et al., 2011), where there is no true 
“stem” and generation of morphological relatives therefore requires affix switching rather 
than addition, we must address how the Perseveration Hypothesis restricts perseveration 
to the base and not afffixal elements. In generating a related form of a morphologically 
complex word, only the meaning of the stem is compatible with the meaning of the target 
word; the meaning of the affix is not, and therefore the articulatory gestures 
corresponding to the affix will be inhibited, making them unlikely to be incorporated into 
the new form. 
1.3. Associations 
As mentioned above, copying is often the right thing to do (e.g. wug~wugs). For it to 
be an error, there must be some other generalization that requires a change to the base 
that perseveration violates. A particularly important class of change-demanding 
13  
  
generalizations for our work is arbitrary paradigmatic mappings, also called paradigmatic 
associations (Ervin, 1961), the latter of which could be considered to be the cognitive 
representations of the former. Paradigmatic mappings are controversial in usage-based 
and constructionist accounts (Bybee 1985, 2001; Goldberg, 2002; Kapatsinski, 2013), but 
they seem to be necessary in morphology (perhaps only in morphology, Kapatsinski, 
2018a, 2018b) because the shape of the to-be-produced form can depend on what other 
forms of the same word are like, as for Genitive Plural production in Russian in (6)-(7). 
6. trop   tropov  
‘trope’  ‘tropes.GEN’ 
 
7. tropa   trop  
‘path’  ‘paths.GEN’ 
The shape of the Genitive Plural is determined by whether the base (Nominative 
Singular) ends in -a, a suffix which is absent from the Genitive Plural form itself. For 
other examples, see Booij (2010), Gouskova & Becker (2014), Kapatsinski (2017b), 
Nesset (2008), and Pierrehumbert (2006).  
When a paradigmatic association requires a change to the base (like in 
electri[k]~electri[s]ity), it conflicts with paradigmatic perseveration and obeying the 
latter results in an error. 
1.3.1. The Paradigm Cell Filling Problem 
Some theorists argue that the Paradigm Cell Filling Problem can be solved without 
referring to other forms of the word (Malouf, 2017; Thymé, 1993; Thymé et al., 1994). 
Malouf (2017) achieves impressive performance on a range of morphological systems by 
using the lexico-semantic features to be expressed and the preceding phonological cues as 
input and generating the target incrementally left-to-right. However, we think a general 
solution to the PCFP needs to incorporate paradigmatic mappings (Albright & Hayes, 
14  
  
2003; Bonami & Beniamine, 2016; Booij, 2010; Kapatsinski, 2010, 2018a, 2018b; 
Nesset, 2008; Plunkett & Juola, 1999; Rumelhart & McClelland, 1986; Westermann & 
Ruh, 2012), since the phonological features of other forms of the word are often the most 
informative cue to aspects of the target form. Without these paradigmatic mappings, 
learners can be misled by phonological neutralizations within target forms when they 
make the preceding phonological context uninformative (Becker & Gouskova, 2016; 
Gouskova & Becker, 2013). 
Masculine diminutives in Russian serve as an illustrative example. They are formed 
using a set of suffxes including -ik and -ok, where -ik is favored by non-velars and -ok by 
velars. Luk ‘onion’ selects -ok, and lutʃ ‘beam’ selects -ik, but velars are palatalized 
before -ok, so lukàlutʃ before -ok, neutralizing the contrast; the diminutive forms of 
‘onion’ and ‘beam’ are lutʃok and lutʃik, respectively. The neutralization of the consonant 
contrast means that there is nothing in the diminutive form to predict whether -ik or -ok 
should be chosen. When trying to generate ‘little onion’, a left-to-right process like that 
of Malouf (2017) could change [k] into [tʃ], because [k] is underattested in the 
postvocalic position in diminutives, but would be unable to determine whether to 
continue with [o] or [i], since both occur in that context.4 In order to know what the final 
vowel of the diminutive should be, there needs to be some way of referencing the 
consonant of the non-diminutive5. The diminutive and non-diminutive are in an 
                                                
4 The model could be augmented by including inflectional class in addition to the morphosyntactic features 
used for specifying the paradigm cell; this would, however, grant the model prior knowledge of the 
structure of the lexicon that may not be warranted, if the goal is to approximate a novice learner. 
 
5 In rule-based models, the neutralization before -ok is taken to mean that the suffix must be chosen first 
and then trigger the consonant change. However, as long as the final consonant of the non-diminutive base 
remains accessible to cue the choice of the suffix after the stem is changed, then either order is acceptable. 
The crucial part is the availability of the paradigmatically related non-diminutive base when the suffix is 
being chosen. 
15  
  
asymmetrical implicative relationship: The form of the non-diminutive predicts the form 
of the diminutive, but not vice versa. 
Paradigmatic mappings from [k] to [tʃok] and [tʃ] to [tʃik] are active for speakers: 
Russian adults correctly affix novel/unfamiliar nouns ending in [k] with -ok and almost 
categorically palatalize (and almost never palatalize before -ik), and nouns ending with 
[tʃ] are suffixed with -ik6. For other productive paradigmatic mappings, see Becker & 
Gouskova (2016), Gouskova & Becker (2013), Krajewski et al (2011), and Pierrehumbert 
(2006).  
Cross-lingustic research shows the ubiquity of implicative relationships in 
morphological grammars (Ackerman et al., 2009; Ackerman & Malouf, 2013; J. P. 
Blevins, 2013; Bonami & Beniamine, 2016; Bonami & Strnadová, 2019; Finkel & 
Stump, 2007; Sims & Parkers, 2016; Stump & Finkel, 2013). Not all of these 
relationships are necessarily productive, but it is difficult to believe that none of them are. 
While one could once question the productivity of paradigmatic mappings, and by 
extension their existence in the mental grammar (Bybee, 2001; Kapatsinski, 2013; 
Ramscar et al., 2013), the evidence now seems sufficient for a consensus of their 
psychological reality (Ramscar et al. 2010, p. 914, 2013, p. 782 vs. J. P. Blevins et al., 
2017; Kapatsinski, 2013 vs. 2017b, 2018b). 
1.3.2. The difficulty in associating dissimilar representations 
Thus far, the Perseveration Hypothesis does not predict a bias against large changes. 
Speakers perseverate on the input, but it is not yet clear why, for example, p~tʃ in 
                                                
 
6 There are some semantic influences on the choice of suffix, too (Magomedova, 2017), but they do not 
override paradigmatic phonology (see also Ramscar, 2002, for English). 
 
16  
  
bup~butʃi is harder to learn than k~tʃ in buk~butʃi, since in both cases, CopyFIN must be 
overriden, yet there is extensive evidence that large changes are disliked more than small 
(Skoruppa et al., 2011; Stave et al., 2013; White, 2014). We assume that the base forms 
that provide material for the production of novel forms are production representations; 
following Articulatory Phonology, we assume they are composed of articulatory 
gestures/target constrictions of the vocal tract (Browman & Goldstein, 1989). 
Paradigmatic associations, then, require learning an association between two gestures: 
Learning to change a base gesture X into another gesture Y requires learning a 
paradigmatic association between X and Y, such that activating X also activates Y and 
allows X to be changed into Y (Ervin, 1961; Rumelhart & McClelland, 1986, et seq.). 
The associative learning literature suggests that acquiring the XàY association is easier 
when X is similar to Y (Rescorla & Furrow, 1977; Rescorla, 1986; regarding 
phonotactics, Moreton, 2008, 2012; Warker & Dell, 2006; Warker et al., 2008).  
A plausible mechanism for why similarity should matter for associability comes from 
Kapatsinski (2011), who argues that learning an association requires modifying the 
synaptic connections between associated representations. If the representations are very 
different, they may be stored in different parts of the cortex and be separated by more 
and/or weaker synaptic connections. More modification to the synapses is necessary to 
form the association, and may require more training. This is consistent with the motor 
sequence learning literature, where learning an association between X and Y appears to 
involve changing the behavior of the cortical and subcortical areas separating X and Y 
(Hluštík et al., 2004).  
17  
  
In the present work, we extend the finding that associating dissimilar elements 
requires more connections from syntagmatic associations between perceptual 
representations to paradigmatic associations between production representations. The 
crucial prediction is that associations between dissimilar gestures will be harder to learn 
than associations between similar gestures, which makes large changes particularly 
difficult to learn and perform. 
1.4. Learning associations 
Given the importance of paradigmatic associations to our account, we must consider 
how they are learned. Learning paradigmatic associations seems very challenging, and 
has proven difficult to observe in the lab (Braine et al., 1990; Brooks et al., 1993; Frigo & 
McDonald, 1998; McNeill 1963, 1966, though see Seyfarth et al., 2014, and Williams, 
2003, for successful examples). In natural language, the acquisition of paradigmatic 
mappings develops into adulthood. Many Polish adults use only some of the factors 
conditioning suffix choice in extant vocabulary (Dąbrowska & Sczerbinski, 2006), and 
Polish children productively use frequent paradigms/cells but struggle with less frequent 
or reliable mappings (Krajewski et al., 2011), with a similar pattern present for Korean 
children (Do, 2013). The protracted development in acquiring paradigmatic mappings 
(Dąbrowska & Sczerbinski, 2006; Do, 2013; Krajewski et al., 2011) suggests that 
opportunities for learning them are rare, but we nonetheless think they are crucial. 
1.4.1. Early proposals 
Early work on paradigmatic mappings in the 1960s (Ervin, 1961; McNeill, 1963, 
1966) examined paradigms of antonymous adjectives, like deep~shallow, big~little, 
large~small. There is evidence for the existence of paradigmatic mappings in that 
18  
  
domain. Adults tend to produce antonyms in response to adjectives in free association 
tasks, whereas children tend to produce associated nouns (Brown & Berko, 1960; Ervin, 
1961; Woodrow & Lowell, 1916); e.g. if cued with shallow, a child might produce lake 
while an adult would produce deep. Adults also have intuitions about which adjectives go 
together, with big associated with little and large with small (Justeson & Katz, 1991), 
suggesting that adults have formed paradigmatic associations between antonymous 
adjectives. 
McNeill (1966) claims that acquiring paradigmatic associations is difficult because 
paradigmatic associates rarely appear together in sentences, so co-occurrence cannot be 
relied on to learn them, and that therefore the opportunity to learn them is absent under 
normal speech conditions. Nevertheless, paradigmatic associations are learned, and there 
have been a number of proposals regarding what allows that to happen in the absence of 
contiguity. McNeill (1963, 1966) and Ervin (1961) propose that the strength of a 
paradigmatic response depends on how often it has been incorrectly anticipated for a 
given stimulus, with antonyms being characterized by their ability to substitute for each 
other within syntactic frames. Plunkett & Juola (1999) claim that children compare the 
word tokens they hear to what they expect to hear and use the differences to update their 
beliefs. Albright & Hayes (2003) propose that speakers create rules that apply to subsets 
of the grammar, describing what changes apply to create a past tense from a present tense 
form of a word. Regardless of the particulars, these accounts all propose impressive feats, 
namely that whenever a learner hears a word, they either 
1) generate predictions about other forms of the same word and maintain those 
predictions until they are encountered, or 
19  
  
2) retrieve other forms of the same word from memory and evaluate whether they 
expected to hear what they heard. 
Existing computational models of morphological learning from perceptual experience 
assume a high level of reliability in these processes, e.g. most models of English past 
tense acquisition assume that the present tense is always available for predicting the 
corresponding past tense form (Albright & Hayes, 2003; Plunkett & Juola, 1999; 
Rumelhart & McClelland, 1986; Westermann & Ruh, 2012), which seems highly suspect. 
1.4.2. Discriminative learning and contiguity 
If memory is not fully reliable, then the temporal relationship between predictor and 
predicted forms should be crucial. Comparisons between separately presented stimuli 
present substantial difficulties elsewhere, leading to, for example, change blindness 
(Mitroff et al., 2004). Using one form to predict another should be easiest when the 
predicted closely follows the predictor. Temporal contiguity has been argued to be 
important for learning associations since at least Thorndike (1898, the “law of effect”). 
Ramscar et al. (2010) argue that order matters for what is learned about form-meaning 
mappings. In their experiment, each trial showed participants either a spoken form 
followed by the pictorial representation of meaning or a picture followed by the form. 
Participants discovered the most predictive semantic features when the meaning preceded 
the form, but not vice versa. These findings are argued to be consistent with 
discriminative models of associative learning, where cues are used to predict subsequent 
outcomes, and the learner only acquires associations from predictive cues to outcomes. 
Predictive cues allow the learner to discriminate between cue sets followed by a 
particular outcome and those followed by other outcomes, and the downweighting of 
20  
  
unpredictive cues is central to learning to appropriately discriminate. Further evidence for 
the importance of the temporal relationship for learning is provided by Arnon & Ramscar 
(2012), who show that the temporal relationship is crucial for the ability of the learner to 
learn to use gendered articles to predict upcoming nouns. Ramscar (2013) shows that 
prefixes make the following nominal items more predictable, whereas suffixes make the 
preceding nominal items more similar to each other. These results suggest that the 
temporal relationship between form and meaning determines what is learned about the 
form-meaning relationship. In the present work, we argue that the temporal relationship 
between paradigmatically related forms is a crucial determinant of what is learned about 
the paradigm. 
1.4.3. The neurological underpinnings of learning 
Much prior research has focused on the influence of error-driven predictive learning 
on language acquisition (Baayen et al., 2011; Ellis, 2006;  Lim et al., 2014; Plunkett & 
Juola, 1999; Ramscar et al., 2010; Ramscar et al., 2013; Ramscar & Gitcho, 2007; 
Ramscar & Yarlett, 2007; Rumelhart & McClelland, 1986; Westermann & Ruh, 2012). In 
reality, multiple systems work simultaneously. Neuroscience has shown that the brain 
possesses several complementary learning systems that learn in fundamentally different 
ways, including at least the posterior neocortex, hippocampus, and striatum (Ashby et al., 
2007; Kumaran et al., 2016). The striatum supports error-driven predictive learning (Lim 
et al., 2014; Ramscar et al., 2010; Schultz, 2006; Waelti et al., 2001). The hippocampus 
supports a rapid chunking process, which fuses cues together and allows them to acquire 
associations that the individual elements lack (O’Reilly & Rudy, 2001; Sutherland & 
Rudy, 1989). The neocortex learns in a Hebbian manner, strengthening associations 
21  
  
between co-occurring stimuli regardless of expectations (Ashby et al., 2007; McClelland, 
2001). Learning in the striatum is sensitive to prediction error because prediction error 
inversely correlates with the amount of dopamine in the synapse. The posterior cortical 
synapses lack dopamine projections, so synaptic connections strengthen whenever the 
cue co-occurs with the outcome, regardless of whether the outcome’s occurrence was 
expected (Ashby et al., 2007). All of these areas contribute to language learning, so they 
should all contribute in a learning experiment, as well, though any processes that require 
sleep (like consolidation in the neocortex) would only surface in an experimental 
paradigm that spans multiple days. In our experiments, we expect behavioral signatures 
of error-driven predictive learning to co-exist with chunking, with Hebbian learning 
contributing minimally if at all. 
1.4.4. Temporal contiguity 
Cue-outcome contiguity enables predictions of the outcome based on the cue 
(discrimination of cue configurations, e.g. base forms, leading to different outcomes, e.g. 
derived forms) and also highlights the differences between the forms (Carvalho & 
Goldstone 2015; Zaki et al., 2016). Alternating between categories allows learners to 
identify the discriminative features of category exemplars, the ones that best distinguish 
between the two categories. This contrasts with a blocked presentation, where learners 
focus on the features that are most common for that category, regardless of whether they 
are informative about the differences between categories. This suggests that in order for 
learners to determine which features distinguish e.g. singulars from plurals in their 
language, they should encounter corresponding singular and plural forms in close 
temporal proximity. 
22  
  
Previous studies have not investigated the importance of temporal contiguity between 
forms, likely because they shared McNeill’s (1966) assumption that paradigmatically 
related words rarely if ever appear next to each other. Work by corpus linguists has 
shown this assumption does not hold true for antonyms and morphological paradigms; 
antonyms co-occur more frequently than non-antonyms (Fellbaum, 1996; Jones, 2002; 
Jones et al., 2007; Justeson & Katz, 1991; Murphy, 2006), and canonical (small/large vs. 
big/little) co-occur more than non-canonical (Jones et al., 2007). Morphologically related 
words are more likely to co-occur in a limited text window than other word pairs, and 
computational models seeking to identify sets of words sharing a stem have been found to 
benefit from paying attention to co-occurrence (Baroni et al., 2002; Xu & Croft, 1998). If 
paradigmatically related words do co-occur, then instances of syntagmatic co-occurrence 
may be crucial for learning paradigmatic mappings like Ci#Nom.Sg~Ciov#Gen.Pl and 
Cia#Nom.Sg~Ci#Gen.Pl. In the realm of production, there is anecdotal evidence that children 
spontaneously produce paradigms in monologic word play (Weir, 1962; Nelson, 1989; 
Saville-Troike, 1988). Whether we learn paradigms from perception (McNeill, 1966; 
Plunkett & Juola, 1999) or production (Taatgen & Anderson, 2002), or a combination of 
the two, temporal contiguity may be essential for enabling acquisition of paradigmatic 
mappings. 
If temporal contiguity helps learn implicative relationships, how does it do so? 
Following Ramscar et al. (2010) and Arnon & Ramscar (2012), we posit that contiguity 
allows for discrimination of cue configurations that result in distinct outcomes. However, 
while they argue that forms constitute single undecomposable cues, we believe forms are 
configurations of somewhat separable cues (Kapatsinski, 2009), and that every 
23  
  
phonetic/phonological feature of a wordform can in theory be predictive of semantic and 
distributional characteristics of the word (see also Arnold et al., 2017; Baayen et al., 
2011), including – crucially – what other forms of the word are like, e.g. that a particular 
type of Russian non-diminutive form predicts a particular type of diminutive suffix. To 
productively apply implicative relations, a learner needs to know that the ‘base’ form is 
predictive of other forms, and the specific paradigmatic mappings between particular 
phonological features in the base and related/derived forms, which are largely arbitrary 
and not based on phonetics (e.g. Russian [k]-final nouns take -ok in the diminutive) and 
must be learned through experience. 
If individual phonological features are associable, then words are elemental, i.e. 
composed of a large set of independently associable elements. The evidence for this lies 
in the fact that although larger chunks like rimes and words are associable, associations 
of a segment sequence cannot be changed without interfering with the associations of the 
component parts, since they are recognized in the process of recognizing the whole 
(Kapatsinski, 2007, 2009). An important part of learning is “chunking,” where previously 
separate cues that are used together fuse together (Bybee, 2002; Ellis, 2017; Goldstone, 
2000), though separation of previously fused cues can also occur (Goldstone, 2003).  
In this work, we investigate whether contiguity benefits acquisition of two kinds of 
paradigmatic mappings: unfaithful mappings involving a change to the stem (e.g. 
kSGàtʃPL), and faithful mappings that do not (e.g. kSGàkPL). Theories of grammar differ 
on whether temporal adjacency should benefit both types of mappings. Network Theory 
suggests that unfaithful mappings could be carried out under pressure from schemas like 
“plurals should end in [tʃi],” which do not require noticing paradigmatic relations (Bybee, 
24  
  
2001; Kapatsinski, 2013, 2017b). Perhaps, despite appearances, unfaithful mappings are 
not learned by observing unfaithful mappings, so temporal contiguity between members 
of the exemplifying pairs will not benefit acquisition. Faithful mappings have been 
suggested to be the default (Hayes, 2004; McCarthy, 1998; Pinker & Prince, 1988), and 
therefore not subject to improvement from adjacency. If this is the case, then temporal 
contiguity will not benefit acquisition of faithful mappings. However, output-output 
faithfulness constraints (like Copy, Kapatsinski, 2017b, 2018a, §1.3) can be 
downweighted with linguistic experience, and English speakers have likely learned that 
consonants sometimes change (as in electri[k]~electri[s]ity). If that is the case, then 
making faithful mappings more obvious will strengthen the constraint and extend faithful 
mappings. 
1.4.5. Variation sets 
Variation sets are defined as successive utterances containing partial repetitions 
(Onnis et al., 2008), where the communicative intention is maintained, but any or all of 
lexical substitution and rephrasing, addition and deletion of specific reference, and 
reordering are present (Küntay & Slobin, 1996). One advantage of these partial 
repetitions is that they allow language learners to use local comparison to discover 
structure, even if their memory is limited, as it is for children. Approximately 20% of 
child-directed speech appears in variation sets (Küntay & Slobin, 1996), and the same 
proportion has been shown to be sufficient to assist in learning lexical items and phrasal 
units in miniature artificial language learning (Onnis et al., 2008).  
Like our proposal, variation sets emphasize the importance of temporal contiguity. 
Variation sets allow comparison and extraction of shared units, as well as anticipation of 
25  
  
the first form encountered in a context when a different form is presented, and discovery 
of the variety of contexts that a form can appear in. In Chapters V and VI, we focus on 
the effect contiguity has specifically on noticing changes, and the intact singular-plural 
pairs can be considered examples of (very small) variation sets. 
1.5. Creating novel forms 
In order to form new forms of known words, or to re-create a known form that cannot 
be accessed quickly enough by lexical retrieval, speakers generate a form to express the 
desired meaning by using: 
1) meaningàform associations, such as the product-oriented/first-order schemas of 
Bybee (1985, 2001), Kapatsinski (2012, 2013), Nesset (2008), and the constructions of 
Goldberg (2003), 
2) paradigmatic form-form associations, or second-order schemas, which are 
necessary for arbitrary paradigmatic mappings like k~s in electric~electricity (Booij, 
2010; Gouskova & Becker, 2013; Nesset, 2008; Pierrehumbert, 2006), 
3) copying from activated wordforms, like the output-output faithfulness constraints 
of Benua (1997) and Kenstowicz (1996), and 
4) a mechanism for maintaining/re-creating serial order, such as a word-sized 
prosodic template that grows with learning to fit the range of experienced wordforms 
(Redford, 2015; Vihman & Croft, 2007). 
In the present work, we focus on the interaction between (1), (2) and (3), but see 
Kapatsinski (2018a) for the full model. Paradigmatic associations (2) are more difficult to 
learn when the to-be-associated forms involve different articulators/dissimilar articulatory 
gestures, and they compete with the perseveratory tendency to simply output the 
26  
  
activated neuromotor representations (which vary by context and language, based on 
linguistic experience). When the association is too weak to override the perseveratory 
tendency, the stem change is leveled and paradigm uniformity arises. Product-oriented 
schemas may be acquired and used for judgments, even when they are not strong enough 
to drive production because of a strong opposition from perseveration, so a form that is 
rarely or never produced may still be accepted in judgment because it contains the correct 
cues for the meaning (Kapatsinski, 2013). 
1.6. Alternative theories 
There are a number of competing theories about the origin of paradigm uniformity. 
We review them in turn below. 
1.6.1. Stem and affix 
 Pinker & Prince (1988) criticized Rumelhart & McClelland (1986) for predicting 
membled as the past tense of mail, based on a model similar to that in Figure 1.1 but 
without Copy outputs, and proposed that words are instead produced from stems and 
affixes. The tendency to retain too little of the stem is a recurring problem for 
connectionist models of wug test behavior; without CopyINIT, the model in Figure 1.1 
predicts that a novel onset will always be replaced with another onset that has already 
been experienced, with the particulars being determined by co-occurrence with the rest of 
the segments in the word and what meaning is intended, but Kapatsinski (2018a) shows 
that Copy resolves that issue. 
Morphology as a description of language does not require that novel forms be 
produced from combining the stem with an affix; it could instead be based on the storage 
of whole-word representations in a paradigmatic network (Booij, 2010; Bybee, 1985; 
27  
  
Hockett, 1954; Matthews 1965; Robins, 1959; see J. P. Blevins, 2013, for a review). 
Complex morphological systems where a speaker needs to know more than one other 
complete form of a word in order to derive a novel form cannot be described as a 
combination of stem and affix, because the paradigm has multiple principle parts 
(Ackerman et al., 2009; Ackerman & Malouf, 2013; J. P. Blevins, 2006). Additionally, 
parts of the stem can fuse with the affixes (and the meanings of the affixes) with which 
they frequently co-occur, forming a construction/schema with boundaries that do not 
correspond to morpheme boundaries: [[]Nholic]A has been extracted from alcoholic and 
generalized to mean ‘addicted to’, as in workaholic, which is puzzling if alcoholic is 
regularly derived from alcohol + ic (Bybee, 1985). Cross-boundary units can form even 
very early in acquisition; hearing blutʃ~blutʃi strengthens the notion of ‘tʃi’ as a plural 
unit and increases the likelihood of participants believing that blut should become blutʃi 
(Kapatsinski, 2012, 2013). In other words, tʃ~tʃi supports ‘tʃi’=PLURAL rather than 
PLURAL=stem+i, which is problematic for models that assume morphologically-related 
forms are separated into change + stem/context (Albright & Hayes, 2003; Gouskova et 
al., 2015). It is likely that the tendency to preserve the stem is not because it is the basic 
word form from which others are derived by concatenation, but rather as a consequence 
of the fact that affixes are former words that have grammaticalized in situ (Bybee, 1985, 
p. 41; Lehmann, 1992).  
Additional evidence against PU as a consequence of concatenative morphology lies in 
the fact that the bias for preserving the stem is much stronger than the bias for preserving 
the affix (Beckman, 1998; Benua, 1997; McCarthy & Prince, 1995). For example, vowel 
harmony tends to spread from the stem to affixes, not the other way around: Bakovic 
28  
  
(2003) shows languages preserve the identity of the stem, even when the result is 
disharmony, and Finley (2015) demonstrates a learning bias in favor of vowel harmony 
(and against PU) in affixes. These effects are unexpected if stem preservation and affix 
preservation both originate from a word = stem + affix rule. Following Correspondence 
Theory (McCarthy & Prince, 1995), we omit the affix from the input in our model, and it 
is instead activated by the intended meaning and arbitrary paradigmatic associations. 
Since it is not in the input, it cannot be perseverated on. 
Some aspects of the base are more likely to be perseverated on. In the Canadian 
Raising example in (1), the height of the vowel is preserved, but the duration of the 
following stop closure and absence of vocal fold vibration are not: Voiced and voiceless 
stops both become flaps. Any analysis thus needs to be able to target specific 
submorphological units for perseveration, rather than entire stems and affixes (Prince & 
Smolensky, 1993/2004). These include phonological units like vowel height, but also 
subphonemic features of sounds, such as duration of stop/tap closure and phonetic 
correlates of stress like duration, pitch accent, and vowel quality (Steriade, 2000). The 
existence of phonological and subphonemic preservation effects motivates our proposal 
that the root of stem perseveration lies in production perseveration, rather than the 
allegedly concatenative nature of morphology. 
1.6.2. Storage economy 
If we assume that words are stored in their phonetic surface form, then related words 
that share the same phonological structure require less space (Kenstowicz, 1998). 
However, there is no evidence to suggest that long-term memory storage is limited 
(Householder, 1966; Johnson, 1997). Humans are capable of storing vast amounts of 
29  
  
information, down to specific episodes. Shepard (1967), Standing et al. (1970), and 
Standing (1973) show that thousands of novel pictures are automatically memorized 
based on a single exposure and are retained over several days, so working memory 
cannot be the sole factor behind PU. Brady et al. (2008) and Konkle et al. (2010) 
demonstrate that these memories are numerous and fairly detailed. Palmeri et al. (1993) 
use an old/new recognition task to show that voice-specific memories are stored, with 
word recognition facilitated when spoken by a familiar speaker over when spoken by a 
novel speaker, and the same degree of facilitation observed regardless of the number of 
voices (up to 20). From these results, they argue that long-term voice-specific memories 
of words are formed, even if 20 distinct voice-specific memories per word would be 
required. The formation of voice-specific memories is not necessarily automatic; it has 
been found to occur for old/new speaker identification, but not for lexical decision 
(Theodore et al., 2015). Nevertheless, the results suggest that an additional form 
representation per paradigm would not impose a significant load on long-term memory. 
Thus far, the accounts discussed do not account for the finding that PU favors small 
changes over large, in addition to favoring no change over change. Skoruppa et al. (2011) 
and White (2013, 2014) find that a change in two features is harder to learn than a change 
in one feature, but it is unclear how storage economy can explain this, since in both cases, 
a non-uniform paradigm requires more storage space. The preference against large 
changes motivates the accounts of PU in §1.6.3 and §1.6.4, based on avoidance of 
perceptual dissimilarity between alternants. 
 
 
30  
  
1.6.3. Perceptual similarity 
Kenstowicz (1996) argues that PU makes the relationships between related words 
more obvious, thereby facilitating lexical access (see also Steriade, 2000). The lack of PU 
may cause a listener to misinterpret a word that would have been parsable, had related 
words been activated, and speakers therefore avoid large changes because they are 
sensitive to the potential lack of understanding that could result. However, we do not 
know that a large change, like [p] to [tʃ], is harder for the listener to undo than a smaller 
change, like [t] to [tʃ]. Speakers do avoid ambiguity by hyperarticulating cues that 
distinguish words from their minimal pair neighbors (Baese-Berk & Goldrick, 2009; 
Wedel et al., 2013), but there are data suggesting that learners are not highly sensitive to 
homophony resulting from alternation, at least in the lab (Kapatsinski, 2012, 2013). 
Steriade (2001/2009) and White (2017) propose that PU arises because speakers 
avoid producing changes that are easily noticeable. Like other Optimality Theory-based 
accounts, stem changes are introduced to improve phonotactics/ease of articulation, and 
speakers are additionally posited to possess a store of perceptual similarities between 
segments in context (the P-map), which they use to avoid noticeable changes so they do 
not violate speech norms (Steriade, 2001/2009). In other words, on their account, PU is 
due to avoidance of perceptual dissimilarity. Speakers may desire to change the language 
(e.g. to make it more regular or easier to produce), but do not desire that listeners 
disapprove, so they keep the changes small.  
We believe that listener modeling in determining the articulatory details of 
pronunciation, such as degree of consonant voicing, is unsubstantiated. The online 
involvement of perceptual representations in production is not a settled matter (Perkell, 
31  
  
2012). Modeling the perceiver is involved in the retuning of production targets following 
target selection in auditory perturbation experiments (Perkell, 2012; Purcell & Munhall, 
2006). However, this retuning is widely believed to be an offline process that follows 
selection of the target (Villacorta et al, 2007; Perkell, 2012; see Norris et al., 2003, and 
Norris & McQueen, 2008, for arguments against online feedback in perception). The 
perceptual dissimilarity biases suggested by Kenstowicz (1996) and Steriade (2001/2009) 
seem to require the influence of perceptual modeling prior to selection of the production 
target during everyday speech production, but guidance of production by online 
perception seems too slow to be consistent with how rapidly speech production, and other 
skilled motor action, proceeds (Elsner & Hommel, 2001; Welsh & Llinas, 1997). 
Regardless, an online perceptual feedback mechanism seems unnecessary to account for 
word production, so we would rather not rely on it to explain PU. 
1.6.4. Categorization 
Moreton & Pater (2012a) propose that the bias against large changes stems from 
category learning. Arbitrary sound categories requiring multiple features to describe are 
more difficult to learn than those requiring a single feature (Cristià & Seidl, 2008; 
Moreton et al., 2017; Pycha et al., 2003; for non-linguisic categories, see Shepard et al., 
1961, and Feldman, 2003). The same sounds are perceived as being more similar by 
speakers of languages where they are allophones of the same phoneme (Boomershine et 
al., 2008; Johnson & Babel, 2010; Seidl et al., 2009). Moreton & Pater (2012a) suggest 
that acquiring an alternation involves categorizing the sounds that undergo the alternation 
separately from those that do not, and that this is easier when the groups can be defined 
by a single feature; for example, palatalization of labials (p~tʃ) in the absence of 
32  
  
palatalization of alveolars (t~tʃ) and/or velars (k~tʃ) requires categorizing [p] and [tʃ] 
together and separately from [t] and [k], perhaps as [labial] | ([coronal] and [dorsal]) 
(because English [tʃ] is [coronal] and [dorsal], Yun, 2006), whereas k~tʃ without t~tʃ and 
p~tʃ could be captured by [dorsal].  
Perceptual category structure can account for the bias against large changes in 
judgment, but it is not clear that it should apply in production. We are not aware of any 
studies showing that training learners to categorize sounds together (like with the 
unimodal distributional training procedure of Maye et al., 2002) improves acquisition of 
an alternation involving the two sounds. Instead, increased perceptual similarity between 
alternating sounds may be a side effect of learning to produce the alternation rather than a 
cause of it. 
1.6.5. Perseveration Hypothesis vs. Optimality Theory 
We agree with the above accounts that paradigm uniformity exists, and that it 
influences learnability. However, there are several notable differences between our 
account and OT. Firstly, OT proposes that PU exists because of highly-ranked universal 
output-output faithfulness constraints (Hayes, 2004; Kenstowicz, 1997; McCarthy, 1998), 
whereas we believe it is better explained through perseveration in the production system 
(§1.2): In the process of generating a novel form of a known word, related forms of the 
word can become activated and be incorporated into the production plan. Secondly, the 
order that constraints are ranked determines the learnability of patterns in OT, with 
constraints being re-ranked after exposure to enough input. We propose that the 
learnability of an alternation is the result of the difficulty in associating the 
representations that participate in the alternation (§1.3); representations that are very 
33  
  
dissimilar are more difficult to associate, so it takes longer for them to become strong 
enough to override the perseveratory tendency. Finally, OT generally considers only the 
learnability of patterns, not the executability in production, with the notable exception of 
Do (2018). She shows that Korean children avoid producing the alternating forms of 
verbs, even after they have learned them; from this, she argues that learning biases 
influence production preferences as well as learnability. We expect PU to be a factor in 
production, as perseveration is perpetually present and should therefore continue to affect 
performance, even after speakers have learned what they should produce. Additionally, 
judgments of the goodness of forms could be based on first-order schemas, rather than 
paradigmatic mappings, and therefore speakers may like forms that result from an 
alternation because they contain the appropriate cues to meaning without being able to 
produce them themselves.  
1.7. Structure of dissertation 
Chapter II explores what palatalization is, and why we chose it as the test case. 
Chapter III discusses the findings from the baseline experiment with no training, to 
measure the biases subjects bring with them from English. Chapter IV describes the 
results of the learnability of palatalization of either labials, alveolars, or velars, and 
differences in production vs. judgment. Chapter V covers the results of experiments 
varying the adjacency of faithful and unfaithful forms in training for labial or velar 
palatalization. Chapter VI discusses the model of the findings from Chapter V regarding 
the influence of different cues on learnability. Chapter VII reviews and concludes. 
Portions of Chapters II, IV, V, VI, and VII were co-authored with Vsevolod Kapatsinski. 
Portions of Chapters II, IV, and VII were published as: Smolek, A. & Kapatsinski, V. 
34  
  
(2018). What happens to large changes? Saltation produces well-liked outputs that are 
hard to generate. Laboratory Phonology: Journal of the Association for Laboratory 
Phonology, 9(1), 10. Portions of Chapters V, VI, and VII are undergoing revision as: 
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning correspondence 
from contiguity. Manuscript submitted for publication. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35  
  
CHAPTER II 
INVESTIGATING CHANGE MAGNITUDE EXPERIMENTALLY 
Portions of this chapter were taken from: 
Smolek, A. & V. Kapatsinski. (2018). What happens to large changes? Saltation 
produces well-liked outputs that are hard to generate. Laboratory Phonology: Journal of 
the Association for Laboratory Phonology, 9(1), 10. 
In order to investigate the influence of change magnitude and contiguity on the 
learnability of phonological patterns, we need a test case. We first discuss previous work 
on alternation magnitude before turning to the palatalization alternation used in the 
experiments in this thesis. 
2.1. Prior experimental work on change magnitude 
2.1.1. Learning in adults 
A few experiments have examined the learnability of alternations while manipulating 
the distance between alternants. Skoruppa et al. (2011) and White (2013, 2014) 
investigate the learning of alternations involving place, voicing, and manner of obstruents 
(e.g. p~t vs. p~s vs. p~z; p~v vs. b~v). They find that alternations involving a change to 
one feature (non-saltatory) are easier to learn/perform than alternations involving more 
than one feature (saltatory).  
Skoruppa et al. (2011) compare the learning rate of saltatory vs. non-saltatory 
alternations and find that the larger saltatory changes are more slowly acquired, as we 
would expect under the Perseveration Hypothesis. White (2013, Ch. 3; 2014) trained 
participants to criterion on alternating sounds and examined overgeneralization to other 
sounds. In the potentially saltatory condition, participants trained on p~v (“jumping over” 
36  
  
[b] and [f]) or t~ð (“jumping over” [d] and [s]), when tested using a two-alternative 
forced choice task, extend the alternation to [b]/[f] and [d]/[s]. Participants in the control 
conditions, trained on b~v or d~ð, do not extend the alternation to [p]/[f] or [t]/[θ].  
Even when participants are given explicit proof of the alternation being saltatory, they 
still extend it to the intermediate sounds. Half of the saltatory condition participants in 
White (2013, 2014) were trained on p~v and half on t~ð, as in the potentially saltatory 
condition, but they were additionally exposed to either copied intermediate fricatives (f~f 
or s~s, respectively) or copied intermediate obstruents (b~b or d~d), then tested on the 
other intermediate sound in a two-alternative forced choice task. Even though only 
participants who got 80% correct on the trained alternations proceeded to the test portion, 
they still extend the alternating pattern to the other intermediate sound. Comparable 
results were obtained through a production task (White, 2013, Ch. 4.5). He concludes that 
large saltatory changes are taken by participants to imply that the intermediate sounds 
also change. However, the large changes may be harder to perform: Many participants in 
the saltatory condition were excluded for failing to reach criterion accuracy on the trained 
segments and/or failing to select changes on the trained segments during test (17/33 for 
the saltatory condition vs. 2/22 for the potentially saltatory condition; White, 2013, p. 
83). We think that training to criterion is helpful for revealing categorization biases but 
obscures the bias behind PU, namely that large changes are harder to perform and 
dissimilar sounds harder to associate. 
In the experiments above, the alternations involving multiple features are saltatory, 
jumping over another sound, and it is unclear if the existence of an intermediary sound is 
necessary for large changes to be more difficult to learn than small. To our knowledge, 
37  
  
no one has investigated the learnability for the same change for learners that possess the 
intermediate sound in their native language inventory and those that do not (e.g. training 
Arabic and English speakers on b~f, as Arabic lacks the intermediate [p] and [v] in the 
native inventory; Watson, 2002). We therefore take these results to indicate that large 
changes are harder than small changes, in general, which could manifest as a bias against 
saltatory changes. 
2.1.2. Learning in infants 
White & Sundara (2014) and White (2013, Ch. 5) investigate a bias against saltatory 
alternations in 2 year old infants. The infants were placed in one of four conditions, 
shown in (1)-(4). Groups 1 and 2 were exposed to a saltatory alternation, p~v in (1) and 
t~z in (2), and were tested on b~v in (1) and d~z in (2). Participants in (1) pay more 
attention, as evidenced through longer looking times, to d~z trials than b~v, and 
participants in (2) pay more attention to b~v trials than d~z. The infants in (3) and (4) 
were trained on b~v and d~z, respectively, and tested on p~v and t~z, but they show no 
difference in looking times between the alternations at test. In other words, participants in 
(1) and (2) learn that the alternating forms are allophones in complementary distribution 
and extend the phoneme to include the intermediate sound, whereas participants in (3) 
and (4) do not extend the alternating category to [p] or [t], respectively. The authors 
claim that this provides evidence for a bias against saltatory alternations in infants. We 
believe it is instead evidence that alternating sounds are grouped together in perception 
and that the category includes intermediate sounds, unless contradictory evidence is 
provided (and possibly even then, if the results from White, 2013, 2014, hold true for 
infants as well), making the bias against saltatory alternations a special case of the bias 
38  
  
against discontinuous categories in perception (Maddox et al., 2005, 2007; Moreton & 
Pater, 2012a; Moreton et al., 2017). This bias is distinct from the production-internal 
biases that are the focus of the Perseveration Hypothesis, and we do not believe that it 
obviates the need for a bias against large changes in production. It is unlikely that infants 
are responsible for much language change, since they are not in a position to spread their 
innovations to others (Bybee, 2001). Learning to productively use paradigmatic mappings 
continues into school years (Berko, 1958), and may not be complete even in adulthood 
(Dąbrowska, 2012).  
1. {rom;na}{t;z}VCV and rom pVCV and na vVCV 
2. {rom;na}{p;v}VCV and rom tVCV and na zVCV 
3. {rom;na}{d;z}VCV and rom bVCV and na vVCV 
4. {rom;na}{b;v}VCV and rom dVCV and na zVCV 
 
When producing novel forms of a word, learners of paradigms struggle primarily with 
changing the sounds that should be changed rather than avoiding changing any 
intermediate sounds. Children usually make perseveratory errors when producing novel 
forms of known words, leveling stem changes instead of extending them (Do, 2013; 
Kerkhoff, 2007; Krajewski et al., 2011), and errors in overgeneralization of a change to 
intermediate sounds are relatively rare (Bolognesi, 1998; White, 2017). This suggests that 
saltatory alternations are rare because they are a kind of large change, not because the 
“jumped over” sounds come to alternate as well. If imperfect learning ever seeds 
language change, it is more likely to be through persistent paradigmatic perseveration in 
production. 
 
 
39  
  
2.2. Palatalization 
In the present work, we train participants on a palatalization alternation, where voiced 
([b;d;g]) and voiceless oral stops ([p;t;k]) in singulars become the palato-alveolar 
affricates [dʒ] or [tʃ], respectively, before -i or -a in plurals. Palatalization is a common 
process cross-linguistically (Bateman, 2007; Kochetov, 2011), and has been studied 
experimentally (Guion, 1998; Kapatsinski, 2013; Stave et al., 2013; Wilson, 2006). 
2.2.1. The typology of palatalization 
Palatalization of coronals and velars is equally common, whereas labial palatalization 
is very rare (Bhat, 1978; Chen, 1973). Bateman (2007) argues that there are no cases of 
productive labial palatalization where the labial articulation is fully suppressed, and 
Kochetov (2011) proposes an implicational universal: If a language has labial 
palatalization, then it also has alveolar and/or velar palatalization. Palatalization before -i 
is more common than before any other vowel (Bateman, 2007; Kochetov, 2011). The 
preference for palatalization before -i over other vowels has been argued on articulatory 
(Anttila, 1989, p. 72-73; Hock, 1991, p. 73-77) and perceptual grounds (Guion, 1998; 
Ohala, 1989, p. 183-185, 1992, p. 320).  
2.2.2. Palatalization and phonetic naturalness 
Palatalization allows us to consider two types of phonetic naturalness. The first is 
contextual (Stave et al., 2013; termed “feature spreading” by Skoruppa et al., 2011 and 
“contextual relevance” by Peperkamp et al., 2006): Is the result of the alternation 
phonetically closer to the context than the input is? For example, through coarticulation, 
the high front vowel [i] fronts preceding velars (Bateman, 2007; Bhat, 1978; Wilson, 
2006, Yun, 2006). With sufficient gestural overlap (Bateman, 2007), [k] can move 
40  
  
forward to [tʃ] before -i, making velar palatalization before -i natural in this sense. 
Palatalization before [a] is less contextually natural, because moving the place of 
articulation to the front of the mouth does not result in a consonant that is closer 
(articulatorily or acoustically) to [a] than [k] is.  
The second type of phonetic naturalness is how natural the change itself is (Stave et 
al., 2013), in other words, how similar the alternating forms are to each other (termed 
“phonetic distance” by Skoruppa et al., 2011 and “phonetic proximity” by Peperkamp et 
al., 2006). The infrequency of labial palatalization could be attributed to avoidance of 
either perceptual or articulatory dissimilarity: [tʃ] is articulatorily more similar to [t] and 
[k] than [p], as [tʃ] and [dʒ] feature coronal and dorsal articulations (Yun, 2006), but not 
labial. The asymmetry is paralleled in patterns of perceptual similarity (Kochetov, 2011): 
Wang & Bilger (1973) show that linguals are confused with alveopalatals much more 
frequently than labials are confused with alveopalatals. Palatalization of [k] is thus more 
natural than palatalization of [p], whether perceptual or articulatory similarity is 
considered.  
If the perceptual similarity between alternants is what determines learnability, as 
claimed by Hayes & White (2015), Kenstowicz (1996), Steriade (2001/2009), and White 
(2013, 2014), then palatalization of [k] before -i should be easier to learn than 
palatalization of [g] before -i: Guion (1998) demonstrates that American English listeners 
misperceive [ki] as [tʃi] in noise, but mistaking [gi] for [dʒi] is much less common. 40% 
of velar palatalization involves only [k], whereas there are no cases where only [g] is 
palatalized (Bhat, 1978). If articulatory similarity is what is important, as posited by the 
Perseveration Hypothesis (§1.1.3), then there should be no difference in the learnability 
41  
  
of palatalization of [k] and [g], since they are equally articulatorily similar to their palatal 
counterparts. Despite the greater perceptual similarity between [ki] and [tʃi] than [gi] and 
[dʒi] (Guion, 1998), English speakers palatalize [g] more than [k] (Wilson, 2006).  This 
effect is likely due to first language experience: The English letter <g> is often 
pronounced as [dʒ], while <k> and <c> are rarely pronounced as [tʃ] (Gontijo et al., 
2003). A replication of Wilson (2006) would suggest that the perceptual similarity effect 
is minor enough to be overcome by orthographic categorization and therefore is likely not 
a strong factor in learnability. 
2.2.3. Learnability 
Typological frequency often maps onto ease of learning (Finley, 2008; Mitrović, 
2012; White, 2013, 2014; Wilson, 2006). It has been shown that palatalization before -i is 
easier to learn than palatalization before any other vowels for artificial and natural 
language learning (Mitrović, 2012; Wilson, 2006), in line with a preference for context 
naturalness. Contextually unnatural alternations can still be learned, and may even be 
more productive than their contextually natural counterpart (e.g. Kapatsinski, 2010), but 
they are more difficult to learn and likely to be generalized to the more natural context 
(Mitrović, 2012; Wilson, 2006). Based on typological frequency and perceptual accounts 
of paradigm uniformity, k~tʃi should be easier to learn than g~dʒi, but there are cases 
where typological frequency does not correlate with a difference in learnability (Cristià & 
Seidl, 2008; Moreton & Pater, 2012b; Pycha et al., 2003; Skoruppa & Peperkamp, 2011; 
Seidl & Buckley, 2005). It seems plausible that the learnability of synchronic alternation 
patterns, like palatalization, is the cause of only some of the difference in typological 
frequency of those patterns, with the larger part being due to differences in the 
42  
  
frequencies of the diachronic change pathways that result in the alternations (J. Blevins, 
2006; Bybee, 2001). 
The typological asymmetry could be because [ti] and [ki] are more marked than [pi] 
(perhaps because they sound more like the palatal before [i]; Guion, 1998; Kochetov, 
2011), and palatalization of alveolars and velars improves a bad output. However, Stave 
et al. (2013) found that [ap]à[atʃa] was palatalized less than [ak]à[atʃa] or [at]à[atʃa], 
which cannot be ascribed to markedness, as all of the alternations are equally 
phonetically unmotivated. The rarity of labial palatalization thus suggests the existence of 
a learning bias against labial palatalization that would cause it to either not be learned 
well, or to be overgeneralized so that it obeys the implicational universal.  
In the present work, we manipulate context naturalness by testing the learnability of 
palatalization before -i (Experiment 2) vs. before -a (Experiment 3). We also examine 
two types of change naturalness, comparing the learnability of p~tʃ vs. t~tʃ vs. k~tʃ, and  
k~tʃi vs. g~dʒi. In the former, the input consonants differ in how articulatorily and 
perceptually similar they are to the output, and in the latter, they differ only in the degree 
of perceptual similarity to the output. Under the Perseveration Hypothesis, we expect to 
find a difference in palatalization rates by place of articulation (particularly labials vs. 
linguals) but not by voicing, whereas perceptual similarity-based accounts would expect 
differences by place of articulation and voicing. 
2.2.4. The potential problem with palatalization 
While using palatalization allows us to compare our findings to prior work and 
manipulate change and context naturalness, it does have the disadvantage of being 
present in English, for velars and alveolars in word pairs like legal~legislate and 
43  
  
create~creature and for alveolars in frequent phrases like did you. There is therefore a 
chance that our findings could be due to first language transfer and not the experimental 
manipulations. However, Experiment 1 (Chapter III) tests participants’ judgments of 
voiced and voiceless labial, alveolar, and velar palatalization before -i and -a, which we 
include as a comparison for Experiment 2. Thus, we can compare the difference in 
acceptability of palatalization of labials to palatalization of alveolars before and after 
training to determine to what degree the results are due to pre-existing biases vs. learning. 
We now turn to the palatalization experiments discussed in this thesis. 
2.3. Experiment review 
2.3.1. Experiment 1: Baseline 
In Experiment 1 (Chapter III), we obtain judgments of palatalization of voiced and 
voiceless labial, alveolar, and velar stops before -i and -a from native English speakers in 
the absence of any training, in order to establish the biases our participants bring to the 
task. We find that palatalization of alveolars is preferred to palatalization of labials and 
velars, but that there is no difference in acceptability of palatalization before -i vs. before 
-a or between voiced and voiceless velars. For the most part, the differences are minor, 
which suggests participants do not come to the experiment with biases that strongly 
influence acceptability of different types of palatalization.  
2.3.2. Experiment 2: Palatalization before -i 
Experiment 2 (Chapter IV) trains participants on miniature languages containing 
palatalization of voiced and voiceless labials, alveolars, or velars before -i. This allows us 
to evaluate the learnability of alternations that differ only in change magnitude: All the 
conditions have the same target output [tʃi], but alveolars and velars share gestures with 
44  
  
palato-alveolars whereas labials do not, so learning palatalization of labials requires 
“jumping over” [t] or [k]. We find that participants in all conditions are able to learn to 
prefer the alternating form over the faithful (e.g. after receiving training on labial 
palatalization, subjects prefer p~tʃi over p~pi), but that labial palatalization is difficult to 
produce. Participants in the Alveolar and Velar Palatalization conditions produce 
palatalization of the target consonants as often as they judge it acceptable, but 
participants in the Labial Palatalization condition accept it without producing it. 
Comparison to the baseline shows that training increases acceptability of the trained 
alternation an equal amount across places; that is to say, the training improves 
performance an equal amount in judgment. We argue that acceptability judgments may 
reflect first-order schemas, rather than paradigmatic mappings, which accounts for why 
Labial Palatalization participants accept labial palatalization but rarely produce it.  
The perceptual similarity account proposes that the greater confusability of [ki] and 
[tʃi] than [gi] and [dʒi] should result in k~tʃi being easier to acquire, but comparison of 
[k] and [g] reveals that there is no significant difference in palatalization rates in 
production or acceptability of palatalization in judgment by voicing, and in fact a slight 
preference for g~dʒi. While the bias against labial palatalization could conceivably be 
explained through either articulatory or perceptual dissimilarity, the absence of a 
perceptual similarity effect for [k] vs. [g] suggests that it is likely not the motivator of the 
dislike of labial palatalization. 
Taken together, the results of Experiment 2 show that labial palatalization is more 
difficult to learn to produce than lingual palatalization, and it is because labials are 
articulatorily distinct from palato-alveolars, as expected by the Perseveration Hypothesis.  
45  
  
2.3.3. Experiment 3: Palatalization before -a with contiguity 
Experiment 3 (Chapter V) investigates whether syntagmatic contiguity benefits 
acquisition of paradigmatic mappings. Participants are trained on either Labial ([p;b]) or 
Velar ([k;g]) Palatalization, before -a. We manipulate whether pairs of related forms are 
kept intact, with the plural immediately following the corresponding singular, or whether 
they appear in random order. In two of the trial order conditions (NoChange Obvious and 
All Obvious), pairs exemplifying faithful mappings (e.g. k~k{i;a}) are kept intact, and in 
the other two (None Obvious and Change Obvious), the singulars and plurals are 
randomly sorted, with the same true for pairs exemplifying unfaithful mappings (e.g. 
p~tʃa is kept intact in Change Obvious and All Obvious, and is not kept intact in None 
Obvious and NoChange Obvious). Within each language the four conditions are: None 
Obvious, where none of the pairs are kept intact and therefore none of the mappings are 
obvious (the same order as in Experiment 2); All Obvious, where all of the pairs are kept 
intact; Change Obvious, where only unfaithful pairs (e.g. p~tʃa) are kept intact and all 
other singulars and plurals are randomly ordered; and NoChange Obvious, where only 
faithful pairs (e.g. k~k{i;a}) are kept intact and all other singulars and plurals are 
randomly ordered.  
Both faithful and unfaithful mappings benefit from temporal contiguity of the 
singular and the plural: Keeping unfaithful pairs intact in training results in more 
palatalization in production, and keeping faithful pairs intact in training results in less 
palatalization in production. Intact faithful and unfaithful mappings both extend beyond 
their trained contexts, unless the competing pattern is made more obvious: Change 
Obvious participants palatalize more of the stems that should not be palatalized (e.g. 
46  
  
alveolar- and velar-final stems in Labial Palatalization) than All Obvious participants do, 
and NoChange Obvious participants retain the stem consonant of the stems they should 
palatalize more than All Obvious participants do. Having both patterns in temporal 
contiguity enables speakers to learn what to change and what not to change. 
2.3.4. Discriminative model of Experiment 3 
In Chapter VI, we describe a discriminative learning model that captures the effects 
of Experiment 3. We find that keeping singular-plural pairs intact makes the cues of the 
singular available for predicting the outcome of the plural form. Conditions with intact 
faithful pairs show higher rates of copying (retaining the input consonant), and conditions 
with intact unfaithful pairs result in higher rates of palatalization. We also find that 
patterns extend to the smallest natural class that contains all of the participating segments, 
which results in copying extending more to velars in the Velar Palatalization condition 
than it does to labials in the Labial Palatalization condition. Chunking has an effect, as 
well, with surprise boosting the association between parts: Encountering blutʃa after 
hearing blup fuses [tʃa] and makes it easier for one part (e.g. -a) to elicit the other (e.g. 
tʃ). Temporal contiguity benefits even salient patterns; participants tend to produce only   
-a unless faithful pairs (half of whose plurals are suffixed with -i) are kept intact and -i is 
therefore made obvious. The model is successful at capturing the experimental 
manipulations and provides support for the importance of domain-general learning 
mechanisms in language acquisition. 
2.3.5. Comparison of Experiments 2 and 3 
In Chapter VII, we compare the results of Experiments 2 and 3. Unlike Experiment 2, 
there is no difference by change magnitude in Experiment 3. Neither labial nor velar 
47  
  
palatalization is produced in the None Obvious trial order before -a, whereas velar 
palatalization is produced much more often than labial palatalization before -i. We 
propose that this provides evidence for a substantive bias; velar palatalization is learnable 
before -i because the high front vowel causes the tongue to move towards an alveopalatal, 
making k~tʃ motivated before -i but not before -a.  
Like Wilson (2006), we find that participants trained on palatalization before -a 
generalize the alternation to -i, but the reverse is not true, providing evidence that 
alternations generalize from less natural to more natural contexts. Unlike Wilson, we do 
find a difference in the learnability of palatalization by suffix vowel, with participants 
producing more palatalization before -i than before -a, even when trained on labial 
palatalization, which is unmotivated in both contexts. This may reflect a learning bias in 
favor of typologically-frequent patterns (White & Sundara, 2014). 
In the next chapter, we discuss the results of the baseline judgment experiment. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48  
  
CHAPTER III 
EXPERIMENT 1: PALATALIZATION BASELINE 
In order to determine effects of training on learning, a baseline needs to be 
established. English has examples of alveolar and velar palatalization in word pairs like 
create~creature and legal~legislative (though these are likely not productive in the 
synchronic grammars of speakers), and alveolar palatalization in phrases like would you 
and bet you (though these may not be the same process; Zsiga, 1995); because of this, 
participants may accept alveolar and velar palatalization more than labial palatalization. 
As discussed in §2.3, Experiments 2 and 3 investigate the effect of change magnitude 
(Chapter IV) and contiguity (Chapter V) on the learning of alternations. Without a 
baseline in the absence of training, the patterns of results could be due to pre-existing 
biases rather than the experimental manipulation. We chose to use judgments of 
palatalization in the absence of any training, since any production test requires at least 
some examples to illustrate the pattern in question. Perceptual judgments, however, 
require no generation on the part of the participants. 
Experiments 2 and 3 include extensive training, whereas a baseline is intended to 
evaluate the influence of native language without training. However, the very act of 
exposure to training trials may change the interpretation that participants have of the task: 
Being asked to judge exemplars of an alternation may be less odd, or subject to 
alternative explanations, after being shown hundreds of nonce-word trials containing the 
alternation than doing so without training. We tried to minimize any differences by 
providing similar instructions to both groups, though the degree to which that was a 
successful strategy is certainly open for debate. The responses during informal post-
49  
  
experiment interviews were largely the same for participants who were exposed to 
training and those who were not, though the former group included more frustration and 
confusion (unsurprising, given the full experiment was much longer and they had to learn 
a pattern rather than merely judge it). We were concerned that participants without 
training might rate all of the pairs containing palatalization as bad, but fortunately this 
was not the case. Given the goal – namely, to obtain data on participants’ views of 
palatalization patterns based solely on native language experience – we believe the 
chosen experimental paradigm was the best strategy. 
3.1. Methods 
3.1.1. Participants 
12 undergraduates in psychology and linguistics classes at the University of Oregon 
were recruited through the Human Subject Pool and received partial course credit for 
their participation. None reported having any speech, visual, auditory, or learning 
disabilities. 
3.1.2. Materials 
The test stimuli were 30 unique singular forms, randomly paired with pictures of 
creatures from the Spore database. The singular forms were all C(C)VC, and the final 
consonant was an oral stop. The final consonants were evenly divided between place of 
articulation (labial, alveolar, and velar) and voicing (voiced and voiceless), resulting in 5 
tokens of every place * voicing combination. 
Each singular had four plurals, crossing whether it was palatalized and what vowel 
was added (e.g. the singular smip had the plurals smipi, smipa, smitʃi, and smitʃa). Each 
singular was paired with all four plurals, resulting in 120 pairs. The singular and plurals 
50  
  
forms were recorded by a male native American English speaker from Oregon. The 
materials can be found in Appendix A. 
3.1.3. Procedure 
The singular picture with the singular recording was followed by a 300 ms blank 
screen and then the plural picture with one of the four plural recordings (see Figure 3.1). 
Every participant received a different random ordering of the pairs. There was a one 
second blank screen pause between trials. The experiment was conducted on E-Prime 2.0 
Professional (Psychology Software Tools, Pittsburgh, PA) and lasted around 10 minutes. 
 
  
Figure 3.1. Example display for a stimulus pair with labial palatalization. Participants 
saw the creature(s) and heard the associated word (shown in brackets here).  
 
 
Subjects were told they would hear the names of alien creatures, and that they needed 
to indicate using a button box how good a match they thought the plural was for the 
singular. The button box had 5 buttons but the results were strongly bimodal: 74.6% of 
responses were 1 or 5, 19.7% were 2 or 4, and only 5.7% were 3, so we created a binary 
measure where 1 and 2 were coded as 0 and 4 and 5 were coded as 1, with 3s excluded, 
but the results are comparable if the full rating range is used. We chose to use the 
bimodal responses rather than the full range because there is no difference in the 
51  
  
distribution of 3’s across Test Place (F(2) = 0.86, p < 0.43, ns), and the results can be 
more easily compared to those of Experiments 2.  
3.1.4. Measures 
We performed generalized logistic linear mixed-effects models with the lme4 
package (version 1.1-21, Bates et al., 2015) in R (version 3.6.0, R Development Core 
Team, 2019). Fixed effects were included for Keep Place (yes [faithful] vs. no 
[palatalized]), Plural Vowel (-i vs. -a), Test Place (Labial, Alveolar, and Velar), and Test 
Voice (Voiced vs. Voiceless), and any significant interactions. Random intercepts were 
included for Subjects and Singulars/Bases, with the full random effect structure that still 
allowed the model to converge (with maximally Keep Place, Plural Vowel, Test Place, 
and Test Vowel within Subjects, and Keep Place and Plural Vowel within Bases). Log 
likelihood tests on nested models were used to derive significance values. When a 
contrast that was expected to be significant was not, the evidence for the null hypothesis 
was evaluated using the BIC approximation to the Bayes Factor (Wagenmakers, 2007), 
which compares the posterior probabilities of the null and alternative hypotheses 
assuming their priors are equal and so can provide evidence for the null (unlike 
frequentist analysis), distinguishing between lack of evidence against the null vs. 
evidence for the null.  
We used Helmert contrast coding for Test Place to compare labial to lingual (alveolar 
and velar) stems and alveolar to velar stems, because we suspected that the absence of 
labial palatalization and presence of (limited) velar and alveolar palatalization in English 
would make participants less likely to accept palatalization of labials than linguals. 
Visual inspection of the graphs showed that it was actually alveolar-final stems that often 
52  
  
patterned separately from labial and velar stems, so post-hoc tests comparing alveolars to 
labials were also performed. Tested models and contrast coding are included in footnotes, 
and the full dataset and code are available at 
https://app.box.com/s/bd8jhx4g5m7bvlmxb8i4jgtjfjo2x111. 
3.1.5. Predictions 
We predict that participants will likely judge alveolar and velar palatalization as 
better than labial, because of the patterns in English. Alveolar and velar palatalization are 
also much more common cross-linguistically than labial palatalization (Kochetov, 2011; 
Bateman, 2007). 
 Following typological patterns, we expect participants to like palatalizing before  
-i better than before -a (Bateman, 2007; Chen, 1973; Kochetov, 2011; Wilson, 2006). 
However, we do not expect the greater typological frequency of voiceless palatalization 
(Bhat, 1978) to correspond to higher ratings here, because English alveolar palatalization 
targets voiced and voiceless segments, which could make it likely that palatalization at all 
places of articulation would be expected to follow the same pattern. In fact, despite the 
greater perceptual similarity of [ki] and [tʃi] vs. [gi] and [dʒi] (Guion, 1998), English 
speakers prefer palatalizing [g] (Wilson, 2006), likely because of orthographic overlap 
between [g] and [dʒ] in <g>, so it could be possible that (at least for velar palatalization) 
the voiced alternation would be judged better than voiceless. We expect any difference 
between voiced and voiceless consonants, if present, to be strongest before -i and 
minimal or nonexistent before -a, as [tʃa] and [dʒa] are perceptually dissimilar from [ka] 
and [ga], respectively. 
53  
  
Lastly, we predict that subjects will rate faithful plurals (e.g. blut bluti and blut~bluta) 
as better than unfaithful (blut~blutʃi and blut~blutʃa), because larger changes are liked 
less (Kenstowicz, 1996; Skoruppa et al., 2011; Steriade, 2001/2009; White, 2014). We 
predict that there could be a difference in patterning by suffix vowel; due to the acoustic 
similarity between [ki] and [tʃi] (Guion, 1998), and that palatalizing causes consonants to 
become articulatorily more similar to a following [i] (Kochetov, 2011), participants might 
judge palatalization before -i as equivalent to, or even better than, lack of palatalization. 
We do not expect a similar effect before -a. 
3.2 Results 
3.2.1. Judgments of palatalized plurals 
There is no significant difference in acceptance rates of labial palatalization vs. 
lingual palatalization7 (b = -0.37, se(b) = 0.34, z = -1.09, p < 0.28), but alveolar 
palatalization is liked significantly more than velar palatalization (b = 1.09, se(b) = 0.36, 
z = 3.05, p = 0.002), and including Test Place significantly improved model fit (χ2(2) = 
7.01, p = 0.03). Post-hoc tests show that alveolar palatalization is liked significantly more 
than labial palatalization, as well8 (b = 0.72, se(b) = 0.33, z = 2.16, p = 0.03; χ2(1) = 4.08, 
p = 0.04). These patterns can be seen in Figure 3.2. and Table 3.1. 
While there is no difference in overall acceptance rates of palatalization before -i vs. 
before -a (b = 0.31, se(b) = 0.46, z = 0.68, p < 0.50, ns; according to the BIC 
approximation to the Bayes Factor, the results provide strong evidence for the null, ΔBIC 
                                                
7 Rating Bin ~ Test Place * Plural Vowel + Test Voice + (1 + Test Place * Plural Vowel | Subject) + (1 + 
Plural Vowel | Singular), palatalized plurals only, with Helmert contrasts comparing labial to alveolar/velar 
stems and alveolar to velar stems. 
 
8 Rating Bin ~ Test Place + Plural Vowel + Test Voice (1 + Test Place + Plural Vowel | Subject) + (1 + 
Plural Vowel | Singular), palatalized plurals only, labial vs. alveolar stems. 
54  
  
= 6.14, PBIC = 0.956), the difference between alveolar and velar ratings is smaller before  
-a than before -i (b = -1.28, se(b) = 0.52, z = -2.48, p = 0.01), and including the 
interaction significantly improves model fit (χ2(2) = 6.22, p = 0.04). The figure clearly 
shows where this effect originates: The acceptance rates for palatalized alveolars and 
velars are essentially the same before -a, whereas palatalized velars are accepted less 
often before -i.9 While post-hoc analyses show that palatalized velar-final stems are liked 
marginally more before -a than -i10 (b = 0.94, se(b) = 0.55, z = 1.71, p < 0.09), including 
Plural Vowel does not significantly improve the fit of the model and the data provide 
positive support for the null, ΔBIC = 2.8, PBIC = 0.802. 
 
 
Figure 3.2. Acceptance of palatalized plurals by stem-final consonant place of 
articulation and plural suffix. 
 
                                                
9 Post-hoc tests show no significant interaction of Test Place and Plural Vowel for labial vs. alveolar stems 
(b = -0.50, se(b) = 0.44, z = -1.12, p = 0.26, ns) and the results provide positive support for the null, ΔBIC = 
4.82, PBIC = 0.918. 
 
10 Rating Bin ~ Plural Vowel + Test Voice + (1 + Plural Vowel | Subject) + (1 + Plural Vowel | Singular), 
restricted to palatalized velar plurals. 
 
55  
  
There is no significant difference between voiced and voiceless palatalization (b =      
-0.054, se(b) = 0.18, z = -0.30, p < 0.77, ns; according to the BIC approximation to the 
Bayes Factor, the data provide strong evidence for the null, ΔBIC = 6.39, PBIC = 0.961; 
Figure 3.3). Even when we consider only velar palatalization before -i11 (where 
perceptual similarity would suggest favoring [k] over [g]), there is no significant 
difference by voicing (b = -0.47, z = -0.81, p < 0.42, ns; the results provide positive 
support for the null according to the BIC approximation to the Bayes Factor, ΔBIC = 
3.97, PBIC = 0.879). 
 
Table 3.1. Generalized linear effects model output for acceptance of palatalized plural-
singular pairs by stem place of articulation, suffix vowel, voicing, and interactions. 
 b se(b) z p  
(Intercept) -0.60322 0.33065 -1.824 0.0681 . 
Labial vs. 
Alveolar/Velar -0.36724 0.33662 -1.091 0.27528  
Alveolar vs. 
Velar 1.08724 0.35648 3.05 0.00229 ** 
Before -a 0.31218 0.4622 0.675 0.4994  
Voiceless -0.05425 0.18374 -0.295 0.7678  
Labial vs. 
Alveolar/Velar x -0.25377 0.47602 -0.533 0.59396  
Before -a 
Alveolar vs. 
Velar x Before -a -1.28447 0.51763 -2.481 0.01309 * 
. Marginally significant 
* Significance level of 0.05 
** Significance level of 0.01 
 
                                                
11 Rating Bin ~ Test Voice + (1 + Test Voice | Subject) + (1 | Singular), restricted to palatalized velars 
before –i. 
56  
  
 
Figure 3.3. Acceptance of palatalized plurals by stem-final consonant place of 
articulation and voicing. 
 
3.2.2. Judgments of faithful plurals 
There are no significant differences between acceptance rates of faithful forms12, 
whether for labials vs. the linguals (b = 0.093, se(b) = 0.28, z = 0.33, p = 0.74, ns) or 
alveolars vs. velars (b = -0.24, se(b) = 0.31, z = -0.76, p = 0.45, ns). The BIC 
approximation to the Bayes Factor provides very strong evidence for the null, ΔBIC = 
12.42, PBIC = 0.998. There is no significant difference of acceptability of faithful forms 
by suffix vowel (b = 0.88, se(b) = 0.82, z = 1.07, p = 0.29, ns), and according to the BIC 
approximation to the Bayes Factor, the results provide positive evidence for the null, 
ΔBIC = 5.38, PBIC = 0.936. These results are illustrated in Figure 3.413.  
                                                
12 Rating Bin ~ Test Place + Plural Vowel + Test Voice + (1 + Test Place + Plural Vowel + Test Voice | 
Subject) + (1 + Plural Vowel | Singular), faithful plurals only, Helmert contrasts comparing labial to 
alveolar/velar stems and alveolar to velar stems. 
 
13 Despite appearances, the interaction between Test Place and Plural Vowel is not significant (b = 0.10, 
se(b) = 0.56, z = 0.18, p = 0.85, ns) and the results provide very strong support for the null, ΔBIC = 10.45, 
PBIC = 0.995. 
57  
  
 
Figure 3.4. Acceptance of faithful plurals by stem-final consonant place of articulation 
and suffix. 
 
3.2.3. Judgments of plurals before -i 
Before -i14 (see Figure 3.5 and Table 3.2), there is no difference in ratings of labial-
final stems vs. lingual-final stems (b = 0.035, se(b) = 0.47, z = 0.074, p = 0.94) or 
between the linguals (b = -0.79, se(b) = 0.58, z = -1.37, p = 0.17), and according to the 
BIC approximation to the Bayes Factor, the results provide strong support for the null 
(ΔBIC = 8.69, PBIC = 0.987). There are also no differences between ratings of palatalized 
and faithful plurals before -i (b = -0.52, se(b) = 0.64, z = -0.82, p = 0.41) and the data 
provide strong support for the null according to the BIC approximation to the Bayes 
Factor (ΔBIC = 6.35, PBIC = 0.960). Post-hoc tests with alveolar as the baseline15 show 
that unfaithful labial and velar stems are rated lower than unfaithful alveolar stems (b =   
                                                
14 Rating Bin ~ Keep Place * Test Place + Test Voice + (1 + Keep Place * Test Place | Subject) + (1 + Keep 
Place | Singular), before –i, Helmert contrast coded for labial vs. alveolar/velar stems and alveolar vs. velar 
stems. 
 
15 Rating Bin ~ Keep Place * Test Place + Test Voice + (1 + Keep Place * Test Place | Subject) + (1 + Keep 
Place | Singular), alveolar vs. labial and alveolar vs. velar stems. 
58  
  
-1.38, se(b) = 0.69, z = -2.00, p < 0.05 and b = -1.87, se(b) = 0.65, z = -2.87, p = 0.004, 
respectively), but the inclusion of the interaction only marginally improves model fit 
(χ2(2) = 5.30, p = 0.07) Thus, despite the apparent interaction of faithfulness by stem-
final consonant in Figure 3.5, the results provide strong support for the null according to 
the BIC approximation to the Bayes Factor (ΔBIC = 7.67, PBIC = 0.979). Finally, voiced 
stems are no different from voiceless stems (b = 0.0026, z = 0.014, p = 0.99) and the BIC 
approximation to the Bayes Factor shows that the results provide strong support for the 
null (ΔBIC = 6.5, PBIC = 0.963).  
 
 
Figure 3.5. Acceptance of plurals before -i by stem-final consonant place of articulation 
and faithfulness. 
 
 
 
 
 
 
 
 
 
59  
  
Table 3.2. Generalized linear mixed effects model output for acceptance of singular-
plural pairs suffixed with -i by whether the plural was palatalized, stem-final consonant 
place of articulation, voicing, and the interaction between palatalization and place of 
articulation. 
 b se(b) z p  
(Intercept) -0.125993 0.707238 -0.178 0.85861  
Palatalized -0.521188 0.635601 -0.82 0.41222  
Labial vs. 
Alveolar/Velar  0.034646 0.467642 0.074 0.94094  
Alveolar vs. Velar -0.794316 0.579559 -1.371 0.17051  
Voiceless 0.002644 0.192782 0.014 0.98906  
Palatalized x Labial 
vs. Alveolar/Velar -0.444719 0.650644 -0.684 0.49429  
Palatalized x 
Alveolar vs. Velar 1.868223 0.651947 2.866 0.00416 ** 
** Significance level of 0.01 
 
3.2.5. Judgments of plurals before -a 
Before -a16 (see Figure 3.6 and Table 3.2), the only significant effect is that 
palatalized plurals are rated worse than faithful (b = -1.13, se(b) = 0.40, z = -2.80, p = 
0.005; the inclusion of Keep Place significantly improves model fit, χ2(1) = 6.31, p = 
0.01). There is no difference between ratings of labial-final stems and lingual-final stems 
(b = -0.26, se(b) = 0.27, z = -0.98, p < 0.33) or between alveolar and velar stems (b = 
0.10, se(b) = 0.27, z = 0.37, p = 0.71), and according to the BIC approximation to the 
Bayes Factor, the data provide very strong evidence for the null (ΔBIC = 12, PBIC = 
0.998). Voiceless stems are rated no differently than voiced stems (b = 0.0092, se(b) = 
0.25, z = 0.037, p = 0.97) and the data provide strong evidence for the null (ΔBIC = 6.48, 
PBIC = 0.962). None of the interactions are significant. 
 
                                                
16 Rating Bin ~ Keep Place + Test Place + Test Voice + (1 + Keep Place + Test Place + Test Voice | 
Subject) + (1 + Keep Place | Singular), before –a, Helmert contrast coded for labial vs. alveolar/velar stems 
and alveolar vs. velar stems. 
60  
  
 
Figure 3.6. Acceptance of plurals before -a by stem-final consonant place of articulation 
and faithfulness. 
 
Table 3.3. Generalized linear mixed effects model output for acceptance of singular-
plural pairs suffixed with -a by whether the plural was palatalized, stem-final consonant 
place of articulation, voicing, and the interaction between palatalization and place of 
articulation. 
 b se(b) z p  
(Intercept) 0.83142 0.43778 1.899 0.05754 . 
Palatalized -1.12897 0.40266 -2.804 0.00505 ** 
Labial vs. 
Alveolar/Velar -0.26095 0.26656 -0.979 0.32761  
Alveolar vs. Velar 0.10003 0.27361 0.366 0.71465  
Voiceless 0.00917 0.24826 0.037 0.97054  
. Marginal significance 
** Significance level of 0.01 
 
3.2.5. Judgments by faithfulness 
Our final model evaluates judgments by Test Place, Plural Vowel, and Palatalization, 
as well as any significant interactions.  
61  
  
Palatalized forms are rated only marginally worse than faithful forms in the full 
model17 (b = -0.70, z = -1.85, p = 0.06), but the inclusion of Keep Place does significantly 
improve model fit (χ2(1) = 5.28, p = 0.02). There is a greater difference between 
palatalized alveolars and palatalized velars than between faithful alveolars and faithful 
velars (b = 1.65, z = 3.98, p < 0.001), and while there is no difference between labials and 
the linguals (b = -0.44, z = -1.24, p = 0.21), the interaction between Keep Place and Test 
Place does significantly improve model fit (χ2(2) =  9.86, p = 0.007). Post-hoc tests18 
reveal that unfaithful alveolars also differ more from unfaithful labials than faithful 
alveolars do from faithful labials (b = 1.30, se(b) = 0.41, z = 3.15, p < 0.002; χ2(2) =  
9.28, p = 0.002). Figure 3.7 illustrates the locus of this effect: Participants have no 
preference for faithful over unfaithful alveolars19, but dislike palatalized labials and 
velars.  
There is a significant main effect of suffix vowel, with plurals suffixed with -a being 
preferred to those suffixed with -i (b = 0.62, se(b) = 0.17, z = 3.60, p < 0.001), and the 
inclusion of Plural Vowel significantly improves model fit (χ2(1) = 13.54, p < 0.001). The 
significant three-way interaction between Keep Place, Test Place, and Plural Vowel 
indicates that the difference between ratings of palatalized alveolars and palatalized 
velars is smaller before -a (b = -2.01, se(b) = 0.59, z = -3.44, p < 0.001; including the 
interaction significantly improves model fit (χ2(1) = 12.23, p = 0.002), which can be seen 
                                                
17 Rating Bin ~ Keep Place * Test Place * Plural Vowel + Test Voice + (1 + Keep Place + Test Place | 
Subject) + (1 + Keep Place + Plural Vowel | Singular), Helmert contrasts coded for labial vs. alveolar/velar 
stems and alveolar vs. velar stems. 
 
18 Rating Bin ~ Keep Place * Test Place * Plural Vowel + Test Voice + (1 + Keep Place + Test Place | 
Subject) + (1 + Keep Place + Plural Vowel | Singular), restricted to alveolar and labial stems. 
 
19 Post-hoc test on alveolar stems only, b = 0.35, se(b) = 0.41, z = 0.86, p = 0.39, ns; Rating Bin ~ Keep 
Place * Plural Vowel + Test Voice + (1 + Keep Place + Plural Vowel | Subject) + (1 + Keep Place | 
Singular). 
62  
  
by comparing the dark bars in Figures 3.5 and 3.6: Unfaithful alveolars are liked much 
more than unfaithful velars before -i, but before -a, they are accepted equally often.  
 
 
Figure 3.7. Acceptance rates of plurals by stem-final consonant place of articulation and 
whether the plural was faithful to the singular. 
 
 
3.3. Discussion 
 The results suggest that the greater frequency and productivity of alveolar 
palatalization in English does translate into a (slight) preference in a test setting, with 
alveolar palatalization rated higher than labial and velar palatalization. Although 
palatalization before -i is much more common cross-linguistically (Bateman, 2007; Chen, 
1973; Kochetov, 2011), participants show no preference for the alternation before -i vs. 
before -a. This could again be attributed to English patterns: Even though palatalization is 
triggered in common phrases like would you by the glide [j], which is acoustically and 
articulatorily similar to [i], in its palatalized form the “suffix vowel” is a schwa, so they 
may have learned to associate [tʃ] with non-high, non-front vowels.  
63  
  
There is no difference is ratings for palatalization of voiced vs. voiceless consonants 
in general, or velars in particular. This suggests that both the P-map (which suggests a 
preference for palatalizing voiceless velars; Guion, 1998; Steriade, 2001/2009) and 
orthographic overlap (which suggests a preference for palatalizing voiced velars, since 
[g] and [dʒ] can both be written with <g>) either have minimal effect on preference for 
voicing in palatalization, or the effects cancel each other out. Regardless, the results 
could indicate that we won’t see any differences in learnability of palatalization by 
voicing in Experiments 2 and 3. 
If [ti] is liked less than [ki] or [pi], the preference for alveolar palatalization could be 
explained as avoidance of a marked structure. However, there are no significant 
differences between ratings of faithful forms, which suggests that any influence of 
markedness on the likability and learnability of palatalization is minimal, since all of the 
faithful forms are considered roughly equivalent (and therefore, none would be especially 
improved by alternating). 
Overall, palatalized forms are rated marginally lower than faithful forms. Given that 
larger changes tend to be disliked more than smaller (Kenstowicz, 1996; Steriade, 
2001/2009), and that adding a vowel and changing a consonant is a larger change than 
merely adding a vowel, we can hypothesize why we don’t see a stronger preference for 
non-palatalization here. Perhaps their familiarity with English palatalization patterns 
make participants more likely to accept other palatalization patterns, although first-
language transfer may have a limited effect on miniature artificial language learning 
(Garcia et al., 2017; Mitrović, 2012; Wang & Saffran, 2014). Or perhaps judgment is 
64  
  
lenient (Kempen & Harbusch, 2005), so even if they wouldn’t produce a form 
themselves, participants are still willing to deem it acceptable.  
In summary, the baseline experiment shows that whatever differences participants 
have in the acceptability of palatalization by place of stem-final consonant, vowel, and 
voicing, they are relatively minor.  
In Chapter IV, we compare the baseline to the judgment data after training in 
Experiment 2 in order to confirm the influence pre-existing biases have on the learning 
and production of palatalization. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65  
  
CHAPTER IV 
EXPERIMENT 2: PALATALIZATION BEFORE -i 
Portions of this chapter were taken from: 
Smolek, A. & Kapatsinski, V. (2018). What happens to large changes? Saltation 
produces well-liked outputs that are hard to generate. Laboratory Phonology: Journal of 
the Association for Laboratory Phonology, 9(1), 10. 
In Experiment 2, we investigate the learnability of three miniature artificial languages 
containing either labial, alveolar, or velar voiced and voiceless palatalization before -i. 
Like in previous work on the influence of alternation magnitude on the learnability of 
alternations (Skoruppa & Peperkamp, 2011; White, 2013, 2014; White & Sundara, 2014), 
the languages differ in the degree of change to the base that they require. Here, the output 
plural forms always end with the palato-alveolar [tʃi]. Whereas alveolars and velars share 
gestures with [tʃ] (tongue tip and tongue body, respectively; Yun, 2006), labials do not, 
so articulatorily, labials require a larger change and palatalization then necessitates 
saltation over [t] or [k], depending on whether palatalization is reached through [coronal] 
or [dorsal]. 
4.1. Predictions and hypotheses 
Our principal expectation is that labial palatalization, the saltatory alternation, will be 
more difficult to learn than alveolar or velar palatalization. In particular, we expect p~tʃ 
to be harder to produce after training than t~tʃ or k~tʃ, and that it will also be more likely 
to be overgeneralized to alveolars and velars than alveolar or velar palatalization will be 
to overgeneralize to labials. We test five explicit hypotheses: 
 
66  
  
Hypothesis 1: Labial palatalization is hard to learn because of markedness, not 
faithfulness. 
Faithful judgments will not differ across conditions: p~tʃ is liked less than t~tʃ or 
k~tʃ, but p~pi, t~ti, and k~ki are equivalent. If k~ki and t~ti are worse than p~pi, that 
alone would make learning palatalization of those stops easier (Pater & Tessier, 2006). 
  
Hypothesis 2: Large alternations, including saltation, are hard to produce (Skoruppa 
et al., 2011). 
Large alternations are those where the alternants are dissimilar (articulatorily, 
perceptually, and/or featurally). Saltation, where there exists a segment that is more 
similar to the input than the output is, such as [t] between [p] and [tʃ] in labial 
palatalization, is one type of large change. This hypothesis proposes that labials will be 
palatalized less in the Labial Palatalization condition than will alveolars in the Alveolar 
Palatalization condition or velars in the Velar Palatalization condition. They will also be 
palatalized less in error in the Alveolar and Velar conditions than alveolars and velars 
will be (in the Labial and Velar, and Labial and Alveolar conditions, respectively). 
 
Hypothesis 3: Saltatory alternations are likely to be overgeneralized (Hayes & White, 
2015; Moreton & Pater, 2012a; White, 2013, 2014, 2017; White & Sundara, 2014). 
[t] will be palatalized more after training on Labial Palatalization than Velar 
Palatalization, because Labial learners will attempt to acquire a simple conjunctively-
defined category subsuming [p] and [tʃ] (e.g. [-continuant], which also includes [t] and 
67  
  
[k]), whereas a category subsuming [t] and [tʃ] (e.g. [coronal; -continuant]) would not 
include [k] and [p], and vice versa.  
 
Hypothesis 4: Large changes are hard to produce, even if they are judged to be 
preferable. 
The difference between the Labial condition and the others will be larger in the 
production test than in judgment; more specifically, labial palatalization is likely to be 
accepted in judgment while being rarely produced. The Perseveration Hypothesis 
proposes that production of an alternation can be difficult even when the product is 
judged acceptable because production involves overcoming paradigmatic perseveration, 
the default predisposition to copy activated segments of the source(s) into the production 
plan, whereas judgment does not. Overcoming paradigmatic perseveration is made easier 
by the acquisition of paradigmatic associations between segments participating in an 
alternation. Judgments need not rely on paradigmatic mappings because there is no need 
to overcome paradigmatic perseveration in judgment. A speaker who has not acquired 
paradigmatic associations necessary for overcoming paradigmatic perseveration and is 
therefore unable to produce an alternation could nonetheless judge the alternation to be 
more acceptable than the faithful mapping s/he would actually produce, by using product-
oriented/first-order schemas (Bybee, 1985, 2001; Kapatsinski, 2012, 2013; Nesset, 2008); 
that is, by judging that the product contains the appropriate cues for the meaning 
expressed (Kapatsinski, 2013). The avoidance of labial palatalization should therefore be 
stronger in production than judgment. This mirrors the patterns found for Tagalog by 
Zuraw (2000), where speakers rarely produce nasal substitution, which requires a change 
68  
  
to the base, but still judge it better than nasal assimilation, which does not require a stem 
change. A dissociation between judgment and production would also address the critique 
that judgments are merely more tolerant than production (Kempen & Harbusch, 2005), 
since the less acceptable form is more likely to be produced. In other words, the speaker 
recognizes the correct form, but is reluctant or unable to produce it. 
 
Hypothesis 5: The bias against labial palatalization is a bias in favor of perceptual 
similarity between alternants. 
Perceptual explanations of alternation learnability (Hayes & White, 2015; 
Kenstowicz, 1996; Moreton & Pater, 2012a; Steriade, 2001/2009) propose that greater 
perceptual similarity between alternants corresponds with greater ease of acquisition. 
Previous work by Guion (1998) shows that [ki] and [tʃi] are more confusable than are [gi] 
and [dʒi], which implies that [k] will be palatalized less than [g] in all conditions.  
 
Under the Perseveration Hypothesis, we expect support for H1, H2, and H4. 
Categorization based on featural simplicity (Moreton & Pater, 2012a; §1.6.4) would 
support H2 and H3. Perceptual explanations (Kenstowicz, 1996; Steriade, 2001/2009; 
White, 2013, 2014; §1.6.3) would support H1, H2, H3, and H5. We further hypothesize 
that, when comparing effects of alternation magnitude on the segments participants were 
trained to change as well as those they were trained not to change, we will see over-
extension of palatalization in judgment (because judgment is more tolerant, Kempen & 
Harbusch, 2005; and because listeners may have acquired product-oriented schemas that 
69  
  
favor it, Kapatsinski, 2012, 2013) and underapplication leading to its eventual demise in 
production (Skoruppa et al., 2011; Stave et al., 2013). 
4.2. Methods 
4.2.1. Languages 
There were three training languages, consisting of either labial, alveolar, or velar 
palatalization. The stems were always C(C)VC and ended in an oral stop [b;p;d;t;g;k]. 
The plural suffix was -i (100% of the time for To-Be-Palatalized consonants, 50% of the 
time for Not-To-Be-Palatalized consonants) or -a (0% of the time for To-Be-Palatalized 
consonants, 50% of the time for Not-To-Be-Palatalized consonants). The To-Be-
Palatalized consonant became [tʃ] if voiceless and [dʒ] if voiced; we included both voiced 
and voiceless consonants to test for the influence of perceptual vs. articulatory similarity 
on the learnability of alternations (Hypothesis 5). Labial palatalization was saltatory, 
because palatals contain both [coronal] and [dorsal] features, which labials lack. Table 
4.1 shows the patterns in each of the languages. 
 
Table 4.1. Labial, Alveolar and Velar Palatalization patterns presented to participants in 
Experiment 2. 
 Labial Alveolar Velar 
Palatalization Palatalization Palatalization 
Singular Plural Plural Plural 
…p …tʃi …{pi;pa} …{pi;pa} 
…b …dʒi …{bi;ba} …{bi;ba} 
…t …{ti;ta} …tʃi …{ti;ta} 
…d …{di;da} …dʒi …{di;da} 
…k …{ki;ka} …{ki;ka} …tʃi 
…g …{gi;ga} …{gi;ga} …dʒi 
 
 
 
70  
  
4.2.2. Participants 
107 undergraduates in psychology or linguistics classes at the University of Oregon 
were recruited through the Human Subject Pool and received partial course credit for 
participation. 11 were excluded for producing plurals that did not correspond to patterns 
in the training20. After exclusions, there were 32 participants in the Alveolar 
Palatalization condition, 31 in Labial, and 33 in Velar. All participants were native 
English speakers with no speech, hearing, language, or learning disabilities. 
4.2.3. Materials 
4.2.3.1. Training 
For each language, there were 28 unique singulars, randomly paired with images from 
the Spore creature database. Each creature was shown at least once alone, and at least 
once as part of a group (the same image copied multiple times), with 74 tokens total 
(frequencies are shown in Appendix B).  
Images of the solo creatures were matched with singular form recordings, and the 
groups of creatures were paired with the corresponding plural form recordings (see 
Figure 3.1). The words were recorded by an adult male native American English speaker 
from Oregon. All pairings were in a random order, so corresponding singular-plural pairs 
were rarely adjacent. This trial order was chosen because random ordering encourages 
overgeneralization (which is necessary for evaluating Hypotheses 2 and 3), presumably 
by making it less obvious which singular-final consonants map onto the palatals [tʃ] and 
[dʒ] (see Chapter V). 
                                                
20 Including one memorable fellow who, when provided the singular [klip], produced [kliopætra]. 
71  
  
12 out of 28 of the singulars ended in the To-Be-Palatalized consonants (half voiced, 
half voiceless), with the remaining 16 split evenly between the other places * voices. The 
complete stimulus lists are available in Appendix B. 
All participants received an equal amount of training, unlike in White (2013, 2014), 
where participants were trained to a criterion level of accuracy: Training stopped once a 
constant level of accuracy was reached and low performers were excluded. We believe 
that training to criterion obscures the bias against large changes by ensuring that all 
participants learn the alternation equally well. 
4.2.3.2. Production Test 
An additional 92 pairs of names and pictures were created. As in the training, the 
singular picture was paired with a recording of the singular form, but in the test, it was 
immediately followed by the corresponding plural picture, which had no recording. 
Subjects were instructed to produce the appropriate plural for the given singular, which 
was recorded for later coding. 36 out of 92 of the trials ended with the To-Be-Palatalized 
consonants (half voiced, half voiceless), with the other 56 split evenly between the other 
places * voices. The complete stimulus lists are available in Appendix B. 
4.2.3.3. Judgment Test 
The judgment test followed production, since it exposed participants to forms that 
contradict the training. The stimuli were the same across all conditions, with 30 new 
singulars, divided equally between places * voices. Each singular had 4 possible plurals, 
crossing whether it was palatalized and which suffix vowel was added. The singular 
picture with the singular recording was followed by the plural picture with one of the 
72  
  
plural recordings, and all the pairs were randomly ordered. The complete stimulus list is 
available in Appendix A. 
4.2.4. Procedure 
4.2.4.1. Training 
At the beginning of the experiment, subjects were informed that each word was either 
a singular or a plural and that they would be tested on remembering some of them. The 
recall test occurred after going through the complete list of training trials once; subjects 
were shown 14 tokens (7 pairs) from the training and were asked to produce the correct 
form for each picture. The recall test was included to ensure that participants were 
motivated to pay attention, and the results are not included in any analyses. Following the 
recall test, the training stimulus list was presented two more times. 
For the training trials, each picture was shown for 500 ms, followed by the spoken 
wordform referring to it. The picture stayed on screen until the offset of the spoken word. 
There was a 500 ms blank screen pause between trials. The recall trials consisted of a 
picture on the screen, and participants had 6 seconds to say correct name (or terminate the 
trial by clicking mouse or hitting the space bar). There was a 1 second blank screen pause 
between recall trials. 
The experiment was conducted on E-Prime 2.0 Professional (Psychology Software 
Tools, Pittsburgh, PA). Participants wore headphones and could hit the spacebar on a 
keyboard to advance to the next trial. 
4.2.4.2. Production test 
The participants were shown a novel singular form, paired with the picture of a novel 
solo creature, followed 300 ms later by the picture of the corresponding group of 
73  
  
creatures. They were instructed to produce what they thought the correct plural was and 
had 3 seconds to speak before the trial ended. Trials were separated by a 1 second blank 
screen pause. The pairs were randomly ordered for every participant. The spoken 
responses were recorded onto the computer using a Sennheiser HMD 281 headset in an 
isolated room, and each plural was saved as a separate file for later coding by me or 
undergraduate RAs. (All of the RA codings were verified for accuracy, and where there 
was a conflict of opinion, I relied on my judgment or looked at the spectrogram if it was 
unclear.) 
We coded the identity of the stem-final consonant and word-final vowel. If the stem 
consonant was replaced by either palatal ([tʃ] or [dʒ], regardless of voicing of stem 
consonant), the plural was coded as palatalized. If the stem consonant was retained, it 
was coded as not palatalized. If the plural included both the stem consonant and a palatal, 
it was coded as not palatalized because the participant preserved the base (in other words, 
underapplied the change), but there were relatively few of these and the results are 
comparable with or without them, or if they are coded as palatalized. Rarely, participants 
replaced the stem consonant with another non-palatal consonant, which we excluded; any 
trials that included the English plural -s were also excluded, and any participants who 
produced a majority of such trials were excluded entirely (n = 5). 
4.2.4.3. Judgment test 
Like the production test, the judgment test consisted of novel singular-plural pairs. 
The singular form recording paired with the singular picture was followed by a 300 ms 
blank screen pause before the plural form recording played with the plural picture. The 
pairs were randomly ordered for every subject and were separated by a 1 second blank 
74  
  
screen pause. Subjects were instructed to indicate using a button box whether they 
thought the plural was the right one for the singular. The button box had 5 buttons, but 
the results were strongly bimodal: 59% of responses were 1 or 5, 29% were 2 or 4, and 
12% were 3, so we transformed the scale into a binary dependent measure for easier 
comparison to the production test: 1s and 2s were coded as 0 and 4s and 5s as 1, with 3s 
excluded (since they seemed to indicate indecision). See Figure 4.1, below, for the 
distribution of ratings by training condition and final consonant place of articulation. 
We considered using the binarized difference of ratings between the palatalized and 
non-palatalized plurals of the same singular with the same vowel (e.g. bup~bupi minus 
bup-butʃi), excluding trials that were rated equally. This would allow for a 
straightforward comparison to production (where we assume palatalization is produced if 
it is more acceptable than non-palatalization), but there were no effects of training on 
judgments of faithful plurals, so the results remain unchanged from the simpler absolute 
binary ratings: Since all faithful plurals are accepted equally often, the binarized 
difference measure reflects only differences in judgments of palatalized plurals, which 
can be captured just as well by absolute ratings, without the loss of trials where faithful 
and unfaithful forms were rated equally. We also preferred the absolute ratings because it 
allows us to examine the effects of condition on judgments of faithful and unfaithful 
plurals separately, which is necessary for evaluating Hypothesis 1. Prior studies (except 
for Stave et al., 2013) had participants choose between the faithful and unfaithful forms 
in production or forced-choice, which conflates preference for one form with dislike of 
another (e.g., if blup is the singular and blupi and blutʃi are the plurals, participants may 
choose blutʃi because they like it, or because they really dislike blupi). 
75  
  
 
Figure 4.1. Distribution of ratings by training condition (left) and final consonant place 
of articulation (right). The black bars indicate the means by factor level; the dotted line 
indicates the overall mean. 
  
4.2.5. Measures 
We ran generalized logistic linear mixed-effects models with the lme4 package 
(version 1.1-21, Bates et al., 2015) in R (version 3.6.0, R Development Core Team, 
2019). Fixed effects were included for Training Condition (Labial, Alveolar, and Velar, 
contrast coded as noted), Plural Vowel (-i vs. -a), Test Place (labial, alveolar, and velar), 
TBP (To-Be-Palatalized vs. Not-To-Be-Palatalized, given training condition), Test Voice 
(voiced vs. voiceless), Test Type (production vs. judgment), and any significant 
interactions. To evaluate the magnitude of improvement after training, Training (no 
[baseline data from Experiment 1) vs. yes [data from Experiment 2]) was included as a 
fixed effect. We included random intercepts for Subjects and Bases, with the full random 
effect structure that allowed the model to converge (selecting from Plural Vowel, Test 
Place, TBP, Test Voice, and Test Type within Subjects, and Training Condition, Plural 
Vowel, and TBP within Bases; random effect structures are included in the footnotes). 
76  
  
Log likelihood models on nested models were used to derive significance values. The 
BIC approximation to the Bayes Factor (Wagenmakers, 2007) was calculated when a 
contrast that was expected to be significant under a hypothesis was not significant in the 
model, as it allows us to directly test the degree of evidence supporting the null 
hypothesis. The tested models are in footnotes, and the full dataset and code are available 
at https://app.box.com/s/bd8jhx4g5m7bvlmxb8i4jgtjfjo2x111. 
4.3. Results 
4.3.1. Hypothesis 1: Labial palatalization is hard to learn because of faithfulness, not 
markedness 
It is possible that [t] and [k] are easier to change to [tʃ] before -i because [ti] and [ki] 
are worse (i.e. more marked) than [pi]. Judgments of faithful mappings are informative 
here; if the bias exists, then p~pi would be rated better than t~ti and k~ki (in the Labial, 
Alveolar, and Velar Palatalization conditions, respectively). But there is no such pattern21 
(Table 4.2), and in fact there is a slight non-significant trend in the unexpected direction 
for the Not-To-Be-Palatalized consonants (Table 4.3)22.  These results are shown in 
Figure 4.2. The evidence provides very strong evidence for the null (ΔBIC = 14.1, PBIC 
(H0 | D) = 0.999) and are contrary to the markedness explanation: The bias is against 
changes, not against certain output structures. 
 
 
                                                
21 RatingBin ~ Training Condition + Test Voice + Plural Vowel + (1 + Test Voice + Plural Vowel | 
Subject) + (1 + Plural Vowel | Base), Helmert contrast coded for Labial vs. Alveolar and Velar training and 
Alveolar vs. Velar training, restricted to ratings of incorrect faithful plurals. 
 
22 RatingBin ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 + Training Condition | 
Base), Helmert contrast coded for Labial vs. Alveolar and Velar training and Alveolar vs. Velar training, 
restricted to ratings of correct faithful plurals. 
77  
  
Table 4.2. Judgments of incorrect faithful mappings for To-Be-Palatalized consonants 
across training conditions. The inclusion of Training Condition does not significantly 
improve the fit of the model, χ2(2) = 0.45, p = 0.80, ns.  
 b se(b) z p  
(Intercept) -0.48295 0.22952 -2.104 0.0354 * 
Labial vs. Alveolar and Velar 
Training -0.08744 0.36176 -0.242 0.809  
Alveolar vs. Velar Training -0.27456 0.42757 -0.642 0.5208  
Voiceless -0.05531 0.15829 -0.349 0.7268  
-a 0.35555 0.20704 1.717 0.0859 . 
. Marginally significant 
* Significance level of 0.05 
 
Table 4.3. Judgments of correct faithful mappings for Not-To-Be-Palatalized consonants 
across training conditions; Training Condition does not significantly improve the fit of 
the model, χ2(2) = 0.098, p = 0.95, ns. 
 b se(b) z p  
(Intercept) 0.29886 0.13278 2.251 0.0244 * 
Labial vs. Alveolar and Velar 
Training -0.07223 0.24345 -0.297 0.7667  
Alveolar vs. Velar Training -0.03532 0.28153 -0.126 0.9002  
Voiceless 0.1244 0.10086 1.233 0.2174  
* Significance level of 0.05 
 
 
Figure 4.2. Judgments of faithful plurals across conditions. Left: judgments of incorrect 
faithful To-Be-Palatalized stems. Right: judgments of correct faithful Not-To-Be-
Palatalized stems. 
78  
  
4.3.2. Hypothesis 2: Large alternations, including saltation, are hard to produce 
Following Hypothesis 2, To-Be-Palatalized consonants should be palatalized less 
often when they require a larger change, which here is in the Labial Palatalization 
condition. There are large differences between the Labial and the lingual (Alveolar/Velar) 
conditions, as expected: To-Be-Palatalized consonants are palatalized significantly less 
often after Labial Palatalization training than after Alveolar and Velar Palatalization 
training23 (Table 4.4, Figure 4.3). There is no effect of or interaction with voicing, and the 
inclusion of Training Condition significantly improves model fit, χ2(2) = 29.47, p < 
0.001. Participants in the Velar Palatalization condition learn to palatalize velars, and 
those in the Alveolar Palatalization condition learn to palatalize alveolars, but there is no 
difference in palatalization rates of labials vs. the linguals for participants trained on 
Labial Palatalization (and in fact, they actually palatalize the To-Be-Palatalized 
consonants slightly less than the Not-To-Be-Palatalized consonants after Labial 
Palatalization training). That the larger labial palatalization change is applied to the to-be-
changed segments less often compared to smaller changes mirrors the results of Skoruppa 
et al. (2011) and is contra White (2013). 
                                                
23 Keep Place ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 | Base); Helmert coded 
for Labial vs. Alveolar and Velar and Alveolar vs. Velar Palatalization training conditions, restricted to 
productions of To-Be-Palatalized consonants before -i. 
 
79  
  
 
Figure 4.3. Palatalization rates before -i in production, by training condition. The bars 
represent the rate of palatalization in production of To-Be-Palatalized (light) and Not-To-
Be-Palatalized (dark) consonants, grouped by training language, which determines    
(Not-)To-Be-Palatalized status of consonants.  
 
 
Labials should also be erroneously palatalized less often than alveolars and velars, 
which is true and can be seen in Figure 4.4, where overgeneralization of palatalization to 
labials is shown by the dark bars and overgeneralization to linguals by the light bars. 
Table 4.5 shows that training on Labial Palatalization overgeneralizes more to alveolar-
final stems than Alveolar Palatalization training overgeneralizes to labial-final stems24. 
Test Place significantly improves the fit of the model, χ2(1) = 13.13, p < 0.001. Table 4.6 
shows that velars are palatalized more after Labial Palatalization training than are labials 
after Velar Palatalization training25, with Test Place significantly improving the model fit, 
χ2(1) = 7.13, p = 0.008. In other words, labial palatalization is less likely to be produced 
                                                
24 Keep Place ~ Test Voice + Test Place + (1 + Test Voice | Subject) + (1 | Base), restricted to Alveolar 
training palatalization of labials and Labial training palatalization of alveolars. 
 
25 Keep Place ~ Test Voice + Test Place + (1 + Test Voice | Subject) + (1 | Base), restricted to Velar 
training palatalization of labials and Labial training palatalization of velars. 
80  
  
than alveolar or velar palatalization, whether or not participants are exposed to it in 
training.  
 
Table 4.4. The effect of Training Language on (erroneous) retention rates of To-Be-
Palatalized consonants in production, before -i. Negative regression coefficients indicate 
higher rates of palatalization (less retention of the base consonant), which in this case 
means higher accuracy. 
 b se(b) z p  
(Intercept) 0.01121 0.36582 0.031 .976  
Labial vs. Alveolar and Velar 4.17856 0.80836 5.169 <.00001 *** 
Training 
Alveolar vs. Velar Training -1.20023 0.86301 - ⁠ a 1.391 .164  
Voiceless 0.12622 0.26445 0.477 .633  
*** Significance level of 0.001 
a According to the BIC approximation to the Bayes Factor, the results provide positive evidence for the null 
(ΔBIC = 3.8, PBIC (H0 | D) = 0.87). 
 
 
Figure 4.4. Palatalization rates of Not-To-Be-Palatalized consonants by stem-final 
consonant place of articulation and training condition. Left panel: Overgeneralization of 
alveolar palatalization to labials (light) vs. overgeneralization of labial palatalization to 
alveolars (dark). Right panel: Overgeneralization of velar palatalization to labials (light) 
vs. overgeneralization of labial palatalization to velars (dark).  
 
 
81  
  
Table 4.5. Overgeneralization of palatalization from alveolars to labials and labials to 
alveolars. 
 b se(b) z p  
(Intercept) 4.2555 0.9343 4.555 <.00001 *** 
Voiceless 0.7503 0.7457 1.006 .31432  
Alveolar Stem -3.3044 1.0073 -3.28 .00104 ** 
** Significance level of 0.01 
*** Significance level of 0.001 
 
Table 4.6. Overgeneralization of palatalization from velars to labials and labials to 
velars. 
 b se(b) z p  
(Intercept) 3.8124 0.784 4.863 <.00001 *** 
Voiceless 0.9275 0.7749 1.197 .23131  
Velar Stem -2.3129 0.8975 -2.577 .00996 ** 
** Significance level of 0.01 
*** Significance level of 0.001 
 
4.3.3. Hypothesis 3: Saltatory alternations are likely to be overgeneralized 
According to Hypothesis 3, alveolars and velars should be palatalized more by 
participants in the Labial Palatalization condition than by participants in the Velar and 
Alveolar Palatalization conditions, respectively. Since palatals are [coronal] and [dorsal] 
(Yun, 2006), labials ([labial]) need to jump over [coronal] and or [dorsal], whereas 
alveolars ([coronal]) and velars ([dorsal]) have a direct route without intermediate 
segments. In other words, alveolar palatalization doesn’t need to imply velar 
palatalization, and vice versa.  
In production, participants in the Labial and Velar Palatalization conditions palatalize 
alveolar-final stems equally frequently26 (Figure 4.5, left panel; b = -0.46, se(b) = 0.68, z 
= -0.67, p = 0.50), as do Labial and Alveolar Palatalization participants for velar-final 
                                                
26 Keep Place ~ Training Condition + Test Voice + Plural Vowel + (1 + Test Voice + Plural Vowel | 
Subject) + (1 + Training Condition + Plural Vowel | Base), restricted to Velar and Labial training 
palatalization of alveolars. 
82  
  
stem palatalization27 (Figure 4.5, right panel; b = -0.46, se(b) = 0.67, z = -0.69, p = 0.49). 
The BIC approximation to the Bayes Factor provides strong support for the null 
hypothesis in both cases (ΔBIC = 6.9, PBIC (H0 | D) = 0.97 and ΔBIC = 7, PBIC (H0 | D) = 
0.97, respectively). The study is sufficiently powerful to provide positive evidence in 
favor of the hypothesis that a saltatory change is no more likely to overgeneralize than a 
non-saltatory change (contra White, 2013, 2014).  
 
 
Figure 4.5. Overgeneralization of palatalization depending on magnitude. 
Overgeneralization of labial palatalization is shown by the light bars, and 
overgeneralization of lingual palatalization by the dark bars.  
 
In judgment, there is no difference in judgments of palatalized velars for participants 
in the Labial and Alveolar Palatalization conditions (Figure 4.6), both across vowel 
                                                
27 Keep Place ~ Training Condition + Test Voice + Plural Vowel + (1 + Test Voice + Plural Vowel | 
Subject) + (1 + Training Condition + Plural Vowel | Base), restricted to Alveolar and Labial training 
palatalization of velars. 
83  
  
contexts28 (b = -0.02, se(b) = 0.31, z = -0.70, p = 0.94, ns) and before -i, the palatalizing 
suffix29 (b = -0.40, se(b) = 0.44, z = -0.91, p = 0.36, ns). The data provide strong evidence 
for the null across suffixes (ΔBIC = 6.8, PBIC (H0 | D) = 0.97) and positive evidence 
before -i (ΔBIC = 5.4, PBIC (H0 | D) = 0.94). 
 
 
Figure 4.6. Acceptance of overgeneralization of palatalization to velars. 
 
There is also no difference in the rate of acceptance for palatalized alveolars after 
Velar and Labial Palatalization training (Figure 4.7) across vowel suffixes30 (b = -0.63, 
se(b) = 0.39, z = -1.59, p = 0.11, ns), and according to the BIC approximation to the 
Bayes Factor, the results provide positive evidence for the null hypothesis (ΔBIC = 4.4, 
                                                
28 RatingBin ~ Training Condition + Test Voice + Plural Vowel + (1 + Test Voice + Plural Vowel | 
Subject) + (1 + Training Condition | Base), restricted to Labial and Alveolar Palatalization training ratings 
of palatalized velars. 
 
29 RatingBin ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 + Training Condition | 
Base), restricted to Labial and Alveolar Palatalization training ratings of palatalized velars before –i. 
 
30 RatingBin ~ Training Condition + Test Voice + Plural Vowel + (1 + Test Voice | Subject) + (1 + 
Training Condition + Plural Vowel | Base), restricted to Labial and Velar Palatalization training ratings of 
palatalized alveolars. 
84  
  
PBIC (H0 | D) = 0.90). However, before -i, Velar Palatalization participants give 
marginally lower ratings of palatalized alveolars than Labial Palatalization participants31 
(b = -0.89, se(b) = 0.47, z = -1.86, p = 0.06; Trained Place marginally improves model fit, 
χ2(1) = 3.40, p = 0.06, though according to the BIC approximation to the Bayes factor, 
the results still provide positive evidence for the null hypothesis, ΔBIC = 2.78, PBIC (H0 | 
D) = 0.80). 
 
 
Figure 4.7. Acceptance of overgeneralization of palatalization to alveolars, across 
suffixes (left panel) and before -i (right panel). 
 
There is little evidence for saltation overgeneralizing, even in judgment, contrary to 
Hypothesis 3, and what differences there are are quite small, compared to the very large 
differences in production (Figure 4.3, page 80). 
4.3.4. Hypothesis 4: Large changes are hard to produce, even if they are judged to 
be preferable 
                                                
31 RatingBin ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 + Training Condition | 
Base), restricted to Labial and Velar Palatalization training ratings of palatalized alveolars before –i. 
85  
  
The bias against labial palatalization is expected to be stronger in production than 
judgment. Even if judgment is based on a model of production, participants are provided 
with a product form in the judgment task, minimizing differences in activation between 
forms that are associated with the source form to different degrees (see Harmon & 
Kapatsinski, 2017; Luce & Pisoni, 1998, for the same difference arising in open-set vs. 
closed-set tasks). Product-oriented schemas may also be sufficient to judge the product of 
palatalization to be more acceptable than an unpalatalized form even when they are not 
strong enough to overcome paradigmatic perseveration in production (Kapatsinski, 
2013); if participants learn that plurals often end in [tʃi], they may consider all plurals 
ending in [tʃi] well-formed, regardless of the input segment or whether they would 
produce it themselves. Judgments of faithful plurals are the same across conditions 
(Figure 4.2), so any differences in judgment between conditions must be driven by 
ratings of palatalized forms (Figure 4.8, see p. 78 for Figure 4.2). 
In Table 4.7, the dependent variable was coded as 0 if the trial was non-palatalized in 
production and the non-palatalized form had a rating under 3 in judgment, and 1 if the 
trial was palatalized in production and the palatalized form was rated over 3 in judgment. 
We can see the dissociation between production and judgment after Labial Palatalization 
training (left bars of Figure 4.8 and 4.2; see also Figure 4.10): Labial palatalization is 
accepted, but rarely produced. The interaction between Labial Training and Test is 
significant in Table 4.732, and it significantly improves model fit (χ2(1) = 21.94, p < 
0.001). This interaction is not present for Velar or Alveolar Palatalization training, where 
subjects produce palatalization as often as they accept it. 
                                                
32 Dependent Variable ~ Test Voice + Labial Training * Test + (1 + Test Voice + Test | Subject) + (1 | 
Base), restricted to To-Be-Palatalized trials in production and judgments of palatalized forms, before –i. 
86  
  
 
Figure 4.8. Acceptance of palatalization in judgment; take special notice of the high rate 
of acceptance of labial palatalization. 
 
Table 4.7. The effects of training on Labial vs. Alveolar and Velar Palatalization on 
judgment vs. production of palatalized forms before -i. 
 b se(b) z p  
(Intercept) -1.3751 0.4113 -3.344 0.000827 *** 
Voiceless 0.218 0.2077 1.049 0.294075  
Labial Training 4.0657 0.7809 5.207 1.92E-07 *** 
Judgment Test -0.3777 0.4064 -0.929 0.352711  
Labial Training x 
Judgment Test -3.6816 0.786 -4.684 2.81E-06 *** 
*** Significance level of 0.001 
 
It is not just that judgments are more lenient than production (Kempen & Harbusch, 
2005). Figure 4.9 shows that judgments of faithful p~pi (left bars) are lower for Labial 
Palatalization participants than palatalized p~tʃi (right bars), and Keep Place significantly 
improves model fit, both before -i33 (χ2(1) = 11.86, p < 0.001) and across suffix vowels34 
                                                
33 RatingBin ~ Test Voice + Keep Place + (1 + Keep Place + Test Voice | Subject) + (1 + Keep Place | 
Base), restricted to Labial Palatalization training ratings of To-Be-Palatalized labial stems before –i. 
 
34 RatingBin ~ Test Voice + Keep Place + Plural Vowel + (1 + Keep Place + Test Voice + Plural Vowel | 
Subject) + (1 + Keep Place | Singular), restricted to Labial Palatalization training ratings of To-Be-
Palatalized labial stems. 
87  
  
(χ2(1) = 4.53, p = 0.03). Palatalization is preferred to non-palatalization in the judgment 
task after training, but it nonetheless fails to be produced. In other words, the mapping 
that is preferred in production is dispreferred in judgment. 
 
 
Figure 4.9. Judgments of To-Be-Palatalized plurals by faithfulness, before -i. 
 
These effects are reflected in the 3-way interaction in Table 4.8, shown in Figure 
4.1035. After Alveolar and Velar Palatalization training, palatalization of To-Be-
Palatalized consonants is produced about as often as it is accepted (left panel), but 
palatalization of Not-To-Be-Palatalized consonants is still accepted more than half the 
time, where it is seldom produced (right panel). Additionally, To-Be-Palatalized velars 
are palatalized more than To-Be-Palatalized labials, but correct palatalization of velars is 
accepted as much as correct palatalization of labials.  
                                                
35 Dependent Variable ~ Test Voice + Labial Training * Test * To-Be-Palatalized Place + Plural Vowel + 
(1 + Test * To-Be-Palatalized Place | Subject) + (1 + Labial Training + To-Be-Palatalized Place | Base).  
88  
  
 
Figure 4.10. Comparison of the rate of palatalization in production (light bars) to the rate 
of acceptance of palatalized plurals in judgment (dark bars), by training language. Left 
panel: To-Be-Palatalized stems, before -i. Right panel: Not-To-Be-Palatalized stems, 
before -i. 
 
Table 4.8. The effects of training on Labial vs. Lingual palatalization on correct vs. 
erroneous palatalization and judgments of correct vs. erroneous palatalization. Bolded 
rows show effects of interest. Palatalization is accepted more often than produced but 
especially so after labial training. Labial training reduces or reverses the difference in 
palatalization rates and judgments of palatalization between To-Be-Palatalized and Not-
To-Be-Palatalized consonants. However, this reduction is smaller in judgments than in 
production. 
 b se(b) z p  
(Intercept) 0.16494 0.24954 0.661 0.509  
Voiceless 0.06059 0.0944 0.642 0.521  
Labial Training 3.03852 0.58809 5.167 2.38E-07 *** 
Rating Test -1.74 0.23899 -7.281 3.33E-13 *** 
Not-To-Be-Palatalized Place 1.85559 0.25219 7.358 1.87E-13 *** 
 -a 2.23907 0.12595 17.778 2.00E-16 *** 
Labial Training  
x Rating Test -2.81071 0.59131 -4.753 2.00E-06 *** 
LabialTraining  
x Not-To-Be-Palatalized Place -3.38016 0.42421 -7.968 1.61E-15 *** 
Not-To-Be-Palatalized Place  
x Rating Test -1.44839 0.2256 -6.42 1.36E-10 *** 
Labial Training x Rating Test  
x Not-To-Be-Palatalized Place 3.4364 0.4113 8.355 2.00E-16 *** 
*** Significance level of 0.001 
89  
  
The production data suggests that Labial Palatalization participants fail to learn the 
paradigmatic association between labials and alveopalatals, and are therefore unable to 
produce labial palatalization, but the judgment data shows that they still learn that 
palatalization is better than lack thereof. Prior work has shown that participants in similar 
experiments acquire first-order schemas (like “plurals end in [tʃi]”; Kapatsinski, 2012, 
2013), which can explain the preference for palatalized over faithful forms, even after 
Labial training. In fact, the first-order schema might even be stronger after Labial 
training. Because they have failed to acquire an association between labials and 
alveopalatals, every time [tʃi] occurs it is surprising, which may help notice that plurals 
often end in [tʃi], strengthening the first-order schema. By definition, a first-order schema 
applies to all inputs equally, and this is all that participants in the Labial condition can 
rely on because they have failed to learn what should be changing (i.e., to acquire the 
paradigmatic mapping pàtʃi). Alveolar and Velar Palatalization participants do acquire 
the relevant paradigmatic mappings, as evidenced in their higher rates of production of 
palatalization for the To-Be-Palatalized consonants (see Figure 4.3), and they can use 
those to drive their judgments, which accounts for the higher ratings of correct 
palatalization over incorrect in those conditions (see Figure 4.8). However, there is still 
evidence that they acquire first-order schemas, as well: Faithful plurals of Not-To-Be-
Palatalized stems are rated lower than palatalized plurals before -i36 (b = -0.59, se(b) = 
0.21, z = -2.85, p = 0.004; Keep Place significantly improves model fit, χ2(1) = 7.61, p < 
0.006), even after Alveolar and Velar Palatalization training, as shown in Figure 4.11. 
Labial Palatalization training results in higher ratings for Not-To-Be-Palatalized stems, 
                                                
36 Rating Bin ~ Training Language + Keep Place + Test Voice + (1 + Keep Place + Test Voice | Subject) + 
(1 | Base), restricted to judgments of Not-To-Be-Palatalized plurals before -i. 
90  
  
overall (b = 0.68, se(b) = 0.29, z = 2.37, p < 0.02), and the inclusion of Training 
Language significantly improves model fit (χ2(2) = 6.89, p = 0.03), but all training 
languages result in preference for the unfaithful plural over the faithful. The lingual 
conditions participants’ preference for palatalizing the correct consonants over the 
incorrect consonants can be attributed to paradigmatic mappings, but their preference for 
palatalizing the incorrect consonants over not palatalizing them must be due to product-
oriented schemas favoring plurals ending in [tʃi] and [dʒi]: Changes that are rarely 
produced are still accepted, presumably because the resulting structure has become 
associated with the intended meaning (Kapatsinski, 2013). 
 
 
Figure 4.11. Judgments of Not-To-Be-Palatalized plurals by faithfulness, before -i. 
 
 
4.3.5. Hypothesis 5: The bias against labial palatalization is due to perceptual 
dissimilarity 
Hypothesis 5 claims that the bias against labial palatalization is perceptual, because 
[pi] and [tʃi] are acoustically very dissimilar. By extension, [k] should be palatalized 
91  
  
more than [g], since [ki] is more confusable with [tʃi] than [gi] is with [dʒi] (Guion, 
1998).  
Comparing palatalization rates of voiced and voiceless velars by training condition 
(Figure 4.11, left panel) reveals that [k] is significantly less likely to be palatalized than 
[g]37 (b = 0.77, se(b) = 0.28, z = 2.75, p = 0.006; Test Voice significantly improves model 
fit, χ2(1) = 7.43, p = 0.006). In judgment (Figure 4.12, right panel), palatalized plurals of 
[k]-final stems are liked marginally less than palatalized plurals of [g]-final stems38 (b =  
-0.59, se(b) = 0.33, z = -1.80, p = 0.07; Test Voice marginally improves model fit, χ2(1) = 
3.31, p < 0.07). 
 
 
Figure 4.12. Rates of palatalization in production (left panel) and acceptance of 
palatalization in judgment (right panel) of velars before -i by voicing and training 
condition. 
 
                                                
37 Keep Place ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 | Base), restricted to 
productions of velar-final stems before –i.  
 
38 RatingBin ~ Training Condition + Test Voice + (1 + Test Voice | Subject) + (1 + Training Condition | 
Singular), restricted to ratings of palatalized velar-final stems before –i. 
92  
  
Combined, the results show that, contrary to the perceptual similarity hypothesis, the 
less similar alternants g~dʒ are preferable to the more similar k~tʃ. This is likely due to 
categorization, since in English orthography, [g] and [dʒ] can both be written with <g> 
(Gontijo et al., 2013), whereas [k] and [tʃ] have minimal orthographic overlap. This result 
mirrors Wilson’s (2006) findings. We would expect that languages with different 
phonology-orthography mappings would result in different learning patterns. For 
example, in Italian, [g] and [dʒ] can both be written as <g>, and [k] and [tʃ] can both be 
written as <ch> and <c> (Proudfoot & Cardo, 2005). To the extent that orthographic 
overlap determines alternation learnability (aside from other factors like frequency), 
Italian orthography favors equivalent learning of k~tʃ and g~dʒ. Turkish orthography 
writes [k], [g], [tʃ], and [dʒ] with distinct characters (<k>,<g>,<ç>, and <c>, respectively; 
Underhill, 1976). We would still expect k~tʃ and g~dʒ to be learned equally well, but 
because Turkish orthography does not group the stop and corresponding alveopalatal 
together, Turkish speaker-writers should show overall lower rates of alternation 
acquisition compared to Italian. Future work is needed to evaluate this hypothesis. For 
the time being, our results suggest that whatever effect perceptual similarity may have on 
the learnability of alternations, it is insufficient to overcome other categorization effects. 
By extension, the perceptual dissimilarity between [p] and [tʃi] should have minimal 
effect on the learnability of the p~tʃi alternation, for which other explanations must be 
explored – such as the proposed articulatory effects.  
4.3.6. Effect of training 
In order to ensure that the patterns found are due to learning and not pre-existing 
biases, we compare the judgments from Experiment 1 (Chapter III) to the judgment data 
93  
  
from Experiment 2. In particular, we compare the acceptance of palatalization (in the 
judgment task) of To-Be-Palatalized plurals before -i after training to the acceptance of 
palatalization in the baseline experiment, with Training as the new factor. The dependent 
variable was coded as 0 if the palatalized form had a rating under 3 and 1 if the 
palatalized form was rated over 3 in judgment.   
For the judgment test, subjects who received training judge palatalized plurals as 
being better than subjects without training39 (b = -1.97, se(b) = 0.32, z = -6.14, p < 
0.001), and the inclusion of Training significantly improves model fit (χ2(1) = 29.33, p < 
0.001). This can be seen in the comparison of the light bars (no training) and dark bars 
(judgment test) in Figure 4.13, where the dark bars are all higher than the corresponding 
light bars. However, there is no interaction between Test Place and Training40 (labials vs. 
linguals, b = -0.11, se(b) = 0.58, z = -0.19, p = 0.85, ns; alveolars vs. velars, b = -0.12, 
se(b) = 0.50, z = -0.25, p = 0.80, ns), indicating that all conditions learn an equal amount 
about the acceptability of palatalizing what should be palatalized; the difference between 
the grey bars and white bars are roughly uniform across all places. According to the BIC 
approximation to the Bayes Factor, the data provide very strong support for the null 
(ΔBIC = 13.9, PBIC (H0 | D) = 0.999). 
 
                                                
39 Dependent Variable ~ Training + Test Place + Test Voice + (1 + Test Voice + Test Place | Subject) + (1 
+ Training | Singular), restricted to ratings before –i of all plurals from No Training condition and only To-
Be-Palatalized plurals from the Experiment 2 training data. 
 
40 Dependent Variable ~ Training * Test Place + Test Voice + (1 + Test Voice + Test Place | Subject) + (1 
+ Training | Base), restricted to ratings before –i of all plurals from No Training condition and only To-Be-
Palatalized plurals from the Experiment 2 training data. 
94  
  
 
Figure 4.13. Acceptance of correct palatalized plurals, before -i. The light bars represent 
the acceptance rate of palatalized plurals before -i at each of the places of articulation in 
the baseline judgment condition, without training. The dark bars indicate the acceptance 
rate of correct (To-Be-Palatalized) palatalized plurals before -i at each of the places of 
articulation (which here are also the training conditions, i.e. Labial participants’ 
judgments of correct palatalized labial-final stems) in the judgment test after training.  
 
If the strength of associations between production representations at least partly 
determines whether an alternation will be produced, as proposed by the Perseveration 
Hypothesis, then the lack of production of Labial Palatalization would suggest that the 
paradigmatic associations between labials and alveopalatals have not been formed. What, 
then, explains the increase in ratings of labial palatalization after training? We propose 
that this increase in judgments is based on the acquisition of product-oriented schemas 
(Kapatsinski, 2012, 2013), so the improvement in judgments after training reflects that 
participants have learned that [tʃi] and [dʒi] are good indicators of plurality. The fact that 
judgments change as a result of training indicates that even Labial Palatalization 
95  
  
participants learn first-order schemas, but these schemas are insufficient to overcome the 
production-internal bias against changing labials into alveopalatals41. 
4.4. Discussion 
Judgments of faithful plurals, for both To-Be-Palatalized and Not-To-Be-Palatalized 
stems, are equal for all training languages. The results support Hypothesis 1: Alveolars 
and velars are not palatalized more than labials because [ti] and [ki] are poorly formed 
and palatalizing results in a less marked output, but because the change from [p] to [tʃ] is 
undesirable (see also §4.4.1.1.1). Labials are correctly palatalized less than alveolars and 
velars are, and are also palatalized in error less than alveolars and velars are, supporting 
Hypothesis 2. Labial palatalization is not more likely to be overgeneralized in general, 
however; while Labial Palatalization generalizes to alveolars and velars more than 
Alveolar and Velar Palatalization generalizes to labials, it does not generalize more to the 
linguals than the linguals do to each other, so Hypothesis 3 is not supported. While Labial 
Palatalization participants rarely palatalize in production, they accept palatalized labials 
more than faithful labials in judgment. Even though they know what they should produce, 
they are unable to do so, just as Hypothesis 4 predicts. Contrary to the perceptual 
similarity hypothesis in Hypothesis 5, palatalization of [k] is not produced or accepted 
more than palatalization of [g], even though [ki] is perceptually and acoustically more 
similar to [tʃi] than [gi] is to [dʒi] (Guion, 1998). In fact, palatalization of [g] is slightly 
preferred to palatalization of [k], as was the case in Wilson (2006), likely due to English 
orthography. 
                                                
41 *Map constraints (Zuraw, 2007) could capture the bias against labial palatalization by ranking *Map(p,tʃ) 
/ *Map(b,dʒ) higher than *Map(t,tʃ) / *Map(d,dʒ) and *Map(k,tʃ) / *Map(g,dʒ). White (2013) uses *Map 
constraints based on the P-map (Steriade, 2001/2009) to capture saltation, whereas we propose that 
articulatory similarity is the relevant metric. 
96  
  
Despite the low rate of production palatalization after Labial Palatalization training, 
judgments of labial palatalization increase (compared to the baseline) an equal amount in 
all conditions. If paradigmatic associations between production representations drive 
production of an alternation, as proposed by the Perseveration Hypothesis, the results 
suggest that Labial Palatalization participants fail to learn the mapping between labials 
and alveopalatals, but the improvement in judgments of labial palatalization indicates that 
they do learn something. We propose that they learn product-oriented schemas (Bybee, 
1985, 2001; Kapatsinski, 2012, 2013; Nesset, 2008), which describe the characteristics 
certain types of forms are likely to have (like “plurals end in [tʃi]”), and can be used in 
making judgments. Even though Labial Palatalization participants do not manage to 
associate labials with alveopalatals, they still notice the high rate of [tʃi] in plurals, which 
allows them to acquire the first-order schema. They then apply the schema to all input 
consonants equally, having no paradigmatic mappings to tell them otherwise. Alveolar 
and Velar Palatalization participants also acquire the first-order schema, as shown by the 
preference for palatalized Not-To-Be-Palatalized stems over faithful. However, they also 
learn to associate alveolars and velars, respectively, with alveopalatals, and these 
mappings compete with the first-order schema, driving down the rates of acceptance of 
palatalized Not-To-Be-Palatalized consonants compared to palatalized To-Be-Palatalized 
consonants. 
4.4.1. Implications for other theories 
4.4.1.1. Perceptual similarity 
Steriade (2001/2009) proposed the P-map, a store of perceptual similarities between 
segments in context, which speakers rely on in their quest to avoid noticeable changes 
97  
  
that violate speech norms. In our study, as in Zuraw’s (2000) study of Tagalog, the 
speech norms (as reflected in judgment scores) encourage changing the stem, but 
speakers still fail to produce the change. Perhaps first-language experience makes English 
speakers prefer some alternations, but the equivalent judgments of faithful mappings 
show that the preference cannot be reduced to phonotactics, and the same pattern of 
results was found in Stave et al. (2013) (though with overall less likelihood of 
palatalization, since the context was phonetically unnatural; Mitrović, 2012; Wilson, 
2006).  
The bias against labial palatalization is against certain changes, not for/against certain 
structures. The data suggest that English speakers know that labials are less changeable 
than velars, which are less changeable than alveolars. However, White (2013) found no 
evidence for alternations targeting labials (p~v) to be any harder to learn than those 
targeting alveolars (t~θ). Together, the results indicate that learners assign prior 
probabilities to alternations/paradigmatic mappings, which can be captured by *Map 
constraints in OT (Zuraw, 2007) and operations in rule-based phonology.   
One difference between our results and White’s (2013, 2014) is that in our 
experiment no-change errors on the trained segments (i.e. failing to change what should 
be) are the most common error in production, and in his experiment large changes 
overgeneralize more than small in production and a two-alternative forced choice task.42 
Our results are similar to Skoruppa et al. (2011), and we think the difference is due to the 
exclusion of subjects who made too many no-change errors in White (2013, 2014), which 
                                                
42 In our experiment we do find greater overgeneralization of the large change in judgment, with Labial 
Palatalization extending to alveolars and velars more than Alveolar and Velar Palatalization extend to 
labials, which we attribute to the product-oriented schema having no competition from paradigmatic 
mappings in the Labial condition, which allows it to apply everywhere. 
98  
  
affected the large change condition more than the small (White, 2013, p. 72). Had those 
participants been included, there would have been a higher proportion of no-change 
errors after exposure to the large change.  
Another possibility is that perhaps our and Skoruppa et al.’s (2011) results can be 
explained by first-language experience, whereas White’s (2013, 2014) results cannot. 
Maybe familiarity affects the learnability of a change (unfamiliar changes are harder to 
notice in training and perform in production), and magnitude affects the likelihood the 
change will be overgeneralized (large changes are more likely to be overgeneralized than 
small changes). We doubt this proposal, however, because the changes presented in 
White (2013, 2014) were not necessarily novel, either: Turning a voiceless stop into a 
voiced fricative between vowels was compared to intervocalic lenition of a voiced stop, 
and while English does not have categorical intervocalic stop lenition, it does have fairly 
common variable lenition (Davidson, 2011; Honeybone, 2001; Sangster, 2001; Warner & 
Tucker, 2011). Subjects at UCLA likely have exposure to Spanish and Spanish-accented 
English, which have voiced stop lenition that tends to preserve voicing (Zampini, 1996). 
Therefore the small changes in White (2013, 2014) may have been more familiar than the 
large. Additionally, diachronic patterns suggest that large changes are more likely to lose 
productivity than they are to be generalized (Bybee, 2008). We should still replicate our 
results with speakers of other languages, especially languages with labial palatalization, 
like the Southern Bantu languages Xhosa and Sotho (Bennett & Braver, 2015; Ohala, 
1978); their exposure to labial palatalization may result in different biases, if the 
experience of producing and hearing labials palatalizing is able to overcome the greater 
articulatory distance between labials and palatals, but we expect results to be comparable. 
99  
  
Serbian has productive velar palatalization before [e] but not [i], but Serbian speakers still 
learned velar palatalization before [i] better than before [e] in an artificial language 
experiment (Mitrović, 2012). It would also be interesting to test speakers of languages 
where none of the palatalization patterns are productive (e.g. Catalan, Finnish, Kannada, 
Tagalog, Tamil; Bateman, 2007). 
4.4.1.1.1. Influence of markedness 
Prior researchers have proposed that markedness has an effect on the learnability of 
alternations. Wilson (2006) found that palatalizing [k] before -i is more likely than 
palatalizing [k] before -e, because [ki] is more marked than [ke]. White (2017) argued 
that learning to produce large changes requires a very high weight for the competing 
markedness constraint. However, our subjects show no difference in the acceptability of 
[pi] vs. [ti] vs. [ki], even though they are more willing to change [t] and [k] than [p]. The 
prior studies did not have participants judge faithful forms, so it is unclear if Wilson 
(2006) and White (2013, 2014) would show similar patterns, had that comparison been 
included. Contrary to the standard assumptions of OT and the models of Wilson (2006) 
and White (2017), we think there is no particular connection between the markedness of a 
structure (like [ki]) and the productivity of a change that avoids that structure (like 
[k]à[tʃi]). We view production of alternations as being driven by attraction to 
particularly good outputs (Bybee, 2001; Kapatsinski, 2013) following the paths of 
paradigmatic associations when these are available, rather than avoidance of marked 
outputs.  
 
 
10 0 
  
4.4.1.2 Storage economy 
Kenstowicz’s (1996) proposal that Paradigm Uniformity improves lexical access 
fares better given our results, assuming that the bias against large changes is within the 
production system. Perhaps speakers assume that p~tʃ is harder for listeners to undo than 
k~tʃ, and so avoid the larger change because of the potential for misunderstanding. There 
is evidence that speakers avoid ambiguity by hyperarticulating cues that distinguish a 
word from its minimal pair neighbors (Baese-Berk & Goldrick, 2009; Wedel et al., 
2013). We consider that a desire to maintain recoverability of the base (or underlying) 
form is an unlikely explanation for the avoidance of large alternations because 
neutralization of underlying contrasts does not lead participants to avoid alternations that 
result in it in comparable experiments. For example, adding examples of tʃ~tʃi makes 
participants more likely to change [k] into [tʃi], even though these examples make it 
unclear whether a [tʃi]-final plural originated from a [k]- or [tʃ]-final singular 
(Kapatsinski, 2009, 2012, 2013). The only circumstance in which neutralization has been 
shown to result in avoidance of an alternation is when it both resulted in complete 
homonymy between corresponding forms, and these forms were adjacent to each other 
(Kapatsinski, 2017b). In the present experiment, neither large nor small alternations result 
in homonymy, and the corresponding forms are randomly ordered. 
4.4.1.3. Categorization 
According to the categorization account (Moreton & Pater, 2012a), labial 
palatalization should overgeneralize to alveolar and velar palatalization, since any 
category including labials and alveopalatals (such as [-continuant]) also includes 
alveolars and velars. Our results (Hypothesis 3) show no more overgeneralization from 
10 1 
  
labials to alveolars than from velars to alveolars, or from labials to velars than alveolars 
to velars. When subjects are not trained to criterion (cf., White, 2013, 2014), they are not 
able to overcome the bias against the large changes, making large changes no more likely 
to be overgeneralized, merely less likely to be produced.  
There is previous research that has shown that the featural complexity of categories 
corresponds to learnability. In Skoruppa et al. (2011) and White (2013, 2014), 
alternations that differed by one feature were easier to learn and perform than those that 
differed by more than one feature. However, the alternations with more than one feature 
different were also saltatory. Prior work has suggested these alternations are harder to 
learn because of category learning (Moreton & Pater, 2012a; White, 2013, 2017), but we 
suggest they are more difficult because they are a type of large change, which are harder 
to learn than small changes generally (§2.2.1). It is unclear whether the existence of an 
intermediary sound is necessary for the larger change to be harder to learn (e.g. in an 
alternation between [b] and [f], [p] and [v] are intermediary), so future work should 
investigate the learnability of the same change for speakers who have the intermediate 
sound in their inventory vs. those who do not. 
4.4.1.4. Learnability 
If the learnability of an alternation is based on perceptual similarity (Wilson, 2006), 
then k~tʃi should be easier to learn than g~dʒi, since [ki] and [tʃi] sound more similar 
than [gi] and [dʒi] (Guion, 1998). Our results contradict this, with [g] being palatalized 
significantly more often than [k] (and palatalized [g]-final stems being rated marginally 
better than palatalized [k]-final stems, Hypothesis 5), which suggests that the link 
10 2 
  
between perceptual similarity and learnability is not a causal one.43 Alternatively, perhaps 
it is precisely because it is more noticeable that g~dʒ is easier to acquire than k~tʃ. 
However, were this the case, we would expect labial palatalization to be more productive, 
or at least learned faster, than velar or alveolar palatalization, which is distinctly not what 
we find. The effect is therefore probably due to orthography: [k] and [tʃ] spellings are 
largely distinct (<k> always maps onto [k], <c> maps onto [tʃ] 3% of the time, and <ch> 
maps onto [tʃ] 87% of the time), whereas [g] and [dʒ] often overlap (<g> maps onto [dʒ] 
about 30% of the time; Gontijo et al., 2003). Prior research has shown that it is easier to 
learn alternations between sounds that can be grouped into one category (Moreton & 
Pater, 2012a), and that participants tend to convert sounds into their orthographic 
representation (White, 2013). The shared spellings of [g] and [dʒ] in English appear to 
make them easier to associate than the distinct spellings of [k] and [tʃ]. Future research 
should investigate whether the asymmetry between palatalization of [g] and [k], 
unexpected on the basis of perception, holds for other languages with different 
orthographic correspondences (see §4.3.5), and for pre-literate children. 
Another possibility is that the confusability values from Guion (1998) are not the best 
measure to evaluate perceptual similarity. Wang & Bilger (1973) showed that [g] and 
[dʒ] are more confusable than [k] and [tʃ] if confusions before [u], [i], and [ɑ] are 
combined, raising the question of whether context-sensitive or context-independent 
values should be used. Steriade (2000, 2001/2009) proposed that perceptual similarity 
must be context-specific to account for typological asymmetries in assimilation patterns. 
White (2017) uses context-independent confusion probabilities from Wang & Bilger 
                                                
43 Additionally, all training languages increase judgments of palatalized forms an equal amount over the no-
training baseline; in other words, labial palatalization (perceptually dissimilar) is learned in judgment as 
well as alveolar and velar palatalization (perceptually similar).  
10 3 
  
(1973) as an estimate of perceptual similarity between alternating segments, which 
enables his model to capture the slight preference for [g] found here and in Wilson 
(2006). Palatalization may be harder to learn in a phonetically-unmotivated context 
(Mitrović, 2012; Wilson, 2006; see also Chapter VII), but is the influence of the context 
segments independent from the identity of the input segments? In other words, do 
speakers assign probabilities to rules (pàtʃ/_a), or do they assign probabilities to changes 
(pàtʃ) and their outputs (tʃa) and use the combined probability to evaluate the likelihood 
of a particular change resulting in a particular output (Labov, 1969)? The prior 
probability of a change is at least partially context-independent, since pàtʃ is harder to 
learn than tàtʃ or kàtʃ, whether before -i (as in the present experiment) or -a (Stave et 
al., 2013). However, Experiment 2 and the experiment in Stave et al. (2013) differ in 
other respects than just the suffix vowel. Comparison of Experiment 2 to Experiment 3, 
which differ only in the magnitude of change (labial vs. velar) and the context of the 
change (-i vs. -a), shows that velar palatalization is produced more than labial 
palatalization only before -i, with no difference found before -a. The degree to which 
probability of change is context-dependent vs. context-independent is therefore still an 
open question. One final possibility is that perceptual similarity is more abstract than 
confusability (i.e. originating from a higher level of representation) and could be affected 
by, for example, how often two sounds share a spelling. Future work should test how well 
an alternation between two articulatorily dissimilar and acoustically or perceptually 
similar segments is learned; articulatory similarity predicts that an alternation like f~θ 
would be difficult to learn, where perceptual similarity predicts the opposite. 
 
10 4 
  
4.4.2. Limitations 
The primary limitation of this study is that all participants were native American 
English speakers, who may have generalized from their knowledge of English or imposed 
English patterns on the artificial language (Finn & Hudson Kam, 2009). This subject pool 
allows us to make comparisons to the perceptual data from Guion (1998) and previous 
palatalization learning experiments (Wilson, 2006; Kapatsinski, 2012, 2013; Stave et al., 
2013), but it also means the results could be due to first-language phonological 
experience rather than the difference in change magnitude. In particular, English has 
alveolar palatalization before glides in frequent phrases like would you and bet you and in 
words like creature (cf. create) or torture (cf. extort). While the former do not involve a 
complete change in place of articulation (Zsiga, 1995) and the latter are of doubtful 
productivity, the existence of the patterns may have made alveolar palatalization easier to 
learn to produce. First-language transfer may not be an insurmountable factor in 
miniature artificial language learning (Garcia et al, 2017; Mitrović, 2012; Wang & 
Saffran, 2014), but it would still be worthwhile to see the pattern of results for speakers 
of languages without alveolar palatalization. 
The other potential limitation is that which plagues all artificial language learning 
experiments, namely that learning in the lab may not be the proxy for learning in more 
natural contexts that we take it to be. Learners in the lab are exposed to a much more 
impoverished input, for a much shorter period of time, than learners of natural languages. 
While this allows a degree of control over the variables of interest that is not possible 
with natural language, it may also lead to different learning strategies than would 
normally be applied. However, Friederici et al. (2002) showed that patterns of brain 
10 5 
  
activation when processing a miniature artificial language are comparable to those 
activated when processing natural (first) language, and Ettlinger et al. (2016) found that 
performance in artificial language tasks correlates with performance of L2 Spanish 
learners, suggesting that the same abilities are recruited in both cases. While the 
simplified structure and limited exposure to linguistic input in the lab is a definite 
limitation, it seems that similar resources are relied on as are in natural language. There is 
always the possibility that the findings from laboratory experiments do not all generalize 
to natural language, but they at least provide a starting point for further investigation. 
4.4.3. Summary 
The Perseveration Hypothesis as an explanation for Paradigm Uniformity proposes 
that stem changes are leveled by paradigmatic perseveration within the production 
system. When creating a novel form, the speaker activates other related forms and the 
meaning to be expressed, and articulatory gestures from the base are incorporated into the 
form being produced through a blending process. When too much of the base is copied, 
the stem change is leveled. Paradigmatic associations between related forms prevent 
leveling by specifying that X gesture in the base form activates Y gesture in the target 
form, and it is harder to learn associations between dissimilar representations because 
they require greater synaptic modification. Paradigmatic associations can drive 
judgments, and we argue they are a factor in palatalizing the To-Be-Palatalized 
consonants in the small change conditions (Alveolar and Velar Palatalization). As a 
result, learners in these conditions judge palatalization of To-Be-Palatalized consonants 
as more acceptable than palatalization of Not-To-Be-Palatalized consonants. However, 
product-oriented schemas (like “plurals end in [tʃi]”) also influence judgments. In the 
10 6 
  
absence of paradigmatic associations, only product-oriented schemas are present. 
Because participants in the large change condition (Labial Palatalization) are unable to 
form the appropriate paradigmatic associations, [tʃi] is always a surprise when it occurs, 
which makes it more salient and therefore available and activated for all inputs, resulting 
in overgeneralization of palatalization from labials to velars and alveolars in judgment. 
Palatalization of Not-To-Be-Palatalized consonants benefits from the salience of the [tʃi] 
schema in the labial condition, whereas palatalization of To-Be-Palatalized consonants is 
hurt by the absence of the second-order schema. As a result, the difference in judgments 
between the two is much smaller in the labial condition compared to the others. 
The ultimate goal of any theory of Paradigm Uniformity is to explain natural 
language, not just metalinguistic judgments or elicited production. If we are interested in 
modeling the process responsible for the avoidance of large changes in language, the 
production task is the more informative, and it tells us that large changes are rare cross-
linguistically because they are hard to perform, so they lose productivity. This loss of 
productivity is extremely common, and may in fact be a diachronic universal (Bybee, 
2008); for an example in English, consider the k~s alternation in electric~electricity, and 
how it is not extended to the intermediate [t] (Pierrehumbert, 2006). Based on this, we 
expect that labial palatalization will be lost in languages that have it (like Southern Bantu; 
Ohala, 1978), rather than be generalized to all stops. In the first study on the productivity 
of labial palatalization, Bennett & Braver (2015) found that it is indeed only partially 
productive in Xhosa. Based on universal diachronic patterns, we believe that 
paradigmatic perseveration is partially responsible for the rarity of large stem changes 
10 7 
  
typologically: They are hard to perform in production, and difficult stem changes are 
especially likely to be leveled by performance pressure. 
In Chapter V we present the results of Experiment 3, which was designed to 
investigate whether temporal contiguity of related pairs can make the paradigmatic 
associations exemplified by the pairs easier to acquire. Of particular interest was whether 
making large changes obvious would provide enough evidence to allow for paradigmatic 
associations to form between labials and alveopalatals. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10 8 
  
CHAPTER V 
EXPERIMENT III: EFFECTS OF ADJACENCY ON LEARNABILITY OF 
PALATALIZATION BEFORE -a  
Portions of this chapter were taken from: 
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning 
correspondence from contiguity. Manuscript submitted for publication. 
Experiment 2 shows that it is difficult to learn to produce alternations involving 
dissimilar sounds, and that those alternations are likely to be extended beyond their 
original context in judgments. This raises the question of how alternations are acquired: 
What allows speakers to learn an alternation, and to restrict it to the appropriate context? 
We propose that temporal contiguity of related forms is an integral part of the acquisition 
of morphophonology. Erwin (1961) pointed out that morphologically related words tend 
to occur in similar contexts, allowing the listener to “bring them into contiguity” by using 
the context to anticipate one of the words as the other is perceived (see also McNeill, 
1966; Onnis, Waterfall & Edelman, 2008; Slobin & Küntay, 1996). Furthermore, 
contrary to McNeill (1966), paradigmatically related words also co-occur in natural 
language (Baroni et al., 2002; Fellbaum, 1996; Jones et al., 2007; Murphy, 2006; Xu & 
Croft, 1998), allowing for the listener to anticipate one form of a word as another is 
perceived. In both cases, the corresponding forms are in contiguity because the listener 
activates one form of a word (whether perceived or anticipated on the basis of context) as 
s/he is about to perceive another form. 
Work on category learning has shown that contiguity between members of 
contrasting categories helps learners pick up on discriminative features, the ones that best 
10 9 
  
distinguish between the categories (Arnon & Ramscar, 2012; Ramscar et al., 2010; 
Carvalho & Goldstone, 2015); by extension, learners may use corresponding singulars 
and plurals occurring in contiguity in order to learn what distinguishes singulars from 
plurals. For example, in a language with labial palatalization before the plural suffix -a, a 
final …p indentifies the form as singular, and a final …a identifies it as plural. Here, we 
argue that temporal contiguity also allows for learners to predict one of the forms of a 
word in the context of the other form, and therefore using the other form as a set of 
predictive cues. We show that discriminative learning under contiguity allows the learner 
to identify cues that discriminate among singulars corresponding to different plural 
patterns (e.g. learning that a final …p  in a singular indicates that the plural will end in 
…tʃa). 
Theories of grammar differ in whether they predict contiguity to benefit faithful and 
unfaithful mappings. For example, in Optimality Theory, faithful mappings are due to 
output-output (OO) faithfulness constraints, which are thought to start out at the top of 
the constraint ranking (Hayes, 2004; McCarthy, 1998). Similarly, Taatgen and Anderson 
(2002) propose that faithful mappings are produced by a default “do nothing” rule that 
remains the default unless an alternative rule is learned. If OO faithfulness constraints or 
“do nothing” are still at the top for our participants, then contiguity of faithful forms 
should have no effect on the acquisition of the faithful mapping, since it is already at 
ceiling. However, exposure to English should downweight OO-faith[alveolar] and OO-
faith[velar], but not OO-faith[labial], which would mean that intact pairs including 
faithful alveolars and velars in training would increase retention of stem-final alveolar 
11 0 
  
and velar consonants at test, but adjacent faithful labials would have no effect on 
retention of stem-final labial consonants.  
In Usage-based Phonology, unfaithful mappings have been attributed to product-
oriented schemas that are acquired by generalizing over forms belonging to a particular 
cell in the morphological paradigm (such as “plural”), without reference to any base form 
(Bybee, 1985, 2001; Kapatsinski, 2012, 2013). If unfaithful mappings are always 
attributable to product-oriented schemas, then unfaithful mappings (like p~tʃa) should not 
benefit from contiguity as much as faithful mappings (like t~ta and k~ka), because a 
product-oriented schema about plurals is acquired without reference to the singular (e.g., 
“plurals end in [tʃa]” can be learned without knowing what the corresponding singulars 
are like; Bybee, 2001; Kapatsinski, 2013, 2017b). 
However, it is also possible that unfaithful paradigmatic mappings are at least helped 
by generalizations over corresponding pairs of words. This is true in rule-based 
phonological models that map surface forms onto each other by context-specific 
transformations (Albright & Hayes, 2003). It is also true in usage-based models that 
include a role for second-order schemas or paradigmatic mappings (Booij, 2010; 
Kapatsinski, 2018; Nesset, 2008). Finally, generalization over both singulars and plurals 
is required in discriminative learning models (Baayen et al., 2011), where learners would 
learn what plurals are like by determining what phonological features discriminate 
singulars from plurals. That is, …tʃa is associated with plurality not because it is common 
in plurals but because it is more common in plurals than in non-plurals. 
Unfaithful paradigmatic mappings are notoriously difficult to acquire in artificial and 
natural language contexts (Braine et al., 1990; Brooks et al., 1993; Dąbrowska & 
11 1 
  
Sczerbinski, 2006; Krajewski et al., 2011). Unfaithful mappings may benefit more from 
contiguity if they are created by application of a second-order schema or rule, because 
they have more room for improvement.  
The following experiment explores these effects through training on voiced and 
voiceless palatalization before -a, manipulating the order of the training trials. For the 
purposes of this experiment, only pairs where the singular is immediately followed by the 
corresponding plural are considered temporally contiguous. While temporal contiguity in 
natural language includes cases where the related forms are separated by some number of 
other words but are connected by occuring in a similar context, and sharing the context 
has been shown to work in the lab (Onnis et al., 2008), here we implement contiguity by 
actual adjacency. That is, temporal contiguity is implemented by keeping pairs of 
corresponding words “intact,” by presenting the corresponding singulars and plurals 
adjacent to each other. This manipulation is, however, intended to control, in the most 
direct way possible, whether the singular form is available to compare to or predict the 
plural form when the plural is presented. 
5.1. Methods 
5.1.1. Participants 
152 native American English speakers with no reported history of speech, language, 
hearing, or learning disabilities were recruited from the Psychology/Linguistics Human 
Subject Pool at the University of Oregon. There were 38 participants per trial order 
condition (split between training languages), and they received partial course credit for 
participation. 
 
11 2 
  
5.1.2. Languages 
There were two training languages, containing either labial or velar palatalization. 
Just as in Experiment 2 (Chapter IV), the stems were always C(C)VC and ended in an 
oral stop [b;p;d;t;g;k]. The plural suffix was -a (100% of the time for To-Be-Palatalized 
consonants, 50% of the time for Not-To-Be-Palatalized consonants) or -i (0% of the time 
for To-Be-Palatalized consonants, 50% of the time for Not-To-Be-Palatalized 
consonants). The To-Be-Palatalized consonant became [tʃ] if voiceless and [dʒ] if voiced, 
before -a. Velar palatalization is perceptually (Guion, 1998; Ohala, 1989, p. 183-185, 
1992, p. 320) and articulatorily (Anttila, 1989, p. 72-73; Hock, 1991, p. 73-77) motivated 
before -i, whereas labial palatalization is not; neither are phonetically motivated before    
-a. In Experiment 2, Velar Palatalization participants produce palatalization of the target 
consonants much more often than Labial Palatalization participants. [k] is articulatorily 
closer to [tʃ] than is [p], regardless of the vowel context, so the difference in change 
magnitude could hold before -a as well. However, if learning is affected by substantive 
bias, as has been proposed and found experimentally (e.g. Do, 2013; Finley, 2008; Hayes 
& White, 2015; Mitrović, 2012; Stave et al., 2013; White, 2013, 2014; Wilson, 2006), 
then velar palatalization may only be learned better in Experiment 2 because it is in a 
phonetically motivated context, so there may be no difference in learnability by change 
magnitude in Experiment 3. The two experiments are compared in Chapter VII.  
See Table 5.1 for the patterns in each language. 
 
 
 
 
 
11 3 
  
Table 5.1. Labial and Velar Palatalization patterns presented to participants in 
Experiment 3. 
 Labial Palatalization Velar Palatalization 
Singular Plural Plural 
…p …tʃa …{pi;pa} 
…b …dʒa …{bi;ba} 
…t …{ti;ta} …{ti;ta} 
…d …{di;da} …{di;da} 
…k …{ki;ka} …tʃa 
…g …{gi;ga} …dʒa 
 
The identity of the To-Be-Palatalized consonants (labial [p;b] or velar [k;g]) was 
crossed with trial order, which is the variable of principal interest in this study. 
Participants were exposed to one of four trial orders, shown here for the Velar 
Palatalization condition: 
–All Obvious: All corresponding singular-plural pairs were intact, with the plural 
immediately following the singular, whether the mapping was faithful or unfaithful (1) 
–None Obvious: Singulars and plurals were all randomly ordered, so corresponding 
singulars and plurals were not adjacent at greater than chance frequency (4) 
–Change Obvious: Unfaithful corresponding singular-plural pairs were intact, but 
singular-plural pairs exemplifying faithful mappings were split up and randomly ordered 
(2) 
–NoChange Obvious: Only singular-plural pairs exemplifying faithful mappings 
were kept intact; unfaithful pairs were split up and randomly ordered (3) 
(1) All Obvious: blupSG blupaPL klutSG klutiPL smakSG smatʃaPL… 
(2) Change Obvious: klutiPL blupSG klutSG smakSG smatʃaPL blupaPL … 
(3) NoChange Obvious: blupSG blupaPL smatʃaPL klutSG klutiPL smakSG… 
(4) None Obvious: klutiPL blupSG smatʃaPL klutSG smakSG blupaPL … 
11 4 
  
The trial orders differ on two dimensions: whether unfaithful singular-plural 
mappings (To-Be-Palatalized, here smakSG~smatʃaPL) are adjacent and therefore easier to 
learn, which is the case for the All Obvious and Change Obvious conditions; and whether 
faithful singular-plural mappings (Not-To-Be-Palatalized, here blupSG~blupaPL and 
klutSG~klutaPL) are adjacent and therefore easier to learn, which is true for the All 
Obvious and NoChange Obvious conditions. As discussed earlier, adjacency is a proxy 
for temporal contiguity of paradigmatically-related words in corpus data. We expect 
adjacency to influence learnability of the alternation (as reflected by the overall rate of 
palatalization and differences in this rate across different types of singulars).  
We use intactness of pairs and faithfulness as the primary predictors in statistical 
analyses, and expect both to influence rate of palatalization. We do not have any strong 
intuitions regarding how they will interact with language, suffix, and final consonant, and 
we explore those interactions using conditional inference trees. 
5.1.2.1. What learners need to weight 
To perfectly reproduce the input language, participants need to learn several 
generalizations: 
1) there are two plural suffixes, -i and -a 
2) when to use each suffix, and in particular, that -a is the only eligible suffix after 
consonants that should be changed ([k] and [g] for Velar Palatalization, [p] and [b] for 
Labial Palatalization); learning (when) to use suffixes is covered in §5.2.1 
3) there is palatalization in the language 
4) the context of palatalization, in particular which suffix triggers it (§5.2.2) 
11 5 
  
5) the context of palatalization, in particular which input consonants are afflicted by it 
(§5.2.3) 
The learnability of these generalizations is potentially affected by contiguity, and we 
explore these effects in §5.2. 
5.1.3. Materials 
The materials were the same as for Experiment 2 (§4.2.3), except for the trial order 
manipulations described in §5.1.2 and that all To-Be-Palatalized plurals were suffixed 
with -a instead of -i, as well as an additional 24 novel singulars which ended with a 
palatal (half [tʃ], half [dʒ]) in the prodution test phase. The complete materials are 
available in Appendix C. 
5.1.4. Procedure 
5.1.4.1. Training 
The training phase procedure was the same as in Experiment 2 (§4.2.4.1), except for 
the trial order differences. We created two “blocks” of trials within each of the three 
training blocks; one “block” consisted of word pairs, and the other of single word forms. 
Each training trial sampled a “block” (with replacement) and then randomly sampled a 
trial within the “block” (without replacement). In the None Obvious condition, all 
samples came from the single wordform “block” (this was the same procedure as 
Experiment 2). In the All Obvious condition, all samples came from the word pair 
“block”. For Change Obvious, the To-Be-Palatalized pairs were selected from the word 
pair “block” and the Not-To-Be-Palatalized pairs from the single wordform “block,” with 
the reverse true for NoChange Obvious. Table 5.2 shows the block choice by the To-Be-
Palatalized status of the stem for each trial order. 
11 6 
  
Table 5.2. Trial selection “blocks” by trial order and To-Be-Palatalized status of stem-
final consonant. 
Trial Order To-Be-Palatalized Not-To-Be-Palatalized 
None Obvious single wordform block single wordform block 
All Obvious word pairs block word pairs block 
Change Obvious word pairs block single wordform block 
NoChange Obvious single wordform block word pairs block 
 
 
5.1.4.2. Test 
The procedure for the test was the same as for the production test in Experiment 2 
(§4.2.4.2). As for Experiment 2, all the production trials were randomly ordered. 
5.1.5. Measures 
5.1.5.1. Transcription protocol and exclusions 
Each production was transcribed for later analysis. We coded for whether 
palatalization was present and the identity of the plural suffix. Palatalization was coded as 
present if the word ended in a vowel preceded by [tʃ] or [dʒ]. Codings were performed by 
me and a team of undergraduate RAs, with the latter also checked by me; in cases of 
disagreement, I revisited the recording and looked at the spectrogram if necessary. We 
excluded productions where the singular-final consonant was replaced with anything 
other than [tʃ] or [dʒ], or where the plural suffix vowel was anything other than -i or -a. 
85% of observations remained after exclusions (n = 15,565). See §5.2.1 for analysis of 
error patterns. 
5.1.5.2. Model structure 
The results were analyzed using generalized (logistic) linear mixed-effects models 
with the lme4 package (version 1.1-21, Bates et al., 2015) in R 3.6.0 (R Core Team, 
2019). The maximal random effects structure was used, including random intercepts for 
Subjects and Singulars and random slopes for Trial Order variables (Faithful Intact, 
11 7 
  
Unfaithful Intact) within Singulars and To-Be-Palatalized (To-Be-Palatalized vs. Not-To-
Be-Palatalized, given training condition) within Subjects. Pairwise interactions between 
the individual trial order variables and To-Be-Palatalized were included if they were 
significant. Complete model structures are included in footnotes. 
Subsequent exploratory data analysis was conducted using conditional inference trees 
with the party package (Hothorn et al., 2006). Trees included Language as a predictor of 
vowel choice and both Language and Plural Vowel as predictors of palatalization. To 
control for multiple comparisons, the minimum significance level required to split the 
tree was lowered to 0.001. Theoretically interesting interactions (between, for example, 
To-Be-Palatalized and Plural Vowel) discovered by inspecting the trees were tested using 
mixed-effects models to take into account dependencies between observations coming 
from the same subject or item. We visually inspected the conditional inference trees for 
any (a priori unexpected) cross-over interactions, and would have included them in 
mixed-effects regression models had they been found. 
5.1.5.3. Predictions 
If contiguity helps the acquisition of faithful mappings, then NoChange Obvious and 
All Obvious will have less palatalization than Change Obvious and None Obvious, 
because the faithful alternations are intact. If contiguity helps the acquisition of unfaithful 
mappings, then Change Obvious and All Obvious will have more palatalization than in 
the conditions where unfaithful alternations are not intact (NoChange Obvious and None 
Obvious). If contiguity helps both types of mapping, NoChange Obvious should have 
many faithful mappings, Change Obvious many unfaithful mappings, All Obvious should 
be somewhere in between, with None Obvious serving as a baseline. 
11 8 
  
5.2. Results 
5.2.1. Error patterns 
We compared the error rate by trial order and training language, as shown in Figure 
5.1. Keeping faithful pairs intact results in lower error rates44 (b = -1.47, se(b) = 0.27, z =   
-5.40, p < 0.001), and adjacency of faithful pairs significantly improves model fit (χ2(1) = 
26.32, p < 0.001): The proportion of acceptable plurals is higher for participants in the 
NoChange Obvious and All Obvious conditions than those in the None Obvious and 
Change Obvious conditions. Keeping unfaithful pairs intact has no significant effect on 
error rates (b = 0.046, se(b) = 0.27, z = 0.17, p = 0.87, ns), and according to the BIC 
approximation to the Bayes Factor, the results provide very strong support for the null, 
ΔBIC = 10, PBIC (H0 | D) = 0.993. Participants in the Velar Palatalization language 
produce fewer errors than those in Labial Palatalization (b = -0.70, se(b) = 0.27, z =         
-2.56, p = 0.01), and Training Language significantly improves model fit (χ2(1) = 6.49, p 
= 0.01): The darker bars, representing Velar Palatalization, are all higher than the lighter 
bars, representing Labial Palatalization. None of the interactions are significant.  
 
                                                
44 Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | Singular) 
11 9 
  
 
Figure 5.1. Percentage of plural productions without mistakes by trial order and training 
condition. 
 
5.2.1.1. Consonant and vowel error types 
We looked closer at the types of errors that participants in each language and training 
condition produce. We first separated consonant errors and vowel errors, for ease of 
visualization45. There are three types of consonant errors, as shown in Figure 5.2: 
“Stop+Palatal,” where the plural form contains both the stop and the palatal (e.g. 
streik~streiktʃa); “Bizarre,” where the stem-final consonant is replaced with a non-palatal 
consonant (e.g. trab~traga) or the stem consonant is retained and followed by a non-
palatal consonant (e.g. drag~dragda); and “Absent,” where no consonant is produced 
(e.g. ʃlud~ʃlu, but more commonly traɪk~???). There are four types of vowel errors, as 
shown in Figure 5.3: “New Vowel,” where the suffix is a vowel other than -i or -a (e.g. 
smuk~smuku); “-s,” where an -s is attached directly to the stem (e.g. stɛd~stɛds) or to a 
suffix vowel (e.g. klub~klubas); “Bizarre,” where the suffix is a syllabic consonant (e.g. 
                                                
45 Fewer than 5% of productions contained both errors in the consonant and the vowel, and 88% of those 
are trials where participants failed to produce any plural form. 
12 0 
  
dig~dign) or a sequence of segments (e.g. roʊp~roʊpakaɪ); and “Absent,” when no plural 
suffix is produced, either because the singular is repeated (e.g. kwug~kwug) or, more 
commonly, nothing is produced (e.g. flad~???). For the analyses, we ran separate logistic 
regressions for each error category within error type (e.g. comparing “-s” to all other 
plural consonant types, including faithful and unfaithful); none of the interactions were 
significant. All inferential statistics were performed on the entire data set, but for ease of 
visual comparison, the figures only show the numbers of error-containing productions.  
5.2.1.2. Consonant errors 
Figure 5.2 shows the distribution of consonant errors by training language (right 
panel) and trial order (left panel). Keeping unfaithful pairs intact results in more 
consonant errors, overall46 (b = 0.66, se(b) = 0.27, z = 2.47, p = 0.01), and inclusion of 
Unfaithful Intact significantly improves model fit (χ2(1) = 5.83, p < 0.02): Change 
Obvious and All Obvious produce more consonant errors than do None Obvious and 
NoChange Obvious (which from the figure seems driven entirely by the high rate of 
“Stop+Palatal” productions in Change Obvious; see below for further analysis). Keeping 
faithful pairs intact reduces the number of consonant errors (b = -1.09, se(b) = 0.27, z =    
-4.05, p < 0.001), and including Faithful Intact significantly improves model fit (χ2(1) = 
15.32, p < 0.001), as can be seen by comparing the bars in NoChange Obvious and All 
Obvious to those in None Obvious and Change Obvious. Lastly, participants produce 
fewer consonant errors after training on Velar Palatalization compared to Labial 
Palatalization (b = -0.95, se(b) = 0.27, z = -3.53, p < 0.001), and Training Language 
significantly improves model fit (χ2(1) = 12.19, p < 0.001). 
                                                
46 Consonant Mistakes ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
12 1 
  
 
 
Figure 5.2. Plural productions containing consonant errors by error type. Left panel: 
Error rates by training trial order. Right: Error rates by training language. 
 
Breaking down the consonant errors by type, we find that for “Stop+Palatal” errors 
(light bars), keeping unfaithful pairs intact (in Change Obvious and All Obvious) results 
in higher rates of producing both the stem-final consonant and a palatal47 (b = 2.86, se(b) 
= 0.69, z = 4.15, p < 0.001), and including Unfaithful Intact significantly improves model 
fit (χ2(1) = 16.37, p < 0.001). Keeping faithful pairs intact (in NoChange Obvious and All 
Obvious) results in lower rates of “Stop+Palatal” productions (b = -2.03, se(b) = 0.70, z = 
-2.91, p < 0.004), and including Faithful Intact significantly improves model fit (χ2(1) = 
8.59, p = 0.003). Change Obvious, where unfaithful pairs are intact and faithful pairs are 
not, has a much higher rate of productions retaining the stem-final consonant and also 
producing a palatal, which seems like a compromise between implementing the obvious 
change and the effort of overriding perseveration. The majority of “Stop+Palatal” 
productions are suffixed with -a (537/585, or 92%), suggesting that Change Obvious 
                                                
47 Stop+Palatal Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
12 2 
  
participants have learned that [tʃa] indicates plurality. (See §6.3.2.2.1 and §6.4.2.2 for 
further evidence and discussion of [tʃa] as a “chunk.”) There is no significant difference 
in “Stop+Palatal” productions by training language (b = -0.21, se(b) = 0.68, z = -0.31, p = 
0.76, ns) and according to the BIC approximation to the Bayes Factor the data provide 
very strong support for the null, ΔBIC = 9.5, PBIC (H0 | D) = 0.991.  
Intact faithful pairs results in fewer errors of the “Bizarre” variety48 (b = -0.79, se(b) 
= 0.29, z = -2.77, p < 0.007; including Faithful Intact significantly improves model fit, 
χ2(1) = 7.55, p < 0.006), as shown in the comparison of the grey bars in None Obvious 
and Change Obvious vs. NoChange Obvious and All Obvious. There is no significant 
effect of Unfaithful Intact (b = 0.21, se(b) = 0.28, z = 0.74, p < 0.46), and the results 
provide strong support for the null according to the BIC approximation to the Bayes 
Factor, ΔBIC = 9, PBIC (H0 | D) = 0.989. Finally, after Velar Palatalization training, 
participants produce fewer “Bizarre” consonants (b = -1.18, se(b) = 0.29, z = -4.09, p < 
0.001), and including Training Language significantly improves model fit (χ2(1) = 16.62, 
p < 0.001), as seen in the medium grey bars in the right panel of Figure 5.2. 
There is no significant effect of Unfaithful Intact on the rate of failing to produce a 
plural consonant49 (“Absent,” dark grey bars in Figure 5.2; b = -0.37, se(b) = 0.33, z =      
-1.11, p < 0.27), and according to the BIC approximation to the Bayes Factor, the results 
provide strong support for the null, ΔBIC = 8.3, PBIC (H0 | D) = 0.984. Keeping faithful 
pairs intact results in marginally fewer “Absent” productions (b = -0.60, se(b) = 0.33, z = 
-1.81, p = 0.07), but according to the BIC approximation to the Bayes Factor, the data 
                                                
48 Bizarre Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | Singular) 
 
49 Absent Consonant Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
 
12 3 
  
provide strong support for the null, ΔBIC = 6.4, PBIC (H0 | D) = 0.961. Training on Velar 
Palatalization results in lower rates of “Absent” consonants (b = -1.00, se(b) = 0.33, z =   
-2.99, p < 0.003), and including Training Language significantly improves model fit 
(χ2(1) = 9.14, p < 0.003).  
5.2.1.3. Vowel errors 
Figure 5.3 shows comparisons of vowel errors. Keeping unfaithful pairs intact results 
in fewer mistaken vowel productions50 (b = -0.91, se(b) = 0.32, z = -2.85, p = 0.004), and 
Unfaithful Intact significantly improves model fit (χ2(1) = 7.77, p = 0.005). Intact faithful 
pairs also results in fewer vowel errors (b = -1.27, se(b) = 0.32, z = -4.00, p < 0.001), and 
including Faithful Intact significantly improves model fit (χ2(1) = 14.62, p < 0.001). None 
Obvious, where neither faithful nor unfaithful pairs are kept intact, has a much higher 
error rate compared to the other conditions, as shown in the left panel of Figure 5.3. Velar 
Palatalization participants produce fewer errors in plural vowels than Labial 
Palatalization participants do (b = -0.64, se(b) = 0.32, z = -2.02, p = 0.04), and including 
Training Language significantly improves model fit (χ2(1) = 4.05, p = 0.04). The right 
panel of Figure 5.3 suggests that difference is driven by higher rates of “Other Vowel” 
and “Absent” productions after Labial Palatalization training, but we evaluate differences 
by error type, below, to confirm. 
                                                
50 Vowel Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | Singular) 
12 4 
  
 
Figure 5.3. Plural productions containing vowel errors by error type and training trial 
order. Left panel: Error rates by training trial order. Right: Error rates by training 
language. 
 
Keeping unfaithful pairs intact results in fewer “New Vowel” productions51 (white 
bars, b = -1.49, se(b) = 0.68, z = -2.20, p < 0.03), and Unfaithful Intact significantly 
improves model fit (χ2(1) = 4.57, p = 0.03). Keeping faithful plurals intact has no effect 
on “New Vowel” productions (b = -1.07, se(b) = 0.68, z = -1.58, p = 0.11), and according 
to the BIC approximation to the Bayes Factor, the results provide strong support for the 
null, ΔBIC = 7.2, PBIC (H0 | D) = 0.973. There is no effect of training language (b = -0.24, 
se(b) = 0.68, z = -0.35, p < 0.73), and according to the BIC approximation to the Bayes 
Factor, the data provide very strong support for the null, ΔBIC = 9.5, PBIC (H0 | D) = 
0.991.  
None of the effects are significant in the “-s” vowel error model52 (light grey bars, all 
z < 1) or the “Bizarre” vowel error model53 (dark grey bars, all z < 1.3). The vast majority 
                                                
51 New Vowel Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
52 -s Vowel Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | Singular) 
12 5 
  
of “-s” vowel errors occur in the None Obvious condition (176/181), and none at all 
occur in the NoChange Obvious and Change Obvious conditions. All but 4 of the 
“Bizarre” plural vowels are produced in None Obvious (125/186) or Change Obvious 
(57/186), the conditions where faithful pairs are not intact. Only when the faithful pairs 
are intact are both vowels made obvious (since both -a and -i attach to Not-To-Be-
Palatalized stems, whereas only -a attaches to To-Be-Palatalized stems); otherwise, it 
seems that some participants notice that there is something going on with the vowel but 
cannot determine what the particulars are. 
For “Absent” errors (black bars), there is no effect of keeping unfaithful pairs intact54 
(b = -0.41, se(b) = 0.29, z = -1.43, p = 0.15), and the data provide strong support for the 
null according to the BIC approximation to the Bayes Factor, ΔBIC = 7.5, PBIC (H0 | D) = 
0.977. Keeping faithful pairs intact reduces “Absent” productions (b = -0.89, se(b) = 
0.29, z = -3.06, p < 0.003), and Faithful Intact significantly improves model fit (χ2(1) = 
9.17, p = 0.002). Finally, training on Velar Palatalization results in fewer “Absent” errors 
(b = -0.81, se(b) = 0.29, z = -2.76, p < 0.006), and including Training Language 
significantly improves model fit (χ2(1) = 7.82, p = 0.005). 
In summary, keeping unfaithful pairs intact results in more consonant errors (driven 
by higher rates of “Stop+Palatal” and “Bizarre” errors), but fewer vowel errors (driven by 
fewer “New Vowel” errors). Keeping faithful pairs intact results in fewer consonant 
errors (driven by lower rates of “Stop+Palatal” errors) and fewer vowel errors (driven by 
                                                                                                                                            
 
53 Bizarre Vowel Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
 
54 Absent Vowel Errors ~ Unfaithful Intact + Faithful Intact + Training Language + (1 | Subject) + (1 | 
Singular) 
12 6 
  
lower rates of “Absent” vowel errors). Velar Palatalization results in fewer consonant 
errors (driven by lower rates of “Bizarre” and “Absent” errors) and fewer vowel errors 
(driven by lower rates of “Absent” errors). Bizarre, unattested patterns seem to reflect 
general confusion (“I guess it’s just random”), and silence seems to indicate profound 
indecision (“I have no idea what I am supposed to do”), which is evidently a larger 
problem when the change is obvious – where participants can more easily notice that 
something is changing but may struggle to figure out exactly what – and for participants 
receiving training on the larger change. There are many more vowel errors in None 
Obvious than the other trial orders, suggesting that exposure to randomly-ordered 
phonetically-unnatural patterns confuses participants, leading them to either assume any 
vowel can be used (white bars), or giving up on even trying (black bars).  
 Overall, the error distributions suggest that labial palatalization is more confusing and 
challenging for participants to learn, and that the lack of any training structure in the 
None Obvious condition is an obstacle to learning. 
5.2.2. Suffix choice 
5.2.2.1. Suffix frequency 
Participants generally learn that there are two suffix vowels; 86% use each suffix at 
least once, but there is a dramatic effect of trial order. Figure 5.4 shows the distribution of 
suffix choice probabilities for each trial order condition. It illustrates a large effect of 
contiguity on the system learned: When faithful (Not-To-Be-Palatalized) pairs are kept 
intact (0{i;a}/Cnot-TBP_ such as hɛt~hɛta and naɪd~naɪdi, right column), participants use -i 
and -a at similar rates to the input probabilities and few subjects regularize (i.e. use one 
suffix 100% of the time). When faithful pairs are not intact (left column), participants 
12 7 
  
strongly favor -a and often regularize. There is a weaker effect of unfaithful pair 
adjacency (To-Be-Palatalized, CTBPtʃa such as bluk~blutʃa and fɛp~fɛtʃa), but when they 
are kept intact, use of -a increases (bottom row).   
 
 
Figure 5.4. Suffix choice probabilities across trial order conditions. In the ‘NoChange 
Obvious’ and ‘All Obvious’ conditions, word pairs exemplifying 0{i;a}/Cnot-TBP_ were 
kept intact. In the ‘Change Obvious’ and ‘All Obvious’ conditions, word pairs 
exemplifying CTBPtʃa were kept intact. 
 
 
Table 5.355 shows that keeping Not-To-Be-Palatalized pairs (faithful forms, 
0{a;i}/Cnot-TBP_) intact significantly increases the use of -i (the effect is significant in 
model comparisons, χ2(1) = 17.20, p < 0.001). There is also a significant effect of To-Be-
Palatalized, which favors -a, especially when To-Be-Palatalized pairs are kept intact in 
training. Including the interaction significantly improves model fit (χ2(1) = 7.45, p = 
0.006), as does To-Be-Palatalized alone (χ2(1) = 23.45, p < 0.001). In sum, participants 
                                                
55 Plural Vowel ~ No Change Order + Change Order * To-Be-Palatalized + (1 + To-Be-Palatalized | 
Subject) + (0 + No Change Order | Base) + (0 + Change Order | Base), restricted to non-palatal-final 
singular stems 
12 8 
  
learn that -a is favored by the consonants it palatalizes, and they learn it better when pairs 
exemplifying -a combined with palatalization are intact. 
 
Table 5.3. Generalized linear mixed-effects model output for suffix choice by trial 
adjacency and To-Be-Palatalized.  
 b se(b) z p  
(Intercept) -1.997 0.2382 -8.385 <.0001 *** 
Intact Not-To-Be-Palatalized *** 
Pairs 1.1722 0.2793 4.196 <.0001 
Intact To-Be-Palatalized Pairs -0.2887 0.2827 -1.021 0.30712  
To-Be-Palatalized = yes -0.3191 0.1354 -2.357 0.01842 * 
Adjacent To-Be-Palatalized ** 
Pairs x To-Be-Palatalized = yes -0.5391 0.1927 -2.798 0.00514 
* Significance level of 0.05 
** Significance level of 0.01 
*** Significance level of 0.001 
 
5.2.2.2. Consonant choice effect 
Figure 5.5 shows the factors that influence the choice of suffix vowel using a 
conditional inference tree. The effect of consonant choice (whether the stem is To-Be-
Palatalized) on suffix choice is weaker than in the input language (and completely absent 
for subjects who regularized). The 103 subjects who use each suffix more than 10% of 
the time show sensitivity to the To-Be-Palatalized status of the consonant, with To-Be-
Palatalized disfavoring -i but still using it 26% of the time (vs. 0% in the input) and Not-
To-Be-Palatalized stems using -i 36% of the time (vs. 50% in the input). The effect of 
To-Be-Palatalized56 is significant in the mixed-effects regression model (b = -0.28, se(b) 
= 0.14, z = -2.06, p < 0.04; including To-Be-Palatalized improves model fit, χ2(1) = 
21.62, p < 0.001), as is the interaction between To-Be-Palatalized and whether the 
                                                
56 Plural Vowel ~ Change Order * To-Be-Palatalized + No Change Order + (1 + To-Be-Palatalized | 
Subject) + (1 + Change Order + No Change Order | Base), data restricted to subjects who used each suffix 
>10% of the time and stems ending in a non-palatal consonant. 
12 9 
  
unfaithful pairs are kept intact (b = -0.55, se(b) = 0.21, z = -2.66, p = 0.008; including the 
interaction significantly improves model fit, χ2(1) = 6.82, p = 0.009).  
 
Figure 5.5. Conditional inference tree of the factors that influence suffix vowel choice. 
 
 
5.2.2.3. Other effects 
The strongest effect is whether faithful pairs are kept intact (NoChangeOrder, Node 
1): doing so increases the frequency of -i, though it never becomes the majority variant. 
Keeping CTBP~tʃa pairs intact is also a significant effect (Nodes 3 and 9, ChangeOrder), 
with consonants showing increased use of -a in the All Obvious (Node 13) and Change 
Obvious (Node 5) conditions, where To-Be-Palatalized pairs are kept intact. The effect is 
especially pronounced for To-Be-Palatalized consonants (Nodes 6 and 15). Finally, 
training on velar palatalization decreases the use of -i compared to training on labial 
palatalization (Nodes 2 and 10, TrainPlace). 
 
 
13 0 
  
5.2.3. Suffix content 
5.2.3.1. Trial order effects 
Figure 5.6 shows the production probabilities of palatalization across subjects, and as 
in Figure 5.4, there is a clear effect of trial order. Keeping pairs exemplifying 
palatalization intact leads to more palatalization57 (Change Obvious and All Obvious, 
bottom row) (b = 4.50, se(b) = 0.69, z = 6.52, p < 0.001; including Unfaithful Intact 
significantly improves model fit, χ2(1) = 47.38, p < 0.001). There is also a significant 
effect of whether faithful pairs are kept intact (NoChange Obvious and All Obvious, right 
column), with less palatalization when they are (b = -1.30, se(b) = 0.64, z = -2.03, p = 
0.04; Faithful Intact significantly improves model fit, χ2(1) = 4.17, p = 0.04). The 
interaction is not significant (z < 1, p = 0.73, ns), and according to the BIC approximation 
to the Bayes Factor, the data provide strong support for the null (ΔBIC = 8.1, PBIC (H0 | 
D) = 0.983).  
                                                
57 Palatalization ~ Change Order + No Change Order + (1 | Subject) + (1 | Base), restricted to To-Be-
Palatalized consonants before -a 
 
13 1 
  
 
Figure 5.6. Palatalization rates across conditions of the appropriate consonants in the 
appropriate context (a To-Be-Palatalized consonant before -a). In the “Change Obvious” 
and “All Obvious” conditions, word pairs exemplifying palatalization were kept intact in 
training. In the “NoChange Obvious” and “All Obvious” conditions, word pairs 
exemplifying faithful mappings (lack of palatalization) were kept intact. 
 
5.2.3.2. Suffix vowel effects 
Participants that use each suffix more than 10% of the time show a significant effect 
of plural vowel on the probability of palatalization. In line with the training data, 
palatalization is favored by -a and disfavored by -i58 (b = -1.35, se(b) = 0.27, z = -4.94, p 
< 0.001; Plural Vowel significantly improves model fit, χ2(1) = 29.83, p < 0.001). There 
is no interaction between final vowel and trial order, which is to be expected: The suffix 
vowel occurs in the same form as the palatalization, so the dependency between them 
should be equally apparent regardless of trial ordering. (However, the interaction between 
final vowel and adjacency of unfaithful (To-Be-Palatalized) pairs approaches 
                                                
58 Palatalization ~ Change Order + No Change Order + Plural Vowel + (1 + Plural Vowel | Subject) + (1 + 
Change Order + No Change Order + Plural Vowel | Base), restricted to subjects who palatalized. 
13 2 
  
significance, z = -1.87, p = 0.06.) These effects are shown in the conditional inference 
tree in Figure 5.7. 
 
 
Figure 5.7. Conditional inference tree of the effects of vowel suffix and trial order on the 
probability of palatalization (dark) for subjects who used each suffix vowel >10% of the 
time. 
 
5.2.4. Input consonant 
Finally, participants need to learn not only that palatalization exists, and that it is 
triggered by -a, but also that it only affects certain consonants (namely the labial [p] and 
[b] in the Labial Palatalization condition and the velar [k] and [g] in the Velar 
Palatalization condition). In previous work using the None Obvious trial order, this 
proved difficult to learn, with subjects tending to overgeneralize velar palatalization to 
the alveolar stops and labial palatalization to both alveolar and velar stops, when they 
were able to learn to palatalize at all (Smolek & Kapatsinski, 2018; Stave et al., 2013; see 
also Chapter IV). We were curious here about whether participants could learn to restrict 
palatalization to the appropriate context if singular-plural mappings were made more 
13 3 
  
obvious by changing the trial order. The results below are for the -a suffix, which triggers 
palatalization in the training and strongly favors palatalization at test, but there are no 
crossover interactions between vowel and any of the predictors we discuss, and the 
results hold if both vowel contexts are included. 
5.2.4.1. Trial order effects on overgeneralization of palatalization 
Figure 5.8 shows that faithful and unfaithful mappings both benefit from keeping the 
pairs exemplifying them intact, and keeping both types of mappings intact is necessary 
for participants to learn when they should be unfaithful to the input. To test for the effects 
of trial order on palatalization rates and the probability of overgeneralization to Not-To-
Be-Palatalized consonants across languages, we restricted the analysis to To-Be-
Palatalized consonants and the Not-To-Be-Palatalized consonants that palatalization is 
overgeneralized to in the None Obvious order. In other words, we excluded labial-final 
stems from the set of Not-To-Be-Palatalized consonants for Velar Palatalization, because 
subjects do not overgeneralize to it in the None Obvious condition so there is no way for 
adjacency to help (but the results are the same if they are included). 
13 4 
  
 
Figure 5.8. An overview of the results on the probability of palatalization (before -a).  
 
ChangeOrder (Intact Unfaithful) has the strongest effect on the probability of 
palatalization, increasing its incidence: Palatalization is more productive when clearly 
visible in training. When word pairs exemplifying palatalization are not kept intact (the 
subtree below Node 2), participants trained on velar palatalization palatalize alveolars 
and velars equally. Participants trained on labial palatalization palatalize everything 
equally. These results replicate Chapter IV: Participants eliminate saltatory alternation 
patterns (see also White, 2013, 2014). When only the word pairs exemplifying 
palatalization are kept intact (Nodes 9-10), participants palatalize all consonants equally, 
even when trained on velar palatalization. That is, with intact To-Be-Palatalized and non-
intact Not-To-Be-Palatalized pairs (Change Obvious condition), participants palatalize 
both types of consonants at a high rate. However, when all word pairs are kept intact (All 
Obvious, the subtree below Node 11), participants are able to learn to palatalize only the 
13 5 
  
consonants they are trained to palatalize. That is, keeping Not-To-Be-Palatalized pairs 
intact reduces their palatalization rates. 
The results are shown in Table 5.459. In the baseline None Obvious condition, 
participants palatalize Not-To-Be-Palatalized stems as much as To-Be-Palatalized, 
reflecting a bias in favor of palatalizing alveolars and against palatalizing labials (z < 1, p 
= 0.33, ns), which replicates prior work (Smolek & Kapatsinski, 2018; Stave et al., 2013; 
Chapter IV). Keeping pairs exemplifying unfaithful mappings intact increases 
palatalization overall (χ2(1) = 44.20, p < 0.001), and the effect is stronger for To-Be-
Palatalized than Not-To-Be-Palatalized consonants (χ2(1) = 24.09, p < 0.001). Keeping 
pairs exemplifying faithful mappings intact reduces palatalization rates (χ2(1) = 6.62, p = 
0.01), especially for Not-To-Be-Palatalized consonants (χ2(1) = 18.03, p < 0.001). 
Because of the interactions, keeping both types of mappings intact reduces 
overgeneralization: Participants learn to palatalize only the To-Be-Palatalized 
consonants. 
 
Table 5.4. The influence of trial order on palatalization rates. 
 b se(b) z p  
(Intercept) -3.0428 0.3954 -7.695 <.00001 *** 
Intact To-Be-Palatalized Pairs 3.4397 0.462 7.445 <.00001 *** 
To-Be-Palatalized = no 0.134 0.1377 0.973 .33  
Intact Not-To-Be-Palatalized  
Pairs -0.8616 0.4515 -1.908 .056 
Intact To-Be-Palatalized Pairs x *** 
To-Be-Palatalized = no -0.7212 0.147 -4.907 <.00001 
Intact Not-To-Be-Palatalized *** 
Pairs x To-Be-Palatalized = no -0.5945 0.1401 -4.242 .00002 
*** Significance level of 0.001 
 
                                                
59 Palatalization ~ Change Order * To-Be-Palatalized + No Change Order * To Be Palatalized + (1 | 
Subject) + (1 | Singular), restricted to non-palatal-final singulars before -a for Labial Palatalization subjects 
and non-palatal, non-labial-final singulars before -a for Velar Palatalization subjects. 
13 6 
  
5.3. Summary 
Retaining adjacency of word pairs exemplifying a given paradigmatic mapping helps 
faithful and unfaithful mappings, stem changes triggered by particular affixes, and the 
affixes themselves. For example, many participants in the None Obvious and Change 
Obvious conditions never use -i, even though it occurs 25% of the time in training and 
adults tend to probability match in tasks like this (Harmon & Kapatsinski, 2017; Schwab 
et al., 2018), whereas participants in the All Obvious and NoChange Obvious conditions 
use -i at roughly the same rate as in training. The difference is that the latter groups 
encountered intact faithful pairs, where the addition of -i is clearly seen. Similarly, many 
participants in the None Obvious and NoChange Obvious never palatalize, even though 
half of the input plurals contain palatalization/end in -tʃa, vs. participants in the All 
Obvious and Change Obvious conditions, who have seen pairs like blup~blutʃa. 
Contiguity makes the mapping more obvious and helps the learner notice it. 
However, making the mapping obvious does not prevent overgeneralization to 
contexts not seen in training. Rather, overgeneralization of a mapping to a particular 
consonant is prevented by making the competing mapping obvious: Making pàtʃ 
obvious increases tàtʃ and kàtʃ, and it is only when tàt and kàk are also made 
obvious that palatalization is properly restricted to [p].  
In sum, temporal adjacency helps acquisition of both faithful and unfaithful 
mappings, and both are necessary to learn when each should apply. These results 
therefore require participants to generalize over pairs of corresponding forms to learn 
both faithful and unfaithful mappings. Unfaithful mappings cannot be attributed entirely 
to first-order / product-oriented schemas acquired by generalizing over forms within a 
13 7 
  
paradigm cell. Faithful mappings cannot be attributed entirely to a default “do nothing” 
rule or top-ranked output-output faithfulness constraints. 
In Chapter VI, we discuss a discriminative model that captures the results of 
Experiment 3, suggesting that temporal contiguity of related forms allows learners to use 
phonological characteristics of a singular form to predict the characteristics of the 
corresponding plural form. This allows participants to form singular-to-plural 
paradigmatic mappings that encode what kinds of singulars are mapped onto …tʃa in the 
plural. Implications for linguistic theory and learnability of morphology are discussed 
further in §6.4, and comparison of the results of Experiments and 3 is presented in 
Chapter VII. 
 
 
 
 
 
 
 
 
 
 
 
 
 
13 8 
  
CHAPTER VI 
COMPUTATIONAL MODEL: EFFECTS OF ADJACENCY ON LEARNABILITY OF 
PALATALIZATION BEFORE -a  
Portions of this chapter were taken from: 
 Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning 
correspondence from contiguity. Manuscript submitted for publication. 
In Chapter V, we discussed the results of an experiment on the learnability of labial 
and velar palatalization before -a, varying whether pairs exemplifying faithful and 
unfaithful mappings were kept intact in training. We found that obvious examples of -i 
suffixation (present when faithful pairs, like blut~bluti, were kept intact) increase the 
usage of -i in production; that adjacency helps both faithful and unfaithful mappings and 
that participants are more likely to produce palatalization before -a than -i after training. 
In Chapter VI, we explore these effects through computational modeling in order to 
discover the potential learning mechanisms behind them. We use a custom function in R 
(R Core Team, 2019) based on the original code implementing the Rescorla-Wagner 
(1972) discriminative learning model, obtained from R.H. Baayen. All code and data are 
available at https://app.box.com/s/bd8jhx4g5m7bvlmxb8i4jgtjfjo2x111. 
6.1. Discriminative learning 
Research over the last 30 years has explored the possibility that domain-general 
learning mechanisms could explain much of language acquisition. Morphology is a 
particularly fruitful area for exploration due to the amount of intricate and language-
specific structure; it is also the only domain where paradigmatic mappings are clearly 
necessary (Kapatsinski, 2018a, 2018b). It began with Rumelhart & McClelland (1986), 
13 9 
  
and subsequent work includes Albright & Hayes (2003), Baayen et al. (2011), Ramscar et 
al. (2013), and Westermann & Ruh (2012). We focus here on discriminative learning. 
Discriminative learning was originally applied to classical conditioning in animals. 
Here (as elsewhere, cf. Baayen et al., 2011; Caballero & Kapatsinski, 2019; Kapatsinski, 
2017b, 2018b; Ramscar et al., 2013), we apply it to morphological learning. The 
fundamental structure of discriminative learning is that it is predictive, i.e. the learner 
aims to make predictions, based on environmental cues (e.g. Pavlov’s (1927) dogs, who 
used lights and sounds to predict whether food would be available). In the Rescorla-
Wagner model, the strength of association from a particular cue to an outcome tracks the 
statistic Δp, which is the probability of the outcome when the cue occurs minus the 
probability of the outcome when the cue does not occur (Ellis, 2006; Harmon & 
Kapatsinski, 2017). The strongest cues are those that greatly increase the outcome’s 
likelihood of occurrence, in other words, the ones that best discriminate between the 
contexts where an outcome occurs and where it does not. 
The model increases on a trial-by-trial basis the weights of cues present when (or 
before) the outcome unexpectedly occurs, and the more surprising the outcome, the 
greater the increase in cue weights. Cue weights are decreased when the outcome is 
unexpectedly absent in the presence of the cues. Only weights of present cues are 
updated. Equation 1 applies for present outcomes and equation 2 for absent; w is the cue 
weight (the weight of the connection from cue to outcome), and a is the activation of the 
outcome (the sum of the present cue weights, equivalent to the model’s expectation 
whether the outcome will (not) occur at time t). When the outcome does (not) occur, the 
model updates its expectations for the future. The weight w of the cue-outcome 
14 0 
  
connection at time t is incremented by an amount proportional to learning rate Λ (set to 
1), multiplied by the difference between the correct expectation (1 for present, 0 for 
absent) and the model’s actual expectation given current weights. If the model’s 
prediction is accurate, there is no need to update the weights. Whenever the prediction is 
inaccurate, the weights must be updated, and the more confident it is in its erroneous 
prediction, the greater the adjustment to the weights. This makes the model error-driven. 
(1) 𝑤!→! = 𝑤!→!!!! ! + 1− 𝑎!! ×𝛬 
(2) 𝑤!→! !→!!!! = 𝑤! + 0− 𝑎!! ×𝛬 
6.2. Model design 
6.2.1. Relevant cues and outcomes 
To capture the effects of plural production Experiment 3, the relevant cues are the 
meaning of the form (PL) and the stem-final consonant of the base, singular form and the 
relevant outcomes are the stem-final consonant of the plural form and the suffix vowel 
(e.g. blutSG~blutaPL yields cues PL and ‘t’ and outcomes ‘t’ and ‘a’). 
6.2.2. Capturing implicational hierarchies 
To capture the effects of Change Order (whether unfaithful pairs were kept intact) on 
overgeneralization, we include the assumption that encountering kàtʃ supports tàtʃ but 
not pàtʃ, and encountering pàtʃ supports tàtʃ and kàtʃ. This is a common assumption 
for capturing implicational hierarchies in patterns of overgeneralization (Hayes & White, 
2015; Steriade, 2008). To capture this representationally, we assume that an example like 
blukSG~blutʃaPL contains the cues ‘t’ and ‘k’ and PL, and an example like blupSG~blutʃaPL 
contains ‘k’ and ‘t’ and ‘p’ and PL. See §6.3.2.1 and §6.3.2.2 for further discussion. 
 
14 1 
  
6.2.3. Trial order effects on cue availability 
The trial order manipulations of Experiment 3 influence the availability of cues for 
predicting the phonological shape of the corresponding plural form when it is 
encountered. When related singulars and plurals are adjacent, the stem-final consonant is 
available to predict the plural form. When they are not adjacent, only the plural meaning 
is available. For example, when blutʃaPL is preceded by anything other than the singular, 
only PL is available to predict ‘tʃ’ and ‘a’, so only PL gets updated. This means that in 
the None Obvious condition, singular-final weights are never updated and reflect the 
prior beliefs that subjects bring to the experiment. 
6.2.4. Prior beliefs 
Participants’ prior beliefs are represented by the initial connection weights in the 
model. They include that subjects are least reluctant to palatalize alveolars and most 
reluctant to palatalize labials, with velars falling in between. This may be due in part to 
English palatalization of alveolars and velars in word pairs like create~creature and 
legal~legislative, and alveolar palatalization in frequent phrases like would you. Although 
the word-internal alternations only occur in a few words and are of doubtful productivity, 
and the cross-boundary alternations may be a different process (Zsiga, 1995), the 
presence of these processes in English may make participants more willing to palatalize 
alveolars and velars. The bias could instead (or additionally) be due to cross-linguistic 
biases against large changes, whether the measure of magnitude is perceptual (Hayes & 
White, 2015; Steriade, 2008) or articulatory (Smolek & Kapatsinski, 2018; Chapter IV). 
We remain agnostic about the source of the bias for the present work and encode it by 
mapping alveolar, velar, and labial stops onto themselves, pre-training the model with 2 
14 2 
  
examples of tàt (weight of 0.19), 4 examples of kàk (weight of 0.34), and 6 examples 
of pàp (weight of 0.47). This is not intended to be a representation of English, merely a 
way of ‘telling’ the model to have a strong expectation of [p] remaining [p] and a weak 
expectation of [t] remaining [t]. 
We employ a traditional two-stage architecture, with morphology preceding 
phonology, which is widely assumed by generative linguistics (see Scheer, 2011, for a 
review). Recent theoretical and experimental work has questioned this assumption and 
instead proposed that a suffix and the stem change thought to be triggered by it are in fact 
chosen in parallel60 (Bybee, 2001; Kapatsinski, 2010; Prince & Smolensky, 1993/2004).  
The modeling results below are consistent with suffix choice initiating first, or suffix 
and stem change initiating simultaneously. However, in order for the suffix and the stem 
change to not be independent (which they should not be, given the relationship in the 
training data between suffix vowel and presence of palatalization), one process needs to 
complete first so that it can condition the other. Any choice is the outcome of a process 
that takes time, with harder decisions requiring more time (Usher & McClelland, 2001). 
Our results are consistent with suffix choice requiring less time than palatalization. 
If the stem allomorph were chosen first, then the stem-final consonant would be 
available to select the suffix vowel, regardless of the proximity of the singular form. 
However, suffix vowel choice is predictive of palatalization regardless of trial order, but 
trial order is predictive of suffix choice; when the stem-final consonant is selected first, 
the model predicts no influence of trial order on suffix choice, beyond its effect on the 
                                                
60 OT and Harmonic Grammar consider competing output forms, which obviates the need to order 
processes, but here we only generate one form, so that is not an option. 
14 3 
  
choice of stem-final consonant. We suspect the suffix vowel is (usually) chosen first, 
perhaps because it is more salient in the final position. 
A third possibility we considered was to separate the suffix and stem-final consonant 
and treat them as bigrams (e.g. [tʃa], [ki]), but the dependency between vowel and 
consonant is stronger in the input than in subject productions. If participants were 
choosing from a set of bigrams, they would never produce unattested combinations like 
[ka] in the Velar Palatalization condition, but they do (quite often, in fact). 
Perhaps all three processing routes are used, and speakers choose based on how easy 
it is to select the suffix vs. the stem-final consonant (or change) on any given occasion 
(Kapatsinski, 2013; Westermann & Ruh, 2012). Selection of stem-final consonant may be 
faster than suffix choice when the plural and the singular have the same stem allomorph 
(i.e. the mapping is faithful), but here, choosing the suffix before the stem 100% of the 
time captures all of the effects we see in the experimental data. 
Other criteria for model selection could lead to differing conclusions: The best stem-
before-vowel model fits the palatalization probability across the cells of the experimental 
design better (pseudo-R2 = 86% vs. 96%), but this ordering fails to capture the 
dependency between trial order and vowel choice (§5.2.1). It is possible that alternating 
between routes, dependent on context, could capture the significant patterns without 
sacrificing fit. Since we were only interested in explaining the mechanisms behind trial 
order effects rather than the structure of the grammar, the latter must be explored in 
future work. 
 
 
14 4 
  
6.2.5. Linking hypothesis 
The activation levels must be connected to the observable dependent variable of 
production probability, which necessitates a theory of the task participants perform in the 
experiment. We follow the Luce choice rule in (3) (Luce, 1959), where the probability of 
choosing response i tracks the ratio of the activation of i and the activation of all response 
choices. This captures the probability matching seen in open-set tasks like production, 
and the equalizing/irregularization behavior seen in closed-set forced choice tasks 
(Harmon & Kapatsinski, 2017; Luce & Pisoni, 1998).  
(3)  𝑝 𝑖 = !!!  
!!!!
6.3. Modeling results 
As discussed above (§6.2.1), the baseline model’s cues include the stem-final 
consonant in the singular and the plural meaning, and the outcomes include the stem-final 
consonant in the plural and the suffix vowel. When the baseline model’s expectations 
differ systematically from participants’ behavior, we modify the model to better reflect 
the observed results. The modifications are additional cues and outcomes (§6.3.2.2.2) or 
adjustments to connection weights (§6.3.1.2, §6.3.2.2.2), discussed in detail below. 
6.3.1. Suffix vowel choice 
6.3.1.1. Baseline model results 
As in Chapter V, we start with an exploration of the effects on suffix vowel choice. 
We compare the observed values (based on the participant data from Experiment 3) to the 
expected values (based on the model). As shown in Figure 6.1, the base model captures 
49% of the variance. In particular, it captures that -a is disfavored by Not-To-Be-
14 5 
  
Palatalized consonants, and that consonants have a weaker effect on subject productions 
than is present in the input data. 
 
Figure 6.1. Expected (x-axis) vs. observed (y-axis) vowel choice probabilities, using 
unaltered probabilities from the model. Each dot corresponds to one cell in Table 6.1. 
 
Table 6.1. Expected (model) / observed (experiment) production probabilities for the 
palatalizing suffix -a (vs. -i) across training language (Velar vs. Labial), place of 
articulation of the consonant at the end of the singular form, and trial order. 
Velar TBP None NoChange Change All Obvious 
Language Obvious Obvious Obvious 
Alveolar - 66% / 80% 59% / 68% 64% / 84% 58% / 71% 
Velar + 66% / 81% 66% / 76% 77% / 96% 77% / 81% 
Labial - 66% / 79% 58% / 72% 64% / 84% 56% / 70% 
Labial Language 
Alveolar - 66% / 74% 59% / 63% 64% / 74% 58% / 76% 
Velar - 66% / 78% 58% / 61% 64% / 74% 56% / 68% 
Labial + 66% / 80% 66% / 66% 77% / 78% 78% / 85% 
 
As shown in Table 6.1, the model captures the significant effects of trial order on 
suffix choice probabilities. Making -a addition to To-Be-Palatalized stems more obvious 
increases the use of -a with To-Be-Palatalized consonants: For velars in the Velar 
Palatalization condition and labials in the Labial Palatalization condition, the Change 
Obvious and All Obvious conditions pattern alike and separate from None Obvious and 
14 6 
  
NoChange Obvious. It is also able to capture that making -i addition to Not-To-Be-
Palatalized stems obvious increases its usage with Not-To-Be-Palatalized consonants: For 
labials and alveolars in the Velar Palatalization condition and alveolars and velars in the 
Labial Palatalization condition, NoChange Obvious and All Obvious pattern alike and 
separately from None Obvious and Change Obvious. 
6.3.1.2. Shortcomings and modifications 
The model does fall short in some important ways. Firstly, it underpredicts the use of 
-a overall; the Luce choice rule predicts probability matching, but subject responses fall 
between probability matching and maximizing (Harmon & Kapatsinski, 2017). Secondly, 
it fails to capture the difference in -a usage between training languages. Labial 
Palatalization subjects use -a less than Velar Palatalization subjects do, perhaps because 
it is associated with a difficult-to-produce alternation that they tend to avoid (a strategy 
employed by child learners as well; Schwartz & Leonard, 1982). Labial palatalization is 
very difficult to execute (Smolek & Kapatsinski, 2018; Stave et al., 2013; see also 
Chapter IV), so it is unsurprising that subjects avoid the suffix that triggers it. Fit 
improves if the production probability of -a is incremented by 4% in each language (to 
account for maximizing), and by an additional 6% in Velar Palatalization (to account for 
avoidance of a difficult change in Labial Palatalization). However, the effect of language 
is weak in the mixed-effects analysis of the human data and would not survive a 
Bonferroni correction (b = 0.69, se(b) = 0.29, z = 2.37, p = 0.018), and the confidence 
intervals for the observed-expected correlation coefficients for the original vs. adjusted 
models overlap substantially (0.42 ≤ r ≤ 0.86 vs. 0.54 ≤ r ≤ 0.90). 
 
14 7 
  
6.3.2. Palatalization 
6.3.2.1. Baseline model results 
Tables 6.2 and 6.3 and Figure 6.2 show the probabilities of palatalization for the 
unmodified model across training language, training trial orders, place of articulation of 
stem-final consonant, and suffix. One of the most robust findings from this line of 
research is that labial palatalization overgeneralizes to alveolars and velars, and velar 
palatalization overgeneralizes to alveolars but not labials. Encoding the implicational 
hierarchy in the training data improves the model: Table 6.2 shows that in the None 
Obvious condition, Labial Palatalization participants palatalize alveolars and velars more 
than labials, and Velar Palatalization participants palatalize alveolars as often as velars, 
but do not palatalize labials (see also Smolek & Kapatsinski, 2018; Stave et al., 2013; 
Chapter IV). Only in the All Obvious condition do participants learn to palatalize only 
the To-Be-Palatalized consonants (at least, more than the Not-To-Be-Palatalized 
consonants that are articulatorily or perceptually closer to [tʃ]). We must assume that 
palatalization of [k] provides evidence for tàtʃ, and palatalization of [p] provides 
evidence for tàtʃ and kàtʃ, because if they only provide evidence for palatalizing 
themselves, the predicted palatalization rates of alveolars and velars in the Labial 
Palatalization condition are far too low (and the R2 is only 46%).  
 
 
 
 
 
14 8 
  
Table 6.2. Expected / observed palatalization rates before -a, the palatalizing suffix in the 
input. Expectations from the unmodified Rescorla-Wagner model. Large deviations from 
human behavior are bold. 
Velar Language TBP None NoChange Change All Obvious 
Obvious Obvious Obvious 
Alveolar - 22% / 13% 20% / 6% 28% / 47% 22% / 23% 
Velar + 22% / 17% 25% / 10% 42% / 56% 47% / 40% 
Labial - 15% / 5% 14% / 1% 14% / 32% 11% / 10% 
Labial Language 
Alveolar - 22% / 23% 19% / 15% 31% / 49% 22% / 34% 
Velar - 19% / 16% 16% / 9% 29% / 47% 19% / 28% 
Labial + 17% / 13% 19% / 16% 38% / 57% 38% / 56% 
 
Table 6.3. Expected / observed palatalization rates before -i, the suffix that does not 
trigger palatalization in the input. Expectations from the unmodified Rescorla-Wagner 
model. Large deviations from human behavior are bold. 
Velar Language TBP None NoChange Change All Obvious 
Obvious Obvious Obvious 
Alveolar - 8% / 9% 5% / 2% 7% / 1% 13% / 5% 
Velar (+) 8% / 13% 10% / 2% 26% / 30% 26% / 9% 
Labial - 6% / 1% 3% / 0% 5% / 1% 4% / 3% 
Labial Language 
Alveolar - 8% / 12% 6% / 2% 14% / 31% 7% / 13% 
Velar - 7% / 14% 4% / 1% 14% / 34% 6% / 11% 
Labial (+) 6% / 10% 8% / 2% 22% / 24% 22% / 19% 
 
 
Figure 6.2. Expected (x-axis) vs. observed (y-axis) palatalization probabilities. Each dot 
corresponds to one cell in Tables 6.2-6.3. 
14 9 
  
As shown in Figure 6.2, the model succeeds at fitting the proportions in Tables 6.2 
and 6.3, in particular capturing the difference between To-Be-Palatalized and Not-To-Be-
Palatalized consonants, the difference in palatalization rates between -i and -a suffixes, 
and the increased palatalization rates of Not-To-Be-Palatalized consonants in the Labial 
Palatalization condition. It is also quite successful at capturing the ordinal effects of trial 
order, with regards to the relationship between conditions and interactions between trial 
order and the (Not-)To-Be-Palatalized status of consonants.  
6.3.2.2. Shortcomings and modifications 
The model also differs systematically from human learners. It sometimes 
underestimates increases in palatalization of Not-To-Be-Palatalized stems in the Change 
Obvious conditions vs. the All Obvious and None Obvious conditions (bold non-italic in 
Tables 6.2 and 6.3). In particular, note the bottom rows of None Obvious and Change 
Obvious in the Velar Palatalization language in Table 6.2: Making velar palatalization 
obvious dramatically increases the rate of labial palatalization (from 5% to 32%), but it 
has no effect on model expectations (15% vs. 14%). Note also that this increase in 
palatalization rates is specific to the palatalizing suffix. The palatalization rates of Not-
To-Be-Palatalized consonants increase in Change Obvious before -a (Rows 1 and 3 in 
Table 6.2) but not -i (Rows 1 and 3 in Table 6.3). The model is correct about the pre-i 
context but not the pre-a. 
6.3.2.2.1. Perceptual contrast and chunking 
To capture the effect of obvious unfaithful alternations on palatalization rates of Not-
To-Be-Palatalized consonants before -a, we propose that encountering blukàblutʃa or 
blupàblutʃa boosts the association between [tʃ] and [a]. In other words, noticing that 
15 0 
  
something changes into [tʃa] makes it easier for [a] to trigger a preceding [tʃ] in general:  
-a fuses with [tʃ], increasing the automaticity of palatalization before -a, which we 
implement as a boost with constant magnitude (λ). The perceptual contrast of [k] or [p] 
becoming [tʃa] makes the listener notice the [tʃa] and increment the association between 
[tʃ] and [a], resulting in [tʃa] being treated as a single chunk. Alternatively, the boost 
could be dependent on the surprisal value of [tʃa], such that the first encounter increases 
the association between [tʃ] and [a] substantially with subsequent encounters increasing it 
less. Regardless, the boost is in addition to the boost driven by the surprise at an 
unexpected [tʃ] from the Rescorla-Wagner model. This can also explain the cross-
linguistic process of suffixes fusing with the stem-final consonants that frequently 
precede them (and which are themselves usually the result of a stem change triggered by 
the suffix; Haspelmath, 1995). 
6.3.2.2.2. Overgeneralization asymmetries in To-Be-Palatalized consonants 
Finally, we need to explain why To-Be-Palatalized consonants are palatalized less in 
the All Obvious condition vs. the Change Obvious condition for Velar Palatalization, but 
there is no difference between conditions for Labial Palatalization (bold italic in Tables 
6.2 and 6.3). We propose this difference arises because of overgeneralization of CopyFIN, 
the relevant outcome for faithful mappings, which is taken to apply to the smallest natural 
class that includes all copied segments; in other words, faithful mappings are produced 
via a constraint like “CopyFIN applies when the coda consonant is [X],” where [X] is a the 
natural class including all segments subject to being copied. For Labial Palatalization, 
stem-final alveolars and velars are associated with CopyFIN, and they are both made using 
parts of the tongue that are not independently controlled until quite late in development 
15 1 
  
(Gibbon, 1999), frequently pattern together (Christdas, 1988; Clements & Hume, 1995; 
Mielke, 2004), and have been argued to form the natural class [lingual] (Browman & 
Goldstein, 1989; Christdas, 1988; Clements & Hume, 1995). The overgeneralization of 
copying to velars in Velar Palatalization suggests that alveolars and labials do not form a 
natural class (Clements & Hume, 1995; though cf. Chomsky & Halle, 1968, and 
[anterior]), so any class that includes both must also include velars, and therefore CopyFIN 
spreads to the To-Be-Palatalized in Velar Palatalization All Obvious. We can augment 
trials containing [k] and [t] with an additional [lingual] cue (i.e. t+i+lingual à t+copy) to 
capture that they are alike and separate from labials, which accounts for the lack of 
overgeneralization of copying in Labial Palatalization. However, the model does not 
make global inferences, even with the inclusion of [lingual], because it aims to 
distinguish how inputs behave differently (i.e., to discriminate among them), not what 
features they share. As such, we added a mechanism to increment the weight of  CopyFIN 
by λ whenever the natural class of copied segments comprises all consonants in the 
language (i.e. in the Velar Palatalization language). This is a post-hoc hack intended to 
capture the Hebbian learning mechanism that wires co-occurring cues together, 
regardless of surprise, which slightly reduces the rate of palatalization of To-Be-
Palatalized consonants in the Velar All Obvious condition. The evidence is somewhat 
questionable, as the effect on the model fit is quite small, and eliminating the mechanism 
still achieves an R2 of 80% so long as [lingual] is accessible. But without it, the model 
cannot reproduce the significant differences between palatalization rates of To-Be-
Palatalized consonants in the Velar All Obvious and Change Obvious conditions. See 
15 2 
  
Tables 6.4 and 6.5 for the expected and observed values for the modified model and 
Figure 6.3 for the scatterplot after adjusting the model. 
 
Table 6.4. Expected / observed palatalization rates before -a, the palatalizing suffix in the 
input. Expectations from the modified Rescorla-Wagner model. 
Velar TBP None Obvious NoChange Change All Obvious 
Language Obvious Obvious 
Alveolar - 21% / 13% 15% / 6% 36% / 47% 20% / 23% 
Velar + 21% / 17% 18% / 10% 49% / 56% 31% / 40% 
Labial - 15% / 5% 10% / 1% 22% / 32% 14% / 10% 
Labial Language 
Alveolar - 21% / 23% 12% / 15% 50% / 49% 32% / 34% 
Velar - 21% / 16% 14% / 9% 44% / 47% 29% / 28% 
Labial + 15% / 13% 9% / 16% 46% / 57% 41% / 56% 
 
 
Table 6.5. Expected / observed palatalization rates before -i, the suffix that does not 
trigger palatalization in the input. Expectations from the modified Rescorla-Wagner 
model.  
Velar TBP None NoChange Change All Obvious 
Language Obvious Obvious Obvious 
Alveolar - 8% / 9% 4% / 2% 6% / 1% 1% / 5% 
Velar (+) 8% / 13% 8% / 2% 20% / 30% 12% / 9% 
Labial - 6% / 1% 2% / 0% 1% / 1% 0% / 3% 
Labial Language 
Alveolar - 8% / 12% 6% / 2% 27% / 31% 8% / 13% 
Velar - 7% / 14% 4% / 1% 22% / 34% 6% / 11% 
Labial (+) 6% / 10% 8% / 2% 26% / 24% 20% / 19% 
 
15 3 
  
 
Figure 6.3. Expected (x-axis) vs. observed (y-axis) palatalization probabilities after 
model adjustments. Each dot corresponds to one cell in Tables 6.4-6.5. 
 
 
6.4. General discussion 
6.4.1. Implications for learning 
6.4.1.1. Discriminative framework 
To review, a simple discriminative learning model (Rescorla & Wagner, 1972) can 
capture trial order effects and the interaction of trial order with training language. These 
results are consistent with extensive prior work showing that language learning can be 
captured using domain-general associative learning models originally developed for 
conditioning experiments (Arnold et al., 2017; Arnon & Ramscar, 2012; Baayen et al., 
2011, 2016; Kapatsinski & Harmon, 2017; Kruschke, 1992; Lim et al., 2014; McMurray 
et al., 2012; Mirman et al., 2006; Olejarczuk et al., 2018; Ramscar et al., 2010, 2013; 
Ramscar & Yarlett, 2007; Rumelhart & McClelland, 1986; Yu & Smith, 2012; inter alia). 
In a discriminative learning framework, outcomes are predicted from cues, so the 
availability of cues should be critical to whether learners use the cue as a predictor of the 
outcome (MacWhinney et al., 1985). In the present work, the outcomes are the varying 
15 4 
  
plural forms, and the trial order manipulations are intended to influence the availability of 
the cues contained in the corresponding singular form. The model captures the effects of 
the manipulation, showing that the manipulation was successful: The cues in the singular 
are used to predict the plural when the singular is adjacent to the plural. When the 
singular and plural are not adjacent, then the cues contained in the singular are largely or 
entirely unavailable for predicting outcomes, and only the semantics can be used. This 
proves true for faithful and unfaithful singular-plural mappings, concatenative suffixes, 
and stem changes. 
6.4.1.2. Saliency and adjacency 
A surprising illustration of the importance of contiguity is visible in Figure 5.1 
(§5.2.1; see also Figure 6.1). Many None Obvious and Change Obvious participants 
always choose -a, which might be optimal behavior because it better matches the 
unconditioned suffix probabilities, and in the absence of obvious examples of -i, 
participants do not acquire the probabilistic conditioning of suffix choice. The 
participants in the NoChange Obvious and All Obvious conditions, however, tend to 
probability match. This is what we would normally expect them to do, since participants 
believe their task is to reproduce the training data as accurately as possible (Harmon & 
Kapatsinski, In prep; Perfors, 2016), and probability matching is observed in a variety of 
language production tasks, both artificial and natural (Harmon & Kapatsinski, 2017; 
Hayes et al., 2009; Hudson Kam & Newport, 2005; Kapatsinski, 2010; Ramscar & 
Yarlett, 2007). These results indicate that the addition of the suffix needs to be noticed 
(just as changes to the stem need to be noticed) in order for probability matching to 
occur. When the plural forms ending in a suffix are not adjacent to the corresponding 
15 5 
  
singular forms, a significant proportion of subjects fail to notice the different suffix, as if 
it were not there at all. We were very surprised by this finding, since we thought the 
suffixes were salient enough to be noticed without explicit form-form comparisons, but 
the result suggests that even very salient concatenative morphemes benefit from the 
perceptual contrast between contiguous paradigmatically-related forms. 
6.4.2. Implications for phonological theory 
The most fundamental implication of the present results is that both faithful and 
unfaithful mappings rely, at least in part, on generalizing over the relationships between 
source forms and the corresponding products. In particular, the present results are 
accounted for by discriminating among source forms that lead to distinct product 
patterns, resulting in paradigmatic mappings that serve the role of second-order schemas 
or rules. They differ from second-order schemas in being directional: Being able to 
anticipate the plural based on the singular does not imply also being able to anticipate the 
singular based on the plural (e.g., Krajewski et al., 2011). They differ from rules in that 
they are not context-specific. Rather than mappings being enacted by changes occurring 
in certain contexts, the present model allows the output to be gradiently conditioned by 
many different features of the context, where the corresponding source segment is only 
one such feature with no special status.  
Discriminative learning needed to be combined with Hebbian learning that boosted 
outcomes that frequently occur within products across source contexts, and strengthening 
of syntagmatic bonds between the parts of unexpected chunks. The former correspond to 
first-order / product-oriented schemas and indicate that the goal of learning does not 
reduce to discriminating among source forms. Rather, participants learn what plurals are 
15 6 
  
like in general (Bybee, 1985, 2001; Kapatsinski, 2012, 2013). The latter correspond to 
phonotactic or morphotactic dependencies resulting in the frequent phenomenon of 
affixes fusing with the parts of the stem they change (Haspelmath, 1995). 
Overgeneralization patterns we observe are also informative regarding the structure of 
the phonological similarity space. In experimental lab work, Cristià & Seidl (2008) used 
patterns of (over)generalization to argue that English nasals are [-continuant] like stops 
and not [+continuant] like fricatives. The pattern of overgeneralization in the present 
experiment is informative of the structure of the phonological similarity space: 
Participants overgeneralize non-palatalization (faithful copying) from alveolars and 
labials to velars (in the Velar Palatalization language) but not from alveolars and velars to 
labials (in the Labial Palatalization language). Because a pattern is assumed to apply to 
all members of the smallest natural class that contains the segments sharing a behavior 
(Moreton & Pater, 2012a; White, 2013, 2014), the results suggest that velars and 
alveolars belong to the [lingual] natural class (Browman & Goldstein, 1989; Clements & 
Hume, 1995), but that labials and alveolars do not belong to [anterior] (Chomsky & 
Halle, 1968) and thus anything that applies to both must also apply to velars.  
The patterns of overgeneralization of faithful mappings therefore support a featural 
organization of the segment inventory. The fact that faithful mappings can be 
overgeneralized at all in turn supports the notion of conditioned copying (Kapatsinski, 
2017; §1.2.2, §6.3.2.2.2). Copying here means incorporating memory representations into 
the production plan (and is, for example, why [p] mapping onto [p] is the same as [k] 
mapping onto [k]), and we propose it can be conditional on various cues. That copying 
can be conditioned predicts that speakers need to learn when and what to copy, and that 
15 7 
  
copying can be over/undergeneralized, depending on the conditions observed where 
copying does/does not occur. This proposal explains why keeping faithful pairs (like 
blut~bluta) intact facilitates faithful mappings, and why this extends to the unattested 
k~ka mapping Velar Palatalization but not p~pa in Labial Palatalization: In Labial 
Palatalization, copying is conditioned on the [lingual] feature, whereas in Velar 
Palatalization it is not conditioned on a place feature and so can be extended to velars.  
6.4.2.1. Retreating from overgeneralization 
6.4.2.1.1. Entrenchment and pre-emption 
Preventing overgeneralization has been of concern in work on construction learning. 
The two most prominent mechanisms for explaining overgeneralization patterns are: 
1) Entrenchment (Ambridge et al., 2008, 2012; Blything et al., 2012; Braine & 
Brooks, 1995; Brooks et al., 1999; Harmon & Kapatsinski, 2017; Regier & Gahl, 2004; 
Stefanowitsch, 2008; Xu & Tenenbaum, 2007): If a form occurs often in one context, 
learners infer it does not occur in other contexts, and 
2) Statistical pre-emption (Boyd & Goldberg, 2011; Goldberg, 2011): Forms are 
pushed out of contexts by other forms, which act as pre-emptors; without pre-emptors, 
forms can extend to any context. 
Both palatalization and faithful copying into the output form are subject to 
overgeneralization, and making the paradigmatic context where the form occurs more 
obvious through temporal contiguity does not restrict it to that context/diminish 
overgeneralization, contra the entrenchment explanation. For example, making pàtʃ 
obvious in Labial Palatalization Change Obvious makes palatalization rates for all 
consonants higher than in Labial None Obvious (§5.2.3, §6.3.2.1), and making pàp and 
15 8 
  
tàt obvious in Velar All Obvious increases the rate of velar copying (in other words, 
reduces the palatalization rate of [k], §6.3.2.2.2). The increase in availability of [tʃ] or 
CopyFIN overrides the evidence of it being restricted to a particular context.  
This finding parallels Harmon & Kapatsinski (2017) for form-meaning mappings. 
They showed that the increase in frequency makes forms more available to extend to new 
contexts, but it also makes it less likely that subjects will map it onto new contexts. In 
other words, frequency only leads to entrenchment when accessibility differences 
between frequent and infrequent forms are neutralized. The results hold as long as the 
occurrence of the form-meaning pairing strengthens the relationship between the form 
and all features comprising the meaning. If any feature is presented as part of a novel 
related meaning, then the form paired with the original meaning usually receives the 
greatest activation. We see the same effect in the present experiment. When the 
availability of paradigmatic mappings is boosted in Change Obvious and NoChange 
Obvious, the outputs of those mappings become more available for use with related 
inputs that share some features with the original. One prediction for future work is that if 
participants are given a plural form and asked to choose the singular, they will be more 
likely to pick [tʃa] plurals correctly in Change Obvious than in NoChange Obvious, and 
more likely to pick faithful plurals correctly in NoChange Obvious than Change Obvious. 
The data present a strong case for statistical pre-emption: Participants in All Obvious, 
and only All Obvious, learn which consonants should be mapped onto [tʃ]. When both 
faithful and unfaithful mappings are strong, they can pre-empt each other; obvious 
faithful mappings pre-empt unfaithful mappings from affecting Not-To-Be-Palatalized 
15 9 
  
consonants, and obvious unfaithful mappings pre-empt faithful mappings from affecting 
To-Be-Palatalized consonants. 
6.3.2.1.2. Other accounts of overgeneralization 
Rule-based grammars assume that “do nothing” is the “elsewhere” condition, and 
therefore cannot be conditioned by context (Chomsky & Halle, 1965, 1968; Pinker & 
Prince, 1988). If we disregard this assumption, it is straightforward to implement context-
sensitive copying, which provides a parsimonious account of the present data (for 
computational implementations, see Albright & Hayes, 2003; Allen & Becker, 2015; 
Taatgen & Anderson, 2002; for an application to morphologically-conditioned 
palatalization, see Kapatsinski, 2010). 
Optimality Theory captures avoidance of unfaithful mappings through output-output 
constraints (Benua, 1997; Kenstowicz, 1996), but it is not clear how that could capture 
the overgeneralization of non-palatalization to To-Be-Palatalized velars but not To-Be-
Palatalized labials. Labial palatalization may violate Ident-[lingual] and Ident-[delayed-
release], velar palatalization may violate Ident-[dorsal], and alveolar palatalization may 
violate Ident-[coronal], but evidence against alveolar and labial palatalization does not 
result in evidence against velar palatalization; Ident-[anterior] and Ident-[lingual] do not 
help Ident-[dorsal]. A mechanism to prevent a specified class of segments from changing 
(to [tʃ]) is necessary. Zuraw (2007) proposes that the grammar contains a set of *Map 
constraints, which militate against specific segment-segment mappings, so *Map(Càtʃ) 
could be upweighted whenever an input segment fails to change into [tʃ]. Hayes & White 
(2015) use these kinds of constraints to account for overgeneralization of changes to 
intermediate segments (e.g. overgeneralization of pàv to bàv in White, 2014). 
16 0 
  
However, it is not clear how that can account for overgeneralization of faithful mappings. 
If we assume that tàta and pàpa provide evidence for *Map(Càtʃ), then it would seem 
that they should also provide evidence against the faithful tʃàtʃ mapping. While this 
seems implausible, it needs to be directly tested. 
6.4.2.2. Schemas 
The present results are challenging for the proposal that unfaithful mappings are all 
due to a product-oriented/first-order schema like “plurals end in [tʃa],” which is learned 
by generalizing over forms in a paradigm cell, in this case plurals (Bybee, 2001; 
Kapatsinski, 2013). If this were true, then contiguity of unfaithful forms should have no 
effect on acquisition of unfaithful mappings; blukàblutʃa should not help the plural [tʃa] 
schema any more than would [blutʃa] in isolation. Perhaps these examples help not by 
facilitating discovery of the paradigmatic kàtʃ mapping, but by strengthening the 
association between elements of the schema via perceptual chunking, such that when one 
is selected, the other must follow. However, Table 5.4 shows that such examples are 
particularly helpful for To-Be-Palatalized stems, meaning that they do help the particular 
paradigmatic mappings they exemplify. These results provide evidence that both faithful 
and unfaithful mappings rely on conditioned paradigmatic mappings, and the 
conditioning of the mapping is acquired most easily when forms are adjacent. 
6.4.2.3. Implicational hierarchy 
Labial palatalization overgeneralizes to alveolars and velars, but velar palatalization 
only overgeneralizes to alveolars, not labials (§5.2.3, §6.3.2). This pattern mirrors prior 
work and provides further support for the implicational hierarchy that velar palatalization 
implies alveolar palatalization and labial palatalization implies alveolar and velar 
16 1 
  
palatalization (Smolek & Kapatsinski, 2018; Stave et al., 2013). It is also reminiscent of 
other findings where a change from A to C generalizes to a sound intermediate between 
A and C (B) (Skoruppa et al., 2011; White, 2014). The model can capture the extension 
to intermediate sounds by the existence of a [lingual] feature, but the implicational 
hierarchy can only be captured if the model assumes that perceiving a change like AàC 
involves perceiving AàBàC, where the intermediate sound B is actually perceived to 
change in the output, on at least some occasions. Future work should investigate if B 
must be phonetically intermediate between A and C, or if it just needs to be a priori more 
likely to change into C than A is. We are not confident that we can argue that [t] is 
between [tʃ] and [k], though it is perceptually closer to [tʃ] than [k] (much less [p]) and is 
considered more likely to turn into [tʃ] by English speakers. If the results are due to a 
priori likelihood of change and not intermediacy, then they may be better captured by 
stronger surprisal-dependent boosts to [tʃ] or post-hoc inference of the type “if [p] 
changes to [tʃ], then surely everything must” rather than an online perceptual mechanism. 
6.4.2.4. Morphology feeds phonology 
Lastly, the results provide some support for the traditional rule-based view of the 
architecture of the grammar, where morphology feeds phonology (Chomsky & Halle, 
1965, 1968; cf. Bybee, 1985, 2001; Kapatsinski, 2010). The crucial finding is that the 
choice of suffix vowel is influenced by the identity of the singular consonant, even if the 
plural consonant is also taken into account: [tʃ]’s that are made from To-Be-Palatalized 
consonants favor -a over [tʃ]’s that are made from Not-To-Be-Palatalized consonants. 
The model fails to learn this dependency if the suffix is chosen after the choice of 
whether to palatalize, or if the choices are made in parallel and independently. If the 
16 2 
  
model chooses CV sequences like [tʃa] vs. [ka] vs. [ki] vs. [tʃi], consonant-vowel 
dependencies are learned too well, so that unattested (in training) sequences like [ka] in 
the Velar Palatalization language are never produced. The suffix does not need to always 
be chosen first, or always before the associated stem change; we rather propose that 
whichever change is easier to perform is applied before harder changes, and in this 
experiment, vowel addition is apparently easier to notice and enact than changing the 
consonant (see also Smolek & Kapatsinski, 2018; Stave et al., 2013; and Chapter IV). 
The vowel is therefore likely to condition the consonant change, as we would otherwise 
expect all [tʃ]’s to be suffixed with -a at equivalent rates. 
6.4.3. Limitations 
One important caveat is that this experiment only looked at the very early stages of 
language learning. Research has shown the importance of sleep for consolidation of 
knowledge (Cai et al., 2009; Davis et al., 2009), which is outside the purview of the 
Rescorla-Wagner (1972) associative learning model we use here. Such models instead 
provide good descriptions of rapid error-driven learning subserved by subcortical 
structures including the basal ganglia (for procedural memory, Ashby et al., 2007; Lim et 
al., 2014) and hippocampus (for declarative memory, Davis et al., 2009; McClelland et 
al., 1995).  
In the Complementary Learning Systems framework, the consolidation process 
involves rapidly learning subcortical structures by ‘replaying’ the events of the day to the 
slowly learning neocortex (Kumaran et al., 2016; McClelland et al., 1995). There are 
reasons to believe that paradigmatic mappings could emerge from this process without 
the requirement of temporal contiguity: The hippocampus has been shown by lesioning 
16 3 
  
studies in non-human animals to be particularly important for learning associations in the 
absence of contiguity (“trace conditioning,” Bangasser et al., 2006). It is possible that the 
hippocampus acquires associations between non-contiguous words but is unable to 
motivate behavior at test without “enlisting” the neocortex by sharing what it has learned. 
The neocortex learns more slowly and represents knowledge in a more distributed code 
than do the rapidly-learning subcortical systems (McClelland et al., 1995; Davis et al., 
2009), and the more distributed coding brings out similarities in the coded events, leading 
to new generalities (Cai et al., 2009; Lewis & Durrant, 2011). Newly learned words are 
integrated over time with the previously learned words that form the native lexicon in 
word learning experiments (Davis & Gaskell, 2009; Davis et al., 2009; Dumay & 
Gaskell, 2007), so it is possible that paradigmatic relations between words could emerge 
here, when corresponding forms are integrated. Davis et al. (2009) propose that 
consolidation is necessary for learning to expand beyond the trained sensory regions; in 
other words, exposure to an alternation may not result in production of that alternation 
until time and/or sleep has passed. English speakers, due to their experience with 
(limited) alveolar and velar palatalization, may already have built the connections 
necessary to employ the alternation after brief exposure, whereas labial palatalization is 
novel enough that the articulatory architecture is not yet in place, but that given time for 
incubation/consolidation could be created. Future work should examine possible 
restructuring of lexical knowledge and bring models of consolidation (Ashby et al., 2007; 
McClelland et al., 1995) to bear on changes to paradigmatic structure after sleep. 
 
 
16 4 
  
6.5. Summary 
A simple two-layer discriminative model consisting of cues and outcomes 
successfully captures the significant effects of Experiment 3. Regarding suffix vowel 
choice, the model captures that keeping unfaithful and faithful pairs intact results in 
increased use of the associated suffix vowel (-a and -i, respectively), especially for the 
associated consonants (To-Be-Palatalized and Not-To-Be-Palatalized, respectively). The 
baseline model underpredicts the use of -a, the more frequent suffix, so we increment -a 
by 4% in each language and an additional 6% in Velar Palatalization, since Labial 
Palatalization subjects seem to avoid the suffix that triggers the difficult-to-produce labial 
palatalization alternation.  
For palatalization, the model captures that intactness of pairs allows access to the 
phonological form cues in the singular, which affects faithful and unfaithful mappings, 
stem changes, and suffix vowel choice. We encode the implicational hierarchy of more 
difficult changes implying easier changes, in particular that labial palatalization implies 
alveolar and velar palatalization, and that velar palatalization implies alveolar but not 
labial palatalization. This allows the model to capture that there is more palatalization of 
the Not-To-Be-Palatalized consonants in the Labial language than Velar. One 
shortcoming is that the model underestimates the increase in palatalization of Not-To-Be-
Palatalized consonants in Change Obvious compared to All Obvious and None Obvious. 
We propose that the perceptual contrast of encountering singulars adjacent to the 
corresponding unfaithful plurals leads participants to “chunk” the stem change and the 
triggering vowel together, boosting the association between them and allowing them to 
select for one another. This boost may be due to surprise at an unexpected outcome (like 
16 5 
  
encountering [tʃa] when expecting [pa]), which is separate from the standard RW effects 
of surprise. 
The other major shortcoming in the model is its failure to capture overgeneralization 
of faithful mappings to the To-Be-Palatalized consonants in Velar Palatalization All 
Obvious, but not in Labial. We propose that faithful mappings are the result of the 
outcome CopyFIN, which mandates retention of the singular-final consonant, and is cued 
by various phonological features of the singular form as well as the semantics to be 
expressed in the product form.  
It appears that CopyFIN applies to all the consonants in the smallest natural class that 
includes the copied segments attested in training. This can be captured by the baseline 
discriminative learning model when the CopyFIN outcome is triggered by a subset of 
singular-final consonants. Thus, for the Labial Palatalization language, CopyFIN applies to 
alveolars and velars, which are included in [lingual]; since labials are excluded from the 
[lingual] natural class, copying does not extend to the To-Be-Palatalized consonants in 
Labial Palatalization. The model is able to discriminate between linguals, which trigger 
CopyFIN, and non-linguals, which do not. In Velar Palatalization, however, CopyFIN 
applies to alveolars and labials, which do not form a natural class that excludes velars, so 
CopyFIN is taken to apply to all consonants. The model, however, cannot generalize to all 
inputs, discriminating among alveolars and labials on the one hand and velars on the 
other. The results are captured only if we increment the activation CopyFIN for labials 
when the natural class of segments triggering CopyFIN includes all consonants in the 
language, as it does in Velar Palatalization.  
16 6 
  
We suspect that this increment comes from an additional Hebbian learning 
mechanism that complements the discriminative learning mechanism implemented in the 
baseline model. This mechanism appears to increment associations between cues and co-
occurring outcomes even when those cues are not discriminative. Thus, even though all 
singulars end in [+cons] segments, rendering [+cons] powerless to discriminate among 
singulars, it can become associated with copying of the final segment, allowing for 
faithful copying to be overgeneralized form alveolars and labials to velars. 
Overall, these results are therefore consistent with a “maximalist,” all-of-the-above 
view of grammar in which forms are produced using a variety of partially-redundant 
generalizations (e.g., Bybee, 1985; Langacker, 1987) that include both product-oriented 
and source-oriented schemas, phonotactic dependencies within product forms, and 
knowledge about what parts of activated representations ought to be copied into the 
production plan being constructed.  
 
 
 
 
 
 
 
 
 
 
16 7 
  
CHAPTER VII 
REVIEW, GENERAL DISCUSSION, AND CONCLUSIONS 
Portions of this chapter were taken from: 
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning 
correspondence from contiguity. Manuscript submitted for publication. 
Smolek, A. & Kapatsinski, V. (2018). What happens to large changes? Saltation 
produces well-liked outputs that are hard to generate. Laboratory Phonology: Journal of 
the Association for Laboratory Phonology, 9(1), 10. 
Paradigm Uniformity (PU) is the preference for consistency in forms sharing a stem61 
(Benua, 1997; Kenstowicz, 1996; Steriade, 2000). Phonologically dissimilar allomorphs 
are particularly dispreferred, and learning alternations is harder with dissimilar alternants 
(Hayes & White, 2015; Moreton & Pater, 2012a; Peperkamp et al., 2006; Skoruppa et al., 
2011; Steriade, 2001/2009; White, 2014).  
The Perseveration Hypothesis proposes that motor perseveration in the production of 
a novel form of a known word causes the observed avoidance of stem changes. 
Paradigmatic perseveration conflicts with paradigmatic associations, which require 
particular relationships between related forms of a word (e.g. in Russian the nominative 
trop ‘trope’ corresponds to the genitive plural tropov ‘tropes.GEN’ but the nominative 
tropa ‘path’ corresponds to the genitive plural trop ‘paths.GEN’). For the Perseveration 
Hypothesis, the relevant associations are between production representations (such as the 
gestures of Articulatory Phonology, Browman & Goldstein, 1989) of related forms. The 
reason that large changes are especially difficult to learn is that learning associations 
between alternants is more difficult when the to-be-associated parts of the alternants are 
                                                
61 Kenstowicz’s (1998) Uniform Exponence allows for consistency in affixes as well. 
16 8 
  
dissimilar (Moreton, 2008, 2012; Warker & Dell, 2006). If a paradigmatic association 
requires a change to the base, then obeying paradigmatic perseveration is a PU error. A 
poorly-acquired association is a lesser obstacle to paradigmatic perseveration, making PU 
more likely to arise for large changes. 
One consequence of the production-locus of the Perseveration Hypothesis is that we 
expect PU to be stronger in production than in perception, contrary to proposals that 
privilege perceptual similarity (Kenstowicz, 1996; Steriade, 2001/2009; White, 2017). 
We experimentally investigate this prediction in Experiment 2, where we find that labial 
palatalization is accepted in judgment while being rarely produced, confirming our 
proposal. We suggest that participants in the Labial Palatalization condition are unable to 
acquire associations between labials and alveopalatals because of articulatory 
dissimilarity, which prevents them from producing the alternation. However, they do 
learn the product-oriented schema “plurals should end in [tʃi],” and because they do not 
have any competing paradigmatic mappings, the schema freely applies to all inputs, 
resulting in high ratings of palatalization of all consonants. 
In Experiment 3, we investigate how associations between dissimilar alternants could 
be learned by varying the presentation order of training trials. We hypothesize that 
temporal contiguity between related forms allows participants to notice the relationship 
between the singular and plural forms of a word and home in on the relevant cues that 
distinguish the forms. We vary adjacency of faithful and unfaithful pairs separately to 
determine whether contiguity affects the acquisition of mappings that require a change to 
the base and those that do not. If unfaithful mappings are learned from product-oriented 
schemas like “plurals should end in [tʃi]” (Bybee, 2001; Kapatsinski, 2013, 2017b), then 
16 9 
  
adjacent unfaithful pairs should not help acquisition of the unfaithful mapping, whereas if 
they are learned through paradigmatic relations, they should improve when unfaithful 
pairs are adjacent. If faithful mappings are the default (Pinker & Prince, 1988), then 
adjacent faithful pairs should not help acquisition of a faithful mapping. While output-
output faithfulness constraints (Hayes, 2004; McCarthy, 1998) are thought to be initially 
high-ranked and therefore “default,” they are subject to demotion with linguistic 
experience; since our participants are adults, they have presumably learned that related 
forms do not always share their form (e.g. the past tense of [kip] is [kɛpt], not [kipt]), 
which would indicate that (at least some of) their faithfulness constraints are not 
dominant and could benefit from adjacency. Experiment 1 shows that participants like 
palatalization of alveolars more than labials or velars, suggesting that the corresponding 
faithfulness constraints have different weights – and even labial and velar palatalization is 
accepted sometimes, so it seems unlikely that any of the faithfulness constraints remain at 
the top of the hierarchy. 
Experiment 3 shows that both faithful and unfaithful mappings benefit from 
contiguous pairs of corresponding wordforms in training, and it is only when both types 
are exemplified by contiguous word pairs that participants learn to produce the correct 
forms in the correct contexts. We show that a simple domain-general discriminative 
model based on Rescorla-Wagner (1972) can capture the effects of Experiment 3, 
supporting the idea that alternations are produced via learned paradigmatic associations 
acquired using domain-general mechanisms of associative learning. 
In this final chapter, we review the results from Experiments 1, 2, and 3 and explore 
the theoretical implications of our findings before concluding. 
17 0 
  
7.1. Review of results 
All three experiments shared similar stimuli and structure: Participants were tasked 
with producing (Experiments 2 and 3) and judging (Experiments 1 and 2) singular-plural 
pairs of alien creatures. The singulars all ended with oral stops ([b;p;d;t;g;k]) and the 
plurals were suffixed with -i or -a. Depending on the training language, labial, alveolar, 
or velar stem-final consonants in the singular alternated with palatal affricates ([tʃ] for 
voiceless stops and [dʒ] for voiced) in the plural. In Experiment 1, participants provided 
ratings for singular-plural mappings without any training, in order to establish pre-
existing biases. In Experiments 2 and 3, participants were exposed to training 
exemplifying faithful and unfaithful mappings; these results are informative with regards 
to the learnability and executability of alternations. 
7.1.1. Experiment 1 
The baseline experiment (Chapter III) evaluates participants’ judgments of 
palatalization by stem-final consonant place of articulation (labial, alveolar, and velar), 
voicing of stem-final consonant (voiced or voiceless), and suffix vowel (-i or -a).  
The primary preference found is for palatalization of alveolars: Palatalized alveolars 
are accepted more often than palatalized labials or velars, and are also accepted as often 
as faithful alveolars, whereas faithful labials and velars are preferred to alveopalatals. The 
general preference for perseveration on the base (Do, 2013; Hayes, 2004; Kerkhoff, 
2007; McCarthy, 1998) is absent for alveolar-final stems for native English speakers, 
perhaps because of their experience with palatalization in frequent phrases like let you. 
There is no preference for palatalization before -i vs. before -a without training, 
despite the former being the typologically more frequent pattern (and the context where 
17 1 
  
articulatory and perceptual factors favor palatalization; Anttila, 1989; Guion, 1998). 
Wilson (2006) trained participants on velar palatalization before either [i] or [e] and 
found that the alternation was learned equally well in both contexts, but generalized from 
[e] to  [i] (and [ɑ]), but not from [i] to [e] (and [ɑ]) (see §7.1.4.2 for further discussion of 
generalization by vowel). In the baseline experiment, palatalization of velars is preferred 
marginally more before -a than before -i. We propose that the lack of preference for the 
more motivated context is due to the frequency with which palatalization is followed by a 
low vowel in English. Even though the high glide [j] triggers the palatalization in e.g. bet 
you, it is not present in the production of betcha, so listeners (and speakers) may have 
become accustomed to palatalization before non-high vowels.  
There is also no preference for voiceless palatalization over voiced, even for velars 
before -i, where perceptual similarity patterns suggest that [ki]~[tʃi] should be preferred 
over [gi]~[dʒi], as the former are more confusable than the latter (Guion, 1998). The lack 
of bias for the more perceptually similar alternants suggests that perceptual similarity 
does not have a strong influence on the acquisition of alternations and that differences in 
learnability or willingness to produce an alternation are better viewed as arising from 
articulatory (dis)similarity. 
Taken together, the results of the baseline experiment suggest that alveolar 
palatalization may be preferred to (and more frequently produced than) labial or velar 
palatalization, and that suffix vowel and stem-final consonant voicing may play a limited 
role. 
 
 
17 2 
  
7.1.2. Experiment 2 
Experiment 2 trained participants on palatalization before -i of voiced and voiceless 
labials, alveolars, or velars, and all training trials occured in a random order. 
Figure 7.1 shows the rate of palatalization in production by training language and 
stem-final consonant. The productivity of palatalization for the To-Be-Palatalized 
consonants is highest for alveolars, lower for velars, and lowest for labials, providing 
support for our claim that large changes are harder to (learn to) produce than small 
changes are. In other words, paradigm uniformity exerts a stronger force on the large-
change labial palatalization than small-change alveolar and velar palatalization. There is 
no difference in the rates of overgeneralization to linguals across conditions, so it is not 
that labial palatalization is more likely to extend to intermediate sounds (as proposed by 
White, 2014), but rather that it is less likely to be produced in the first place. 
 
 
Figure 7.1. Differences in palatalization rates in production before -i across individual 
stops and training languages. Shading indicates place of articulation from labial (lightest) 
through alveolar (medium) to velar (darkest). Voiced consonants are on the left within 
shading, while voiceless are on the right. Left panel: Labial Palatalization. Center panel: 
Alveolar Palatalization. Right panel: Velar Palatalization. 
 
17 3 
  
Nonetheless, participants in the Labial Palatalization condition still learn that they 
should palatalize labials. Figures 7.2 and 7.3 show the rate of acceptance in judgment of 
faithful and palatalized plurals, respectively, by training language and stem-final 
consonant. There are no differences in acceptability of faithful plurals, and participants in 
all conditions learn to prefer the unfaithful plural over the faithful for To-Be-Palatalized 
consonants. But whereas participants in the Velar and Alveolar Palatalization languages 
produce palatalization of the target consonants as often as they accept it, participants in 
the Labial Palatalization condition accept it much more frequently than they produce it. 
The judgments of unfaithful mappings suggest that Labial Palatalization participants like 
to palatalize the best of everyone, and yet they produce palatalization the least often. 
Zuraw (2000) found similar results for nasal substitution vs. assimilation in Tagalog, 
where the more difficult substitution was judged as being better, but nasal assimilation 
was produced more often, although the small sample size (n = 9) was insufficient for 
inferential statistics on production or the difference between production and judgment. 
Our results provide stronger evidence for a dissociation between production and 
judgment. 
  
 
 
17 4 
  
 
Figure 7.2. Differences in judgments of faithful mappings before -i across individual 
stops and training languages. Shading indicates place of articulation from labial (lightest) 
through alveolar (medium) to velar (darkest). Voiced consonants are on the left within 
shading, while voiceless are on the right. Left panel: Labial Palatalization. Center panel: 
Alveolar Palatalization. Right panel: Velar Palatalization. 
 
 
Figure 7.3. Differences in judgments of palatalization before -i across individual stops 
and training languages. Shading indicates place of articulation from labial (lightest) 
through alveolar (medium) to velar (darkest). Voiced consonants are on the left within 
shading, while voiceless are on the right. Left panel: Labial Palatalization. Center panel: 
Alveolar Palatalization. Right panel: Velar Palatalization. 
 
Prior work on alternation-learning largely examined judgment only (Moreton, 2008; 
Skoruppa et al., 2011; White & Sundara, 2014; but see White, 2013, Ch. 4.5, 2014; 
Wilson, 2008). Our work, and the results of Stave et al. (2013), shows that the bias 
against changing labials manifests differently in production and judgment. In both cases, 
17 5 
  
training on labial palatalization makes participants not prefer palatalizing labials over 
other consonants, but in production, that is because they tend to palatalize nothing, and in 
perception, because they tend to accept palatalization of everything. After training on 
alveolar or velar palatalization, participants learn to palatalize, and to prefer palatalization 
of, the To-Be-Palatalized consonants more than Not-To-Be-Palatalized (although they 
still accept incorrect palatalization more often than they produce it). Overgeneralizing to 
more likely targets is typical in the judgment task (White, 2013; Wilson, 2006), and 
substantive biases are designed to capture this effect (Moreton & Pater, 2012b; White, 
2017; Wilson, 2006). Our data show that in production, however, the problem lies not in 
overgeneralizing the change, but in failing to change what should be changed. 
While [ki] is perceptually more similar to [tʃi] than [gi] is to [dʒi] (Guion, 1998), [g] 
is palatalized more than [k], and preferred marginally more in judgment. [g] and [k] are 
equally articulatorily similar to [dʒi] and [tʃi], respectively, and this combined with the 
orthographic overlap of [g] and [dʒ] under <g> seems sufficient to overcome any effect 
perceptual similarity plays on learnability in this instance.  
Comparison to the baseline experiment with no training reveals that all training 
languages result in an equal increase in judgments of palatalization of the target 
consonants; in other words, acceptance of labial palatalization improves after training as 
much as acceptance of alveolar or velar palatalization do. It thus seems that the difference 
in production rates is not because subjects are unable to learn the pattern, but rather that 
they are unable to apply that knowledge in production. We propose that the articulatory 
dissimilarity between labials and alveopalatals prevents participants from acquiring 
paradigmatic mappings in the Labial Palatalization condition, and they are thus unable to 
17 6 
  
produce palatalization. Because they do not know what inputs correspond to [tʃi] in the 
output, every occurrence of [tʃi] is surprising, which strengthens the first-order schema, 
as reflected in their high ratings of palatalized forms. The rarity of labial palatalization is 
due to the articulatory difference between [p] and [tʃ] being so large. The fact that the 
effect of articulatory dissimilarity manifests itself only in production, leaving judgments 
and the learning rate as indicated by pre- and posttest judgment task comparisons 
unaffected, indicates that it is a channel bias, not an inductive bias. 
The Perseveration Hypothesis provides the best account for the results: Labial 
palatalization is avoided because of articulatory dissimilarity, not perceptual 
dissimilarity, and the bias against labial palatalization is much stronger in production than 
perception.  
7.1.3. Experiment 3 and a discriminative model 
In Experiment 3 (Chapter V), we trained participants on palatalization of labials or 
velars before -a, with non-alternating alveolars, labials (for Velar Palatalization), and 
velars (for Labial Palatalization) suffixed with -i or -a (50% of stems for each), varying 
the order in which training trials appeared. Adjacency of faithful and unfaithful mappings 
was manipulated such that faithful mappings were adjacent in half of the conditions and 
randomly ordered in the other half, with the same true for unfaithful mappings. The four 
conditions were None Obvious (neither faithful nor unfaithful mappings presented in 
contiguity), All Obvious (both faithful and unfaithful mappings presented in contiguity), 
Change Obvious (only unfaithful mappings presented in contiguity), and NoChange 
Obvious (only faithful mappings presented in contiguity). 
17 7 
  
We model the results of Experiment 3 with code implementing the Rescorla-Wagner 
(1972) discriminative learning model. The model includes two layers of nodes, one for 
cues and one for outcomes, with association weights connecting them. The model uses 
the cues to predict the outcomes, and when the prediction is wrong (an unexpectedly 
absent or present outcome, given the set of active cues), weights are adjusted, with more 
unexpected outcomes resulting in greater adjustments. The cues in the baseline model 
include the stem-final consonant of the singular and the semantic meaning (PL), and the 
outcomes are the stem-final consonant of the plural and the plural vowel. The initial 
connection weights reflect English speakers’ biases, namely that alveolars are most 
acceptable to change, followed by velars, with labial alternations disliked. The model 
captures the results below when the suffix vowel is chosen before the plural stem-final 
consonant allomorph (see §6.2.2 for discussion). To encode the implicational hierarchy of 
larger changes implying smaller (Skoruppa et al., 2011; White, 2013, 2014), we specify 
that kàtʃ provides support for tàtʃ as well, and that pàtʃ provides support for both tàtʃ 
and kàtʃ. The trial order manipulation determines the availability of cues: When the 
singular and plural forms are adjacent, the phonological form of the singular is available 
to predict the phonological form of the plural, but when they are not adjacent, only the 
semantics (PL) can be used. Since only the weights of present cues can be updated, 
mappings that are not adjacent can not be updated and thus in the None Obvious 
condition, they reflect the initial biases. In the sections below, we review the results from 
the experiment and discuss how successful the model is at capturing them, as well as any 
modifications (additional cues/outcomes or adjustments to association weights) we apply. 
 
17 8 
  
7.1.3.1. Suffix vowel 
Keeping faithful pairs intact (NoChange Obvious and All Obvious) results in greater 
use of the suffix -i, with most participants approximately probability matching. Not 
keeping faithful pairs intact (None Obvious, Change Obvious) leads many participants to 
use only -a, with many participants regularizing. This can be seen in Figure 7.4, with 
lower rates of -a for the NoChange Obvious and All Obvious conditions compared to 
None Obvious and Change Obvious. There is a weaker effect for keeping unfaithful pairs 
intact (Change Obvious, All Obvious), which results in more use of -a, especially for the 
To-Be-Palatalized consonants, seen by comparing the light and dark bars in Change 
Obvious and All Obvious vs. None Obvious and NoChange Obvious. 
 
 
Figure 7.4. Percent of plurals suffixed with -a by training trial order and To-Be-
Palatalized status of stem. 
 
The baseline model captures that To-Be-Palatalized stems are more likely to be 
suffixed with -a than Not-To-Be-Palatalized stems, that exposure to intact pairs 
exemplifying faithful mappings results in more use of -i, and that exposure to intact pairs 
17 9 
  
exemplifying unfaithful mappings results in more use of -a, particularly for the To-Be-
Palatalized consonants. However, it underpredicts the frequency of -a overall. We 
increment -a by 4% in both languages to account for how participants tend to fall 
between probability matching and maximizing in their vowel choice.  
The other way the model falls short is that it fails to capture that participants use -a 
more when trained on the Velar Palatalization language than on Labial Palatalization, as 
shown in Figure 7.5. Palatalization is triggered by -a in this experiment, and labial 
palatalization (like other large changes) is difficult to produce (Chapter IV), so we 
propose that Labial Palatalization participants choose to avoid -a in order to avoid the 
challenging palatalization alternation. If it is difficulty that makes them avoid -a, we 
would expect that were labial palatalization less difficult to perform, Labial Palatalization 
participants would trend more towards maximizing/regularization, as the Velar 
Palatalization participants do. To represent this in the model, we increment -a by an 
additional 6% in Velar Palatalization. 
 
Figure 7.5. Percent of plurals suffixed with -a by training trial order and training 
language. 
18 0 
  
7.1.3.2. Palatalization 
Palatalization occurs before -a in training, and the participants who produced each 
suffix on at least 10% of trials learn to palatalize before -a more than before -i, as shown 
in Figure 7.6. The baseline model captures this finding.  
 
 
Figure 7.6. Palatalization probability by training trial order and suffix vowel. 
 
Looking only at the palatalizing suffix -a, keeping unfaithful pairs intact leads to 
more palatalization, especially of To-Be-Palatalized consonants, and keeping faithful 
pairs intact leads to less palatalization, with no significant interaction. Figure 7.7 shows 
this effect: There is substantially more palatalization in the Change Obvious and All 
Obvious conditions, especially for To-Be-Palatalized consonants, than there is in the 
None Obvious and NoChange Obvious conditions. Keeping both types of mappings 
intact results in production of palatalization, but restricted to the appropriate consonants. 
18 1 
  
 
Figure 7.7. Production of palatalization before -a by training trial order and To-Be-
Palatalized status of stem. 
 
The model underestimates the increase in palatalization of Not-To-Be-Palatalized 
consonants in the Change Obvious vs. All Obvious/None Obvious conditions before -a. 
We suggest that adjacent unfaithful pairs allows participants to notice, through perceptual 
contrast and/or surprise, the association between [tʃ] and [a], fusing them together into a 
single “chunk” (see §7.2.3, below, for further discussion). This chunking allows [tʃ] and 
[a] to elicit one another, making it more likely that [a] will trigger [tʃ]. In the absence of 
adjacent faithful pairs to draw notice to [a] without [tʃ], the chunk can apply in all 
contexts. We implement the association of [tʃ] and [a] in the model as a boost with 
constant magnitude, after which the model makes the correct prediction of greater 
overgeneralization in Change Obvious. 
The baseline model also fails to predict that To-Be-Palatalized consonants are 
palatalized less in All Obvious than Change Obvious for Velar Palatalization, but not 
Labial Palatalization. This is shown in Figure 7.8: The rate of palatalization of To-Be-
18 2 
  
Palatalized consonants is the same for Change Obvious and All Obvious for Labial 
Palatalization (light bars), but lower for All Obvious than Change Obvious for Velar 
Palatalization (dark bars). We propose this is because faithful mappings (produced using 
CopyFIN) are taken to apply to the smallest natural class including the copied segments. In 
Labial Palatalization, alveolars and velars are both [lingual], but the copied segments in 
Velar Palatalization are alveolars and labials, which do not form a natural class excluding 
velars, so copying is free to apply to all consonants. We modify the model by adding 
[lingual] as a cue for singulars ending with [t] or [k] and incrementing CopyFIN whenever 
the natural class of copied segments includes all consonants in the language (i.e. in Velar 
Palatalization). 
 
 
Figure 7.8. Rates of palatalization in production of To-Be-Palatalized consonants by 
training trial order and training language. 
 
7.1.4. Context naturalness and alternation learnability 
Experiment 2 shows that the naturalness of the alternation affects acquisition of 
paradigmatic mappings, with the larger, less natural labial palatalization being produced 
18 3 
  
less often than the smaller, more natural velar and alveolar palatalization. Labial 
Palatalization participants still learn the product-oriented schema “plurals should end in   
[tʃi],” leading to high acceptance rates of palatalization in judgment. To evaluate the 
effect that context naturalness has on learnability, we compare the palatalization rates in 
production of To-Be-Palatalized consonants for Experiment 2 to Experiment 3. 
Palatalization is triggered by -i in Experiment 2, which is the typologically more common 
and phonetically more natural context (Bateman, 2007; Kochetov, 2011), whereas it is 
triggered by -a in Experiment 3. Wilson (2006) shows that palatalization is learned 
equally well before [i] and before [e], but that palatalization before [e] generalizes to [i] 
more than the reverse. Our previous work on palatalization before -a shows lower rates of 
palatalization compared to -i (Stave et al., 2013, vs. Smolek & Kapatsinski, 2018), but the 
same bias against labial palatalization. However, the experiments are not strictly 
equivalent in training design.  
Experiments 2 and 3 can be directly compared, since the training has the same 
number of trials distributed over the consonants in the same way, with certain 
restrictions. We included only Labial and Velar Palatalization participants from 
Experiment 2 (since Alveolar Palatalization is not included in the languages of 
Experiment 3), and only None Obvious participants from Experiment 3 (because all 
participants in Experiment 2 were trained with trials in a random order). In Experiment 2, 
we excluded participants who produced particularly egregious plurals, whereas we 
included every participant from Experiment 3. To ensure that the Experiment 3 
participants were not overall worse-performing than those in Experiment 2, we excluded 
any participants from the former who produced more errors (see pp. 188) than the worst-
18 4 
  
performing participant in the latter, which removed one subject (from Velar Palatalization 
None Obvious). We ran generalized logistic linear mixed-effects models with the lme4 
package (version 1.1-21, Bates et al., 2015) in R (version 3.6.0, R Development Core 
Team, 2019). Fixed effects were included for Training Language (Labial and Velar), 
Vowel Context (Correct vs. Incorrect, based on training; i.e. -i is coded as Correct for 
Experiment 2 and Incorrect for Experiment 3, and -a is coded as Incorrect for Experiment 
2 and Correct for Experiment 3), To-Be-Palatalized (yes and no), and Experiment (2 and 
3), and any significant interactions. Random intercepts were included for Subjects and 
Singulars, with no random slopes. Log likelihood models on nested models were used to 
derive significance values, and for contrasts that were not significant and of theoretical 
interest, evidence for the null hypothesis was evaluated using the BIC approximation to 
the Bayes Factor (Wagenmakers, 2007). 
7.1.4.1. Palatalization of To-Be-Palatalized consonants in the triggering context 
First, we compare the rate of palatalization in production of the To-Be-Palatalized 
consonants in the triggering context (before -i for Experiment 2 and before -a for 
Experiment 3)62 to determine whether the effect of change magnitude holds for both 
natural and unnatural contexts. Velars are palatalized more than labials (b = -3.63, se(b) = 
1.00, z = -3.64, p < 0.001, and Training Language significantly improves model fit (χ2(1) 
= 7.86, p = 0.005). The difference is smaller for Experiment 3 (b = 3.68, se(b) = 1.55, z = 
2.37, p < 0.02) and the interaction of Experiment by Training Language significantly 
improves model fit (χ2(1)  = 5.64, p < 0.02). Figure 7.9 shows why: Palatalization is 
learned better before -i than before -a, overall (b = 0.84, se(b) = 1.10, z = 0.76, p = 0.45 
                                                
62 Keep Place ~ Experiment * Training Language + (1 | Subject) + (1 | Singular), data restricted to To-Be-
Palatalized consonants before -i for Experiment 2 and before -a for Experiment 3 
18 5 
  
but Experiment significantly improves model fit, χ2(1) = 12.12, p < 0.001), but before -i, 
correct velar palatalization is learned much better than correct labial palatalization, 
whereas there is no difference by language before -a.  
 
Figure 7.9. Rates of palatalization of To-Be-Palatalized consonant in palatalization-
triggering context by training language and experiment. 
 
The lack of difference before -a could be due to participants in Experiment 3 failing 
to learn either alternation, perhaps because palatalization before -a is phonetically 
unmotivated. Stave et al. (2013) find a difference between languages before -a, but the 
stem vowel was always [a] in palatalized forms, so perhaps the contextual consistency 
made the pattern easier to notice and apply.  
We first compare the proportion of palatalized plurals produced by participants in 
each condition. We would expect that as difficulty increases, so does the number of 
participants who produce minimal palatalization, with more participants at the higher end 
18 6 
  
of the scale for easier patterns. We calculated the proportion of palatalized plurals for To-
Be-Palatalized stems by dividing the number of plurals (before -i or -a; for comparison of 
learning by suffix vowel context, see §7.1.4.2) by 36. 
Figure 7.10 shows that many more participants learn to palatalize the correct 
consonants in Experiment 2 than Experiment 3. 93% of Velar Palatalization participants 
palatalize velars at least once, and 43% palatalize more than half the time; 50% of Labial 
Palatalization participants palatalize labials at least once, and 13% palatalize more than 
half the time. In Experiment 3, however, only around half of participants in either 
language palatalize at all (52% for Labial Palatalization and 57% for Velar 
Palatalization), and none palatalize half the time (22% palatalized was the highest 
proportion for a participant in Labial Palatalization and 42% for Velar Palatalization). 
Experiment 3 shows lower rates of palatalization because very few participants manage 
to palatalize at all; palatalizing before -a was evidently much more difficult to learn, even 
for velars.  
Another proxy for experimental difficulty is the rate of “acceptable” plurals produced, 
where by “acceptable” we mean plurals whose plural consonant is either palatalized or 
retained from the singular, and whose plural vowel is either -i or -a (i.e. the plurals that 
were not excluded from analysis). If participants produce more “unacceptable” plurals 
(e.g. roʊp~roʊpakaɪ, gwæp~gwæpeɪd) or fail to produce a plural at all, we can assume 
that this reflects confusion about the pattern. For these purposes, faithful plurals of To-
Be-Palatalized consonants and unfaithful plurals of Not-To-Be-Palatalized consonants are 
still “acceptable,” in that the form of the plural obeys the rules regarding how plurals can 
look. 
18 7 
  
 
Figure 7.10. Proportion of plurals of To-Be-Palatalized consonants that were palatalized. 
Upper row: Experiment 2. Lower row: Experiment 3, None Obvious training order. Left 
column: Labial Palatalization condition. Right column: Labial Palatalization condition. 
 
We created histograms of the proportion of “acceptable” plurals for every subject, by 
training language and experiment, as shown in Figure 7.11. (Note that the sample size 
was not the same for every experiment; we are interested not in the raw counts 
themselves but rather in the shape of the distribution.) The proportion was calculated by 
dividing the number of “acceptable” plurals by 92, the total number of plurals. 
Comparison of the top row (Experiment 2) to the bottom row (Experiment 3) is 
informative: In Experiment 2, most participants produce a majority, even a large 
majority, of “acceptable” plurals, whereas in Experiment 3, many participants fall below 
50% “acceptable” plurals and the distributions are more uniform. For both experiments, 
Labial Palatalization (left column) has more participants at the lower end of acceptability. 
18 8 
  
Statistical analyses63 show that participants in Experiment 3 produce significantly more 
“unacceptable” forms than participants in Experiment 2 (b = 0.99, se(b) = 0.36, z = 2.72, 
p = 0.006; Experiment is significant in model comparisons, χ2(1) = 7.14, p < 0.008), and 
Velar Palatalization training results in fewer errors (b = -0.82, se(b) = 0.36, z = -2.29, p = 
0.02; including Training Language significantly improves model fit, χ2(1) = 5.11, p = 
0.02), with no significant interaction. From this, we can conclude that the participants in 
Experiment 3 struggle to extract the appropriate patterns, often producing inexplicable 
plurals or failing to produce a plural at all, and participants trained on the large change 
alternation have more difficulty learning the patterns. 
 
 
Figure 7.11. Proportion of plurals that followed patterns included in training. Upper row: 
Experiment 2. Lower row: Experiment 3, None Obvious trial order. Left column: Labial 
Palatalization condition. Right column: Velar Palatalization condition. 
 
                                                
63 Errors ~ Experiment + Training Language + (1 | Subject) + (1 | Singular), including participants from 
trained on Labial and Velar Palatalization in Experiment 2 and participants in the None Obvious trial order 
in Experiment 3. 
18 9 
  
The lack of a difference between Labial and Velar Palatalization in Experiment 3 
suggests that substantive biases affect learning, as proposed by Wilson (2006) and White 
(2013, 2014), among others (cf. J. P. Blevins, 2006; Bybee, 2001; Hale & Reiss, 2000; 
Moreton & Pater, 2012b, for skepticism). Velar palatalization before -i is more natural 
than before -a, in that [k] is articulatorily and perceptually closer to [tʃ] before -i, but not 
before -a; velar palatalization is also more natural than labial palatalization, because [k] 
shares articulators with [tʃ] but [p] does not. The results suggest that it is only when both 
context and change are natural that the alternation is learned, at least when related forms 
are not presented in contiguity. The results of Experiment 3 show that unnatural 
alternations can be acquired when related forms are temporally contiguous, with both 
labial and velar palatalization before -a being produced at high rates in the Change 
Obvious and All Obvious conditions. 
7.1.4.2. Generalization to the “wrong” suffix 
We evaluate the extent to which palatalization generalizes from the unnatural context 
to the natural context, as shown by Wilson (2006) and Mitrović (2012), by comparing 
rates of palatalization of To-Be-Palatalized consonants by Vowel Context and 
Experiment64. Figure 7.12 is informative: When trained to palatalize before -i (left panel), 
neither Labial nor Velar Palatalization participants palatalize before -a at an appreciable 
rate. When trained to palatalize before -a (right panel), both Labial and Velar 
Palatalization participants palatalize before -i at a rate comparable to before -a.  
In other words, participants trained on the change in an unnatural context generalize 
to producing the change in the natural context, but not the reverse (b = -2.07, se(b) = 
                                                
64 Keep Place ~ Experiment * Training Language * Vowel Context + (1 | Subject) + (1 | Singular), 
restricted to To-Be-Palatalized consonants 
19 0 
  
0.70, z = -2.96, p = 0.003; the interaction of Experiment by Vowel Context significantly 
improves model fit, χ2(1) = 59.78, p < 0.001). The generalization difference is especially 
stark for Velar Palatalization, as shown in the difference in heights of the light (correct 
vowel context) and dark (incorrect vowel context) bars for Velar Palatalization compared 
to the same for Labial Palatalization in Figure 7.12.  
 
 
Figure 7.12. Rates of palatalization of To-Be-Palatalized consonants by training 
language and plural vowel. Left panel: Experiment 2, where participants were trained to 
palatalize before -i and not -a. Right panel: Experiment 3, where participants were trained 
to palatalize before -a and not before -i. 
 
To confirm that it is overgeneralization and not just that some participants learn the 
wrong pattern, we evaluated individual differences in overgeneralization of palatalization 
by vowel context. We restricted the analysis to participants who palatalized at least once 
in the correct context (excluding one participant from Experiment 3 Velar Palatalization, 
who palatalized 3 times before -i but never before -a, suggesting they did not learn the 
19 1 
  
target pattern) and at least twice overall (any fewer than that, and it would not be possible 
for them to show any differences by suffix vowel). For Experiment 2, 8 out of the 22 
Labial Palatalization participants who qualified palatalize before both -i and -a (36.4%), 
as do 9 out of the 28 Velar Palatalization participants (32.1%). For Experiment 3, 8 of the 
14 Labial Palatalization participants palatalize before both vowels (57.1%), as do 5 of the 
8 Velar Palatalization participants (62.5%). There are fewer people who learn to 
palatalize in Experiment 3, but more than half of those who do extend the alternation to 
the more natural context, whereas only around a third of participants trained on the 
natural context extend it to the unnatural. Most of the participants in Experiment 2 who 
palatalize in both contexts palatalize in the incorrect context very rarely, whereas the 
participants in Experiment 3 do not show such a skewed distribution (though the range is 
roughly the same for both), as shown in Figure 7.13. However, Fisher’s exact tests show 
no significant difference in the proportion of overgeneralizers by experiment for Labial (p 
= 0.31) or Velar (p = 0.22) Palatalization, so these conclusions are tentative. The results 
suggest that large, unnatural changes – when they are learned – may be taken to imply 
smaller, natural changes, as proposed by Wilson (2006) and Mitrović (2012).  
 
19 2 
  
 
Figure 7.13. Proportion of palatalized plurals that were suffixed with the correct vowel  
(-i for Experiment 2, -a for Experiment 3). Only participants who palatalized at least once 
in the correct context and at least twice overall were included. 
 
 
7.2. Theoretical implications 
7.2.1. The fate of large changes 
The dissociation between judgment and production in Experiment 2 introduces some 
uncertainty regarding the fate of large changes. They are likely leveled by the speaker but 
judged unacceptable by the listener; if the speaker obeys the listener, they may avoid the 
faithful form in the future. Speakers do adjust production in response to listener feedback 
(Buz et al., 2016; Goldstein et al., 2003; Maniwa et al., 2009; Schertz, 2013; Warlaumont 
et al., 2014), which provides evidence that listeners’ beliefs about speakers’ productions, 
if made apparent and heeded by the speaker, can influence their future productions. 
However, listeners’ beliefs are based on the productions they hear, so often in language 
change, “use leads, and belief follows” (Harmon & Kapatsinski, 2017). Sociolinguistics 
is full of dissociations between judgment and production, where speakers produce an 
19 3 
  
innovative form but judge it unacceptable due to stigma (Labov, 1975, 1996), but it is 
unclear whether these judgments result in avoidance of the unacceptable forms or limit 
their spread. More research on the interaction between belief and use is needed by, for 
example, performing observational studies on the impact of social acceptability on use 
and implementation of more interactive tasks on how judgment and production interact 
(Buz et al., 2016), varying the order of production and judgment tasks (Harmon & 
Kapatsinski, 2017), and examination of the time course of development of judgment and 
production in the acquisition of alternations (Kerkhoff, 2007). 
7.2.2. The importance of syntagmatic co-occurrence 
The results of Experiment 3 show that paradigmatic mappings may actually be 
learned syntagmatically, strengthening when related forms occur next to each other in 
time. McNeill (1966) was skeptical that associative models could capture acquisition of 
paradigmatic mappings because paradigmatic associates do not appear in contiguity, 
whether in speech or through erroneous anticipation, but this has proven to be unfounded. 
Corpus studies show that members of morphological paradigms occur near each other 
more often than other word pairs (Baroni et al., 2002; Xu & Croft, 1998), so learning 
paradigmatic mappings does not require any superhuman capabilities like perfect recall 
or anticipation, but rather merely noticing related forms when they occur together. While 
paradigmatically related words “have a relation to one another different from co-
occurrence” (McNeill, 1966, p. 543), that relation is nonetheless learned in the presence 
of co-occurrence. Associative models are able to acquire paradigmatic mappings 
precisely because temporal contiguity matters. Mechanisms that acquire paradigmatic 
mappings in the absence of contiguity (Albright & Hayes, 2003; Ervin, 1961; McNeill, 
19 4 
  
1966; Plunkett & Juola, 1999) may be overly powerful and exceed the ability of actual 
learners 
7.2.3. Chunking and common fate 
Incrementing the association between [tʃ] and [a] in the model successfully captures 
the greater overgeneralization of palatalization in Change Obvious over All Obvious, 
which suggests that adjacency of corresponding forms is not just helpful for making the 
cues comprising the singular form more available for predicting the plural form, but also 
brings out differences between the corresponding forms. Recent work on category 
learning has shown that temporal adjacency between exemplars from multiple categories 
make participants focus on the discriminative features (Carvalho & Goldstone, 2015; 
Zaki & Salmi, 2019), whereas adjacent exemplars of the same category make participants 
notice the shared features, even if they are also shared by exemplars of other categories. 
In the present work, the “chunking” of [tʃa] in Change Obvious motivates our claim that 
placing singulars and plurals next to each other also makes participants notice the parts 
they do not have in common.  
The results also suggest that elements that “move together” when an exemplar from 
one category is placed next to an exemplar from another “fuse together,” and each 
becomes able to evoke the other in production and perception. Noticing that the [k] or [p] 
of the singular has been replaced by [tʃa] in the plural when the forms are adjacent helps 
the -a suffix, once chosen for production, to evoke [tʃ] and thus palatalize the preceding 
consonant. This can be considered an instance of the “principle of common fate,” the 
basic mechanism of perceptual grouping (Köhler, 1929; Wertheimer, 1923/1938; Uttal et 
al., 2000). In speech processing, the principle of common fate manifests as the grouping 
19 5 
  
of auditory elements that change in amplitude together or are frequency-modulated into a 
single stream (Bregman & Pinker, 1978; but see Böhm et al., 2003, for evidence against 
common fate in perceptual grouping). Goodsitt et al. (1993) and Kuhl (2000) reference 
the principle of common fate in their suggestion that infants group together sounds that 
occur together in words (i.e. that aren’t separated by a word boundary). However, Baayen 
et al. (2016) demonstrate that this clustering can be described using a baseline 
discriminative learning model, as long as upcoming elements are predicted from 
preceding ones, without needing to include common fate/chunking. The present results, 
on the other hand, suggest that discriminative learning needs to be supplemented with a 
chunking mechanism. Without a chunking mechanism, the baseline model is unable to 
account for the overgeneralization differences between Change Obvious and All 
Obvious, suggesting that common fate may aid in strengthening associations between 
elements that replace another element in a shared context. For example, saying I went to 
pet the pengui-, no, to pet my cat on the sofa may strengthen the association between my 
and cat by making the shared context obvious, creating a variation set. 
7.2.4. Variation sets 
In variation sets, successive utterances present different morphemes in a constant 
communicative context, which facilitates acquisition of words and morphemes (Küntay 
& Slobin, 1996; Onnis et al., 2008; Schwab & Lew-Williams, 2016; Tal & Arnon, 2018; 
Waterfall, 2006). Following Ervin (1961), variation sets are usually thought to indicate to 
the learner that certain morphemes are interchangeable in a paradigm. A variation set 
may additionally teach the learner that all segments of a morpheme belong together: 
Following the principle of common fate, all segments comprising a morpheme “change 
19 6 
  
together” and therefore can fuse and evoke one another. In other words, in addition to 
teaching that two morphemes belong together in a paradigm, adjacent utterances may 
also teach that the segments comprising a morpheme (or word, or phrase) belong 
together. 
7.2.5. Surprise! 
The other major tweak to the model was to provide an additional role for surprise, 
which goes beyond the influence on learning rate that is proposed by all error-driven 
models (Baayen et al., 2016; Olejarczuk et al., 2018; Rescorla, 1988). The goal of the 
Rescorla-Wagner model is to make correct predictions when presented with certain cues. 
Surprise determines learning rate in that whenever an outcome is expected given the 
preceding cues, the model’s beliefs are correct and need not be updated. The extent to 
which the model’s beliefs are updated is proportional to how surprising the event is (see 
(1-a) and (0-a) terms in equations (1) and (2) in §6.1). The present results suggests that 
the Rescorla-Wagner model underestimates the importance of surprise, as the occurence 
of [tʃa] when the singular ends in [p] or [k], which leads the learner to expect [p] or [k] in 
the plural, boosts [tʃa] more strongly than predicted. This could be driven by a 
disconfirmation or novelty bias, where surprising events change beliefs more than would 
normatively be warranted (see Olejarczuk et al., 2018, for evidence of disconfirmation 
bias in phonological learning). Disconfirmation bias contrasts with confirmation bias, 
which is discounting evidence inconsistent with prior beliefs (in other words, 
underutilizing information from surprising events; Bacon, 1620/1932; see Klayman, 
1995, Nickerson, 1988 for reviews). Confirmation bias is well-documented in other 
domains but there is limited evidence of it in language learning outside of experiments 
19 7 
  
where learners are asked to discover rules by explicitly testing the grammaticality of 
different sentences (Robinson, 1996). Studies have shown that evidence for a 
phonetically unmotivated pattern is taken by participants to imply the existence of the 
phonetically motivated counterpart: Exposure to palatalization before [e] leads to 
production of palatalization before [i] as well (Wilson, 2006), and training on a saltatory 
alternation (e.g. p~v) generalizes to the intermediate segments (e.g. b~v and f~v) (White, 
2013, 2014). Even when explicitly trained to not alternate the intermediate segments, 
participants still often prefer to do so (White, 2013, 2014), disregarding the disconfirming 
evidence. While these results suggest that surprising linguistic events do not necessarily 
result in corresponding adjustment of beliefs, much more research is needed to determine 
under what circumstances confirmation vs. disconfirmation bias takes precedence.  
7.3. Conclusion 
In this work, we proposed the Perseveration Hypothesis, a novel explanation for 
Paradigm Uniformity, the avoidance of stem changes (especially large ones). The 
Perseveration Hypothesis claims that stem changes are leveled by paradigmatic 
perseveration within the production system. When trying to produce a novel form of a 
known word, other forms of the word are activated along with production schemas linked 
to the meaning to be expressed (e.g. PLURAL). The articulatory gestures comprising 
these forms are incorporated into the novel form through a process of blending the 
activated production representations (Kapatsinski, 2013; Taylor, 2012). When too much 
is copied, the stem change is leveled. To prevent wanton paradigm leveling, speakers 
learn paradigmatic associations between related forms, which specify that activation of a 
particular gesture in the base form should activate a different gesture in the to-be-
19 8 
  
produced form, which is copied into the production plan under construction. These 
paradigmatic associations are harder to learn when gestures are dissimilar, because 
linking dissimilar representations requires modifying more synapses in the brain 
(Kapatsinski, 2011; Warker & Dell, 2006).  
Paradigmatic perseveration and the bias against associating dissimilar forms 
comprises the Perseveration Hypothesis: Perseveration conflicts with changes mandated 
by paradigmatic associations, and associations between dissimilar representations are 
more difficult to acquire. 
The Perseveration Hypothesis makes the unique claim that the bias against large 
changes is strongest in production, because performing the change is what is difficult. We 
do see a dissociation between production and judgment: In Experiment 2, participants 
trained on labial palatalization (a large change) judge palatalization to be better than non-
palatalization, but are unlikely to produce it. Overgeneralization of large changes to small 
changes is present in the judgment task, but overgeneralization is not itself the cause of 
Paradigm Uniformity. Participants may accept alternating forms in judgment because 
they contain the appropriate cues to meaning (e.g. by following a first-order schema like 
“plurals end in [tʃi]”) without being able to produce the forms themselves. Difficulties in 
performing the stem change, especially when it is large, erode its productivity: A change 
that is hard to produce is more likely to be leveled, preventing future speakers from 
encountering the alternation and falling out of use over time, and this loss of productivity 
is widespread enough to arguably be universal (Bybee, 2008).  
Acquiring morphology is often considered the hardest part of learning language 
(Slabakova, 2008). Morphological cues are often low in salience and therefore are often 
19 9 
  
missed or underutilized (MacWhinney et al., 1985). Morphology is also rife with 
paradigmatic mappings, which play a relatively minor role elswhere in the grammar 
(Kapatsinski, 2018a). In Experiment 3, we show that both issues are less severe when 
corresponding forms are in temporal contiguity. Through perceptual grouping, or the 
principle of common fate (Wertheimer, 1923/1938), parts of a form that jointly cue a 
difference in meaning between forms “pop out,” which makes them easier to cue each 
other. Contiguity may be essential for learning novel unfaithful mappings, in that 
paradigmatic mappings become more apparent as contiguity makes the phonological 
properties of one form available for predicting properties of the other. It may also be 
important for learning faithful mappings, which require copying elements of the base 
form into the output: Both faithful and unfaithful mappings can be overgeneralized, in a 
way that is sensitive to the similarity structure of phonological space. Contiguity between 
forms exemplifying a mapping does not, on its own, prevent overgeneralization. For 
example, noticing kàtʃ (through contiguous singulars and plurals) favors mapping both 
[k] and [t] onto [tʃ]; noticing pàtʃ favors mapping all consonants onto [tʃ]; and noticing 
tàt and pàp favors kàk. In the condition where only unfaithful mappings are intact, 
changes are overgeneralized, and when only faithful mappings are intact, changes avoid 
notice and are rarely produced. Only when both are in contiguity are changes constrained 
to the appropriate context, as discriminative learning partitions the space of inputs 
between faithful and unfaithful mappings. 
Early work assumed that morphologically related words are rarely if ever in 
contiguity (McNeill, 1966), but recent work in corpus and computational linguistics 
shows that contiguity between related words is a ubiquitous feature of natural language 
20 0 
  
and can be helpful for detecting morphological paradigms like that went is the past tense 
of go (Baroni et al., 2002; Xu & Croft, 1998). We suspect, based on the results of 
Experiment 3, that contiguity is not only helpful for computational models but is also 
essential for human learners of morphology and morphophonology. Paradigmatic 
relations may only be learnable because paradigmatically-related words are subject to 
syntagmatic co-occurrence, which allows the patterns between related forms to be noticed 
and acquired. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20 1 
  
APPENDIX A 
 
EXPERIMENT 1 AND EXPERIMENT 2 JUDGMENT STIMULUS LISTS 
 
In the lists below, the plurals are formed by suffixing the vowel to the singular stem(e.g. 
blæb(-i,-a)àblæbi, blæba) or by removing the stem-final consonant and replacing it with 
the palatal-vowel combination(e.g. blæb(-dʒi,-dʒa)àblædʒi, blædʒa). 
 
Labial stems: 
blæb(-i,-a,-dʒi,-dʒa), fraɪb(-i,-a,-dʒi,-dʒa), kwoʊb(-i,-a,-dʒi,-dʒa), prub(-i,-a,-dʒi,-dʒa), 
smæb(-i,-a,-dʒi,-dʒa), blaɪp(-i,-a,-tʃi,-tʃa), frɛp(-i,-a,-tʃi,-tʃa), præp(-i,-a,-tʃi,-tʃa), skip(-i,-
a,-tʃi,-tʃa), smip(-i,-a,-tʃi,-tʃa) 
 
Alveolar stems: 
bloʊd(-i,-a,-dʒi,-dʒa), frɑd(-i,-a,-dʒi,-dʒa), kwæd(-i,-a,-dʒi,-dʒa), preɪd(-i,-a,-dʒi,-dʒa), 
smɑd(-i,-a,-dʒi,-dʒa), blit(-i,-a,-tʃi,-tʃa), frɛt(-i,-a,-tʃi,-tʃa), kweɪt(-i,-a,-tʃi,-tʃa), prut(-i,-a,-
tʃi,-tʃa), smɑt(-i,-a,-tʃi,-tʃa) 
 
Velar stems: 
blɪg(-i,-a,-dʒi,-dʒa), fraɪg(-i,-a,-dʒi,-dʒa), kwɪg(-i,-a,-dʒi,-dʒa), prɪg(-i,-a,-dʒi,-dʒa), smɪg(-
i,-a,-dʒi,-dʒa), bleɪk(-i,-a,-tʃi,-tʃa), frik(-i,-a,-tʃi,-tʃa), kwɑk(-i,-a,-tʃi,-tʃa), praɪk(-i,-a,-tʃi,-
tʃa), smɛk(-i,-a,-tʃi,-tʃa) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20 2 
  
APPENDIX B 
 
EXPERIMENT 2 STIMULUS LISTS 
 
Training 
Labial Palatalization training language 
brib(-dʒi, 1), tʃaɪb(-dʒi, 1), gib(-dʒi, 1), hɛb(-dʒi, 4), paɪb(-dʒi, 4), vɑb(-dʒi, 1), broup(-tʃi, 
4), glɑp(-tʃi, 1), heɪp(-tʃi, 1), naɪp(-tʃi, 1), slæp(-tʃi, 4), snip(-tʃi, 1), tʃaɪd(-a, 1), dæt(-a, 2), 
drid(-a, 2), feɪd(-i, 2), flaɪt(-a, 1), hɛd(-i, 1), lɑt(-i, 2), preɪt(-i, 1), tʃik(-a, 1), faɪk(-a, 2), 
glug(-i, 2), gwig(-a, 2), heɪg(-a, 1), noʊk(-i, 2), roʊg(-i, 1), spaɪk(-i, 1) 
 
Alveolar Palatalization training language 
brid(-dʒi, 1), tʃaɪd(-dʒi, 1), gid(-dʒi, 1), hɛd(-dʒi, 4), paɪd(-dʒi, 4), vɑd(-dʒi, 1), brout(-tʃi, 
4), glɑt(-tʃi, 1), heɪt(-tʃi, 1), naɪt(-tʃi, 1), slæt(-tʃi, 4), snit(-tʃi, 1), tʃaɪp(-a, 1), dæp(-a, 2), 
drip(-a, 2), feɪp(-i, 2), flaɪp(-a, 1), hɛp(-i, 1), lɑp(-i, 2), preɪp(-i, 1), tʃik(-a, 1), faɪk(-a, 2), 
glug(-i, 2), gwig(-a, 2), heɪg(-a, 1), noʊk(-i, 2), roʊg(-i, 1), spaɪk(-i, 1) 
 
Velar Palatalization training language 
brig(-dʒi, 1), tʃaɪg(-dʒi, 1), gig(-dʒi, 1), hɛg(-dʒi, 4), paɪg(-dʒi, 4), vɑg(-dʒi, 1), brouk(-tʃi, 
4), glɑk(-tʃi, 1), heɪk(-tʃi, 1), naɪk(-tʃi, 1), slæk(-tʃi, 4), snik(-tʃi, 1), tʃip(-a, 1), faɪp(-a, 2), 
glub(-i, 2), gwib(-a, 2), heɪb(-a, 1), noʊp(-i, 2), roʊb(-i, 1), spaɪp(-i, 1), tʃaɪd(-a, 1), dæt(-
a, 2), drid(-a, 2), feɪd(-i, 2), flaɪt(-a, 1), hɛd(-i, 1), lɑt(-i, 2), preɪt(-i, 1) 
 
Production 
Labial Palatalization training language 
vɛg, θug, strig, sprug, smoʊg, slɛg, skwæg, sɛg, kwug, krig, klɪg, frig, fleɪg, drɑg, vɪk, 
traɪk, θoʊk, streɪk, smuk, slɑk, skwaɪk, sɪk, plæk, kweɪk, klɑk, fræk, foʊk, faɪk, wɛd, trid, 
ðoʊd, θaɪd, stɛd, snud, ʃroʊd, ʃlud, proʊd, præd, gwɪd, flɑd, faɪd, blud, waɪt, θut, ðat, stɪt, 
sprɛt, snɛt, ʃræt, ʃaɪt, plɑt, kraɪt, gweɪt, fɛt, draɪt, blɪt, smeɪb, skɑb, sɪb, sib, ʃrib, ʃlɑb, 
proʊb, nɑb, kwæb, klub, gwɑb, froʊb, fleɪb, blub, slɛp, slæp, skæp, ʃroʊp, ʃlɪp, roʊp, plɪp, 
kwɑp, krip, klip, gwæp, frɪp, dræp, bloʊp, θip, strup, spreɪp, smoʊp, trɑb, θaɪb, streɪb, 
snɛb 
 
Alveolar Palatalization training language 
vɛg, θug, strig, sprug, smoʊg, slɛg, skwæg, sɛg, kwug, krig, klɪg, frig, fleɪg, drɑg, vɪk, 
traɪk, θoʊk, streɪk, smuk, slɑk, skwaɪk, sɪk, plæk, kweɪk, klɑk, fræk, foʊk, faɪk, smeɪb, 
skɑb, sɪb, sib, ʃrib, ʃlɑb, proʊb, nɑb, kwæb, klub, gwɑb, froʊb, fleɪb, blub, slɛp, slæp, 
skæp, ʃroʊp, ʃlɪp, roʊp, plɪp, kwɑp, krip, klip, gwæp, frɪp, dræp, bloʊp, trid, ðoʊd, θaɪd, 
stɛd, snud, ʃroʊd, ʃlud, proʊd, præd, gwɪd, flɑd, faɪd, blud, waɪt, θut, ðat, stɪt, sprɛt, snɛt, 
ʃræt, ʃaɪt, plɑt, kraɪt, gweɪt, fɛt, draɪt, blɪt, moʊt, dʒeɪt, dit, chut, vid, slɛd, hoʊd, krɪd 
 
 
Velar Palatalization training language 
smeɪb, skɑb, sɪb, sib, ʃrib, ʃlɑb, proʊb, nɑb, kwæb, klub, gwɑb, froʊb, fleɪb, blub, slɛp, 
slæp, skæp, ʃroʊp, ʃlɪp, roʊp, plɪp, kwɑp, krip, klip, gwæp, frɪp, dræp, bloʊp, wɛd, trid, 
ðoʊd, θaɪd, stɛd, snud, ʃroʊd, ʃlud, proʊd, præd, gwɪd, flɑd, faɪd, blud, waɪt, θut, ðat, stɪt, 
20 3 
  
sprɛt, snɛt, ʃræt, ʃaɪt, plɑt, kraɪt, gweɪt, fɛt, draɪt, blɪt, vɛg, θug, strig, sprug, smoʊg, slɛg, 
skwæg, sɛg, kwug, krig, klɪg, frig, fleɪg, drɑg, vɪk, traɪk, θoʊk, streɪk, smuk, slɑk, skwaɪk, 
sɪk, plæk, kweɪk, klɑk, fræk, foʊk, faɪk, nik, mɛk, tʃuk, boʊk, pɑg, heɪg, dig, blɪg 
 
Judgment Test 
See Appendix A. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20 4 
  
APPENDIX C 
 
EXPERIMENT 3 STIMULUS LISTS 
 
Training 
Labial Palatalization training language 
hɛt(-a, 2), paɪt(-a, 1), vɑt(-i, 2), ɡit(-i, 1), broʊd(-a, 2), slæd(-a, 1), heɪd(-i, 2), naɪd(-i, 1), 
brik(-a, 2), tʃaɪk(-a, 1), feɪk(-i, 2), hɛk(-i, 1), dæɡ(-a, 2), flaɪɡ(-a, 1), lɑɡ(-i, 2), preɪɡ(-i, 1), 
ɡwip(-tʃa, 4), heɪp(-tʃa, 4), ɡlup(-tʃa, 1), roʊp(-tʃa, 1), θɑp(-tʃa, 1), ɡɛp(-tʃa, 1), snaɪb(-
dʒa, 4), brub(-dʒa, 4), nib(-dʒa, 1), spɑb(-dʒa, 1), pæb(-dʒa, 1), toʊb(-dʒa, 1)  
 
Velar Palatalization training language 
hɛt(-a, 2), paɪt(-a, 1), vɑt(-i, 2), ɡit(-i, 1), broʊd(-a, 2), slæd(-a, 1), heɪd(-i, 2), naɪd(1 -i), 
brip(-a, 2), tʃaɪp(-a, 1), feɪp(-i, 2), hɛp(-i, 1), dæb(-a, 2), flaɪb(-a, 1), lɑb(-i, 2), preɪb(-i, 1), 
ɡwik(-tʃa, 4), heɪk(-tʃa, 4), ɡluk(-tʃa, 1), roʊk(-tʃa, 1), θɑk(-tʃa, 1), ɡɛk(-tʃa, 1), snaɪɡ(-
dʒa, 4), bruɡ(-dʒa, 4), niɡ(-dʒa, 1), spɑɡ(-dʒa, 1), pæɡ(-dʒa, 1), toʊɡ(-dʒa, 1) 
 
Production 
Labial Palatalization training language 
blɪt, blud, draɪt, faɪd, fɛt, flɑd, gweɪt, gwɪd, kraɪt, plɑt, præd, proʊd, ʃlaɪt, ʃlud, ʃræt, ʃroʊd, 
snɛt, snud, sprɛt, stɛd, stɪt, θaɪd, ðɑt, ðoʊd, θut, trid, waɪt, wɛd, bloʊp, blub, dræp, fleɪb, 
frɪp, froʊb, gwɑb, gwæp, klip, klub, krip, kwæb, kwɑp, nɑb, plɪp, proʊb, roʊp, ʃlɑb, ʃlɪp, 
ʃrib, ʃroʊp, sib, sɪb, skɑb, skæp, slæp, slɛp, smeɪb, drɑg, faɪk, fleɪg, foʊk, fræk, frig, klɑk, 
klɪg, krig, kweɪk, kwug, plæk, sɛg, sɪk, skwæg, skwaɪk, slɑk, slɛg, smoʊg, smuk, sprug, 
streɪk, strig, θoʊk, θug, traɪk, vɛg, vɪk, smoʊp, snɛb, spreɪp, streɪb, strup, θaɪb, θip, trɑb, 
troʊdʒ, trædʒ, θidʒ, ʃrɛdʒ, ʃlædʒ, saɪdʒ, sædʒ, prɪdʒ, kwɛdʒ, krɑdʒ, kludʒ, klɪdʒ, fridʒ, 
drɪdʒ, θɛtʃ, θætʃ, sutʃ, staɪtʃ, slaɪtʃ, ʃrɑtʃ, ʃlɪtʃ, prɛtʃ, prɑtʃ, plutʃ, kwɑtʃ, kloʊtʃ, gweɪtʃ, 
gwætʃ 
 
Velar Palatalization training language 
blɪt, blud, draɪt, faɪd, fɛt, flɑd, gweɪt, gwɪd, kraɪt, plɑt, præd, proʊd, ʃlaɪt, ʃlud, ʃræt, ʃroʊd, 
snɛt, snud, sprɛt, stɛd, stɪt, θaɪd, ðɑt, ðoʊd, θut, trid, waɪt, wɛd, bloʊp, blub, dræp, fleɪb, 
frɪp, froʊb, gwɑb, gwæp, klip, klub, krip, kwæb, kwɑp, nɑb, plɪp, proʊb, roʊp, ʃlɑb, ʃlɪp, 
ʃrib, ʃroʊp, sib, sɪb, skɑb, skæp, slæp, slɛp, smeɪb, drɑg, faɪk, fleɪg, foʊk, fræk, frig, klɑk, 
klɪg, krig, kweɪk, kwug, plæk, sɛg, sɪk, skwæg, skwaɪk, slɑk, slɛg, smoʊg, smuk, sprug, 
streɪk, strig, θoʊk, θug, traɪk, vɛg, vɪk, boʊk, chuk, dig, glug, heɪg, mɛk, nik, pɑg, troʊdʒ, 
trædʒ, θidʒ, ʃrɛdʒ, ʃlædʒ, saɪdʒ, sædʒ, prɪdʒ, kwɛdʒ, krɑdʒ, kludʒ, klɪdʒ, fridʒ, drɪdʒ, θɛtʃ, 
θætʃ, sutʃ, staɪtʃ, slaɪtʃ, ʃrɑtʃ, ʃlɪtʃ, prɛtʃ, prɑtʃ, plutʃ, kwɑtʃ, kloʊtʃ, gweɪtʃ, gwætʃ 
 
 
 
 
 
20 5 
  
REFERENCES CITED 
 
Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: Implicative patterns 
in inflectional paradigms. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar 
(pp. 54-82). Oxford University Press. 
doi:10.1093/acprof:oso/9780199547548.003.0003. 
 
Ackerman, F., & Malouf, R. (2013). Morphological organization: The low conditional 
entropy conjecture. Language, 89(3), 429-464. doi:10.1353/lan.2013.0054. 
 
Albright, A. (2008). Explaining universal tendencies and language particulars in 
analogical change. In J. Good (Ed.), Linguistic universals and language change, 144-
181. Oxford University Press. doi:10.1093/acprof:oso/9780199298495.003.0007. 
 
Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A 
computational/experimental study. Cognition, 90(2), 119-161. doi:10.1016/s0010-
0277(03)00146-x. 
 
Alderete, J. D. (2001). Dominance effects as transderivational anti-faithfulness. 
Phonology, 18(2), 201-253. doi:10.1017/s0952675701004067. 
 
Allen, B., & Becker, M. (2015). Learning alternations from surface forms with sublexical 
phonology. Unpublished manuscript, University of British Columbia, Vancouver, 
Canada, and Stony Brook University, Stony Brook, NY. Retrieved from 
https://ling.auf.net/lingbuzz/002503. 
 
Ambridge, B., Pine, J. M., Rowland, C. F., & Chang, F. (2012). The roles of verb 
semantics, entrenchment and morphophonology in the retreat from dative argument 
structure overgeneralization errors. Language, 88(1), 45-81. 
doi:10.1353/lan.2012.0000. 
 
Ambridge, B., Pine, J. M., Rowland, C. F. & Young, C. R. (2008). The effect of verb 
semantic class and verb frequency (entrenchment) on children’s and adults’ graded 
judgments of argument structure overgeneralisation errors. Cognition, 106(1), 87-129. 
doi:10.1016/j.cognition.2006.12.015. 
 
Andersen, H. (1973). Abductive and deductive change. Language, 49(4), 765-793. 
doi:10.2307/412063. 
 
Anttila, R. (1989). Historical and comparative linguistics. John Benjamins. 
 
Arnold, D., Tomaschek, F., Sering, K., Lopez, F., & Baayen, R. H. (2017). Words from 
spontaneous conversational speech can be recognized with human-like accuracy by 
an error-driven learning algorithm that discriminates between meanings straight from 
smart acoustic features, bypassing the phoneme as recognition unit. PLoS One, 12(4), 
e0174623. doi:10.1371/journal.pone.0174623. 
20 6 
  
Arnon, I., & Ramscar, M. (2012). Granularity and the acquisition of grammatical gender: 
How order-of-acquisition affects what gets learned. Cognition, 122(3), 292-305. 
doi:10.1016/j.cognition.2011.10.009. 
 
Ashby, F. G., Ennis, J. M., & Spiering, B. J. (2007). A neurobiological theory of 
automaticity in perceptual categorization. Psychological Review, 114, 632-656. 
doi:10.1037/0033-295x.114.3.632. 
 
Baayen, R. H., Dijkstra, T., & Schreuder, R. (1997). Singulars and plurals in Dutch: 
Evidence for a parallel dual-route model. Journal of Memory and Language, 37(1), 
94-117. doi:10.1006/jmla.1997.2509. 
 
Baayen, R. H., Milin, P., Đurđević, D. F., Hendrix, P., & Marelli, M. (2011). An 
amorphous model for morphological processing in visual comprehension based on 
naive discriminative learning. Psychological Review, 118(3), 438-481. 
doi:10.1037/a0023851. 
 
Baayen, R. H., Shaoul, C., Willits, J., & Ramscar, M. (2016). Comprehension without 
segmentation: A proof of concept with naive discriminative learning. Language, 
Cognition and Neuroscience, 31(1), 106-128. doi:10.1080/23273798.2015.1065336. 
 
Bacon, F. (1939). Novum organum. In E. A. Burtt (Ed.), The English philosophers from 
Bacon to Mill (pp. 24-123). Random House. (Original work published in 1620)  
 
Baese-Berk, M., & Goldrick, M. (2009). Mechanisms of interaction in speech production. 
Language and Cognitive Processes, 24(4), 527-554. 
doi:10.1080/01690960802299378. 
 
Bakovic, E. (2003). Vowel harmony and stem identity. San Diego Linguistic Papers, 1, 
1-42. Retrieved from 
https://cloudfront.escholarship.org/dist/prd/content/qt7zw206pt/qt7zw206pt.pdf. 
 
Bangasser, D. A., Waxler, D. E., Santollo, J., & Shors, T. J. (2006). Trace conditioning 
and the hippocampus: The importance of contiguity. Journal of Neuroscience, 26(34), 
8702-8706. doi:10.1523/jneurosci.1742-06.2006. 
 
Baroni, M., Matiasek, J., & Trost, H. (2002). Unsupervised discovery of morphologically 
related words based on orthographic and semantic similarity. In Proceedings of the 
ACL-02 workshop on morphological and phonological learning, vol. 6 (pp. 48-57). 
Association for Computational Linguistics. doi:10.3115/1118647.1118653. 
 
 Bateman, N. (2007). A crosslinguistic investigation of palatalization (Doctoral 
dissertation, University of California San Diego). Available from ProQuest 
Dissertations and Theses database. (3262182) 
 
 
20 7 
  
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects 
models using lme4. Journal of Statistical Software, 67(1), 1-48. 
doi:10.18637/jss.v067.i01. 
 
Becker, M., & Gouskova, M. (2016) Source-oriented generalizations as grammar 
inference in Russian vowel deletion. Linguistic Inquiry, 47(3), 391-425. 
doi:10.1162/ling_a_00217. 
 
Beckman, J. N. (1998). Positional faithfulness (Doctoral dissertation, University of 
Massachusetts). Available from ProQuest Dissertation and Theses database. 
(9823717) 
 
Bennett, W. G., & Braver, A. (2015). The productivity of ‘unnatural’ labial palatalization 
in Xhosa. Nordlyd, 42, 33-44. doi:10.7557/12.3738. 
 
Benua, L. (1997). Transderivational identity: Phonological relations between words 
(Doctoral dissertation, University of Massachusetts). Available from ProQuest 
Dissertation and Theses database. (9809307) 
 
Berko, J. (1958). The child’s learning of English morphology. Word, 14, 150–177. 
doi:10.1080/00437956.1958.11659661. 
 
Bhat, DN. (1978). A general study of palatalization. Universals of Human Language, 2, 
47-92.  
 
Bickel, B., Banjade, G., Gaenszle, M., Lieven, E., Paudyal, N. P., Rai, I. P., ... & Stoll, S. 
(2007). Free prefix ordering in Chintang. Language, 83(1), 43-73. 
doi:10.1353/lan.2007.0002. 
 
Biedermann, B., Beyersmann, E., Mason, C., & Nickels, L. (2013). Does plural 
dominance play a role in spoken picture naming? A comparison of unimpaired and 
impaired speakers. Journal of Neurolinguistics, 26(6), 712-736. 
doi:10.1016/j.jneuroling.2013.05.001. 
 
Blevins, J. (2006). A theoretical synopsis of Evolutionary Phonology. Theoretical 
Linguistics, 32(2), 117-166. doi:10.1515/tl.2006.009. 
 
Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(3), 531-573. 
doi:10.1017/s0022226706004191. 
 
Blevins, J. P. (2013). Word-based morphology from Aristotle to modern WP (word and 
paradigm models). In K. Allan (Ed.), The Oxford handbook of the history of 
linguistics (pp. 41-85). Oxford University Press. 
doi:10.1093/oxfordhb/9780199585847.013.0017. 
 
 
20 8 
  
Blevins, J. P., Milin, P., Ramscar, M. (2017). The Zipfian paradigm cell filling problem. 
In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Morphological paradigms and 
functions (pp. 139-158). Brill. doi:10.1163/9789004342934_008. 
 
Blything, R. P., Ambridge, B., & Lieven, E. V. (2014). Children use statistics and 
semantics in the retreat from overgeneralization. PLoS One, 9(10), e110009. 
doi:10.1371/journal.pone.0110009. 
 
Böhm, T. M., Shestopalova, L., Bendixen, A., Andreou, A. G., Georgiou, J., Garreau, G., 
... & Winkler, I. (2013). The role of perceived source location in auditory stream 
segregation: Separation affects sound organization, common fate does not. Learning 
& Perception, 5(Supplement 2), 55-72. doi:10.1556/lp.5.2013.suppl2.5. 
 
Bolognesi, R. (1998). The phonology of Campidanian Sardinian: A unitary account of a 
self-organizing structure. Holland Institute for Generative Linguistics. 
 
Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word 
Structure, 9(2), 156-182. doi:10.3366/word.2016.0092. 
 
Bonami, O., & Strnadová, J. (2019). Paradigm structure and predictability in derivational 
morphology. Morphology, 29(2), 167-197. doi:10.1007/s11525-018-9322-6. 
 
Booij, G. (2010). Construction morphology. Language and Linguistics Compass, 4(7), 
543-555. doi:10.1111/j.1749-818x.2010.00213.x. 
 
Boomershine, A., Hall, K. C., Hume, E., & Johnson, K. (2008). The impact of allophony 
versus contrast on speech perception. In P. Avery, B. E. Dresher & K. Rice (Eds.), 
Contrast in phonology: Theory, perception, acquisition (pp. 145-171). Mouton de 
Gruyter. doi:10.1515/9783110208603.2.145. 
 
Boyd, J. K., & Goldberg, A. E. (2011). Learning what not to say: The role of statistical 
preemption and categorization in a-adjective production. Language, 87(1), 55-83. 
doi:10.1353/lan.2011.0012. 
 
Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual long-term memory 
has a massive storage capacity for object details. Proceedings of the National 
Academy of Sciences, 105(38), 14325-14329. doi:10.1073/pnas.0803390105. 
 
Braine, M. D., Brody, R. E., Brooks, P. J., Sudhalter, V., Ross, J. A., Catalano, L., & 
Fisch, S. M. (1990). Exploring language acquisition in children with a miniature 
artificial language: Effects of item and pattern frequency, arbitrary subclasses, and 
correction. Journal of Memory and Language, 29(5), 591-610. doi:10.1016/0749-
596x(90)90054-4. 
 
 
 
20 9 
  
Braine, M. D., & Brooks, P. J. (1995). Verb argument structure and the problem of 
avoiding an overgeneral grammar. In M. Tomasello & W. E. Merriman (Eds.), Beyond 
names for things: Young children’s acquisition of verbs (pp. 353-376). Lawrence 
Erlbaum Associates. doi:10.4324/9781315806860. 
 
Braver, A., & Bennett, W. G. (2015, January). Phonology or morphology: Inter-speaker 
differences in Xhosa labial palatalization. Paper presented at the 89th Annual 
Meeting of the Linguistic Society of America, Portland, OR. 
 
Bregman, A. S., & Pinker, S. (1978). Auditory streaming and the building of timbre. 
Canadian Journal of Psychology, 32(1), 19-31. doi:10.1037/h0081664. 
 
Brooks, P. J., Braine, M. D. S., Catalano, L., Brody, R. E., & Sudhalter, V. (1993). 
Acquisition of gender-like noun subclasses in an artificial language: The contribution 
of phonological markers to learning. Journal of Memory and Language, 32, 79–95. 
doi:10.1006/jmla.1993.1005. 
 
Brooks, P. J., Tomasello, M., Dodson, K., & Lewis, L. B. (1999). Young children’s 
overgeneralizations with fixed transitivity verbs. Child Development, 70(6), 1325–
1337. doi:10.1111/1467-8624.00097. 
 
Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units. 
Phonology, 6(2), 201-251. doi:10.1017/s0952675700001019. 
 
Brown, R., & Berko, J. (1960). Word association and the acquisition of grammar. Child 
Development, 31(1), 1-14. doi:10.1111/j.1467-8624.1960.tb05779.x. 
 
Burzio, L. (1996). Surface constraints versus underlying representations. In J. Durand & 
B. Laks (Eds.), Current trends in phonology: Models and methods, vol. 1 (pp. 123-
142). University of Salford. Retrieved from 
https://pdfs.semanticscholar.org/1cb4/4e69e127fe04be25fe744409f87e310d8e86.pdf. 
 
Buz, E., Tanenhaus, M. K., & Jaeger, T. F. (2016). Dynamically adapted context-specific 
hyper-articulation: Feedback from interlocutors affects speakers’ subsequent 
pronunciations. Journal of Memory and Language, 89, 68-86. 
doi:10.1016/j.jml.2015.12.009.  
 
Bybee, J. (1985). Morphology: A study of the relation between meaning and form. John 
Benjamins. doi:10.1075/tsl.9. 
 
Bybee, J. (2001). Phonology and language use. Cambridge University Press. 
doi:10.1017/cbo9780511612886. 
 
Bybee, J. (2002). Sequentiality as the basis of constituent structure. In T. Givon & B. F. 
Malle (Eds.), The evolution of language out of pre-language (pp. 109–132). John 
Benjamins. doi:10.1075/tsl.53.07byb. 
21 0 
  
Bybee, J. (2008). Formal universals as emergent phenomena: The origins of structure 
preservation. In J. Good (Ed.), Linguistic universals and language change (pp. 108-
121). Oxford University Press. doi:10.1093/acprof:oso/9780199298495.003.0005. 
 
Bybee, J. (2010). Language, usage and cognition. Cambridge University Press. 
doi:10.1017/cbo9780511750526. 
 
Bybee, J., & Slobin, D. I. (1982). Why small children cannot change language on their 
own: Suggestions from the English past tense. In A. Alqvist (Ed.), Papers from the 
5th International Conference on Historical Linguistics (pp. 29-37). John Benjamins. 
doi:10.1075/cilt.21.07byb. 
 
Caballero, G. (2010). Scope, phonology and morphology in an agglutinating language: 
Choguita Rarámuri (Tarahumara) variable suffix ordering. Morphology, 20(1), 165-
204. doi:10.1007/s11525-010-9147-4. 
 
Caballero, G. & Kapatsinski, V. (2019). How agglutinative? Searching for cues to 
meaning in Choguita Rarámuri (Tarahumara) using discriminative learning. In A. 
Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological typology and 
linguistic cognition. Cambridge University Press. 
 
Cai, D. J., Mednick, S. A., Harrison, E. M., Kanady, J. C., & Mednick, S. C. (2009). 
REM, not incubation, improves creativity by priming associative networks. 
Proceedings of the National Academy of Sciences, 106(25), 10130-10134. 
doi:10.1073/pnas.0900271106. 
 
Carvalho, P. F., & Goldstone, R. L. (2015). The benefits of interleaved and blocked study: 
Different tasks benefit from different schedules of study. Psychonomic Bulletin & 
Review, 22(1), 281-288. doi:10.3758/s13423-014-0676-4. 
 
Chen, M. (1973). On the formal expression of natural rules in phonology. Journal of 
Linguistics, 9(02), 223-249. doi:10.1017/s0022226700003765. 
 
Chomsky, N., & Halle, M. (1965). Some controversial questions in phonological 
theory. Journal of Linguistics, 1(2), 97-138. doi:10.1017/s0022226700001134. 
 
Chomsky, N., & Halle, M. (1968). The sound pattern of English. Harper & Row. 
Retrieved from ERIC database. (ED020511) 
 
Christdas, Prathima. (1988). The phonology and morphology of Tamil (Doctoral 
dissertation, Cornell University). Avaliable from ProQuest Dissertation and Theses 
database. (8900809) 
 
Clark, R. (1974). Performing without competence. Journal of Child Language, 1, 1-10. 
doi:10.1017/s0305000900000040. 
 
21 1 
  
Clark, R. (1977). What’s the use of imitation? Journal of Child Language, 4, 341-58. 
doi:10.1017/s0305000900001732. 
 
Clements, G. N., and Hume, E. (1995). The internal organization of speech sounds. In J. 
Goldsmith (Ed.), Handbook of phonological theory (pp. 245–306). Blackwell. 
 
Cristià, A., & Seidl, A. (2008). Is infants' learning of sound patterns constrained by 
phonological features? Language Learning and Development, 4(3), 203-227. 
doi:10.1080/15475440802143109. 
 
Dąbrowska, E. (2012). Different speakers, different grammars: Individual differences in 
native language attainment. Linguistic Approaches to Bilingualism, 2(3), 219-253. 
doi:10.1075/lab.2.3.01dab.  
 
Dąbrowska, E., & Szczerbiński, M. (2006). Polish children's productivity with case 
marking: the role of regularity, type frequency, and phonological diversity. Journal of 
Child Language, 33(3), 559-597. doi:10.1017/s0305000906007471. 
 
Davidson, L. (2011). Characteristics of stop releases in American English spontaneous 
speech. Speech Communication, 53(8), 1042-1058. 
doi:10.1016/j.specom.2011.05.010. 
 
Davis, M. H., Di Betta, A. M., Macdonald, M. J., & Gaskell, M. G. (2009). Learning and 
consolidation of novel spoken words. Journal of Cognitive Neuroscience, 21(4), 803-
820. doi:10.1162/jocn.2009.21059. 
 
Davis, M. H., & Gaskell, M. G. (2009). A complementary systems account of word 
learning: neural and behavioural evidence. Philosophical Transactions of the Royal 
Society of London B: Biological Sciences, 364(1536), 3773-3800. 
doi:10.1098/rstb.2009.0111. 
 
Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. 
Psychological Review, 93(3), 283-321. doi:10.1037/0033-295x.93.3.283. 
 
Dell, G. S., Burger, L. K., & Svec, W. R. (1997). Language production and serial order: A 
functional analysis and a model. Psychological Review, 104(1), 123-147. 
doi:10.1037//0033-295x.104.1.123. 
 
Do, Y. A. (2013). Biased learning of phonological alternations. (Doctoral Dissertation, 
MIT). Retrieved from 
https://dspace.mit.edu/bitstream/handle/1721.1/84416/868024936-
MIT.pdf?sequence=2&isAllowed=y. 
 
Do, Y. (2018). Paradigm uniformity bias in the learning of Korean verbal 
inflections. Phonology, 35(4), 547-575. doi:10.1017/s0952675718000209. 
 
21 2 
  
Dumay, N., & Gaskell, M. G. (2007). Sleep-associated changes in the mental 
representation of spoken words. Psychological Science, 18(1), 35-39. 
doi:10.1111/j.1467-9280.2007.01845.x. 
 
Ellis, N. C. (2006). Language acquisition as rational contingency learning. Applied 
linguistics, 27(1), 1-24. doi:10.1093/applin/ami038. 
 
Ellis, N. C. (2017). Chunking in language usage, learning and change: I don’t know. In 
M. Hundt, S. Mollin, & S. E. Pfenninger (Eds.), The changing English language: 
Psycholinguistic perspectives (pp. 113-147). Cambridge University Press. 
doi:10.1017/9781316091746.006. 
 
Elsner, B., & Hommel, B. (2001). Effect anticipation and action control. Journal of 
Experimental Psychology: Human Perception and Performance, 27(1), 229-240. 
doi:10.1037//0096-1523.27.1.229. 
 
Ervin, S. M. (1961). Changes with age in the verbal determinants of word-association. 
American Journal of Psychology, 74, 361-372. doi:10.2307/1419742. 
 
Ettlinger, M., Morgan­Short, K., Faretta­Stutenberg, M., & Wong, P. C. (2016). The 
relationship between artificial and second language learning. Cognitive 
Science, 40(4), 822-847. doi:10.1111/cogs.12257. 
 
Farrar, M. J. (1992). Negative evidence and grammatical morpheme acquisition. 
Developmental Psychology, 28(1), 90-98. doi:10.1037//0012-1649.28.1.90. 
 
Feldman, J. (2003). The simplicity principle in human concept learning. Current 
Directions in Psychological Science, 12(6), 227-232. doi:10.1046/j.0963-
7214.2003.01267.x. 
 
Fellbaum, C. (1996). Co-occurrence and antonymy. International Journal of 
Lexicography, 8(4), 281-303. doi:10.1093/ijl/8.4.281. 
 
Finkel, R., & Stump, G. (2007). Principal parts and morphological 
typology. Morphology, 17(1), 39-75. doi:10.1007/s11525-007-9115-9. 
 
Finley, S. (2008). Formal and cognitive restrictions on vowel harmony. (Doctoral 
dissertation, Johns Hopkins University). Available from ProQuest Dissertations and 
Theses database. (3339713) 
 
Finley, S. (2015). Learning exceptions in phonological alternations. In D. C. Noelle, R. 
Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio 
(Eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society 
(pp. 698-703). Cognitive Science Society. Retrieved from 
https://pdfs.semanticscholar.org/9e95/b5c10202ff40e74c3406c2c6f398016b0377.pdf. 
 
21 3 
  
Finn, A. S., & Hudson Kam, C. L. (2008). The curse of knowledge: First language 
knowledge impairs adult learners’ use of novel statistics for word 
segmentation. Cognition, 108(2), 477-499. doi:10.1016/j.cognition.2008.04.002.  
 
Frigo, L., & McDonald, J. L. (1998). Properties of phonological markers that affect the 
acquisition of gender-like subclasses. Journal of Memory and Language, 39(2), 218-
245. doi:10.1006/jmla.1998.2569. 
 
Garcia, C., van Horne, K. D., & Hartshorne, J. (2017). Replication of Finn & Hudson 
Kam (2008) The curse of knowledge: First language knowledge impairs adult 
learners’ use of novel statistics for word segmentation, Exp. 1. Retrieved from 
PsyArXiv. doi:10.17605/OSF.IO/2XCWK. 
 
Gibbon, F. E. (1999). Undifferentiated lingual gestures in children with 
articulation/phonological disorders. Journal of Speech, Language, and Hearing 
Research, 42(2), 382-397. doi:10.1044/jslhr.4202.382 
 
Goldberg, A. E. (1995). Constructions: A Construction Grammar approach to argument 
structure. Chicago University Press. 
 
Goldberg, A. E. (2002). Surface generalizations: An alternative to alternations. Cognitive 
Linguistics, 13(4), 327-356. doi:10.1515/cogl.2002.022. 
 
Goldberg, A. E. (2003). Constructions: a new theoretical approach to language. Trends in 
Cognitive Sciences, 7(5), 219-224. doi:10.1016/s1364-6613(03)00080-9. 
 
Goldberg, A. E. (2011). Corpus evidence of the viability of statistical preemption. 
Cognitive Linguistics, 22(1), 131-153. doi:10.1515/cogl.2011.006. 
 
Goldstein, M. H., King, A. P., & West, M. J. (2003). Social interaction shapes babbling: 
Testing parallels between birdsong and speech. Proceedings of the National Academy 
of Sciences, 100(13), 8030-8035. doi:10.1073/pnas.1332441100.  
 
Goldstone, R. L. (2000). Unitization during category learning. Journal of Experimental 
Psychology: Human Perception and Performance, 26(1), 86-112. doi:10.1037//0096-
1523.26.1.86. 
 
Goldstone, R. L. (2003). Learning to perceive while perceiving to learn. In R. Kimchi, M. 
Behrmann, & C. R. Olson (Eds.), Perceptual organization in vision (pp. 245-290). 
Psychology Press. 
Gontijo, P. F., Gontijo, I., & Shillcock, R. (2003). Grapheme–phoneme probabilities in 
British English. Behavior Research Methods, Instruments, & Computers, 35(1), 136-
157. doi:10.3758/bf03195506. 
 
 
 
21 4 
  
Goodsitt, J. V., Morgan, J. L., & Kuhl, P. K. (1993). Perceptual strategies in prelingual 
speech segmentation. Journal of Child Language, 20(2), 229-252. 
doi:10.1017/s0305000900008266. 
 
Gouskova, M., & Becker, M. (2013). Nonce words show that Russian yer alternations are 
governed by the grammar. Natural Language & Linguistic Theory, 31(3), 735-765. 
doi:10.1007/s11049-013-9197-5. 
 
Gouskova, M., Newlin-Łukowicz, L., & Kasyanenko, S. (2015). Selectional restrictions 
as phonotactics over sublexicons. Lingua, 167, 41-81. 
doi:10.1016/j.lingua.2015.08.014. 
 
Guion, S. G. (1998). The role of perception in the sound change of velar palatalization. 
Phonetica, 55(1-2), 18-52. doi:10.1159/000028423. 
 
Hale, M. & Reiss, C. “Substance abuse” and “dysfunctionalism”: Current trends in 
phonology. Linguistic Inquiry, 31(1), 157-169. 
https://doi.org/10.1162/002438900554334. 
 
Harmon, Z., & Kapatsinski, V. (2017). Putting old tools to novel uses: The role of form 
accessibility in semantic extension. Cognitive Psychology, 98, 22-44. 
doi:10.1016/j.cogpsych.2017.08.002. 
 
Harmon, Z., & Kapatsinski, V. (2019). The target grammar is variable: Speakers’ beliefs 
about the optimality of probability matching. Manuscript in preparation. 
 
Haspelmath, M. (1995). The growth of affixes in morphological reanalysis. In G. Booij 
(Ed.), Yearbook of Morphology 1994 (pp. 1-29). Springer. doi:10.1007/978-94-017-
3714-2_1. 
 
Hayes, B. (2004). Phonological acquisition in Optimality Theory: The early stages. In R. 
Kager, J. Pater & P. Zonneveld (Eds.), Constraints in phonological acquisition (pp. 
158-203). Cambridge University Press. doi:10.1017/cbo9780511486418.006. 
 
Hayes, B., Siptár, P., Zuraw, K. & Londe, Z. (2009). Natural and unnatural constraints in 
Hungarian vowel harmony. Language, 85(4), 822-863. doi:10.1353/lan.0.0169. 
 
Hayes, B. & White, J. (2015). Saltation and the P-map. Phonology, 32(2), 1–36. 
doi:10.1017/s0952675715000159. 
 
Hluštík, P., Solodkin, A., Noll, D. C., & Small, S. L. (2004). Cortical plasticity during 
three-week motor skill learning. Journal of Clinical Neurophysiology, 21(3), 180-191. 
doi:10.1097/00004691-200405000-00006. 
 
Hock, H. H. (1991). Principles of historical linguistics. Mouton de Gruyter. 
doi:10.1515/9783110219135. 
21 5 
  
Hockett, C. F. (1954). Two models of grammatical description. Word, 10, 210-234. 
doi:10.1080/00437956.1954.11659524. 
 
Hockett, C. F. (1967). The Yawelmani basic verb. Language, 43, 208-222. 
doi:10.2307/411395. 
 
Honeybone, P. (2001). Lenition inhibition in Liverpool English. English Language & 
Linguistics, 5(2), 213-249. doi:10.1017/s1360674301000223. 
 
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A 
conditional inference framework. Journal of Computational and Graphical 
statistics, 15(3), 651-674. doi:10.1198/106186006x133933. 
 
Householder, F. W. (1966). Phonological theory: A brief comment. Journal of Linguistics, 
2(1), 99-100. doi:10.1017/s0022226700001353. 
 
Hudson Kam, C.L., & Newport, E.L. (2005). Regularizing unpredictable variation: The 
roles of adult and child learners in language formation and change. Language 
Learning and Development, 1(2), 151-195. doi:10.1207/s15473341lld0102_3. 
 
Hudson Kam, C. L., & Newport, E. L. (2009). Getting it right by getting it wrong: When 
learners change languages. Cognitive Psychology, 59(1), 30-66. 
doi:10.1016/j.cogpsych.2009.01.001.  
 
Johnson, K. (1997). Speech perception without speaker normalization: An exemplar 
model. In K. Johnson & J. W. Mullennix (Eds.), Talker variability in speech 
processing (pp.145-165). Morgan Kaufmann. doi:10.1002/9780470757024.ch15. 
 
Johnson, K., & Babel, M. (2010). On the perceptual basis of distinctive features: 
Evidence from the perception of fricatives by Dutch and English speakers. Journal of 
Phonetics, 38(1), 127-136. doi:10.1016/j.wocn.2009.11.001. 
 
Jones, S. (2002). Antonymy: A corpus-based approach. Routledge. 
doi:10.4324/9780203166253. 
 
Jones, S., Paradis, C., Murphy, M. L., & Willners, C. (2007). Googling for ‘opposites’: A 
web-based study of antonym canonicity. Corpora, 2(2), 129-154. 
doi:10.3366/cor.2007.2.2.129. 
 
Joos, M. (1942). A phonological dilemma in Canadian English. Language, 18, 141-4. 
doi:10.2307/408979 
 
Justeson, J. S., & Katz, S. M. (1991). Co-occurrences of antonymous adjectives and their 
contexts. Computational Linguistics, 17, 1-19. Retrieved from 
https://www.aclweb.org/anthology/J91-1001. 
 
21 6 
  
Kager, R. (1999). Optimality Theory. Cambridge University Press. 
doi:10.1017/cbo9780511812408. 
 
Kapatsinski, V. (2007). Implementing and testing theories of linguistic constituency I: 
English syllable structure. Research on Spoken Language Processing Progress 
Report, 28, 241-76. Retrieved from 
https://www.researchgate.net/profile/Vsevolod_Kapatsinski/publication/237119532_I
mplementing_and_Testing_Theories_of_Linguistic_Constituency_I_English_Syllable
_Structure_1/links/00b7d526751ad24005000000.pdf. 
 
Kapatsinski, V. (2009). Testing theories of linguistic constituency with configural 
learning: The case of the English syllable. Language, 85(2), 248-277. 
doi:10.1353/lan.0.0118. 
 
Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints 
on models of morphophonology. Laboratory Phonology, 1(2), 361-393. 
doi:10.1515/labphon.2010.019.  
 
Kapatsinski, V. (2011). Modularity in the channel: The link between separability of 
features and learnability of dependencies between them. Proceedings of the XVIIth 
International Congress of Phonetic Sciences, 1022-1025. Retrieved from 
https://www.researchgate.net/profile/Vsevolod_Kapatsinski/publication/258205598_
Modularity_in_the_channel_The_link_between_separability_of_features_and_learna
bility_of_dependencies_between_them/links/0c960527358263d2f6000000/Modularit
y-in-the-channel-The-link-between-separability-of-features-and-learnability-of-
dependencies-between-them.pdf. 
 
Kapatsinski, V. (2012). What statistics do learners track? Rules, constraints and schemas 
in (artificial) grammar learning. In S. Th. Gries & D. Divjak (Eds.), Frequency effects 
in language learning and processing (pp. 53-73). Mouton de Gruyter. 
doi:10.1515/9783110274059.53. 
 
Kapatsinski, V. (2013). Conspiring to mean: Experimental and computational evidence 
for a usage-based harmonic approach to morphophonology. Language, 89(1), 110-
148. doi:10.1353/lan.2013.0003. 
 
Kapatsinski, V. (2017a). Copying, the source of creativity. In A. Makarova, S. M. Dickey 
& D. Divjak (Eds.), Each venture a new beginning: Studies in honor of Laura A. 
Janda (pp. 57-70). Slavica. Retrieved from 
https://blogs.uoregon.edu/ublab/files/2017/10/JandaCopying-1j97p9i.pdf. 
 
 
 
 
 
 
21 7 
  
Kapatsinski, V. (2017b). Learning a subtractive morphological system: Statistics and 
representations. Proceedings of the 41st Annual Boston University Conference on 
Language Development, 357-372. Retrieved from 
https://www.researchgate.net/profile/Vsevolod_Kapatsinski/publication/332353115_
Learning_a_Subtractive_Morphological_System_Statistics_and_Representations/link
s/5caf7361299bf120975f695f/Learning-a-Subtractive-Morphological-System-
Statistics-and-Representations.pdf. 
 
Kapatsinski, V. (2018a). Changing minds changing tools: From learning theory to 
language acquisition to language change. MIT Press. 
doi:10.7551/mitpress/11400.001.0001. 
 
Kapatsinski, V. (2018b). Learning morphological constructions. In G. Booij (Ed.), The 
construction of words: Advances in construction morphology, Vol. 4 (pp. 547-581). 
Springer. doi:10.1007/978-3-319-74394-3_19. 
 
Kapatsinski, V., & Harmon, Z. (2017). A Hebbian account of entrenchment and (over)-
extension in language learning. Proceedings of the Annual Meeting of the Cognitive 
Science Society, 39, 2366-2371. Retrieved from 
https://pdfs.semanticscholar.org/a13f/16376b3bedb073d6ecacc9abfcf47c6fa1b2.pdf. 
 
Kempen, G., & Harbusch, K. (2005). The relationship between grammaticality ratings 
and corpus frequencies: A case study into word order variability in the midfield of 
German clauses. In T. Pechmann & C. Habel (Eds.), Linguistic evidence: Empirical, 
theoretical, and computational perspectives (pp.329-349). Mouton de Gruyter. 
doi:10.1515/9783110197549.329. 
 
Kenstowicz, M. (1996). Base identity and uniform exponence: Alternatives to cyclicity. 
In J. Durand & B. Laks (Eds.), Current trends in phonology: Models and methods, 
Vol. 1 (pp. 363-393). University of Salford. Retrieved from 
https://rucore.libraries.rutgers.edu/rutgers-lib/39725/PDF/1/. 
 
Kenstowicz, M. (1998). Uniform exponence: Exemplification and extension. Unpublished 
manuscript, Massachusetts Institute of Technology, Cambridge, MA. Retrieved from 
https://rucore.libraries.rutgers.edu/rutgers-lib/39727/PDF/1/. 
 
Kerkhoff, A. O. (2007). Acquisition of morpho-phonology: The Dutch voicing alternation 
(Doctoral dissertation, University of Nijmegen, Nijmegen, Netherlands). Retrieved 
from 
https://dspace.library.uu.nl/bitstream/handle/1874/22598/full.pdf%3Bjsessionid%3D
D62E8DEC366B4193947FD8ECF10CC8DC?sequence%3D1. 
 
Klayman, J. (1995). Varieties of confirmation bias. Psychology of Learning and 
Motivation, 32, 385-418. doi:10.1016/s0079-7421(08)60315-1. 
 
 
21 8 
  
Kochetov, A. (2011). Palatalization. In C. Ewen, B. Hume, M. van Oostendorp, & K. 
Rice (Eds.), Blackwell companion to phonology (pp. 1666-1690). Wiley-Blackwell. 
doi:10.1002/9781444335262.wbctp0071. 
 
Köhler, W. (1929). Gestalt Psychology. Liveright. 
 
Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010). Scene memory is more 
detailed than you think: The role of categories in visual long-term memory. 
Psychological Science, 21(11), 1551-1556. doi:10.1177/0956797610385359. 
 
Krajewski, G., Theakston, A. L., Lieven, E. V., & Tomasello, M. (2011). How Polish 
children switch from one case to another when using novel nouns: Challenges for 
models of inflectional morphology. Language and Cognitive Processes, 26(4-6), 830-
861. doi:10.1080/01690965.2010.506062. 
 
Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category 
learning. Psychological Review, 99(1), 22-44. doi:10.1037//0033-295x.99.1.22. 
 
Kuczaj, S. A. (1977). The acquisition of regular and irregular past tense forms. Journal of 
Verbal Learning and Verbal Behavior, 16(5), 589-600. doi:10.1016/s0022-
5371(77)80021-2. 
 
Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National 
Academy of Sciences, 97(22), 11850-11857. doi:10.1073/pnas.97.22.11850. 
 
Kumaran, D., Hassabis, D., & McClelland, J. L. (2016). What learning systems do 
intelligent agents need? Complementary learning systems theory updated. Trends in 
Cognitive Sciences, 20(7), 512-534. doi:10.1016/j.tics.2016.05.004. 
 
Kü ntay, A. and Slobin, D. (1996). Listening to a Turkish mother: Some puzzles for 
acquisition. In D. Slobin & J. Gerhardt (Eds.), Social interaction, social context, and 
language: Essays in honor of Susan Ervin-Tripp (pp. 265-286). Erlbaum. 
 
Kurisu, K. (2001). The phonology of morpheme realization (Doctoral dissertation, UC 
Santa Cruz). Available from ProQuest Dissertation and Theses database. (3029802) 
 
Labov, W. (1969). Contraction, deletion, and inherent variability of the English 
copula. Language, 45(4), 715-762. doi:10.2307/412333. 
 
Labov, W., & Austerlitz, R. (1975). Empirical foundations of linguistic theory. In R. 
Austerlitz (Ed.), The scope of American linguistics (pp. 77-133). Peter de Ridder. 
doi:10.1515/9783110857610-006. 
 
Labov, W. (1996).  When intuitions fail. Chicago Linguistic Society, 32, 76-106. 
 
 
21 9 
  
Lehmann, C. (1992). Word order change by grammaticalization. In M. Gerritsen & D. 
Stein (Eds.), Internal and external factors in syntactic change (pp. 395-416). Mouton 
de Gruyter. doi:10.1515/9783110886047.395. 
 
Lewis, P. A., & Durrant, S. J. (2011). Overlapping memory replay during sleep builds 
cognitive schemata. Trends in Cognitive Sciences, 15(8), 343-351. 
doi:10.1016/j.tics.2011.06.004. 
 
Lim, S. J., Fiez, J. A., & Holt, L. L. (2014). How may the basal ganglia contribute to 
auditory categorization and speech perception? Frontiers in Neuroscience, 8, 230. 
doi:10.3389/fnins.2014.00230. 
 
Lobben, M. (1991). Pluralization of Hausa nouns, viewed from psycholinguistic 
experiments and child language data (Master’s thesis, University of Oslo, Oslo, 
Norway). Retrieved from 
https://www.academia.edu/170251/Pluralization_of_Hausa_nouns_-
_viewed_from_psycholinguistic_experiments_and_child_language_data. 
 
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood 
activation model. Ear and Hearing, 19(1), 1-36. doi:10.1097/00003446-199802000-
00001. 
 
Luce, R. D. (1959). Individual choice behavior. Wiley. doi:10.1037/14396-000. 
 
MacWhinney, B., Pleh, C., & Bates, E. (1985). The development of sentence 
interpretation in Hungarian. Cognitive Psychology, 17(2), 178-209. doi:10.1016/0010-
0285(85)90007-6. 
 
Maddox, W. T., Filoteo, J. V., Lauritzen, J. S., Connally, E., & Hejl, K. D. (2005). 
Discontinuous categories affect information-integration but not rule-based category 
learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 
31(4), 654-669. doi:10.1037/0278-7393.31.4.654. 
 
Maddox W. T., Filoteo J. V., Lauritzen J. S. (2007). Within-category discontinuity 
interacts with verbal rule complexity in perceptual category learning. Journal of 
Experimental Psychology: Learning, Memory & Cognition, 33, 197–218. 
doi:10.1037/0278-7393.33.1.197. 
 
Magomedova, V. (2017). Pseudo-allomorphs in Modern Russian. University of 
Pennsylvania Working Papers in Linguistics, 23(1), 16. Retrieved from 
https://repository.upenn.edu/cgi/viewcontent.cgi?article=1956&context=pwpl. 
 
Malouf, R. (2017). Abstractive morphological learning with a recurrent neural network. 
Morphology, 27(4), 431-458. doi:10.1007/s11525-017-9307-x. 
 
 
22 0 
  
Maniwa, K., Jongman, A., & Wade, T. (2009). Acoustic characteristics of clearly spoken 
English fricatives. The Journal of the Acoustical Society of America, 125(6), 3962-
3973. doi:10.1121/1.2990715. 
 
Martin, A. T. (2007). The evolving lexicon. (Doctoral dissertation, University of 
California Los Angeles). Available from ProQuest Dissertation and Theses database. 
(3302537) 
 
Matthews, P. H. (1965). The inflectional component of a word-and-paradigm grammar. 
Journal of Linguistics, 1(2), 139-171. doi:10.1017/s0022226700001146. 
 
Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional 
information can affect phonetic discrimination. Cognition, 82(3), B101-B111. 
doi:10.1016/s0010-0277(01)00157-3. 
 
McCarthy, J. J. (1998). Morpheme structure constraints and paradigm occultation. 
Unpublished manuscript, University of Massachusetts, Amherst. Retrieved from 
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1045&context=linguist_f
aculty_pubs. 
 
McCarthy, J. J. & Prince, A. (1995). Faithfulness and reduplicative identity. Linguistics 
Department Faculty Publication Series, 10. Retrieved from 
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1009&context=linguist_f
aculty_pubs. 
 
McClelland, J. L. (2001). Failures to learn and their remediation: A Hebbian account. In 
J. L. McClelland & R. Siegler (Eds.), Mechanisms of cognitive development: 
Behavioral and neural perspectives (pp. 109-134). Psychology Press. 
doi:10.4324/9781410600646. 
 
McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why there are 
complementary learning systems in the hippocampus and neocortex: Insights from the 
successes and failures of connectionist models of learning and 
memory. Psychological Review, 102(3), 419-457. doi:10.1037//0033-295x.102.3.419. 
 
McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the 
interaction of online referent selection and slow associative learning. Psychological 
Review, 119(4), 831-877. doi:10.1037/a0029872. 
 
McNeill, D. (1963). The origin of associations within the same grammatical class. 
Journal of Verbal Learning & Verbal Behavior, 3, 250-262. doi:10.1016/s0022-
5371(63)80091-2. 
 
McNeill, D. (1966). A study of word association. Journal of Verbal Learning & Verbal 
Behavior, 5, 548-557. doi:10.1016/s0022-5371(66)80090-7. 
 
22 1 
  
Mielke, J. (2004). The emergence of distinctive features (Doctoral dissertation, The Ohio 
State University). Retrieved from 
https://etd.ohiolink.edu/!etd.send_file?accession=osu1092833440&disposition=inline 
 
Mirman, D., McClelland, J. L., & Holt, L. L. (2006). An interactive Hebbian account of 
lexically guided tuning of speech perception. Psychonomic Bulletin & Review, 13(6), 
958-965. doi:10.3758/bf03213909. 
 
Mitroff, S. R., Simons, D. J., & Levin, D. T. (2004). Nothing compares 2 views: Change 
blindness can occur despite preserved access to the changed information. Perception 
& Psychophysics, 66(8), 1268-1281. doi:10.3758/bf03194997. 
 
Mitrović, I. (2012). A phonetically natural vs. native language pattern: An experimental 
study of velar palatalization in Serbian. Journal of Slavic Linguistics, 20(2), 229-268. 
doi:10.1353/jsl.2012.0011. 
 
Moreton, E. (2008). Analytic bias and phonological typology. Phonology, 25(1), 83-127. 
doi:10.1017/s0952675708001413. 
 
Moreton, E. (2012). Inter-and intra-dimensional dependencies in implicit phonotactic 
learning. Journal of Memory and Language, 67(1), 165-183. 
doi:10.1016/j.jml.2011.12.003. 
 
Moreton, E., & Pater, J. (2012a). Structure and Substance in Artificial­phonology 
Learning, Part I: Structure. Language and Linguistics Compass, 6(11), 686-701. 
doi:10.1002/lnc3.363. 
 
Moreton, E., & Pater, J. (2012b). Structure and substance in artificial­phonology 
learning, Part II: Substance. Language and Linguistics Compass, 6(11), 702-718. 
doi:10.1002/lnc3.366. 
 
Moreton, E., Pater, J., & Pertsova, K. (2017). Phonological concept learning. Cognitive 
Science, 41(1), 4-69. doi:10.1111/cogs.12319. 
 
Murphy, M. L. (2006). Antonyms as lexical constructions: or, why paradigmatic 
construction is not an oxymoron. Constructions, SV1(8), 1-37. Retrieved from 
https://journals.linguisticsociety.org/elanguage/constructions/article/download/23/23-
81-1-PB.pdf. 
 
Nelson, K. (1989). Narratives from the crib. Harvard University Press. 
 
Nesset, T. (2008). Abstract phonology in a concrete model: Cognitive linguistics and the 
morphology-phonology interface. Mouton de Gruyter. doi:10.1515/9783110208368. 
 
 
 
22 2 
  
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many 
guises. Review of General Psychology, 2(2), 175-220. doi:10.1037//1089-
2680.2.2.175. 
 
Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive 
Psychology, 47(2), 204-238. doi:10.1016/s0010-0285(03)00006-9. 
 
Norris, D., & McQueen, J. M. (2008). Shortlist B: A Bayesian model of continuous 
speech recognition. Psychological Review, 115(2), 357. doi:10.1037/0033-
295x.115.2.357. 
 
O'Reilly, R. C., & Rudy, J. W. (2001). Conjunctive representations in learning and 
memory: principles of cortical and hippocampal function. Psychological 
Review, 108(2), 311-345. doi:10.1037/0033-295x.108.2.311. 
 
Ohala, J. J. (1978). Southern Bantu vs. the world: The case of palatalization of labials. 
Berkeley Linguistics Society, 4, 370-386. doi:10.3765/bls.v4i0.2218. 
 
Ohala, J. J. (1989). Sound change is drawn from a pool of synchronic variation. In L. E. 
Breivik & E. H. Jahr (Eds.), Language change: Contributions to the study of its 
causes (pp. 173-198). Mouton de Gruyter. doi:10.1515/9783110853063.173. 
 
Olejarczuk, P., Kapatsinski, V., & Baayen, R. H. (2018). Distributional learning is error-
driven: The role of surprise in the acquisition of phonetic categories. Linguistics 
Vanguard, 4(s2). doi:10.1515/lingvan-2017-0020. 
 
Onnis, L., Waterfall, H. R., & Edelman, S. (2008). Learn locally, act globally: Learning 
language from variation set cues. Cognition, 109(3), 423-430. 
doi:10.1016/j.cognition.2008.10.004. 
 
Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice 
attributes and recognition memory for spoken words. Journal of Experimental 
Psychology: Learning, Memory, and Cognition, 19(2), 309-28. doi:10.1037/0278-
7393.19.2.309. 
 
Pater, J., & Tessier, A. M. (2003). Phonotactic knowledge and the acquisition of 
alternations. Proceedings of the 15th International Congress on Phonetic Sciences, 
1177-1180. 
https://pdfs.semanticscholar.org/3d47/386f875afd41b2b019f3d40413dfecb18f47.pdf. 
 
Peperkamp, S., Le Calvez, R., Nadal, J.P., & Dupoux, E. (2006). The acquisition of 
allophonic rules: Statistical learning with linguistic constraints. Cognition, 101(3), 
B31-B41. doi:10.1016/j.cognition.2005.10.006. 
 
 
 
22 3 
  
Perfors, A. (2016). Adult regularization of inconsistent input depends on pragmatic 
factors. Language Learning and Development, 12(2), 138-155. 
doi:10.1080/15475441.2015.1052449. 
 
Perkell, J. S. (2012). Movement goals and feedback and feedforward control mechanisms 
in speech production. Journal of Neurolinguistics, 25(5), 382-407. 
doi:10.1016/j.jneuroling.2010.02.011. 
 
Pierrehumbert, J. B. (2006). The statistical basis of an unnatural alternation. In L. 
Goldstein, D. H. Whalen & C. Best (Eds.), Laboratory phonology 8 (pp. 81-107). 
Mouton de Gruyter. doi:10.1515/9783110197211.1.81. 
 
Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel 
distributed processing model of language acquisition. Cognition, 28(1), 73-193. 
doi:10.1016/0010-0277(88)90032-7. 
 
Plunkett, K., & Juola, P. (1999). A connectionist model of English past tense and plural 
morphology. Cognitive Science, 23, 463-490. doi:10.1207/s15516709cog2304_4. 
 
Prince, A., & Smolensky, P. (1993/2004). Optimality Theory: Constraint interaction in 
generative grammar. Wiley. doi:10.1002/9780470759400. 
 
Proudfoot, A., & Cardo, F. (2005). Modern Italian grammar: A practical guide. 
Routledge. doi:10.4324/9780203085035. 
 
Psychology Software Tools, Inc. E-Prime 2.0 [Computer software]. Retrieved from 
https://www.pstnet.com. 
 
Purcell, D. W., & Munhall, K. G. (2006). Compensation following real-time manipulation 
of formants in isolated vowels. The Journal of the Acoustical Society of America, 
119(4), 2288-2297. doi:10.1121/1.2173514. 
 
Pycha, A., Nowak, P., Shin, E., & Shosted, R. (2003). Phonological rule-learning and its 
implications for a theory of vowel harmony. Proceedings of the 22nd West Coast 
Conference on Formal Linguistics, 22, 101-114. Retrieved from 
https://www.researchgate.net/profile/Anne_Pycha/publication/247174599_Phonologi
cal_Rule-
Learning_and_Its_Implications_for_a_Theory_of_Vowel_Harmony/links/55185a4d0
cf2f7d80a3df7fa.pdf. 
 
R Core Team (2019). R: A language and environment for statistical computing (Version 
3.1.1) [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.  
 
 
 
 
22 4 
  
Raffelsiefen, R. (2005). Paradigm uniformity effects vs. boundary effects. In L. J. 
Downing, T. A. Hall & R. Raffelsiefen (Eds.), Paradigms in phonological theory (pp. 
211-262). Oxford University Press. 
doi:10.1093/acprof:oso/9780199267712.003.0009. 
 
Ramscar, M. (2002). The role of meaning in inflection: Why the past tense does not 
require a rule. Cognitive Psychology, 45(1), 45-94. doi:10.1016/s0010-
0285(02)00001-4. 
 
Ramscar, M. (2013). Suffixing, prefixing, and the functional order of regularities in 
meaningful strings. Psihologija, 46(4), 377-396. doi:10.2298/psi1304377r. 
 
Ramscar, M., Dye, M., & McCauley, S. M. (2013). Error and expectation in language 
learning: The curious absence of mouses in adult speech. Language, 89(4), 760-793. 
doi:10.1353/lan.2013.0068. 
 
Ramscar, M., & Gitcho, N. (2007). Developmental change and the nature of learning in 
childhood. Trends in Cognitive Sciences, 11(7), 274-279. 
doi:10.1016/j.tics.2007.05.007. 
 
Ramscar, M., & Yarlett, D. (2007). Linguistic self­correction in the absence of feedback: 
A new approach to the logical problem of language acquisition. Cognitive 
Science, 31(6), 927-960. doi:10.1080/03640210701703576. 
 
Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The effects of 
feature-label-order and their implications for symbolic learning. Cognitive 
Science, 34(6), 909-957. doi:10.1111/j.1551-6709.2009.01092.x. 
 
Redford, M. A. (2015). Unifying speech and language in a developmentally sensitive 
model of production. Journal of Phonetics, 53, 141-152. 
doi:10.1016/j.wocn.2015.06.006. 
 
Regier, T., & Gahl, S. (2004). Learning the unlearnable: The role of missing 
evidence. Cognition, 93(2), 147-155. doi:10.1016/j.cognition.2003.12.003. 
 
Rescorla, R. A. (1986). Two perceptual variables in within-event learning. Animal 
Learning & Behavior, 14(4), 387-392. doi:10.3758/bf03200083. 
 
Rescorla, R. A. (1988). Pavlovian conditioning: It's not what you think it is. American 
Psychologist, 43(3), 151-160. doi:10.1037//0003-066x.43.3.151. 
 
Rescorla, R. A., & Furrow, D. R. (1977). Stimulus similarity as a determinant of 
Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior 
Processes, 3(3), 203-215. doi:10.1037//0097-7403.3.3.203. 
 
 
22 5 
  
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations 
in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. 
Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64-99). 
Appleton-Century-Crofts. Retrieved from 
https://pdfs.semanticscholar.org/afaf/65883ff75cc19926f61f181a687927789ad1.pdf. 
 
Robins, R. H. (1959). In defence of WP. Transactions of the Philological Society, 58(1), 
116-144. doi:10.1111/j.1467-968x.1959.tb00301.x. 
 
Robinson, P. (1996). Learning simple and complex second language rules under implicit, 
incidental, rule-search, and instructed conditions. Studies in Second Language 
Acquisition, 18(1), 27-67. doi:10.1017/s0272263100014674. 
 
Roelofs, A. (1992). A spreading-activation theory of lemma retrieval in speaking. 
Cognition, 42(1), 107-142. doi:10.1016/0010-0277(92)90041-f. 
 
Rubino, R. B., & Pine, J. M. (1998). Subject–verb agreement in Brazilian Portuguese: 
what low error rates hide. Journal of Child Language, 25(01), 35-59. 
doi:10.1017/s0305000997003310. 
 
Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English 
verbs. In D. E. Rumelhart, J. L. McClelland & The PDP Research Group (Eds.), 
Parallel distributed processing, Vol. 2. MIT Press. Retrieved from 
https://apps.dtic.mil/dtic/tr/fulltext/u2/a164233.pdf. 
 
Sangster, C. M. (2001). Lenition of alveolar stops in Liverpool English. Journal of 
Sociolinguistics, 5(3), 401-412. doi:10.1111/1467-9481.00156. 
 
Saville-Troike, M. (1988). Private speech: Evidence for second language learning 
strategies during the ‘silent’period. Journal of Child Language, 15(3), 567-590. 
doi:10.1017/s0305000900012575. 
 
Scheer, T. (2011). A guide to morphosyntax-phonology interface theories: How extra-
phonological information is treated in phonology since Trubetzkoy’s Grenzsignale. 
Walter de Gruyter. doi:10.1515/9783110238631. 
 
Schertz, J. (2013). Exaggeration of featural contrasts in clarifications of misheard speech 
in English. Journal of Phonetics, 41(3), 249-263. doi:10.1016/j.wocn.2013.03.007. 
 
Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual 
Review of Psychology, 57, 87−115. doi:10.1146/annurev.psych.56.091103.070229. 
 
Schwab, J. F., & Lew-Williams, C. (2016). Repetition across successive sentences 
facilitates young children's word learning. Developmental Psychology, 52(6), 879–86. 
doi:10.1037/dev0000125. 
 
22 6 
  
Schwartz, R. G., & Leonard, L. B. (1982). Do children pick and choose? An examination 
of phonological selection and avoidance in early lexical acquisition. Journal of Child 
Language, 9(2), 319-336. doi:10.1017/s0305000900004748. 
 
Seidl, A., & Buckley, E. (2005). On the learning of arbitrary phonological rules. 
Language Learning and Development, 1(3-4), 289-316. 
doi:10.1207/s15473341lld0103&4_4. 
 
Seidl, A., Cristià, A., Bernard, A., & Onishi, K. H. (2009). Allophonic and phonemic 
contrasts in infants' learning of sound patterns. Language Learning and Development, 
5(3), 191-202. doi:10.1080/15475440902754326. 
 
Seyfarth, S., Ackerman, F., & Malouf, R. (2014). Implicative organization facilitates 
morphological learning. Annual Meeting of the Berkeley Linguistics Society, 40, 480-
494. doi:10.3765/bls.v40i0.3154. 
 
Shepard, R. N. (1967). Recognition memory for words, sentences, and pictures. Journal 
of Verbal Learning and Verbal Behavior, 6(1), 156-163. doi:10.1016/s0022-
5371(67)80067-7. 
 
Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of 
classifications. Psychological Monographs: General and Applied, 75(13), 1-42. 
doi:10.1037/h0093825. 
 
Sims, A. D., & Parker, J. (2016). How inflection class systems work: On the informativity 
of implicative structure. Word Structure, 9(2), 215-239. doi:10.3366/word.2016.0094. 
 
Skoruppa, K., Lambrechts, A., & Peperkamp, S. (2011). The role of phonetic distance in 
the acquisition of phonological alternations. Proceedings of the 39th Annual Meeting 
of the North East Linguistic Society, 464-475. Retrieved from 
http://repository.essex.ac.uk/4251/1/SkoruppaetalNELS.pdf. 
 
Skoruppa, K., & Peperkamp, S. Adaptation to novel accents: Feature-based learning of 
context-sensitive phonological regularities. Cognitive Science, 35(2), 348-366. 
doi:10.1111/j.1551-6709.2010.01152.x. 
 
Slabakova, R. 2008. Meaning in the second language. Mouton de Gruyter. 
doi:10.1515/9783110211511. 
 
Smith, L. B., Thelen, E., Titzer, R., & McLin, D. (1999). Knowing in the context of 
acting: the task dynamics of the A-not-B error. Psychological Review, 106(2), 235-
260. doi:10.1037//0033-295x.106.2.235. 
 
Smolek, A. & Kapatsinski, V. (2018). What happens to large changes? Saltation produces 
well-liked outputs that are hard to generate. Laboratory Phonology: Journal of the 
Association for Laboratory Phonology, 9(1), 10. doi:10.5334/labphon.93. 
22 7 
  
Smolek, A. & Kapatsinski, V. (2019). Syntagmatic paradigms: Learning correspondence 
from contiguity. Manuscript submitted for publication. 
 
Standing, L. (1973). Learning 10000 pictures. The Quarterly Journal of Experimental 
Psychology, 25(2), 207-222. doi:10.1080/14640747308400340. 
 
Standing, L., Conezio, J., & Haber, R. N. (1970). Perception and memory for pictures: 
Single-trial learning of 2500 visual stimuli. Psychonomic Science, 19(2), 73-74. 
doi:10.3758/bf03337426. 
 
Stave, M., Smolek, A., & Kapatsinski, V. (2013). Inductive bias against stem changes as 
perseveration: Experimental evidence for an articulatory approach to output-output 
faithfulness. Proceedings of the 35th Annual Meeting of the Cognitive Science Society, 
3454-3459. Retrieved from 
https://cloudfront.escholarship.org/dist/prd/content/qt293733rg/qt293733rg.pdf. 
 
Stefanowitsch, A. (2008). Negative entrenchment: A usage-based approach to negative 
evidence. Cognitive Linguistics, 19(3), 513-531. doi:10.1515/cogl.2008.020. 
 
Steriade, D. (2000). Paradigm uniformity and the phonetics-phonology boundary. In M. 
B. Broe & J. B. Pierrehumbert (Eds.), Papers in laboratory phonology V: Acquisition 
and the lexicon (pp. 313-334). Cambridge University Press. Retrieved from 
https://www.researchgate.net/profile/Donca_Steriade/publication/2495311_Paradigm
_Uniformity_and_the_Phonetics-
Phonology_Boundary/links/549047eb0cf214269f2664c6.pdf. 
 
Steriade, D. (2001/2009). The phonology of perceptibility effects: The P-map and its 
consequences for constraint organization. In K. Hanson & S. Inkelas (Eds.), The 
nature of the word: Studies in honor of Paul Kiparsky (pp. 151-170). MIT Press. 
doi:10.7551/mitpress/9780262083799.003.0007. 
 
Stump, G., & Finkel, R. A. (2013). Morphological typology: From word to paradigm. 
Cambridge University Press. doi:10.1017/cbo9781139248860. 
 
Sutherland, R. J., & Rudy, J. W. (1989). Configural association theory: The role of the 
hippocampal formation in learning, memory, and amnesia. Psychobiology, 17(2), 
129-144. Retrieved from 
https://link.springer.com/content/pdf/10.3758/BF03337828.pdf. 
 
Taatgen, N. A., & Anderson, J. R. (2002). Why do children learn to say “broke”? A model 
of learning the past tense without feedback. Cognition, 86(2), 123-155. 
doi:10.1016/s0010-0277(02)00176-2. 
 
Tal, S., & Arnon, I. (2018). SES effects on the use of variation sets in child-directed 
speech. Journal of Child Language, 45(6), 1423-1438. 
doi:10.1017/s0305000918000223. 
22 8 
  
Taylor, J. R. (2012). The mental corpus: How language is represented in the mind. 
Oxford University Press. doi:10.1093/acprof:oso/9780199290802.001.0001. 
 
Theodore, R. M., Blumstein, S. E., & Luthra, S. (2015). Attention modulates specificity 
effects in spoken word recognition: Challenges to the time-course hypothesis. 
Attention, Perception, & Psychophysics, 77(5), 1674-1684. doi:10.3758/s13414-015-
0854-0. 
 
Thorndike, E. L. (1898). Animal intelligence, an experimental study of the associative 
processes in animals. Macmillan. doi:10.1037/10780-000. 
 
Thymé, A. (1993). Connectionist approach to nominal inflection: Paradigm patterning 
and analogy in Finnish (Doctoral dissertation, University of California San Diego). 
Available from ProQuest Dissertations and Theses database. (9317518) 
 
Thymé, A., Ackerman, F., & Elman, J. L. (1994). Finnish nominal inflection: 
paradigmatic patterns and token analogy. In S. D. Lima, R. Corrigan, & G. K. Iverson 
(Eds.), The reality of linguistic rules (pp. 445-466). John Benjamins. 
doi:10.1075/slcs.26.25thy. 
 
Tomas, E., van de Vijver, R., Demuth, K., & Petocz, P. (2017). Acquisition of nominal 
morphophonological alternations in Russian. First Language, 37(5), 453-474. 
doi:10.1177/0142723717698839. 
 
Umbreit, B. (2011). Motivational networks: An empirically supported cognitive 
phenomenon. In K.-U. Panther & G. Radden (Eds.), Motivation in grammar and the 
lexicon (pp. 269-286). John Benjamins. doi:10.1075/hcp.27.17umb. 
 
Underhill, R. (1976). Turkish grammar. MIT Press. Retrieved from 
http://www.academia.edu/download/53475783/_Robert_Underhill__Turkish_Gramm
ar__Turk_dili_graBookZZ.org.pdf. 
 
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: the leaky, 
competing accumulator model. Psychological Review, 108(3), 550-592. 
doi:10.1037//0033-295x.108.3.550. 
 
Uttal, W. R., Spillmann, L., Stürzel, F., & Sekuler, A. B. (2000). Motion and shape in 
common fate. Vision Research, 40(3), 301-310. doi:10.1016/s0042-6989(99)00177-7. 
 
Vihman, M., & Croft, W. (2007). Phonological development: Toward a “radical” 
templatic phonology. Linguistics, 45(4), 683-725. doi:10.1515/ling.2007.021. 
 
Villacorta, V. M., Perkell, J. S., & Guenther, F. H. (2007). Sensorimotor adaptation to 
feedback perturbations of vowel acoustics and its relation to perception. The Journal 
of the Acoustical Society of America, 122(4), 2306-2319. doi:10.1121/1.2773966. 
 
22 9 
  
Waelti, P., Dickinson, A., & Schultz, W. (2001). Dopamine responses comply with basic 
assumptions of formal learning theory. Nature, 412(6842), 43. 
doi:10.1038/35083500. 
 
Wang, M. D. & Bilger, R. C. (1973). Consonant confusions in noise: A study of 
perceptual features. Journal of the Acoustical Society of America, 54(5), 1248-1266. 
doi:10.1121/1.1914417. 
 
Wang, T., & Saffran, J. R. (2014). Statistical learning of a tonal language: The influence 
of bilingualism and previous linguistic experience. Frontiers in Psychology, 5, 953. 
doi:10.3389/fpsyg.2014.00953.  
 
Warker, J. A., & Dell, G. S. (2006). Speech errors reflect newly learned phonotactic 
constraints. Journal of Experimental Psychology: Learning, Memory, and Cognition, 
32(2), 387-398. doi:10.1037/0278-7393.32.2.387. 
 
Warker, J. A., Dell, G. S., Whalen, C. A., & Gereg, S. (2008). Limits on learning 
phonotactic constraints from recent production experience. Journal of Experimental 
Psychology: Learning, Memory, and Cognition, 34(5), 1289-1295. 
doi:10.1037/a0013033. 
 
Warlaumont, A. S., Richards, J. A., Gilkerson, J., & Oller, D. K. (2014). A social 
feedback loop for speech development and its reduction in autism. Psychological 
Science, 25(7), 1314-1324. doi:10.1177/0956797614531023. 
 
Warner, N., & Tucker, B. V. (2011). Phonetic variability of stops and flaps in spontaneous 
and careful speech. The Journal of the Acoustical Society of America, 130, 1606. 
doi:10.1121/1.3621306. 
 
Waterfall, H. R. (2006). A little change is a good thing: Feature theory, language 
acquisition and variation sets (Doctoral dissertation, University of Chicago). 
Available from ProQuest Dissertations and Theses database. (3219602) 
 
Watson, J. C. (2002). The phonology and morphology of Arabic. Oxford University 
Press.  
 
Wedel, A., Kaplan, A., & Jackson, S. (2013). High functional load inhibits phonological 
contrast loss: A corpus study. Cognition, 128(2), 179-186. 
doi:10.1016/j.cognition.2013.03.002. 
 
Weir, R. H. (1962). Language in the crib. Mouton. 
 
Welsh, J. P., & Llinás, R. (1997). Some organizing principles for the control of 
movement based on olivocerebellar physiology. Progress in Brain Research, 114, 
449-461. doi:10.1016/s0079-6123(08)63380-4. 
 
23 0 
  
Wertheimer, M. (1923/1938). Untersuchungen zur Lehre von der Gestalt 
II. Psychologische Forschung, 4, 301-350. doi:10.1007/bf00410640. 
 
Westermann, G., & Ruh, N. (2012). A neuroconstructivist model of past tense 
development and processing. Psychological Review, 119(3), 649-667. 
doi:10.1037/a0028258. 
 
White, J. (2013). Bias in phonological learning: Evidence from saltation. (Doctoral 
dissertation, University of California Los Angeles). Available from ProQuest 
Dissertations and Theses database. (3564463) 
 
White, J. (2014). Evidence for a learning bias against saltatory phonological alternations. 
Cognition, 130(1), 96-115. doi:10.1016/j.cognition.2013.09.008. 
 
White, J. (2017). Accounting for the learnability of saltation in phonological theory: A 
maximum entropy model with a P-map bias. Language, 93(1), 1-36. 
doi:10.1353/lan.2017.0001. 
 
White, J., & Sundara, M. (2014). Biased generalization of newly learned phonological 
alternations by 12-month-old infants. Cognition, 133(1), 85-90. 
doi:10.1016/j.cognition.2014.05.020. 
 
Williams, J. N. (2003). Inducing abstract linguistic representations: Human and 
connectionist learning of noun classes. In R. van Hout, A. Hulk, F. Kuiken, & R. J. 
Towell (Eds.), The lexicon-syntax interface in second language acquisition (pp. 151-
174). John Benjamins. doi:10.1075/lald.30.08wil. 
 
Wilson, C. (2006). Learning phonology with substantive bias: An experimental and 
computational study of velar palatalization. Cognitive Science, 30(5), 945-982. 
doi:10.1207/s15516709cog0000_89. 
 
Woodrow, H., & Lowell, F. (1916). Children's association frequency tables. The 
Psychological Monographs, 22(5), i-110. doi:10.1037/h0093111. 
 
Xu, J., & Croft, W. B. (1998). Corpus-based stemming using co-occurrence of word 
variants. ACM Transactions on Information Systems (TOIS), 16(1), 61-81. 
doi:10.1145/267954.267957. 
 
Xu, F., & Tenenbaum, J. B. (2007). Word learning as Bayesian inference. Psychological 
Review, 114(2), 245-272. doi:10.1037/0033-295x.114.2.245. 
 
Yu, C., & Smith, L. B. (2012). Modeling cross-situational word-referent learning: Prior 
questions. Psychological Review, 119(1), 21-39. doi:10.1037/a0026182. 
 
23 1 
  
Yun, G. H. (2006). The interaction between palatalization and coarticulation in Korean 
and English. (Doctoral dissertation, University of Arizona). Available from ProQuest 
Dissertations and Theses database. (3219841) 
 
Zaki, S. R., & Salmi, I. L. (2019). Sequence as context in category learning: An 
eyetracking study. Journal of Experimental Psychology: Learning, Memory, and 
Cognition. Advance online publication. doi:10.1037/xlm0000693. 
 
Zaki, S. R., Rich, A., & Stacy, S. (2016, November). The sequence of items in category 
learning: Modeling and eye-tracking data. Paper presented at the 57th Annual 
Meeting of the Psychonomic Society, Boston, MA. 
 
Zampini, L. M. (1996). Voiced stop spirantization in the ESL of native speakers of 
Spanish. Applied Psycholinguistics, 17, 335–354. doi:10.1017/s0142716400007979. 
 
Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley. 
 
Zsiga, E. C. (1995). An acoustic and electropalatographic study of lexical and postlexical 
palatalization in American English. In B. Connell & A. Arvaniti (Eds.), Phonology 
and phonetic evidence: Papers in laboratory phonology IV (pp. 282-302). Cambridge 
University Press. doi:10.1017/cbo9780511554315.020. 
 
Zuraw, K. (2000). Patterned exceptions in phonology (Doctoral dissertation, University 
of California Los Angeles). Available from ProQuest Dissertations and Theses 
database. (9979100) 
 
Zuraw, K. (2007). The role of phonetic knowledge in phonological patterning: Corpus 
and survey evidence from Tagalog. Language, 83, 277–316. 
doi:10.1353/lan.2007.0105. 
 
 
 
 
23 2