Personality Trait Descriptors: 2,818 Trait Descriptive Adjectives Characterized by Familiarity, Frequency of Use, and Prior Use in Psycholexical Research DATA PAPER DAVID M. CONDON JOSHUA COUGHLIN SARA J. WESTON *Author affiliations can be found in the back matter of this article ABSTRACT CORRESPONDING AUTHOR: David M. Condon This dataset contains 2,818 trait descriptive adjectives in English and information University of Oregon, about the extent to which each term is known among a large and approximately Department of Psychology, representative sample of U.S. adults. The list of personality-related terms includes all Eugene, Oregon, US 1,710 adjectives previously studied by Goldberg (1982) and draws on prior work by dcondon@uoregon.edu Allport and Odbert (1936) and Norman (1967). The extent to which terms were known by respondents was based on the administration of vocabulary questions about each term-definition pair online to a sample of English-speaking U.S. residents with KEYWORDS: approximately average literacy levels. The open data are accompanied by an online personality; personality database that allows the terms to be searched and filtered. structure; personality descriptors; trait descriptive adjectives; lexical hypothesis TO CITE THIS ARTICLE: Condon, D. M., Coughlin, J., & Weston, S. J. (2022). Personality Trait Descriptors: 2,818 Trait Descriptive Adjectives Characterized by Familiarity, Frequency of Use, and Prior Use in Psycholexical Research. Journal of Open Psychology Data, 10: 1, pp. 1–9 DOI: https://doi.org/10.5334/ jopd.57 Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 2 (1) BACKGROUND psychological in nature, evaluative, or classified as quantifiers of degree (rather than directly descriptive). A foundational postulate in personality research – the This produced a list of nearly 2,800 terms. Norman “Lexical Hypothesis” – is that all relevant psychological sought to be over-inclusive in his trimming, and this differences between people are marked by linguistic was confirmed by subsequent itemetric analyses based descriptors. A major benefit of this lexical approach on social desirability ratings as well as self- and other- is that it helps to constrain the scope of differences ratings. between people, as differences that cannot be succinctly Two important comments about Norman’s analyses described are presumed to be less salient. This logic are relevant to the current work. First, Norman reported has led to the development of several widely used that only 34.5% of the terms were known to all the assessment models in personality, each based on data “bright, literate, university undergraduates” who rated collected from self-ratings and ratings of others using them [15, p.17]. As only 8.4% of the U.S. population had subsets of these descriptors [2, 9, 14, 18, 21, 22]. completed a college degree in 1967 [23], the level of Though the universe of descriptors is finite, there are literacy among this group was relatively less common more trait descriptive adjectives (TDAs) than can be than it is currently; 34.8% of the U.S. population held a used to collect ratings from any single rater; the most college degree in 2020 [24]. Aside from gender (roughly exhaustive lists have counted nearly 18,000 terms [1]. half female), no additional demographic information The data described herein were collected to extend about the raters was provided. As the data were collected research seeking to deal with this multiplicity of terms [1, from undergraduates at the University of Michigan in 2, 4, 7, 15]. Specifically, the aim of work in this tradition the mid-1960s, raters in the sample are likely to have is to identify a subset of terms that are among the been young White individuals from the midwestern U.S. most familiar and unambiguous for a representative (especially Michigan) of above average socioeconomic population of English speakers. status. In other words, though roughly 2,000 terms were The primary challenge of identifying useful subsets stems known to 90% of Norman’s raters, the generalizability of from the fact that many descriptors are used infrequently this information is uncertain. Second, Norman reported in reference to personality. Allport and Odbert (1936), for having asked raters to provide definitions for the 200 example, suggested that only about one quarter of the long terms evaluated by each, but he does not report on the list of terms they cataloged from the unabridged version accuracy or the degree of ambiguity of these definitions. of Webster’s New International Dictionary was suitable Thus, Norman’s data regarding self-reported familiarity for use as personality trait names. The remaining 13,500 can only partially inform the question of the suitability of terms were deemed beyond their operationalization of each term for analyses of personality structure. personality (see the original source for more detail). Goldberg [7] subsequently winnowed Norman’s terms The infrequent use of many terms is not only a matter using five procedures. Most dropped terms were (1) of scope, however, as a substantial proportion of terms obscure (roughly one-third), and/or (2) nouns (232 terms), are highly obscure. Some have fallen out of everyday though a small number were removed due to having (3) use; more were rarely used outside of specific contexts. extremely high or low ratings of social desirability, or (4) One consequence of obscurity is that some of the terms a high dispersion of social desirability ratings (a proxy for that are reasonably related to personality can be set ambiguity). An unreported number of additional terms aside due to a lack of familiarity among the general were (5) dropped using “intuitive judgments of suitability” population. A second – and less intuitive – consequence [7, p. 209]. These procedures left 1,657 adjectives from is that new terms occasionally enter the lexicon despite Norman’s list. An additional 13 terms were retained in having a highly similar meaning to one or more terms alternate forms (4 nouns were turned to adjectives), and that already exist. This implies that the subset of widely- 40 terms were added, including 38 non-overlapping terms used personality terms evolves over time and contains a from the Adjective Check List [10]. Goldberg’s final list of non-trivial degree of synonymity. 1,710 terms has been highly influential, both in his own Prior efforts to reduce the universe of terms into a subsequent work [8] and personality structure research tractable subset uniformly credit the list of Allport and conducted by others [3, 20]. Odbert (1936) as a starting point. Cattell [4] used human The current project aimed to update Goldberg’s list in judgments of synonymity to reduce his list, focusing several ways. First, by adding terms that are missing from mainly on the first category (about 4,500 terms) plus a the 1,710, whether due to oversight or the evolution of “few hundred additional terms” [4, p. 437]. His judgments descriptors over the last 40 years. Second, we sought to produced a list of roughly 170 terms (and, for many, a evaluate knowledge of the new and existing terms with corresponding antonym). a metric that can provide some indication of consensus Norman [15] took a more exhaustive approach. He about the meaning of each term beyond self-reports of supplemented Allport and Odbert’s list with new terms, familiarity. Third, we sought to collect a modern and more then culled terms he deemed obscure, broad, non- representative sample of participants in terms of age, Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 3 race/ethnicity, gender, and education. Fourth, we sought PRIOR FORM CHANGED TO to provide an open and accessible database of terms to cagy cagey encourage further use among psychology researchers. clear-headed clearheaded closefisted close-fisted (2) METHODS easy-going easygoing 2.1 STUDY DESIGN The study design involved several distinct steps, including hard-headed hardheaded (1) aggregating a trait descriptive adjective set with 2,818 highfaluting highfalutin terms; (2) sourcing definitions for all of the terms; (3) kind-hearted kindhearted creating multiple choice vocabulary questions based on each term-definition pair; (4) designing and completing level-headed levelheaded survey-based data collection on the pool of questions; light-hearted lighthearted and (5) developing tools to disseminate results from the loud-mouthed loudmouthed analyses of these data and other characteristics of the nosey nosy 2,818 TDA set. Details of each step are given in the sub- sections below. pig-headed pigheaded satiric satirical Step 1: Aggregation of the 2,818 Trait Descriptive stand-offish standoffish Adjectives thick-headed thickheaded Given the centrality of the 1,710 item TDA set derived by Goldberg and Norman, the TDA set reported on in this Table 1 15 revisions from Goldberg (1982). work is a super-set of those terms. This does not imply however that all these terms are necessarily well-suited 847 further terms that were among the top 100,000 for subsequent administration in personality structure most frequent terms in the COCA (754 of these were research. In fact, 483 of these terms are not among the in the top 50,000). It should be acknowledged that 100,000 most frequently occurring terms in the Corpus many of these terms were likely dropped by Allport and of Contemporary American English COCA; [6], and an Odbert or Norman despite their familiarity because they additional 242 were outside of the top 50,000. In many were deemed overly evaluative, broad, or too strongly instances, this obscurity was driven by the inclusion indicative of affective states. Still, retaining such terms of prefixes indicating negation (e.g., unstudious) or at this stage seemed preferable as they could always be extremity (e.g., overtrusting, ultrademocratic). Goldberg dropped later (i.e., prior to structural analyses, possibly [7] provides considerable discussion of the over-inclusive on the basis of more rigorous criteria). In addition, we nature of these 1,710 and the rationale for retaining felt that many of these terms belong in even the most terms with various prefixes. exclusive lists – “private”, “modern”, and “academic” Note that 15 of the 1,710 terms were edited slightly, (to name a few) are familiar, reasonably specific, and as shown in Table 1. In 14 of these cases, the spelling non-evaluative descriptors. See Section 4 on “Re-use was changed based on recommendations from more Potential” for more discussion of this issue. than one online dictionary. In most cases, these changes Finally, an additional 57 terms were added from a reflected the typical progression of spelling revisions variety of sources. Of these, 7 were shared with the first over time during lexicalization [12], from hyphenation to author in personal correspondence with Gerard Saucier compound words. The remaining case (“satiric”) was a (“appreciated”, “controversial”, “exciting”, “supportive”, change of form (“satirical”). “well-adjusted”, “well-known”, and “well-liked”). The To evaluate the need to extend beyond the 1,710, remainder were added by the first author following we reviewed all the terms dropped by Goldberg from review of lists of new terms added to the Merriam- Norman’s list of 2,797 terms. Given our aims, it seemed Webster dictionary from 1968 to 2019, and by reviewing likely that some of the more obscure terms may have adjectives among the 100,000 most frequently used become more widely used over the last few decades. This words in the COCA list that were not already included. prompted the re-introduction of 204 terms. Based on the The final list contains 2,818 terms. To summarize, decision to remain consistent with the over-inclusiveness this includes all 1,710 of Goldberg’s terms (with minor of prior efforts, terms were added if they were deemed revisions listed in Table 1) and overlaps with 1,914 of potentially personality-relevant and were among the Norman’s 2,797 terms. Of the 904 additional terms, 100,000 most frequent terms in the COCA (137 of these 847 were also present in the lists of Allport and Odbert. 204 were among the top 50,000 most frequent). The 57 new terms (i.e., previously unconsidered by Repeating this same procedure with the remaining Allport and Odbert, Norman, and Goldberg) are shown Allport and Odbert list prompted the re-introduction of in Table 2. Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 4 most important and common, per the statement above accepting guilt-free perceiving from OED) was used. adversarial halfhearted pleasure-loving Edits to these definitions were made infrequently appreciated halfwitted receiving in cases where the given definition was lengthy, as artsy hotheaded self-actualized we sought to keep all definitions shorter than 100 characters in length (including punctuation and spaces). attention-seeking humanistic sensation-seeking Similarly, editing and/or the use of definitions from other authoritarian initiating sensing dictionaries was required for a small number of terms burned-out intuiting simpleminded that did not have a definition in Oxford Dictionaries. charismatic judgmental spritely Without exception, this issue was caused by the inclusion of prefixes of negation or extremity (e.g., “unwilful”, competitive laid-back strong-willed “insuppressible”, “oversuspicious”). conceptual low-key supportive controversial malcontented theoretical Step 3: Creating multiple choice term-definition curmudgeonly mentoring trendy vocabulary items The next step was to create two multiple choice economizing nymphomaniacal unenergetic vocabulary questions from each term-definition pair – emergent open-ended unextraverted 5,636 questions in total. Questions were designed such empathetic overcompassionate unsensual that the definitions were used as stimuli, and respondents exciting overmasculine well-adjusted were expected to identify the matching word from several options. For each of the 2,818 pairs, 5 of the other 2,817 experiential oversentimental well-known terms were drawn at random as distractors. All six terms – extraverted overtolerant well-liked the 5 distractors and the term that correctly matches the flaky paranoid worried definition – were then used as possible response options, along with two other possibilities: “I don’t know” and Table 2 57 TDAs in the 2,818 that were not included in existing “None of these” (note that “None of these” was never the lists. correct response). During item development, the order of presentation for the correct response and the 5 distractors Step 2: Sourcing definitions for the 2,818 Trait was randomized; the last two options were the same for Descriptive Adjectives all items. For example, the following item was developed Definitions for each word in the list were obtained from using the term-definition pair for “spontaneous”: the Oxford Dictionaries website, available under license through Google Search. The Oxford Dictionaries site Free, natural, and unconstrained in behavior. is maintained by Oxford University Press, which also a. monosyllabic publishes the Oxford English Dictionary. From the OED b. aloof website: c. nefarious d. corruptible “The dictionary content in Oxford Dictionaries e. spontaneous focuses on current English and includes modern f. relentless meanings and uses of words. Where words have g. I don’t know more than one meaning, the most important and h. None of these common meanings in modern English are given first, and less common and more specialist or Though the questions were developed algorithmically, all technical uses are listed below. The OED, on the items were reviewed by each member of the authorship other hand, is a historical dictionary and it forms team to identify questions that included one or more close a record of all the core words and meanings in synonyms as a distractor, as this would have reduced English over more than 1,000 years, from Old the validity of the question for evaluating respondents’ English to the present day, and including many knowledge of the target term. For cases where this issue obsolete and historical terms [16].” occurred, the questions were replaced with new randomly generated substitutes and reviewed again. The decision Definitions for each term were sourced individually to use two questions for each term-definition pair was (i.e., manually rather than via API) to ensure that the made to further reduce the effect of this and similar definition used was relevant to personality. In rare cases concerns, as large differences in the proportion of correct where more than one definition for a term may have responses across the two versions were expected to signal been relevant to personality, the first definition (i.e., the idiosyncratic effects caused by one or more distractors. Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 5 Note that the text of these questions (i.e., the definitions Step 5: Analyses and database development and the 5 random distractor choices associated with Analyses of the survey data collected as described in each term-definition pair) have not been made openly Step 4 include descriptive statistics about the sample available online or as part of the dataset described in and each of the 5,636 questions. This also included this project. This was done to maintain their validity for aggregation across both forms of each term-definition subsequent research. More specifically, if these questions pair. The analytic code and output are available online were made publicly available, participants may have the at https://pie-lab.github.io/tda/. This resource also provides opportunity to study them and even post answers online, a database of the 2,818 TDAs that can be filtered and thus invalidating the items. Contact the first author to searched according to several criteria. These include the inquire about access to the questions. sample size and mean proportion of correct responses to the vocabulary questions, the frequency of each term’s Step 4: Survey-based data collection presence in books indexed by Google, and the inclusion/ Data were collected on these 5,636 questions using a exclusion of the term in other influential subsets of cross-sectional, planned missingness design. The aim TDAs. The other subsets of TDAs include Goldberg’s of this aspect of the project was to evaluate the extent (1982) 1,710 terms, the 100 terms in Goldberg’s Big Five to which the meaning of each term was known among Factor Markers [9], and the subset of 435 terms used in a relatively representative sample of respondents. By validation work on the Big Five by Saucier and Goldberg relatively representative, we mean in relation to prior [18]. Note that the database does not reflect inclusion/ efforts to evaluate the familiarity of terms [17, 13]. exclusion in the lists by Norman or Allport and Odbert, Terms with higher proportions of correct responses can as this list is only partially overlapping with those lists. be considered to be more familiar than terms with lower Similarly, it does not show the frequency of each term proportions of correct responses. indexed in the COCA [10] as those data are proprietary. To create the survey, two forms (A and B) were used to split the 5,636 questions, with the two versions 2.2 SAMPLING, SAMPLE, AND DATA of each term-definition pair assigned to a different COLLECTION form. Respondents to the survey were administered Participants (N = 1,572; 57% female) were recruited from 75 questions drawn at random from each form and 9 two different crowdsourcing platforms: Prolific (90.7%) demographic questions (see Section 2.2 below for more and Amazon’s Mechanical Turk (MTurk; 9.3%). Data information). As such, there was no chance of presenting collection was conducted across numerous small waves the same term-definition pair within an administration. to meet stratified quotas across numerous categories Participants were recruited through two online simultaneously, including the form of the survey, sex, crowdsourcing portals (again, see Section 2.2). The age, and race/ethnicity. To increase the generalizability study was posted with the title “Trait-Descriptive with respect to literacy, the survey was only made Adjective Vocabulary,” and the description stated that visible to respondents who had previously identified respondents would be helping to evaluate the familiarity themselves to Prolific or MTurk as not having attained a of adjectives used to describe people. Participants who college/university degree. Similarly, the survey was only consented to the survey were instructed as follows: made visible to respondents who reported being current “For each question, choose the option that matches residents of the U.S. (this necessarily implies that the the definition given. If you think none of the options data are not generalizable to English speakers outside match the definition, select the option labeled ‘None the U.S.). See Section 2.5 on Quality Control for more of these’. If you don’t know the answer, select the information about exclusion criteria. option labeled ‘I don’t know’. Please do not look up the Participants were compensated US$ 2.50 for definition!” (emphasis included in the original). Only completing the survey, as this was approximately one question was presented on each page. The survey equivalent to the U.S. federal minimum wage at the was set to auto-advance after a response was selected, time of data collection (US$ 7.25 per hour for roughly 20 but participants were allowed to go back to change their minutes of work, on average). Participants were allowed answers to earlier questions. No feedback was given at to take the survey multiple times (including both forms A the end of the survey. and B).1 Across all 1,572 participants, we obtained 3,290 We sought to collect a sample size large enough that full responses to the survey. Approximately 44% (N = each item would have approximately 30–40 responses. 691) of participants took the survey one time, 35% (N = This number was chosen as 30 because, at this value, 554) took the survey twice, and the rest took the survey the standard error of the estimated proportion is below between 3 and 10 times. Given the relatively small .10 for all true values of the proportion. Given the goal of proportion of items administered to each respondent, identifying TDAs that are widely familiar, we believed this there are few instances in which a participant saw the level of precision to be sufficient. same item multiple times. More specifically, across the Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 6 241,506 item answers, there are only 2,419 times (1%) a Personality Psychology. Poster session presented at the participant saw the same question more than once. annual meeting of the Society for Personality and Social The resulting sample contained participants from a Psychology (virtual). wide range of ages, household incomes, and different geographies (by state) within the U.S. Please see the supplemental website for figures summarizing the (3) DATASET DESCRIPTION demographic characteristics of this sample (https:// 3.1 REPOSITORY LOCATION pie-lab.github.io/tda/sample.html). The sample included a Condon, D. M., Coughlin, J., & Weston, S. J. (2021). Trait higher proportion of participants identifying as White Descriptive Adjectives, Harvard Dataverse. https://doi. (73%) relative to the US population (64% of US adults org/10.7910/DVN/5T80PF. according to the 2020 Census) and a lower proportion of respondents identifying as Black or Hispanic (9% vs 12% 3.2 OBJECT NAME and 5% vs 16%, respectively). The data repository contains 5 data files. The raw data Most participants had either some college-level files are labeled ‘TDA_data_scored’, ‘TDA_frequencies’, education (42%) or a high school degree/GED equivalent and ‘masterkey’. (40%). Approximately 12% of respondents reported The repository also includes two output files that having attained a college/university degree. The cause match the content in the database on the Github site. of this inconsistency with the recruitment strategy is These files are labeled ‘item_difficulties’ and ‘TDA_ unclear, but it is likely that the Prolific/MTurk workers properties.csv’. experienced a change in degree status since first joining the platform. Both the age and geographic distributions 3.3 DATA TYPE reflected considerable diversity. All the U.S. states and Processed data the District of Columbia were represented in the sample except for Vermont and Alaska. Approximately 66% of 3.4 FORMAT NAMES AND VERSIONS the sample had a household income of $60,000 or less. The data are published in CSV format. The accompanying website was built using R version 4.1.1 [17] and RStudio 2.3 TIME OF DATA COLLECTION version 2021.9.0. The survey-based data collection described in Step 4 of Section 2.1 occurred between May and July of 2020. 3.5 LANGUAGE English 2.4 QUALITY CONTROL To facilitate generalizability of the data to native speakers 3.6 LICENSE of American English as spoken in the U.S. at the time The data have been published in the public domain with the data were collected, participants were ineligible to a CC0 license. complete the survey if they did not self-report speaking English “fluently” or “very well”, or if they currently lived 3.7 LIMITS TO SHARE OR EMBARGO or grew up outside the United States. Responses were None. excluded if participants took less than 3 minutes to complete the survey. 3.8 PUBLICATION DATE The final version of the data was published 2021–10–26. 2.5 DATA ANONYMISATION AND ETHICAL The data were first deposited on 2021–04–27. ISSUES A consent form outlining the study rationale, including 3.9 FAIR DATA/CODEBOOK potential benefits and risks, was presented to participants These data conform with FAIR guidelines in that they prior to taking the survey. Participants were given the have been posted in the public domain using a secure option to decline or consent to participation in the study and accessible data repository with appropriate meta- as outlined by this document; participants who declined data (including a persistent identifier). Interoperability is did not go on to complete the survey. Anonymity was demonstrated with openly accessible analytic code and maintained as no individually identifying data were a searchable database on the Github website. collected from participants. This procedure was reviewed and approved by the Institutional Review Board at the University of Oregon (Protocol #02012020.001). (4) REUSE POTENTIAL 2.6 EXISTING USE OF DATA The primary opportunities for re-use of these data relate Coughlin, J., Condon, D. M., Weston, S. J. (2021, February). to subsequent work on the lexical structure of personality Identifying Unbiased Trait-Descriptive Adjectives for descriptors in American English. For example, the Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 7 information provided about these 2,818 TDAs could be for trait-recognition tasks (as they were used here), or used to inform the collection of self- and other-ratings for for more general reading comprehension (i.e., English- all or, more likely, some subset of the terms. While similar language literacy) measures. research has been done extensively before now, several of As over half of participants completed the survey the most influential efforts have relied on relatively small more than once – with about 20% completing the survey and/or homogenous samples of raters, and they have 3 or more times – these data also offer an opportunity mainly used human judgment to winnow the number of to study consistency in performance over repeated TDAs down to a tractable size. The data provided here could attempts (improvement vs fatigue). Though the number be used to replicate prior work and/or (re-)evaluate the of participants who saw the same item multiple times effects using different sets of terms on structural analyses. was low, the data may also be useful as a metric of Further, some researchers have recently noted that consistency in responses by Prolific/MTurk participants. claims about the universality of the so-called Big Few Finally, these data offer numerous possibilities for models are logically problematic. For example, the use in instructional contexts. They could be used to evidence of similar statistical covariation among the provide materials for subsequent data collection or self- and other-ratings of terms across groups may not to teach statistical techniques such as binary logistic adequately account for differences in means or the (multilevel) regression, chi-square tests, and point- potential exclusion of other meaningful content [5, 19]. As biserial correlations. the list reported here contains only American English terms, they are independently sufficient for evaluations with cultures primarily using other languages, but they may be NOTE useful in conjunction with lists from other languages. 1 After collecting data using forms A and B, it was discovered Similarly, this list of terms and the methods described that 13 words were evaluated using incorrect items. That is, the correct term was not included in the list of possible choices here can contribute to studies of the generalizability of associated with the definition. This was the case for both lexical models within populations speaking American items associated with a single word, suggesting the term and English. For example, our method for assessing the definition had become unpaired in our master list. These items were corrected and administered via the same procedures, extent of knowledge about each term may be useful for although distinguished through the labels “form C” and “form D.” subsequent attempts to evaluate terms that are specific These forms contained the corrected items (one for each word), and then 37 items randomly chosen from the set of 96 items to one or more of the many variations of American that were those least frequently administered in the prior round English spoken throughout the U.S. These include cultural of data collection. (e.g., African American English, Cajun Vernacular English, Mexican American English) and regional (e.g., New England English, Upper Midwestern English) dialects as ACKNOWLEDGEMENTS well as American English-based hybrid languages such as Hawai’i Creole English (known locally as Hawaiian As an extension of prior work on the lexical structure of Pidgin) and Gullah English. Differences in the knowledge personality descriptors, it is important to acknowledge and scope of TDAs across these variations may be useful that this resource builds on the substantial contributions for sensitivity analyses of personality structure. of Gordon Allport, Henry Odbert, Warren Norman, Lew These terms are also useful for lexical research Goldberg, and the unnamed colleagues who assisted that does not rely only on survey-based methods. them. Citations to their work should be included, as Recent advances in natural language processing (NLP) appropriate, when these data are used. In addition, we techniques, for example, offer considerable potential for gratefully acknowledge the guidance and feedback novel applications of language analysis, especially with provided by Gerard Saucier, especially during the respect to the breadth and diversity of study populations aggregation of terms. over time [11]. These applications will benefit from the availability of an updated and more comprehensive collection of personality descriptors. COMPETING INTERESTS More specifically, these data can be used to identify commonly known (or uncommonly known) trait The authors have no competing interests to declare. descriptive adjectives (TDAs) for use in personality scale development and/or personality-relevant vocabulary tests. While the commonly known TDAs may be AUTHOR CONTRIBUTIONS preferred when developing generalizable personality assessments, the use of uncommon TDAs may have DMC: Aggregated the list of TDAs; sourced the definitions merit in vocabulary-based ability measures, as they for the TDAs; designed the data collection paradigm and allow a test creator to generate items at various levels of analyses, and collected the data on Prolific and MTurk; difficulty. Ability measures such as these could be used edited website; wrote and edited the manuscript. Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 8 JC: Created the multiple-choice term-definition assessment: Vol. 1, pages 203–234. Hillsdale, NJ: Erlbaum. vocabulary items; built Qualtrics surveys for the data 8. Goldberg, L. R. (1990). An alternative “description of collection; contributed to data collection; assisted with personality”: the big-five factor structure. Journal of statistical analyses; edited the manuscript. personality and social psychology, 59(6), 1216. DOI: https:// SJW: Designed and oversaw creation of the multiple- doi.org/10.1037/0022-3514.59.6.1216 choice term-definition vocabulary items; contributed 9. Goldberg, L. R. (1992). The development of markers for to data collection; designed and conducted statistical the Big-Five factor structure. Psychological Assessment, analyses; built and edited the website; wrote and edited 4(1), 26–42. DOI: https://doi.org/10.1037/1040- the manuscript. 3590.4.1.26 10. Gough, H. G., & Heilbrun, A. B., Jr. (1965). The Adjective Check List manual. Palo Alto, CA: Consulting Psychologists AUTHOR AFFILIATIONS Press. DOI: https://doi.org/10.1037/t02310-000 David M. Condon orcid.org/0000-0002-8406-783X 11. Jackson, J. C., Watts, J., List, J., Puryear, C., Drabble, R., University of Oregon, Department of Psychology, Eugene, & Lindquist, K. A. (2021). From Text to Thought: How Oregon, US Analyzing Language Can Advance Psychological Science. Joshua Coughlin Perspectives on Psychological Science. DOI: https://doi. University of Oregon, Department of Psychology, Eugene, org/10.1177/17456916211004899 Oregon, US 12. Kuperman, V., & Bertram, R. (2013). Moving spaces: Sara J. Weston orcid.org/0000-0001-7782-6239 Spelling alternation in English noun-noun compounds. University of Oregon, Department of Psychology, Eugene, Language and Cognitive processes, 28(7), 939–966. DOI: Oregon, US https://doi.org/10.1080/01690965.2012.701757 13. Lee, K., & Ashton, M. C. (2008). The HEXACO personality REFERENCES factors in the indigenous personality lexicons of English and 11 other languages. Journal of personality, 76(5), 1. Allport, G. W., & Odbert, H. S. (1936). Trait-names: A 1001–1054. DOI: https://doi.org/10.1111/j.1467- psycho-lexical study. Psychological Monographs: General 6494.2008.00512.x and Applied, 47(1), 1–170. DOI: https://doi.org/10.1037/ 14. Norman, W. T. (1963). Toward an adequate taxonomy h0093360 of personality attributes: Replicated factor structure in 2. Ashton, M. C., Lee, K., & Goldberg, L. R. (2004). A peer nomination personality ratings. Journal of Abnormal Hierarchical Analysis of 1,710 English Personality- and Social Psychology, 66, 574–583. DOI: https://doi. Descriptive Adjectives. Journal of Personality and org/10.1037/h0040291 Social Psychology, 87(5), 707–721. DOI: https://doi. 15. Norman, W. T. (1967). 2800 personality trait descriptors: org/10.1037/0022-3514.87.5.707 Normative operating characteristics for a university 3. Ashton, M. C., Lee, K., Perugini, M., Szarota, P., de Vries, R. population. University of Michigan, Department of E., Di Blas, L., Boies, K., & De Raad, B. (2004). A six-factor Psychology, Ann Arbor. structure of personality-descriptive adjectives: Solutions 16. OED. (2018). The OED and Oxford Dictionaries. Retrieved from psycholexical studies in seven languages. Journal of from https://web.archive.org/web/20180228084422/ Personality and Social Psychology, 86(2), 356–366. DOI: http://public.oed.com/about/the-oed-and-oxford- https://doi.org/10.1037/0022-3514.86.2.356 dictionaries/ 4. Cattell, R. B. (1943). The description of personality: Basic 17. R Core Team. (2021). R: A language and environment for traits resolved into clusters. The Journal of Abnormal statistical computing. Vienna, Austria: R Foundation for and Social Psychology, 38(4), 476–506. DOI: https://doi. Statistical Computing. https://www.R-project.org/ org/10.1037/h0054116 18. Saucier, G., & Goldberg, L. R. (1996). Evidence for the Big 5. Cheung, F. M., Leung, K., Zhang, J.-X., Sun, H.-F., Gan, Five in analyses of familiar English personality adjectives. Y.-Q., Song, W.-Z., & Xie, D. (2001). Indigenous Chinese European journal of Personality, 10(1), 61–77. DOI: https:// personality constructs: Is the five-factor model complete? doi.org/10.1002/(SICI)1099-0984(199603)10:1<61::AID- Journal of cross-cultural psychology, 32(4), 407–433. DOI: PER246>3.0.CO;2-D https://doi.org/10.1177/0022022101032004003 19. Saucier, G., Thalmayer, A. G., Payne, D. L., Carlson, R., 6. Davies, M. (2010). The Corpus of Contemporary American Sanogo, L., Ole-Kotikash, L., Church, A. T., Katigbak, M. English as the first reliable monitor corpus of English. S., Somer, O., Szarota, P., et al. (2014). A basic bivariate Literary and linguistic computing, 25(4), 447–464. DOI: structure of personality attributes evident across nine https://doi.org/10.1093/llc/fqq018 languages. Journal of Personality, 82(1), 1–14. DOI: https:// 7. Goldberg, L. R. (1982). From Ace to Zombie: Some doi.org/10.1111/jopy.12028 explorations in the language of personality. In Spielberger, 20. Saucier, G., & Iurino, K. (2020). High-dimensionality C. D. & Butcher, J. N., (Eds.), Advances in personality personality structure in the natural language: Further Condon et al. Journal of Open Psychology Data DOI: 10.5334/jopd.57 9 analyses of classic sets of English-language trait- 24. U.S. Census Bureau. (2020). Educational Attainment in adjectives. Journal of Personality and Social Psychology, the U.S. Retrieved from https://www.census.gov/data/ 119(5), 1188–1219. DOI: https://doi.org/10.1037/ tables/2020/demo/educational-attainment/cps-detailed- pspp0000273 tables.html 21. Thurstone, L. L. (1934). The vectors of mind. Psychological Review, 41(1), 1–32. DOI: https://doi.org/10.1037/ h0075959 PEER REVIEW COMMENTS 22. Tupes, E. C, & Christal, R. E. (1961). Recurrent personality factors based on trait ratings (USAF ASD Tech. Rep. No. Journal of Open Psychology Data has blind peer review, 61–97). Lackland Air Force Base, TX: U.S. Air Force. DOI: which is unblinded upon article acceptance. The editorial https://doi.org/10.21236/AD0267778 history of this article can be downloaded here: 23. U.S. Census Bureau. (1967). Educational Attainment in the U.S. Retrieved from https://www.census.gov/data/ • PR File 1. Peer Review History. DOI: https://doi. tables/1967/demo/educational-attainment/p20-169.html org/10.5334/jopd.57.pr1 TO CITE THIS ARTICLE: Condon, D. M., Coughlin, J., & Weston, S. J. (2022). Personality Trait Descriptors: 2,818 Trait Descriptive Adjectives Characterized by Familiarity, Frequency of Use, and Prior Use in Psycholexical Research. Journal of Open Psychology Data, 10: 1, pp. 1–9. DOI: https://doi. org/10.5334/jopd.57 Published: 25 January 2022 COPYRIGHT: © 2022 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. Journal of Open Psychology Data is a peer-reviewed open access journal published by Ubiquity Press.