Please cite as: Mõttus, R., Wood, D., Condon, D. M., Back, M. D., Baumert, A., Costantini, G., Epskamp, S., Greiff, S., Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Yarkoni, T., Ziegler, M., & Zimmermann, J. (2020). Descriptive, Predictive and Explanatory Personality Research: Different Goals, Different Approaches, but a Shared Need to Move beyond the Big Few Traits. European Journal of Personality, 34, 1175–1201. https://journals.sagepub.com/doi/full/10.1002/per.2311 Descriptive, predictive and explanatory personality research: Different goals, different approaches, but a shared need to move beyond the Big Few traits RENÉ MÕTTUS1,2*, DUSTIN WOOD3, DAVID M. CONDON4, MITJA D. BACK5, ANNA BAUMERT6, GIULIO COSTANTINI7, SACHA EPSKAMP8, SAMUEL GREIFF9, WENDY JOHNSON1, AARON LUKASZEWSKI10, AJA MURRAY1, WILLIAM REVELLE11, AIDAN G. C. WRIGHT12, TAL YARKONI13, MATTHIAS ZIEGLER14, and JOHANNES ZIMMERMANN15 1University of Edinburgh, UK 2University of Tartu, Estonia 3University of Alabama, USA 4University of Oregon, USA 5University of Münster, Germany 6Max Planck Institute for Research on Collective Goods, Bonn, and TUM School of Education, Germany 7University of Milan-Bicocca, Italy 8University of Amsterdam, Netherlands 9University of Luxembourg, Luxembourg 10California State University, Fullerton, USA 11Northwestern University, USA 12University of Pittsburgh, USA 13University of Texas at Austin, USA 14Humboldt Universität zu Berlin, Germany 15University of Kassel, Germany Abstract: We argue that it is useful to distinguish between three key goals of personality science – description, prediction and explanation – and that attaining them often requires different priorities and methodological approaches. We put forward specific recommendations such as publishing findings with minimum a priori aggregation and exploring the limits of predictive models without being constrained by parsimony and intuitiveness but instead maximising out-of-sample predictive accuracy. We argue that naturally-occurring variance in many decontextualized and multi-determined constructs that interest personality scientists may not have individual causes, at least as this term is generally understood and in ways that are human-interpretable, never mind intervenable. If so, useful explanations are narratives that summarize many pieces of descriptive findings rather than models that target individual cause-effect associations. By meticulously studying specific and contextualized behaviours, thoughts, feelings and goals, however, individual causes of variance may ultimately be identifiable, although such causal explanations will likely be far more complex, phenomenon-specific and person- specific than anticipated thus far. Progress in all three areas – description, prediction, and explanation – requires higher- dimensional models than the currently-dominant “Big Few” and supplementing subjective trait-ratings with alternative sources of information such as informant-reports and behavioural measurements. Developing a new generation of psychometric tools thus provides many immediate research opportunities. Keywords: prediction; explanation; cause; hierarchy; personality *Correspondence to: René Mõttus, 7 George Square EH8 9JZ Edinburgh, Scotland; rene.mottus@ed.ac.uk This manuscript is based on an Expert Meeting jointly supported by European Association of Personality Psychology and European Association of Psychological Assessment, and held from 6th to 8th September 2018 in Edinburgh, Scotland (https://osf.io/fn5pw). Authors are grateful to Tom Booth, Jaime Derringer, Ryne Sherman and David Stillwell for their contributions to the Expert Meeting, and to Cornelia Wrzus and Samuel Henry for their comments on the manuscript. Not all authors agree with all arguments put forward in this paper. Description, prediction and explanation 2 Personality psychology has come a long way in describing knowledge of personality grows and research questions how people differ in thinking, feeling, behaving, and become increasingly diverse, it may no longer be optimal wanting. This has been facilitated by agreement among for researchers to coalesce around a single or even a few researchers on a limited number of broad personality ways of operationalizing personality (e.g., the Big Few). We dimensions, organizing research and allowing observations distinguish between three broad aims of personality research to accumulate. The largely overlapping Big Five (Goldberg, – description, prediction, and explanation – and argue that 1990), Five-Factor Model (FFM; McCrae & John, 1992), and these aims may entail disparate and sometimes even HEXACO domains (Ashton & Lee, 2020) have been opposing research strategies. We advocate for the explicit particularly instrumental broad personality constructs, so articulation of these aims when designing, conducting, and much so that they have become the default way of reporting the results of personality research rather than operationalizing personality differences among people; we defaulting to research practices that are widely used but may refer to them as the Big Few. in fact be suboptimal for any given research project. For Yet it is not evident that the Big Few “carve nature at its example, we propose that: joints”. They are useful for conveniently summarizing a ● Descriptive findings should be published in as much variety of ways in which people can differ with a detail as possible (e.g., at the individual item level) manageable number of dimensions. But there is little besides being organized (e.g., according to attributes evidence that they are particularly good units for explaining such as the strength of relations or the psychological behaviour or psychological processes underlying it modalities of the characteristics involved) or aggregated (Baumeister et al., 2007; Wood et al., 2015; Jonas & into broader constructs such as the Big Few. This offers Markon, 2016) or even that they are the best predictors of more flexibility than the common practice of a priori real-world outcomes (Mõttus & Seeboth, 2018; Elleman et aggregating findings for simplicity. al., 2020). The Big Few were formed by combining ● Although traits’ predictive validity is often seen as a subjective perceptions of traits1 that statistically co-vary major reason for doing personality research in the first among people rather than based on models of processes that place, its robustness and ways of maximising it remain happen in individuals. Currently we do not know of many under-explored. Availability of large datasets and genetic variants, neurobiological systems, experiences, or advanced statistical tools are beginning to improve this. developmental processes that specifically contribute to Predictive models should always be independently variance in the certain Big Few domains such as cross-validated and should not depend on parsimony or Extraversion or Conscientiousness and set them apart from consistency with researchers’ theoretical intuitions. other domains such as Openness and Honesty-Humility or ● Many phenomena that interest personality scientists from traits allegedly beyond the Big Few such as motives, such as broad patterns of naturally occurring individual beliefs, or abilities. Moreover, the domains partly overlap differences (e.g., constructs in the personality trait and can be combined into even broader ones (DeYoung, hierarchy) may not have individually tractable causes, 2006), but also broken into numerous more specific traits at least as this term is typically understood and/or in (McCrae & Sutin, 2018). ways that are meaningfully interpretable and allow for None of this is necessarily a problem. But it means that the targeted interventions. This is because the phenomena variance found in typical personality measures can be are inherently decontextualized and relative, and their described as a hierarchy of traits, that there are few reasons indistinguishable levels can arise through many to automatically prefer any one of its levels over others, and combinations of processes and may not result from that the mechanisms of the variance can be highly multiply unidirectional cause-effect associations, among other determined. Researchers are also increasingly considering reasons. When this applies, useful explanations may be processes and related variance within individuals, besides narratives that integrate many pieces of descriptive differences between individuals; it is a crucial question how findings into broad principles rather than attempts to these variation levels are connected or whether they can be identify individual and potentially intervenable cause- addressed with the same statistical and/or theoretical models effect associations. If so, for example, individual at all. Likewise, there may be personality variance both regression coefficients provide poor causal between and within individuals (e.g., behavioural frequencies explanations. However, by defocusing from broader or relationship dynamics) that is not captured in the variability patterns and meticulously studying specific subjective perceptions commonly used for personality and contextualized behaviours, thoughts, feelings and assessment. goals, individual causes of variance may ultimately be As a result, particular models of personality may work better identifiable in useful and potentially even controllable for some purposes than others. This leads to a central idea of ways. Still, such causal explanations may be more this special issue generally and this article specifically: as our complex, phenomenon-specific and person-specific than anticipated thus far. 1 Here, we define traits similarly to Baumert and colleagues (2017): trait is a ● Progress in all three areas – description, prediction, and descriptive dimension of any kind of relatively stable psychological and explanation – will likely require availability of far behavioural differences between people, independent of its content and higher-dimensional models based on traits much more breadth. Description, prediction and explanation 3 specific than the Big Few, as well as supplementing analyzing many-dimensional data and communicating typical subjective trait-ratings with alternative sources of findings that involve numerous statistical associations may information such as informant-reports and behavioural seem overwhelming. measurements (Rauthmann, 2020). Therefore, an area But these difficulties have recently become less relevant. with immediate and immense opportunities is Technological progress has made accessing participants and developing a new generation of psychometric tools that collecting data much easier, with sample sizes now routinely allow sampling persome – the universe of variables in the thousands (Gosling & Mason, 2015). Self-report capturing personality variability – more broadly than scales have turned out to be more reliable than previously currently available measures do. thought, with their many-dimensionality often mistaken for measurement error (e.g., because internal consistency Descriptive personality science systematically underestimates reliability; Cronbach & Descriptive personality research explores associations Shavelson, 2004; McCrae, 2015). This allows us to measure between the measurements of personality constructs and/or a broader selection of narrower traits with the same number their links with phenomena allegedly beyond the personality of carefully selected items, because fewer conceptually domain (e.g., demographic characteristics, experiences, and interchangeable items are required for each trait (McCrae & behavioural outcomes). The results can and do contribute to Mõttus, 2019; Wood, Nye, & Saucier, 2010; Yarkoni, explanatory or predictive research, but they are also 2010). Improved computational power and accessible data important in their own right and should not be constrained by analytic tools have eased working with many-dimensional theoretical models (purview of explanatory research) or data to efficiently summarize, communicate and compare attempts to maximise prediction (aim of predictive research). association patterns (Costantini et al., 2015; Revelle, 2020; For example, there is ample evidence that individual Ellemann et al., 2020; Stachl et al., 2020). differences in personality characteristics can be clustered into Many researchers now agree that population-level replicable groups such as the Big Few (Schmitt et al., 2007), personality variation is best represented as a hierarchy of are relatively stable over several years (Terracciano et al., increasingly specific traits, with no level uniquely 2006), are persistently correlated with a variety of life representing nature carved at its joints (DeYoung, 2015; outcomes (Roberts et al., 2007; Soto, 2019), and perceived at Markon et al., 2005; McCrae & Sutin, 2018). This hierarchy least somewhat similarly by different observers (Connelly & arises because most Big Few traits inter-correlate, Ones, 2010). Genetically related individuals resemble each suggesting few very general super-traits such as Stability other in personality characteristics, accounting for most of and Plasticity (DeYoung, 2006), although methodological the similarity of family members (Briley & Tucker-Drob, artifacts may contribute to this (Bäckström, Björklund, & 2014), although the specific genetic variants correlated with Larsson, 2009; Riemann & Kandler, 2010). The Big Few the characteristics have remained elusive (Lo et al., 2017). domains also break down into constituents that develop and Changes in personality characteristics barely track with correlate with life outcomes in distinct ways (Jang et al., specific life experiences (Bleidorn et al., 2020; Denissen et 1998; Paunonen & Ashton, 2001). Some models have al., 2019), are similarly distributed across geographically therefore delineated “aspects” (DeYoung et al., 2007) or diverse regions (Allik et al., 2017), but vary systematically “facets” (Costa & McCrae, 1992) for the Big Few. These are across genders (Lee et al., 2020). Recently, research has also more than just different ways in which the Big Few can be started to describe systematic patterns of short-term expressed (Jang et al., 1998): moreover, the hypothesis that variations in personality as another aspect of individual some traits such as the Big Few are more “core” or differences (e.g., Danvers, Wundrack, & Mehl, 2020; “temperamental” than other, ostensibly more “surface” traits Horstmann & Ziegler, 2020; Lazarus et al., 2020; Sosnowska such as facets, has found limited empirical support (Kandler, et al., 2020).2 Zimmermann, & McAdams, 2014). However, there has been little systematic research yet to delineate an empirically The trait hierarchy based and comprehensive model of personality facets for researchers to coalesce around (Saucier & Iurino, in press). Within the descriptive kind, a lot of research has been carried Moreover, most personality questionnaire items contain out on the relations between (subjectively perceived) trait unique personality variance beyond the Big Few domains, scores with the aim to reduce personality variation among aspects and facets they were designed to measure. people to as few broad trait dimensions as possible. Therefore, even the most comprehensive of the current facet Summarizing variance with a small number of traits has been models (e.g., Costa & McCrae, 1992) can be broken further a practical approach, both in terms of data collection and down into numerous yet more specific traits, or “nuances”. reporting. For example, accessing sufficient participant Empirically, nuances are every bit as trait-like as the Big numbers and tabulating data can be burdensome, especially Few domains or their aspects and facets, because even the when each trait is measured with numerous items, and unique variance in hundreds of items, reflecting the nuances but not facets, aspects and domains, has essential trait 2 These are all findings of descriptive research, even though correlations with life outcomes are sometimes called prediction and the correlation properties of stability over many years, transcendence across between genetic and phenotypic similarities are sometimes taken as the assessment method such as self- and informant-reports, and former explaining the latter. Description, prediction and explanation 4 heritability; item-specific variances also have distinct Agreeableness, and Conscientiousness but slightly lower in developmental trends and associations with life outcomes Extraversion and Openness than younger adults. But some (Mõttus et al., 2017; Mõttus, Sinick, et al., 2019). Of the of the findings are specific to questionnaires (Costa et al., error-free variance of a typical Big Five item, less than half 2019), hinting that age differences are at least in part driven has been estimated to pertain to the domains and their facets, by narrower traits that are sampled in different proportions leaving at least a half for nuances (McCrae, 2015). There are across instruments, and thereby potentially misrepresented also personality traits that are either in the peripheries of the by the broad trait domains. There is indeed ample evidence Big Five domains, as commonly defined, or beyond them that facets of the same Big Few traits vary in their age (e.g., competitiveness, loyalty, jealousy, humour, sexuality, differences (Terracciano et al., 2005; Jackson et al., 2009; or others; Bouchard, 2016; Paunonen & Jackson, 2000). Lucas & Donnellan, 2009). But even facets may not provide These traits are often not well covered in currently popular a full understanding, because items of the same facets— personality measures. The true ubiquity and utility of reflecting nuances within them—often vary in their age nuance-like narrow personality traits is thus yet to be trends, conveying unique developmental information. For properly estimated, as available evidence is based on example, item-level analysis of the Assertiveness facet of questionnaires carefully developed to assess little but the Big the revised NEO Personality Inventory (NEO-PI-R; Costa & Few and their selected facets. This universe of narrow McCrae, 1992) showed that older people were more likely personality traits that forms the basis of the personality to take charge of situations but less likely to make others do hierarchy has also been referred to as the persome (Mõttus, things, and items of the Achievement Striving facets Bates et al., 2017; Revelle, Dworak, & Condon, 2017). referring to hard work tended to increase with age while In principle, therefore, there are many ways for researchers items referring to success-motivation trended downwards to describe personality variation such as using different (Mõttus & Rozgonjuk, 2019). Such examples abound levels of the trait hierarchy. In practice, they often default to (Mõttus et al., 2015); for example, Mõttus and Rozgonjuk the Big Few, likely because these trait models appear (2019) reported that items within half of the personality intuitive and familiar, are already widely used and can be facets varied in the directions of their age differences, readily measured with existing instruments. Social pressure leading items to contain over 40% more age-sensitive from peers, reviewers and editors may also play a role. information than facets and over twice as much as the Big Although these are legitimate practical reasons, there is no Five domains. More nuanced investigations into how inherent scientific reason why this level of the trait hierarchy personality is linked with various life outcomes or vary should be a priori and always preferred over others for each across cultures have led to similar conclusions (Achaa- and every research purpose. In fact, this may often be Amankwaa, Olaru, & Schroeders, 2020; Elleman et al., counter-productive, in constraining research choices and 2020; Seeboth & Mõttus, 2018; Wessels, Zimmermann, & inspiring potentially misleading generalizations. Leising, 2020). At which level of a personality hierarchy should descriptive What makes good descriptive research? findings stop? The answer will depend on the research To select an appropriate way of representing personality questions under consideration, but the goal should be to variance for a descriptive research question, it helps to represent descriptive findings such as age or gender outline criteria for what would be a good descriptive account differences or links between personality characteristics and of whatever is being described in relation to particular other variables at the level from which going more detailed personality constructs (e.g., other psychological constructs, would not add further useful information. Technically, this different measurements of the same constructs, demographic means the level where the measurable constituents of the variables or life outcomes). We illustrate this with how traits relate to the other variables alike, because traits’ personality varies with age. associations should not depend on which indicators are usedto operationalize them (Mõttus, 2016; Spearman, 1927; Information should be elaborate. Is a good descriptive Gonzales, MacKinnon, & Muniz, 2020). Often this may account simple and parsimonious or comprehensive and mean levels from which we simply cannot go any more detailed? The tension between these priorities can be detailed, such as individual test items, given that personality alleviated by recognizing that parsimonious accounts can is, and possibly will be for some time at least, most always be extracted from detailed ones containing more commonly assessed with questionnaires. On other numerous and less aggregated variables. The reverse, occasions, broader traits such as the Big Few or their facets however, is not possible (Saucier & Iurino, in press). With may turn out to be the most suitable levels of description, remarkable flexibility, many-dimensional findings can be because their constituents follow the same association subsequently zoomed into or summarized with fewer patterns. Following this simple principle makes choosing the dimensions, such as for ease of interpretation and appropriate level of the personality hierarchy a defensible communication. empirical question rather than a matter of personal Being able to zoom in rather than a priori aggregating can preference, peer pressure or editorial policy. pay off. For example, age differences in personality are often It is sometimes thought that theories should constrain described using the Big Few traits, showing that older adults research questions. For descriptive research (as well as for tend to be somewhat higher in Emotional Stability, predictive, below), we argue the opposite: theory should be Description, prediction and explanation 5 used to expand rather than constrain the personality construct theoretical constraints on the findings (e.g., Nagel et al., space and thereby descriptive findings. For example, theories 2018; Plomin & von Stumm, 2018) and there is no reason of how personality may relate to the phenomenon of interest why following suit could not help personality scientists. can be used to suggest items to our item pools to make them More detailed findings can be aggregated into any trait more comprehensive and sensitive to the topic at hand. If we construct, either at the time when they are first published or only operationalize personality with the Big Few, we a priori in subsequent research. This flexibility is especially useful, exclude possibilities to uncover additional aspects of because most items represent several traits at different levels personality, and how they develop and co-vary with other of the trait hierarchy or even at the same level; think of the phenomena. International Personality Item Pool as an example of how But we can use theoretical models to help with organizing items are “recycled” to measure disparate constructs our findings (e.g., Bem & Funder, 1978). For example, (Goldberg, 1999). For example, to estimate how a (latent) Mõttus and Rozgonjuk (2019) described age differences in trait correlates with a criterion from the correlations of k personality using 300 items (many reflecting unique items with this criterion, the item-criterion correlations can personality nuances), but organized the associations be multiplied by the traits’ loadings on the items (which can according to the Big Five and their facets using a Manhattan be extracted from correlations among items) and the sum plot (Revelle, 2020; Revelle, Dworak, & Condon, 2020). product divided by the sum of the squared factor loadings:3 This allowed them to show the general organisation of age differences in personality (they were wide-spread across k hundreds of items) and how they were distributed across (r (X ,Criterion)∗r (X ,Trait )) particular Big Five domains and their facets (i.e., mean ∑ i ii=1 difference between domains and facets in age-trajectories), r (Trait ,Criterion)= k but also how the age differences deviated from the patterns ∑(r (X i , Trait)2) expected under the Big Five model (i.e., items of the same i=i domains/facets often substantially varied in age differences). Or, item-level findings can be organized according to the The same applies to facet-level findings, of course. degrees to which the items represent affect, behaviour, As a general rule for basic research, thus, comprehensive cognition, or desires/motivation (ABCDs; Wilt & Revelle, and detailed descriptions of personality-related phenomena 2015). For instance, a mental health variable could be most are preferable to those that a priori impose parsimony. But strongly linked with affective items, regardless of which Big this does not mean that each and every study should Few domain or facet they belong to; a physical health necessarily measure hundreds of constructs, nor that each outcome may be mostly linked with behavioural items; and paper reports many hundreds of associations. Instead, other outcomes may predominantly track with other types of personality scientists should collectively (across studies) aim items. For a few more examples, findings could be organized towards maximum comprehensiveness. This can be according to the extents to which items reflect universal traits achieved if individual studies a) consider diverse constructs as opposed to contextual adaptations (Henry & Mõttus, rather than focus all on the same trait model (e.g., a Big 2020), social desirability (Wessels, Zimmermann, Biesanz, Few), thereby distributing the workload and pooling & Leising, 2020; Leising et al., 2020), visibility (Funder & findings either in a directed co-ordination or spontaneously, Dobroth, 1987), social maturity (Caspi & Roberts, 2001), and b) provide their findings at various levels of specificity pathology (Vachon et al., 2013; Bleidorn et al., 2019) or and aggregation (including disaggregated, item-level stability, cross-method agreement, and associations with findings). Also needed are accessible tools for integrating other variables (Mõttus, Sinick, et al., 2019). This way, we the findings of different studies (e.g., for meta-analysing can use theory to expand association maps to hundreds of findings for available constructs, collating and publicly variables and still extract intelligible information from these, depositing them). Individual research reports can then especially when we use suitable (e.g., interactive) contribute to, and draw from, a central repository of visualization tools. Large samples and cross-validations are descriptive findings. This is not the default modus operandi vital, but this is no longer an insurmountable barrier in the of current personality research although it is common in current data-centric age. some other fields such as genetics and neuroscience. Patterns in how personality differences relate to the variables Findings should not depend on methodologies. When we of interest can also be explored atheoretically. For example, link something to personality constructs, we typically expect item- or facet-level associations can be organized in the that the associations pertain to psychological characteristics descending order of effect size to highlight the strongest that exist independently of how they happen to be assessed associations and find commonalities in them (e.g., Achaa- (Hilbig, Moshagen, Zettler, 2016; Mõttus, 2016; Thielmann Amankwaa et al., 2020; Elleman et al., 2020; Bem & Funder, & Hilbig, 2016). When conclusions reliably differ, say, as a 1978; Block, Block, & Gjerde, 1986; Block, Gjerde, & function of which personality questionnaire was used for Block, 1991). In some fields such as genetics, recent progress has almost entirely resulted from atheoretically 3 If the combinations of items ought to represent summary-traits rather than scanning association patterns rather than imposing shared variance-based latent traits, principal component loadings can be used instead of factor loadings. Description, prediction and explanation 6 assessing the construct (e.g., the associations of Openness Some recommendations for descriptive research and Extraversion with age or that of Neuroticism with Body Mass Index vary across studies; Costa et al., 2019; Vainik, A new trait taxonomy and instruments for it. Besides the Dagher et al., 2019), this points to the association being Big Few, we need a more encompassing trait taxonomy to driven by narrower traits that are captured by differing be able to comprehensively describe associations of degrees across measurement tools. This implies labelling personality traits among themselves and with other issues (or “jingle fallacies”; Block, 1995; Larsen & Bong, phenomena, coupled with instruments for measuring these 2016), whereby investigators mean different things when traits. In other words, we need to sample the persome more invoking the same scale or construct name. If so, these broadly than the available taxonomies allow for. This does narrower traits should be isolated, because generalizing not mean doing away with the Big Few, but developing a associations beyond them is misleading. Reporting item-level properly hierarchical model in which traits can be association in particular can help to reveal jingle as well as investigated at lower (nuance) levels as well as aggregated jangle fallacies. into increasingly broad traits, including the Big Few. It may Unless there are explicit reasons for the contrary, the also be that the Big Few models eventually require a associations should also generalize across assessment revision to account for lower level traits that are informative methods such as, most readily, self- and informant-reports but do not easily fit into the current Big Few models (ideally, the aggregate ratings of multiple informants). (Saucier & Iurino, in press). Likewise, many lower-level Findings that self- and informant-reports are measurement traits may belong to more than one of the Big Few. invariant are consistent with this (e. g., Mõttus, Allik, et al., Such models are not unrealistic, nor impractical. For 2019). For some traits, self- and informant-ratings may in example, careful item selection – such as avoiding items part measure different aspects of personality (Vazire, 2010; with low retest reliability and excessive redundancy McAbee & Connelly, 2016), in which case discrepant (Christensen, Golino, & Silvia, 2020; McCrae & Mõttus, findings may be expected, and even hint at what contributes 2019) – may allow measuring a usefully comprehensive to the observed associations in the first place. For example, pool of nuances with one or perhaps two items each. associations between personality traits and age tend to be Remember: nuances are narrow, so no broad content stronger in self- than informant-reports (Costa et al., 2019), sampling is required for them because measurement breadth possibly because people have clearer perceptions of their comes from the pool of nuances collectively, not from items own changes than they do of changes in others, or because within individual nuances. If so, a say 100- or 200-item test age differences in self-reports are inflated due to increasing can encompass around 100 nuances that can be aggregated socially desirable responding with age (Soubelet & into a few dozens of facets, and still fewer aspects and Salthouse, 2011). domains. Common psychometric concerns about the use of Researchers should explore the generality of associations short scales can be addressed. For example, the typical retest across contexts and other potential moderators. We reliability of single items of existing questionnaires over a should routinely strive to replicate findings in multiple one-week or two-week interval is around .65 (e.g., Mõttus, diverse cultures, clarifying the extents to which the observed Sinick et al, 2019; Henry & Mõttus, 2020), even though associations characterise larger populations than our typical these instruments have rarely been constructed with item- study participants (e.g., Henrich, Heine, & Norenzayan, level reliability in mind. Therefore, after careful item 2010), or even humans in general. Some already have been. selection the majority of them can have reliabilities well4 For example, age differences in personality are fairly robust above .60, with the average plausibly at about .70. This across cultures (McCrae et al., 2005), even at the levels of means that the retest reliability of most two-item scales can5 facets and nuances (Mõttus, Sinick, et al., 2019). Other be notably higher, often above .80. findings may vary systematically across context; in these Findings obtained with such multi-nuance tests can be cases, we should establish that the variabilities themselves interpreted at any one trait hierarchy level or at multiple are replicable and attempt to identify their sources levels at the same time, as appropriate for the goal at hand. (moderators). For example, the magnitudes (but not profiles For example, broad-trait associations can be qualified by across multiple traits) of gender differences vary which specific narrower traits drive them, in the likely case systematically between cultures and we know how: gender that the associations within the scale have meaningful differences are larger in more prosperous societies (Schmitt heterogeneity. Importantly, the measurement of broader et al., 2008; Mac Giolla & Kajonius, 2019; Lee & Ashton, traits themselves could also improve as a result of their 2020). It has been reported that the timing of age trajectories encompassing more lower-level traits because good may also systematically vary across cultures (Bleidorn et al., measures of broad trait domains sample their content 2013), but these findings have not yet been successfully broadly. This is therefore a win-win scenario. replicated (McCrae et al., in press). One possible benefit of routine attempts to replicate findings across cultures is diversifying the range of researchers 4 Retest-correlations over shorter testing intervals can be higher still participating in personality research, including those from (Lowman, Wood, Armstrong, Harms, & Watson, 2018) and may provideeven more accurate reliability estimates. currently less represented regions and backgrounds. 5 For an example of creating a high-dimensional personality trait pool, see Saucier, Iurino, & Thalmeyer (2020). Description, prediction and explanation 7 Of course, several of the Big Few instruments already allow Gniewosz, Ortner, & Scherndl, 2020) or keep their hand in for the measurement of their facets, but few authors have cold water (e.g., Schmeichel & Vohs, 2009) to measure their provided comprehensive, empirical evidence-based facet self-control, or asking them to categorize adjectives to taxonomies (but see MacCann et al., 2009; Roberts et al., measure their implicit self-concept (Greenwald & Farnham, 2005; Ziegler et al., 2019) and these facet models are by 2000). But despite circumventing the biases of subjective definition constrained to the Big Few that have been defined ratings, these methods may not always enable as a priori. Little taxonomic research yet has simultaneously comprehensive personality measurements as self-reports do. encompassed the Big Few, their aspects and facets as well as They may also lack inherent psychological meaning (face traits beyond them (Condon, 2018; McCrae & Costa, 1996), validity) comparable to typical questionnaire items. Also, and there has been virtually no taxonomic research for the objective measurement approaches may often have poor nuances yet (but see Wood et al., 2010). convergent and discriminant validity (Dreves et al., 2020; Being realistic, it may never be possible to devise the Mazza et al., 2020; Schimmack, 2020), possibly in part due ultimate hierarchical model of personality variance that to low reliability (e.g., Egloff et al., 2010; Wood & covers all narrow personality traits in the persome, as Brumbaugh, 2009). somehow carved out by nature. There may be too many of Measurements with likely greater face validity are direct them, their boundaries are likely inherently as fuzzy as those observations of behaviour and temporal and cross- of broader traits, and many might apply to only some situational patterns in this. These may include in situ self-or individuals and thereby have limited variance across informant-reports of behaviour (via experience sampling) individuals. But it is almost certainly plausible to develop and visual and/or audio recordings taken in labs or everyday models that sample from among the universe of important settings (Breil et al., 2019; Geukes et al., 2019; Schmid, traits far more comprehensively than the currently popular, Gatica‐Perez, Frauendorfer, Nguyen, & Choudhury, 2015; Big Few-centric models do. Wrzus & Mehl, 2015). Indeed, there is a long-standing Additional sources of information. To validate findings tradition in personality science to call for greater use of based on self-reports and explore patterns that may not be behavioural observations (e.g., Baumeister, Vohs, & Funder, accurately captured with self-reports, researchers should use 2007; Back, 2020; Back, in press), and well-cited articles alternative sources of information about personality have discussed suitable methods for this (e.g., Furr, 2009). variation, while also being mindful of the limitations of We fully join with these calls and second that personality these. psychology that exclusively relies on subjective ratings, especially self-ratings, can only provide understanding of For example, technological progress has provided new subjectively perceived variations and inevitably ignores sources of information (Rauthmann, 2020). Several recent anything not detectable, or inaccurately detected, by articles describe how personality and its associations with subjective perceptions. However, direct observations of other variables can be assessed through objectively measured behaviour have remained comparatively rare in personality behaviour or digital traces of behaviour (e.g., Cooper et al., research, likely because they are harder to obtain for 2020; Wiernik et al., 2020; Hall & Matz, 2020; Stachl, Au et sufficiently large samples and broad domains of behaviour. al., 2020). These approaches offer great potential for non- We hope that recent technological advances, such as those invasively collecting personality-related information about described in a recent special issue of European Journal of large numbers of people and possibly over extended periods Personality (Rauthmann, 2020), will improve the situation. of time, hence allowing measurement of short- and even longer-term changes in personality. But often these Combining self- and informant-reports. Objective and / assessment methods have to be given personality-relevant or in situ measurements of personality variance are highly interpretation in relation to subjectively rated personality desirable and increasingly practical, without any doubt. But traits before they become useful. For example, on its own, it is also likely that subjective and decontextualized ratings mobile phone sensor data do not have psychological will remain among the cost-efficient and ecologically valid meaning; they do once we know how they track with self- methods of measuring stable personality traits, all the more reported traits (Wiernik et al., 2020; Stachl, Pargent et al., so because the Big Few-centric research strategies have not 2020; Hall & Matz, 2020). As a result, these methods often yet fully exhausted this method’s potential (e.g., Wood, approximate subjectively rated traits rather than provide Gardner, and Harms, 2015). A well-established but still entirely new information, and any issues with self-reports underused way to improve the reliability and validity of can spill over to their digital approximations (Tay et al., subjective personality ratings is to supplement one rater 2020). Currently, typical correlations between self-reported (e.g., the self) with others (e.g., well-informed other people). traits and their digital approximations are in the range With online testing, this is far easier than it was during the from .30 to .40 (Tay et al., 2020; Stachl, Au et al., 2020), so paper-and-pencil testing era (e.g., participants can nominate the gap between them remains non-trivial. It may narrow as an informant for them, who is sent an automatic invitation to research progresses, though. participate in the study). Likewise, many researchers may strive towards objective, Combining multiple raters can reduce systematic laboratory measurements of personality traits such as asking idiosyncrasies inherent in only one ratings source (McCrae people to persevere with tedious and boring tasks (e.g., et al., 2019; Vazire, 2006); indeed, such method-specific Description, prediction and explanation 8 effects may make up a large proportion of observed variance Better use of already existing data. Researchers can help (McCrae, 2015). Self-ratings capture self-identity while to describe the associations of personality constructs among informant-reports capture reputation; both are likely biased themselves and their relations with other variables in more in their own ways, but what is shared between them is more detail than has been typically done – in fact, with little likely to provide valid information. For example, most additional effort and by using data already collected. people have developed an implicit theory of which traits go For this, we recommend routinely a) using facets of the Big together and adjust their self-ratings or ratings of someone Few and/or b) testing extents to which associations are else accordingly, which can lead to distorted correlations driven by narrower-still traits such as nuances (e.g., single between data-points obtained with one rating source items). Where the associations are driven by particular (McCrae et al., 2019). Combining ratings can also reduce facets or nuances, they should not be automatically random measurement error, especially in single- or few-item generalized beyond these, including to broader domains. nuances where its proportion is higher than in broad Faceted and nuanced association patterns can be as aggregate traits.6 This in turn can result in stronger informative and hypothesis-generative as the comfortingly associations with other variables of interest (e.g., Wright et predictable association patterns typical to the Big Few – al., 2019). Of course, informants may have different or less desirable traits all too often going with desirable outcomes information about their target than the targets themselves do and the other way around, with most “significant” and they may often be biased towards the targets because of correlations somewhere between .10 and .30. We being non-randomly selected (Wessels et al., 2020). recommend that facet- and/or item-level findings be Likewise, we rarely know how discrepancies between self- routinely published in article supplementary materials; this and informant ratings arise – from biases in the former, costs very little to authors (calculation and tabulation of latter, or both – and thereby how to weigh them in the findings) or journals, but it adds transparency to findings combined results. No single source of information is perfect and facilitates their subsequent re-analysis and (e.g., meta- – but, again, combining them is very likely to improve data analytic) integration. This is different from making raw data quality in most cases. available, because calculating the correlations of interest Multiple sources of ratings can sometimes be “triangulated” from these can often be cumbersome, unless very easy-to- to estimate associations a) with reduced single method use statistical programming code is made available. effects while b) also accounting for imperfect agreement Some may think that item-level findings are notoriously between raters due to different information, rating biases or unreliable. But as was discussed before, items often have error (e.g., Biesanz & West, 2004; Eid et al., 2008; Riemann retest reliabilities of .65 to .70 or higher (Lowman et al., & Kandler, 2010). For example, using cross-trait, cross-twin 2019; Mõttus et al., 2019; Wood et al., 2010; Henry & ratings and cross-trait, cross-time ratings, Mõttus and Mõttus, 2020), which may be higher than many intuitively colleagues (2017) calculated bias-and-error-reduced expect. Higher-than-assumed single item reliability is also estimates of heritability and rank-order stability of consistent with findings that items out-predict scales for personality nuances and found that the average estimates outcomes and other variables (Mõttus & Rozgonjuk, 2019; were comparable to those of aggregate traits, defying the Seeboth & Mõttus, 2018; Vainik et al., 2015; Achaa- intuition that broad psychological traits are more “biological” Amankwaa et al, in press; Ellemann et al., 2020). Therefore, than circumscribed behaviours, feelings, thoughts and the allegedly low reliability of items should not be a reason motivations. for not reporting item-level findings. Where reliability is a Combining test-retest data. The reliability of personality concern, however, it can be compensated with large trait assessments and thereby their associations with other samples, meta-analytic integration of findings, and by variables can also be substantially improved by measuring aggregating or triangulating self- and informant-reports. presumably enduring traits twice over reasonably short time A Personality Research Hub. We recommend developing intervals (e.g., two weeks); besides, the associations can then a central repository of descriptive findings. These findings be corrected for unreliability. Again, with online testing, could involve anything from associations among personality organizing two or more measurement occasions is not as traits or their associations with demographic characteristics, taxing as it used to be when testing was done on paper and life events and outcomes to their heritability, stability, and when much of our current assessment practices were set, cross-method agreement estimates. We think that findings including the one-assessment-only tradition. It is especially are best deposited disaggregated (e.g., at the item level), useful if multiple self-ratings can be supplemented with allowing for a flexible aggregation into different scales as informant-ratings: combining multiple pieces of information well as for analysis at the item level. Centrally and publicly allows breaking correlations between variables into several available findings can be tested for robustness across components such as the association net of single-rater and studies, as well as for moderators that help to understand occasion-specific biases, rater-specific effects and occasion- why they vary from study to study or from scale to scale. specific effects (e.g., Koch et al., 2017). They can also be meta-analytically combined and used for setting up and testing novel hypotheses (e.g., a routine 6 For example, if 50% of an item’s variance is free of measurement error and practice in quantitative genetic research; Lee et al., 2018). single source method biases, then combining two raters yields a reliability Some such datasets have already been published (Mõttus, of .67 for the aggregate, according to Spearman-Brown formula. Description, prediction and explanation 9 Sinick, et al., 2019; Condon, Roney, & Revelle, 2017; (in preparation) found that more socially desirable traits Goldberg & Saucier, 2016), but there is no central repository showed stronger age-differences in self-reports than in of personality research findings yet. informant-reports, suggesting that age-differences may be For integrating findings across studies it is not necessary that inflated in self-reports; and Wood and Wortman (2012) all or even most studies use similar instruments. In fact, showed that traits which varied least in their desirability having all researchers assessing the same personality traits across participants were least stable over time. may not even be preferable for many research questions, For a parallel, recent developments in quantitative genetics because this would constrain the range of traits for which have been substantially facilitated by a wide-spread practice findings can become available over time. Instead, it is of sharing genotype-phenotype associations at the most fine- sufficient when studies rely on at least partly overlapping grained level (millions of single nucleotide polymorphisms) measures so that their associations can be compared for in repositories such as the LD Hub (Zheng, et al., 2017). robustness and integrated into larger association networks. Geneticists routinely (meta-analytically) integrate and re- This directly parallels the idea of Synthetic Aperture analyze such data for various research questions, developing Personality Assessment (Revelle et al., 2016), which allows novel methodologies in the process. Much of this work is calculating “synthetic” correlation matrices from only partly based on examining variabilities between genetic markers in overlapping sets of participants. That is, not only can their phenotype-associations or other attributes (e.g., allele correlation matrices be based on different participant frequencies or linkage disequilibrium), exactly as we combinations of the same study, they can also be based on recommend examining systematic variabilities between combined (synthetic) correlations from different studies. A personality traits in their quantifiable attributes. The high- similar procedure is routinely used in modern genetic dimensional findings are filtered and aggregated in various research (e.g., Bulik-Sullivan et al., 2015). For working with ways such as by chromosome or gene expression patterns, to such data, it is sufficient if (nearly) identical items and traits test hypotheses and summarize patterns. This is a share annotation (common labels) – something that also fundamentally more flexible approach to data than the a helps against jingle-jangle fallacies. priori aggregation of data-points that has prevailed in Readily available descriptive findings, especially if they are personality research. not a priori aggregated into the Big Few, would facilitate a New data analytic tools. In conjunction with depositing currently underused research strategy: setting up and testing (disaggregated) findings, we recommend that researchers hypotheses that rely on systematic variability between develop tools for collecting, annotating, archiving, personality traits in their attributes such as demographic processing, meta-analysing, and processing many- differences, stability, heritability, or links with outcomes dimensional personality data. For example, we can imagine (e.g., Funder & Dobroth, 1987; Block, Block, & Gjerde, a software package (e.g., in R, possibly in combination with 1986; Funder & Sneed, 1993; Mõttus et al., 2017; Vainik, other platforms) that facilitates: Misic et al., 2019). That is, much like we study differences ● administering subsets of item pools, selected according between people, we can also study quantitative differences to pre-defined criteria; between traits such as facets and nuances. This is not possible with only, say, five traits, but becomes increasingly ● scoring them into various scales (e.g., the Big Five, viable as the number of traits increases. HEXACO, Dark Triad, or well-being); For example, we could numerically test the hypothesis that ● uploading and downloading data from a central personality development reflects social maturation (Caspi & repository of findings according to specified criteria; Roberts, 2001). If associations between hundreds of items ● automatically meta-analyzing new and/or existing with age are meta-analyzed into reliable estimates, one could findings for user-selected variables; select, say, 200 diverse items, quantify their degrees of ● cross-validating findings across different subsets of reflecting social maturity (e.g., using expert ratings or existing data and identifying candidate moderators; correlations with objective maturity-criteria) and expect ● leveraging existing information (covariances among these degrees to track with empirical age differences in the items) to impute unmeasured variables and to cross- items. This would be a powerful and quantitative alternative walk from measured scales to (partly) unmeasured to eyeball-judging that mean-level change patterns in traits scales; such as the Big Few look like people are generally becoming ● summarising findings (e.g., personality-outcome socially more mature. For other examples, Henry and Mõttus correlations) at different levels of aggregation (2020) examined whether items that corresponded to the (personality hierarchy); definition of traits as opposed to characteristic adaptations demonstrated empirical properties often associated with traits ● identifying the variables (pre-defined scales, individual such as stability, cross-rater agreement, and heritability; items, or computer-identified item collections; Hang, Soto, Lee and Mõttus (under review) studied whether Ellemann et al, 2020) that uniquely (over and above items representing traits with stronger social expectations other variables) drive particular associations (e.g., had larger age differences in means and variances throughout Vainik et al., 2015); childhood and adolescence; Kööts-Ausmees and colleagues Description, prediction and explanation 10 ● testing the extent to which items’ or broader traits’ (Rauthmann, 2020), already contain papers that do exactly associations with particular variables track with their this. Here, we only note two things. previously established properties such as reliability, First, much of the research on short-term variance in social desirability, degrees of reflecting affect, personality states repurposes the descriptive models motivation, and other psychological domains, developed for summarizing individual differences such as developmental trajectories, and so forth, so as to better the Big Few. But the extent to which this is appropriate understand the associations and detect possible needs to be studied not presumed (e.g., Molenaar & confounders; Campbell, 2009; Fisher et al., 2018). There is no reason to ● visualizing association patterns according to user- assume personality hierarchy operates the same way for selected filters (e.g., compare item-outcome correlations individual difference traits and within-individual variance in whether they pertain more to affective or behavioural states, although sometimes it may. Many trait models are items). designed and measured with the specific purpose of glossing Some of these functions have already been implemented over temporal and situational variations, because personality (e.g., Arslan, 2019; Arslan, Walter, & Tata, 2020; Revelle, is often conceived of as broad and decontextualised patterns 2020; Rosenbusch, Wanders, & Pit, 2020), but there is no of individual differences (Funder, 1991; McCrae & Sutin, comprehensive toolbox yet. Possibly, the main reason for 2018). It is useful to recall that the adjective pools that were why this does not already exist is lack of suitable databases; used to derive the Big Few systematically excluded terms to date, personality researchers simply have not pooled their concerning moods or states (Saucier, 1997). For this and (disaggregated) findings, as some other fields have done to a other reasons, employing the Big Few-like broad traits in good effect. We hope this will change. For a relevant studies on how personality states fluctuate just because this example in cooperation research see Spadaro and colleagues model is often used in individual differences research may (2020). not be a good idea, just as assuming that narrower traits such If personality science is moving towards higher-dimensional as facets or nuances are somehow more contextual- representations of phenomena, as we hope, this will also situational and thereby more appropriate candidates for have implications for which skills needed to be taught to, and personality states may be ill-conceived (Horstmann & expected from, graduate students pursuing personality Ziegler, 2020). Being artistic may be a useful narrow trait, research. but uninformative as a personality state. We suspect thatsome phenomena – for example, being talkative or sad – Collaborations. Any one researcher or research group can may constitute reliable variance units both as traits and collect only so much data. Individually, even the largest states (Zimmermann et al., 2019), whereas others may only panel studies with often brief measures of personality traits be appropriate as either. may provide increasingly diminishing returns when the phenomena they explore are many-dimensional. But there is Second, many of the recommendations that we propose for no rule that all research teams have to rely on the same descriptive research on individual differences should also omnibus model of personality and be constrained by the apply to descriptive research on within-individual variance same practical limitations that prevent them from in personality states. Among them are the need to develop a comprehensive measurement. Instead, we may need flexible descriptive framework that allows measuring collaborations where different researchers explicitly set out phenomena with the most appropriate level of granularity to examine different aspects of personality (e.g., different for the purpose at hand, validating findings across methods, traits) and only subsequently integrate their findings. measures and contexts, combining self- and informant-reports, developing tools for flexibly working with and Within-individual variance efficiently summarizing many-dimensional data, and developing efficient tools for data sharing and collaboration We have focused on variance between individuals in (e.g., Kirtley, 2020). enduring patterns of thinking, feeling, behaving, and motivation, partly because this is what much of personality Predictive personality science science is about. But recent years have seen the emergence of a powerful new stream of research that maps variance within Personality researchers often take pride in how personality individuals over very short time-periods and across traits “predict” life outcomes such as academic performance, situational experiences in what is often called personality relationship satisfaction, or health. Strictly speaking, states (Wendt et al, 2020, Sosnowska et al., 2020, Danvers et however, many of these findings – correlations or regression al., 2020, Horstmann & Ziegler, 2020), as well as stable coefficients calculated using the same observations being individual differences in the distributions of these. This will predicted – are actually descriptive. Truly predictive likely provide more detailed descriptions of how particular research aims to create models where characteristics such as individuals and people more generally interact with their personality traits are used to model the best possible environments and differ in this. Here, we do not describe this predictions of outcomes in data that have not yet been new and blooming stream of research in detail only because accessed or even collected (out-of-sample prediction). First, this special issue, as well as another recent special issue this means that the observations that are used to create, or“train”, predictive models must not be the same observations Description, prediction and explanation 11 that will eventually be predicted (Yarkoni & Westfall, 2017; Why is predictive research different from descriptive and Stachl, Pargent et al., 2020). Second, such research should explanatory research? explore the limits of predictive accuracy, whereas descriptive models often have other priorities, as we argue below. It may not be obvious why descriptive models are not Given that the scientific value of personality traits is often necessarily optimal for prediction. For example, doesn’t R 2 said to hinge on their predictive power for important life of a regression model provide a good estimate of its outcomes (Ozer & Benet-Martínez, 2006; Roberts et al., predictive accuracy, even if that model was intended as a 2007; Soto, 2019), it may come as a surprise that this power descriptive research tool to show how the variables in the7 and ways of maximizing it have rarely been directly assessed model are linked with an outcome? It can, especially when in empirical studies. We suspect that this is in part due to a the model comprehensively covers relevant variables at the common failure to distinguish predictive research from other appropriate level of the personality hierarchy, as we kinds of research and a tacit—but often likely mistaken— recommended for descriptive research, and was developed assumption that priorities and methodologies most suitable on a sufficiently large sample to obtain stable parameter for descriptive or explanatory objectives must also be estimates. However, the best descriptive models do not have optimal for predictive purposes. to be the most predictive ones, because efforts to optimizemodels for descriptive as well as explanatory appeal often Why do predictive personality research? decrease their predictive power, for two reasons. First, a failure to cross-validate performance estimates (e.g., Maximizing the out-of-sample predictive utility of reporting an adjusted R2 estimate derived from the same data personality traits can be an end in itself, sometimes even the model was trained on) may result in overfitting (Yarkoni irrespective of its potential descriptive or explanatory utility. & Westfall, 2017; Stachl et al., 2020) and give overly Consider, for example, using personality assessments for optimistic impressions of predictive accuracy, while candidate selection (Lievens, 2017): what matters most is the estimating how individual variables in models contribute to accuracy of the estimated probability that the candidates will their cross-validated prediction reduces the models’ succeed in the job. Although for transparency it is useful to descriptive simplicity (for examples, see Stachl, Pargent et know which individual traits contribute to these predicted al., 2020). To be fair, the issue of overfitting is probably less probabilities, the implications of those contributions for our prevalent in more recent personality research and compared understanding of personality more broadly are less to many other fields of psychology, because often important. Where the most accurate estimates of future job sufficiently large samples are used. But even so, an adjusted performance are based on the Big Few scores, it makes sense R2 estimates a model's predictive performance in a to use them. But where the best predictions are achieved by hypothetical and infinitely large sample that was measuring, say, 100 unrelated personality items and feeding compositionally exactly identical to the one in which the them directly into a predictive model, it may be model was fitted, whereas cross-validation allows one to counterproductive to combine them into broader trait estimate the robustness of the model across different kinds constructs and use these for predictions, however of samples. Researchers often assume that their findings are descriptively elegant or comfortingly familiar this may seem. robust to variations in sample composition, but R2 is The same applies to using personality traits to decide which insensitive to this.8 products are best advertised to which people (Matz et al., Second, human researchers’ and their readers’ cognitive 2016) or for predicting important outcomes in medical and constraints introduce a tension between academic contexts, among other possible applications. descriptive/explanatory and predictive research objectives, Maximising predictive accuracy has theoretical importance, because increased predictive accuracy is often achieved by too. Quite simply, to the extent that predictive accuracy is increasing model complexity, which reduces interpretability one of the main reasons for pursuing personality research, and theoretical parsimony. For example, for descriptive and the case for this pursuit will be even stronger if we manage explanatory purposes researchers tend to look for and group to increase the predictive accuracy. Likewise, one of the correlated variables, whereas sets of variables that capture main theoretical implications of the pervasive personality maximally unique portions of variance likely confer better trait-life outcome associations is that the traits may partly prediction (Saucier, Iurino, & Thalmeyer, 2020). The shape everyday experiences linked to these outcomes (e.g, increased complexity of predictive models may not only differential education, career and relationship success confer mean including many predictor variables (we do recommend different life trajectories and subsequent experiences) and high-dimensional descriptive research!), but also thereby also shape psychological development more broadly (e.g., Scarr, 1983; Roberts & Nickel, 2017). That is, many 7 In fact, many studies linking personality traits with outcomes only report psychologically consequential experiences are unlikely correlations and not R2 estimates. random but related to pre-existing psychological 8 One may expect that increasingly common meta-analyses provide average characteristics: traits’ predictive accuracy is the formal association estimates across different samples that are more generalizablethan estimates from individual studies, and therefore less overfit. measure of how pervasive this tendency is. However, although likely more accurate due to aggregation, meta-analytic estimates may also be inflated due to overfitting in individual samples. Description, prediction and explanation 12 capitalizing on often uninterpretably small differences this. And sometimes comparatively more accurate between already small weights of individual predictors and predictions result from even more counter-intuitive sometimes also incorporating non-linear associations and/or modeling. For example, Mõttus & Rozgonjuk (2019) interactions between the predictors. unsurprisingly found that regularized regression models For example, Mõttus and Rozgonjuk (2019) reported that age predicted age from items much better than models based on could be out-of-sample predicted (in statistical, not the zero-order correlations of these items with age (i.e., if substantive sense) more strongly from 300 individual test the predictions were formed by multiplying the standardized items (r = .65) than from 120 items (r = .54), 30 personality score of each item by its correlation with age in another facets (r = .44) or the Big Five domains (r = .28). This shows sample and subsequently summing the products). But using that hundreds of items contain reliable and age-sensitive zero-order correlations calculated with items’ standardized information about individual differences that is not fully residuals (i.e., after removing the variance of Big Five exhausted by a set of 119, or possibly even 299, other items domains and facets from them) to create the prediction and that including this information in predictive models models improved their performance to levels comparable to makes a material difference in their performance. But from a regularized regression models. That is, removing the descriptive/explanatory standpoint, a model with 300 small variance of the Big Five domains and facets from items regression coefficients that are carefully selected to prior to using them in the models increased their ability to maximize prediction may be suboptimal, because human out-of-sample predict, despite these items having been researchers struggle to reason in so many dimensions and selected to measure the domains and facets in the first place. fathom the small differences between the coefficients. The This surely leaves classical test theorists scratching their findings have to be filtered or organized somehow to make heads: how can what is supposed to be error (i.e., left-over them useful for descriptive and explanatory purposes. This variance in items beyond the traits that they were designed predictive research just revealed that the Big Five (or any to measure) out-predict traits? A plausible explanation is Big Few) may be a particularly suboptimal way of that predictive modeling benefits from uncorrelated organizing items in their age differences. predictors and minimizing their redundancy (Saucier, Iurino, For a parallel, the same applies to quantitative genetics, & Thalmeyer, 2020). If so, capturing personality variation where polygenic models based on contributions from more using sparsely placed markers (items) throughout the numerous genetic variations (e.g., 100,000) generally allow persome is more useful for prediction than relying on for stronger out-of-sample predictions of phenotypic intuitive variables such as the Big Few or even their facets variables than models based on fewer genetic variants (e.g., that capitalize on, and aggregate, correlated traits (i.e., 50,000), even though the contributions of individual variants oversample certain areas of the persome). This means a very are mostly far too small to be meaningfully interpretable different measurement philosophy than classical test theory. (Plomin & von Stumm, 2018). Likewise, in fields like It is important to avoid pejoratively calling predictive computer vision and natural language processing, opaque models with predictors and parameters that are not intuitive and complex statistical learning methods such as deep neural or familiar to human researchers “black box” models. They networks (DNNs) vastly outperform simpler, more are not black boxes because, having designed them, humans interpretable statistical models (for review, see LeCun, can understand their working principles (Hasson et al., Bengio, & Hinton, 2015). Many of these models capitalize 2020). Besides, researchers know the data on which the on so many parameters and small variations in them that they models are trained because they designed the measures and may never be fathomable by humans: not because the models collected the data. It is just that the specific parameter values are overly complex per se, but because human minds have that the models develop to do what modellers designed them constraints that models do not have to obey (Hasson, to do are often not interpretable to these modellers, possibly Nastase, & Goldstein, 2020). We don’t know yet whether the due to their own cognitive constraints, but possibly also due same will prove true for the prediction of individual to insufficient research and familiarization yet. Personality differences in behavior (e.g., DNNs often require volumes researchers should be open to the possibility that some, and quality of training data rarely available in personality perhaps even many, of their familiar tools may become research), but this is not an unreasonable hypothesis. As it suboptimal when we start to systematically explore the stands, there have simply been too few attempts to limits of real-world predictions. systematically explore the limits of personality traits-based Thus, there may often be an inherent tension between predictions. parsimony and predictive power that forces researchers to But initial evidence does suggest that techniques providing choose between descriptively/theoretically elegant models less human-interpretable model parameters such as that have lower predictive power and better-performing regularized regressions or random forests may at least predictive models that benefit from the contributions of sometimes substantially out-perform more intuitive modeling numerous variables with sometimes very small coefficients approaches (e.g., Elleman et al., 2020). For example, that individually make limited sense. Of course, other things regularized regression models often shrink many coefficients being equal, it is always better to understand how a system to a range that descriptively looks close to zero; even operates than not. But sometimes, and maybe even very ordinary regression models with many predictors tend to do often, the true data-generating processes underlying Description, prediction and explanation 13 behaviour are too complex for a model to be simultaneously only minimally (Mõttus & Rozgonjuk, 2019).9 Likewise, a both comprehensible to humans and predictively maximally finding that predictive models allowing for non-linear and/or useful. interactive associations (e.g., recursive partitioning, random forests) do (or do not) out-perform those that only allow for Can predictive models help descriptive and explanatory linear additive associations can be equally informative about ones, and vice versa? possible causal mechanisms, at least when the Predictive modeling can also facilitate progress in other underperformance of complex models is not due to kinds of research, where maximizing out-of-sample measurement error (Jacobucci & Grimm, 2020). Such prediction is not an end in itself (for review, see Yarkoni and findings can also inform intended personality-based Westfall, 2017). interventions, not least about their likely limits in real-lifesettings. First, routine cross-validation can provide researchers with more realistic estimates of not only the predictive, but also Fourth, cross-validation as it is routinely done in predictive the descriptive and explanatory capacity of their models. modeling provides an elegant way of estimating systematic Impressive in-sample performance estimates derived from (lack of) generalizability of results across measurable small-to-medium samples may decrease substantially when factors. For example, one can train a model on only some evaluated in independent samples, whereas the predictive samples (e.g., only for men, North Americans, people power can hold up well with larger samples. But regardless younger than 50 years) and evaluate its performance on of this, where predictive models with tens of well-chosen and others (e.g., women, Asians, those aged over 50); if the well-measured predictors are able to account for only a models perform equally well, the factors that differentiate fraction of the variability in the phenomenon of interest, between the samples do not moderate the associations researchers may want to remain humble about being able to captured by the model. map out the causes of this phenomenon, at least using the On the other hand, attending to descriptive and explanatory kinds of explanatory variables approximated by their concerns can also help improve the performance of predictors. That is, because one could argue that something predictive models. Most importantly, researchers can draw can only be mechanistically explained to the extent it on their domain expertise to facilitate better “feature behaves predictably, predictive accuracy may often signal engineering”; that is, choosing which variables are used in the limits of the explanatory powers of causal models. the predictive models and how they are pre-processed Second, predictive models can help researchers understand (Stachl, Pargent et al., 2020). No amount of machine the trade-offs inherent in emphasizing certain goals over learning expertise is likely to produce optimal predictions if others and identify important lacunae in descriptive or the available predictors contain mostly noise (Jacobucci & explanatory models. For example, even if one’s goal is to Grimm, 2020) or lack coverage of the critical features of the develop a readily interpretable prediction equation using only target phenomenon. An understanding of the sources and the Big Few domains, quantifying the performance structure of human personality and psychometric expertise improvement one might obtain by using a more expansive set can be particularly helpful for maximising predictive of predictors can help calibrate expectations about what potential and for anticipating issues with generalizing the “good” performance constitutes. It is not uncommon to learn models beyond original training settings. For example, it is that the Big Few are “powerful” predictors of life outcomes: likely that personality trait inventories that contain items comparing the predictive power of the Big Few to other trait with high reliability but relatively little redundancy are models would help to either support or at least qualify such particularly useful for prediction, despite the trait scales claims. The predictive models may also help to identify having lower internal consistencies and thereby potentially additional sources of variance for further putting off users with less or outdated psychometric descriptive/explanatory model development such as facets or expertise (Yarkoni, 2010). nuances that could be included into the Big Few or besides In particular, because accuracy of out-of-sample predictions them. entirely depends on comprehensive, well-measured and Third, predictively comparing different kinds of models can generalizable sets of predictors, theoretical accounts also shed light on the general architecture of personality variation in relation to predicted outcomes. For example, 9 When predicting age from personality test items, Mõttus and Rozgonjuk(2019) tried removing items of several facets that had the strongest models based on hundreds of predictors out-performing those correlations with age. Surprisingly, they found that the overall predictive based on the Big Few or their facets would suggest that the capacity of the models decreased minimally, suggesting that the bulk of associations of personality with the outcome could be driven the predictive information was not uniquely concentrated to a smallselection of items or the traits that they were supposed to index. Not by numerous specific processes, rather than a few broad reported in the original paper, but specifically for the current article, we mechanisms – to the extent that causality is involved, of ran additional out-of-sample predictions of age in these data, by dropping course. Among other possibilities, this can be tested by 5%, 10%, 25% and 50% of the most predictive items from the total of 300 dropping the strongest predictors from the model and items: the correlation between predicted and actual ages dropped from .65to .61, .59, .51 and .41, respectively. These predictions were still far more estimating changes in the collective predictive power of the accurate than those provided by the Big Five domains (.28) and mostly remaining predictors: it may be that this changes the results also more accurate than those of the facets (.44), even with these including all their items. This suggests that small amounts of unique age- sensitive information were allocated across many individual items. Description, prediction and explanation 14 elucidating the processes by which personality relates to the data from more people or more data from fewer people. In outcome and descriptive accounts showing how the outcome such cases, larger participant numbers are not always is correlated with personality traits can both be useful for desirable. Instead, prioritizing the coverage of the persome expanding the range of predictors included in predictive by increasing the number of variables at the expense of the models. This may go against the intuition of some number of participants may confer substantial predictive researchers to use prior knowledge to constrain models. For advantages (the same likely applies to descriptive research), training predictive models, however, it does not matter how provided that the variables used during training are also many predictors are initially involved or what putative available in the validation data and any future observations personality hierarchy levels they come from, so long as they for which predictions are intended. A large number of help maximize suitably generalizable out-of-sample responses to a short personality questionnaire can be a poor prediction accuracy. As long as the models are not validated substitute for a rich dataset, even if the latter contains fewer using the observations on which they were trained, any observations. For example, a sample of 3,000 participants excesses in predictor selection will become apparent in the measured with 200 items may often enable more predictive model validation phase and can be corrected. (as well as descriptive) models than a sample of 12,000 participants measured with 50 items, and a sample of 60,000 Some recommendations for predictive research measured with 10 items is likely to fare worse still. Cross-validation. For an accurate evaluation of the Ultimately, the predictive information is in the variables and predictive value of personality traits, it is most important to most outcomes are highly multiply determined, with use cross-validation procedures that distinguish between the observations only needed to reliably estimate relevant training sample and the validation sample (Yarkoni & information in the variables. Besides, many statistical Westfall, 2017; Stachl, Pargent et al., 2020). These can be estimation methods such as regularized regressions are independent partitions of one larger sample (as in k-folds or designed to help stabilize predictions even in cases where leave-one-out cross validation), but it is even better if they the number of variables exceeds the number of independent are independently collected datasets, potentially with observations. A particularly useful solution to balance somewhat varying demographic characteristics. Cross- participant and item numbers is to collect data with validation helps to mitigate against model overfitting due to massively planned missingness, where each participant random sampling variance as well as due to systematic biases provides responses to a different random subset of variables in sampling (e.g., demographic imbalances), and it can guard (e.g., Revelle et al., 2017; Elleman et al., 2020). against the effects of idiosyncracies in data collection, Flexibility in selecting and transforming predictors. processing, and statistical modeling. It is especially valuable When constructing predictive models from personality data, if the training and validation data were collected by different researchers have flexibility over how, or whether at all, to researchers. transform single data points such as item scores into Sufficiently large datasets. Predictive performance tends to predictive variables; this may involve aggregating, raising to improve with increasing model complexity, so long as the powers or grouping values, for example. In machine training data is sufficiently large to mitigate over-fitting. As learning, this process is termed feature engineering. a general rule, the more predictors in a model and/or the Aggregation tends to filter out potentially useful more complex the functional form relating predictors to the information, so measuring many traits with one item each criterion (e.g., allowing for non-linear associations), the can result in more predictive models than measuring few larger the training sample that is required. The incremental traits with many items. But aggregation may be useful when gains associated with larger sample sizes also depend on the this demonstrably improves the generalizability of the effect sizes in question, as large effects require smaller prediction models across contexts and instruments. For samples, and the amount of missing data (Elleman et al., example, it may be that an item-based prediction model 2020). For example, Mõttus & Rozgonjuk, found that vastly out-predicts a model based on fewer aggregate traits prediction models stabilized with a few hundred observations in a given sample, but when the trained model is applied in a when based on up to 30 variables, but required about 3,000 different demographic group, the gap may close or even observations when based on 300 predictors with the smallest reverse. As a general rule, different ways of aggregation individual effect sizes and presumably most measurement could be empirically compared to each other as well as to error. We therefore do not suggest universally “acceptable” completely disaggregated models in their ability to predict sample sizes; instead, this can be estimated with simulations outcomes in independent data (e.g., Mõttus, Bates et al., for individual study designs. For many predictive modeling 2017). applications in personality psychology, it is possible that Comparing statistical models in their performance. increased sample sizes will have diminishing returns beyond Sometimes, well-tuned regularized regression models may a few thousand observations. provide far more robust and accurate predictions than But more variables is often preferable to more “standard” (i.e., ordinary least-squares) regression models; participants. Researchers rarely have the luxury of sometimes the latter may work just as well. Also, models acquiring massive samples with many well-measured that allow for non-linear and interactive associations may variables, and often face a choice between collecting less sometimes provide the most accurate predictions, even ifthey require larger training samples. In some circumstances Description, prediction and explanation 15 such as high levels of missing data, less sophisticated and Explanatory personality science less data-hungry models may provide comparably accurate predictions: for example, Elleman and colleagues (2020) Many psychologists are not satisfied with describing and introduced the Best Items Scales that are Cross validated, predicting personality-relevant phenomena (e.g., traits or Unit weighted, Informative, and Transparent (BISCUIT) their correlates; events, actions, affects, goals, life model that allows researchers to create bespoke personality outcomes) and also aspire to explain them (e.g., Baumert et scales for particular outcomes, consisting of as few items as al., 2017). Few would disagree, however, that explaining possible and each contributing exactly the same amount something is harder than describing and predicting it, not towards the prediction for greater interpretability. Our only because of methodological challenges but also because general point is: to date, there has been too little research that of more fundamental questions about the very nature of has systematically explored the ways of maximising the useful explanations. In fact, even the authors of this article predictive accuracy of personality variables and therefore we could not entirely agree on some fundamental questions cannot know yet which modeling practices are generally around causes, explanations and their roles in personality preferable. science. Fortunately, there have been other recentcontributions regarding how to explain phenomena that Alternative sources of personality information personality scientists consider as falling into their jurisdiction (e.g., Baumert et al., 2017; Briley et al., 2018; Predictive personality research may not only use personality Grosz et al., 2020), including articles in this issue (e.g., traits as predictors, but also as outcomes. A wealth of recent Quirin et al., 2020; Costantini et al,. 2020; Lukaszewski et research has explored the possibility to extract personality- al., 2020). Here we offer general ideas about how one could relevant information not only from traditional sources like think of causes and useful explanations – and why these are self-reports, but from records that people leave behind such not necessarily the same things. as social media or credit card records, mobile sensor data or Crucially, there are different approaches to personality that diaries (Kosinski et al., 2013; Stachl, Au et al., 2020; Weston vary in what their advocates may consider useful and et al., 2019; Wiernik et al., 2020). Typically, such data is realistic goals of explanation. Some conceive of personality given psychological meaning by first collating them into as broad regularities in relatively stable individual scores that approximate self-reported personality traits (e.g., differences, whereas others think of it as a dynamic and using machine learning techniques; Wiernik et al., 2020) and potentially idiosyncratic within-person system, and see the then using these digital records-based self-report- role of personality science as providing an integrative approximations for descriptive or predictive purposes. The account of how the mind and behaviour come together. The standard approach so far has been to predict the Big Few first former approach focuses on decontextualised patterns in and then use these predictions for whatever is their intended naturally occurring, normal and continuous (dimensional) purpose, but recent evidence suggests that predicting variance among individuals (e.g., Funder, 1991; McCrae & narrower traits such as nuances first and using these Sutin, 2018); in this view, personality is a population-level predictions in subsequent analyses may be preferable (Hall et variance phenomenon such as the trait hierarchy. The latter al., 2020). Again, more research is needed before we could approach is primarily about specific processes pertaining to recommend generally preferable research practices, and individuals and resultant variability within them, as well as therefore it may be useful to systematically compare about how individuals may differ in these processes and/or different approaches in their performance. their distant causes (e.g., Quirin et al, 2020). In many cases, it is not evident that variability/processes taking place within Competitions among research teams individuals and variability among people arise for similar Prediction is different from description and explanation in reasons (e.g., DNA structure, anthropometry, parental that there is an objective ground truth for assessing socioeconomic status or other possible sources of individual performance: the agreement of predictions with actual differences do not even vary much within individuals), observations. This creates an opportunity for researchers to although sometimes they may (see Lukaszewski et al., 2020; directly compete against one another in developing the best Quirin et al, 2020). But even more importantly, while possible prediction models, which could go a long way advocates of the latter approach may hope to identify towards eventually establishing the best practices for the specific causes of specific phenomena (why a particular field. For example, teams of researchers could be given person reacts to a situation in a particular way) and similar training data with the only instruction to develop the eventually perhaps even individual differences in these, most accurate prediction models for given outcomes, and the advocates of the former approach may prefer explanations submitted models could be compared in their performance in that propose general principles rather than target specific hold-out data that were not available to model developers causes, for reasons that we’ll describe shortly. (e.g., Salganik et al, 2020). Causes Causes can be defined as broad and specific factors (e.g., neurological structures or repeated experiences) or processes (e.g., situation selection or associations among Description, prediction and explanation 16 psychological constructs) that play roles in producing However, it is not self-evident that researchers who think of particular responses to environments, or vice-versa, either personality as population-level patterns in naturally- psychologically or behaviourally. Even if inferred from occurring individual differences and seek to make sense of comparing individuals, causes and effects pertain to these should target their individual causes. This is because processes and variability within particular individuals in these patterns may not have many tractable causes to begin their particular circumstances. Causal relations have with, at least according to our definition of cause, or they boundary conditions, which can range from the exceptionally may be too numerous and too complex to provide narrow (e.g., where affecting X should only affect Y in rare explanations that are interpretable for the human mind and circumstances and needs to be studied idiographically) to therefore useful. Instead, useful explanations for these very broad (e.g., on Earth, releasing an object almost always patterns could postulate general principles that may or may causes it to fall toward the Earth). Explanations that target not apply to potentially controllable processes in particular causes thus mean specifying (1) the nature of the cause-effect individuals. We now elaborate on this position, because we relation or process (such as X→Y or X→M→Y) and (2) the feel that it is implicitly adopted by many personality circumstances under which the relation or process is researchers but may cause unrealistic expectations when left expected to occur. unarticulated (Grosz, Rohrer, & Thoemmes, 2020). We will The gold-standard for identifying causes is the potential to later return to the alternative view according to which control the outcome by experimentally manipulating these personality researchers should hope to reveal the individual conditions and/or processes. For instance, if we have learned causes of personality-relevant phenomena in the strict sense that Helen exercises because of X1, or Tom parties because of the term. of X2, we should be able to at least in principle influence the levels of X and X to change Helen’s rate of exercising and Why many causes may be inherently elusive1 2 Tom’s rate of partying. This involves counterfactual In one part, causes may often remain elusive because the arguments: if X and Y occurred and we assert that X was a phenomena that personality scientists seek to explain and/or cause of Y, then we have to be able to show that, without X, Y their plausible explanatory variables are, by definition and would not have happened in the way or to the degree that it intentionally, abstract hypothetical constructs that cut across did, all else being equal. Formalized models of hypothesized different circumstances within and across individuals processes that enable controlling them at least conceptually (Funder, 1991), with quantitative levels that are inherently (e.g., Directed Acyclic Graphs, DAGs, with do-operators; relative. Pearl, 2018) can be particularly useful for probing such specific causal relations. Think of individual differences in neuroticism, self-esteem,agency, trustfulness, or procrastination as quintessential To be clear, causes do not have to be deterministic; for examples of the kinds of personality constructs many example, smoking causes lung cancer, but not every smoker researchers work with. To be personality constructs rather gets it. But the probabilistic link between the cause and effect than just specific instances of behaviour, thoughts, feelings, has to be consistent and strong enough such that changing and desires, they represent individual differences in the former makes a non-trivial difference for the latter. reactions that integrate across many kinds of situations and Indeed, the risk of smokers developing lung cancer is about over time, and are therefore taken out of their specific 20 times higher than that of non-smokers (Surgeon General's circumstances. Unless one commits to the view that they Reports, 2004), so starting to smoke makes a material represent singular traits (like height) that exist independently difference for the probability of developing lung cancer. In of how and where they are expressed and measured contrast, very small individual causal effects have (arguably, most personality researchers do not; e.g., commensurately small explanatory power. Baumert et al., 2017), this inevitably makes them Defined as such, identifying causes may be a useful target for decontextualized aggregates that correspond to different approaches that see the primary role of personality science as things in different people and circumstances. Also, identifying potentially controllable processes that underlie individuals’ “raw” scores on them can only be interpreted in within-individual variance and perhaps subsequently also comparison to those of others, because there are few if any individual differences in these processes (e.g., Quirin et al., concrete “anchors” (e.g., specific behaviours) that invariably 2020). For example, a therapist may be able to identify correspond to specific trait levels and ground these in causes of a patient’s problematic behaviours and perhaps individuals.10 According to our definition, however, causes even help the patient to control them to facilitate desired need to represent concrete “things” (e.g., thoughts, feelings, personality change (Hopwood, 2018; Magidson, Roberts, behaviours, desires, skills, experiences, brain structures, Collado-Rodriguez, & Lejuez, 2014). Likewise, functionalist and process approaches may attempt to explain how 10 There have been attempts to create personality rating scales that particular beliefs and skills interact to produce certain provide raters with concrete behavioural anchors rather than the behaviours or self-perceptions, which can similarly provide typical disagreement-agreement dimension such as Likert scale(e.g., Muck, Hell, & Höft, 2008). These may be useful for ‘levers’ for influencing behaviours or trait change (Wood, assessing the manifestation of personality traits in specific Spain, Monroe, & Harms, in press; Metcalfe & Mischel, circumstances in a non-relativistic way, but the measures tend to 1999). be too context-specific to be of general use and to allow for comparing individuals from different circumstances. Description, prediction and explanation 17 even genes) that do correspond to specific circumstances and relative way to serve as causes per our definition? Some apply to particular individuals, irrespective of other already do, and many more may think that they should in individuals. order to make progress. For example, we made the case for a Of course, although many personality constructs are, by greater use of personality nuances in other sections of this nature, decontextualized and relativistic aggregates, their article; these are at least somewhat more concrete than broad constituents such as behaviours measured with questionnaire trait domains. Likewise, we echo those arguing for the items could be concrete enough to also represent situation- importance of moving beyond subjective trait-ratings to specific reactions of particular individuals. If so, we could objectively measured behaviour (see also Back, 2020). But, work backwards from construct levels to what they again, for many researchers the core of personality science is correspond to in individuals. This may sometimes be the just something else by definition – broader and case, especially for narrower constructs that aggregate few decontextualized patterns of individual differences (e.g., constituents; this alone is a good reason to consider lower McCrae & Sutin, 2018) – so asking them to study only levels of the trait hierarchy. However, not many personality highly specific and contextualized variables instead amounts constructs can boast a well-defined set of concrete to asking them to redefine their field of study. The constituents: The Act Frequency Approach (e.g., Buss & decontextualized nature of personality traits, for example, is Craik, 1983) was one prominent attempt to delineate them, often seen as their particular strength (Funder, 1991; but has been largely abandoned for decades. Even for narrow McAdams, 1994) and something that makes personality constructs such as the tendency for aggressive behaviour, science unique among other fields such as social, researchers often ask those who provide information on it to developmental, cognitive, or clinical psychology. This is make abstract inferences (“I tend to get into fights”, “S/he hard to argue with. often hits others”) rather than count the frequencies of the But equally importantly, the specificity required of variables specific behaviours involved – because these are too context- that could have causal impacts on personality phenomena bound to be meaningfully comparable across people. such as patterns in naturally-occurring individual differences But even if researchers did have, or will manage to reach, a may often mean that they are too numerous to be consensus on what are the concrete constituents of specific individually useful as explanations (Yarkoni, 2020). traits and how to measure them in a non-relative way, they Besides, many causes can have multiple effects, which will face another challenge: there are often so many different further complicates disentangling them. For an extreme configurations of these constituents through which any given example, even if individual DNA base pair variations aggregate value can arise that it is virtually impossible to directly cause individual differences in personality connect a specific construct score to the values of its constructs, it will take many thousands of them to account constituents in individuals. Any non-extreme level of a for even a small fraction of the variance, because their construct with even just a handful of facets or nuances can individual effects are miniscule (e.g., Lo et al., 2017; Nagel correspond to hundreds of unique facet/nuance et al., 2018). Most of the individual effects are not even configurations, with even the most common of them statistically significant in any given sample. This is now so remaining rare. Intuitively, we may expect that if a person well established that it is called the Fourth Law of has a medium score on a construct they must also have a Behaviour Genetics (Chabris et al., 2015). Likewise, the medium level on most of its constituents; in fact, generally very same genetic variants pervasively matter for variations this is not the case.11 This is a mathematical and empirical in a whole range of behavioural, social and somatic traits, fact that may be greatly underappreciated among researchers. known as pleiotropy (e.g., Turkheimer, Pettersson, Horn, Given this, why do personality scientists not work with 2014; Mõttus, Realo et al., 2017; Nagel et al., 2018). variables (e.g., individual genes, brain variables, life In many cases, the number of potentially relevant causes experiences, personality nuances, behaviours, or feelings) may be smaller than the very high number of somehow- that are sufficiently concrete and measurable in a non- personality-related genetic variants. But the typical effect sizes in psychology and the pervasive tendency for all things 11 For illustration, we simulated an unrealistically simple construct (N = to correlate (a manifestation of the psychological 10,000,000) that was defined by only five independent constituents, each “pleiotropy” that is sometimes called the crud factor; Orben having only three levels (-1, 0, 1 with 25%, 50%, and 25% probabilities), and a small amount of uniformly distributed “error” (ranging from -1 to 1 & Lakens, 2020) make it unlikely for many personality and accounting for about 12% of variance in construct scores). We then phenomena to have distinct causes that are sufficiently extracted about 20,000 scores of this construct with nearly identical values strong to explain both behaviour and psychological (0 +/- .005) and found that these corresponded to hundreds of processes in particular individuals and a non-trivial amount configurations of their five constituents. By far the most obvious configuration of the five constituents (all 0s) corresponded to only 7% of of normal variability between people. Among other things, the scores and each of the second most prevalent combinations (three 0s, this is consistent with the lack of robust evidence for the one -1, and one 1) corresponded to less than 2% each. In the real world, of effects of specific life experiences on personality constructs, course, few personality-related constructs are almost completely defined by only a handful of well-defined constituents, so our ability to deduce even in the most powerful studies to date (e.g., Asselmann from a construct score to what this may represent in particular individuals & Specht, 202012; Chopik et al., 2020; Denissen et al., is much smaller still. If the constituents are not completely independent (e.g., as semantically non-redundant items of a scale), some configurations 12 One may want to adjust the associations reported for personality change in become relatively more likely, but this does not change the conclusion. See this study for multiple testing. Depending on method of adjustment, this also Østergaard, Jensen, and Bech (2011). may result in only one significant association between life events and trait Description, prediction and explanation 18 2019). Bleidorn and colleagues (2020) recently called for far better estimate the extent to which this applies and whether more detailed examinations of the effects of life experiences this generalizes across types of associations (e.g., links on changes in personality constructs than are available to between psychological or behavioural phenomena as well as date (“Longitudinal Experience-Wide Association Studies” their links with physiological, anatomical, and genetic or LEWAS, p. 285). If Genome-Wide Association Studies variables) or levels of the trait hierarchy (Wright & are anything to go by, then linking numerous life experience Zimmermann, 2019). “variants” with changes in personality constructs in large Given all this, it may seem sensible to keep explanations samples will indeed account for a fraction of variance in that could apply to what particular individuals do in their them, although the findings should always be cross-validated particular circumstances and that could potentially be in independent samples to avoid overfitting. This would be manipulated separate from explanations of population-level an impressive and important empirical feat, but whether this variability in situation-general patterns such as traits in the could help us towards potentially controllable and personality hierarchy. These may end up being very theoretically meaningful causes of why particular individuals different kinds of explanations. do what they do, or why they differ in this, is another question. Explanations short of specific and potentially modifiable An equally fundamental reason that identifying specific causes cause-effect associations is often impractical is that it requires unrealistic assumptions, in particular that causality Where identifying specific causes is not feasible or runs in only one direction (Pearl, 2018). Naturally occurring reasonable, internally coherent and consistent-with- personality variability represents how free-ranging available-observations narratives of how normal variation in individuals spontaneously differ when left to their own clearly defined phenomena comes about may serve as the devices in largely self-created environments. In fact, the very most useful explanations. A useful explanation may state its essence of personality is the means by which people choose, scope (what kinds of variance patterns are being explained) adapt to, and modify their real-world situations and and premises (what is assumed and not further explained), experiences to suit them (Buss, 1987). As a result, what may and specify its observed and unobserved components and be considered causes of personality characteristics often do general principles of relations among them (how they are not happen to people randomly, but are influenced by organized or tend to inter-relate over time, and in which something coming from within them – their personalities, circumstances they are likely to occur or not occur). 13 For potentially including the variables to be explained and other example, abstract narratives about developmental principles variables linked with these. For example, people’s of individual differences (e.g., Caspi & Moffitt, 1993; experiences, not just observable traits, are correlated with Roberts & Nickel, 2017) may be good candidates to become genetic variance among them (Scarr, 1983). Where this useful explanations, despite – and maybe exactly because of applies, there are no clear cause and effect associations and – not attempting to outline the specific causes of the patterns formal models of causality (e.g., DAGs) and counterfactuals that they try to explain. Articulating only a few causes fail: flipping an explanans to its counterfactual state would explain just about nothing, whereas attempting to list automatically means flipping its explanandum as one of its a sufficient number of them, even if feasible at some point, causes, suggesting that we cannot eliminate “back-doors” to could make explanations unintelligible. explanandum (Pearl, 2018). This is also a reason that It is particularly useful if such explanations can be experimental manipulations and other interventions, even if formalized as computational models (Quirin et al., 2020). feasible practically and ethically, could sometimes Although these cannot provide empirical proof and are misrepresent causality in personality science and beyond. In unlikely to reveal causes in the strict sense of the term, they real life, people often choose the “manipulations” and allow playing through complex hypotheses that involve “interventions” that suit them and do all they can to avoid large numbers of hypothetical variables with potentially others, in part based on their personalities. many-to-many and bidirectional relationships that can Finally, in some and maybe even many cases, links between unfold over many iterations. Setting up a computational phenomena and their plausible causes exist in such narrow model that runs and produces results that are even broadly circumstances as to be unique to individuals or only small consistent with observations of relevant real-world subsets of them (e.g., Beck & Jackson, 2020), which further phenomena often takes a lot of rigorous thinking and is all complicates connecting them with population-level variance too likely to identify gaps in verbal-only explanations in individual differences constructs (cf. Beltz et al., 2016; (Mõttus, Allerhand, & Johnson, 2020). Examples of the use Dotterer et al., in press; Lazarus et al., 2020; Woods et al., of computational models in personality science include 2020; Wright et al., 2019). The more idiosyncratic the Revelle and Condon's (2015) dynamics of action model, associations are, the less practical and even plausible it is to identify the specific causes of individual differences, at least 13 Besides the ‘how’ part, there may also be a ‘why’ part of an explanation,referring to the function (outcome) of the phenomenon in relation to a as long as these are defined as dimensions along which broader phenomenon (e.g., the function of anger may be to restore equity individuals vary. At present, far more research is needed to in social transactions; Lukazsewski et al., 2020); assuming that every explanation involves a function may be problematic, however (e.g., some phenomena are no longer functional or may even appear dysfunctional, change (β = .08 for decrease in emotional stability after divorce). but still require an explanation). Description, prediction and explanation 19 Read and colleagues' (2010) neural network model, tractable unidirectional causes (Yarkoni, 2020). The best Smaldino and colleagues’ model of niche diversity explanations for these phenomena may often hinge on the (Smaldino et al., 2019), or Mõttus and colleagues' (2020) most coherent available narratives that combine many pieces model of person-environment transactions and the of, and patterns in, descriptive findings rather than rely on corresponsive principle. specific and definitive experiments or statistical models. For But are explanations, defined this way, really more than example, whether a particular regression coefficient does or descriptions? We argue that they are if they help to interpret, does not represent a causal effect in a strict sense may often organize, and integrate descriptive observations. That is, if be a moot question and (suppressing) arguments over this they fill in knowledge gaps, help researchers to envisage yet- may simply reflect naivety. Regardless of this, regression to-be made observations, and suggest possible directions for coefficients alongside other findings of descriptive research more detailed explanations. However, we realize that the line can be a useful basis for narrative explanations. between the explanations, defined this way, and descriptive findings is probably far less clear than many would prefer. Alternative view: Identifying tractable causes may be a Indeed, what may seem as identifying causal explanations tractable problem after all may often, at a closer look, amount to more detailed and On the other hand, many researchers – including several better organized descriptions (Yarkoni, 2020). If so, well- authors of this article – disagree with the view that attempts documented and detailed basic descriptive findings are and to explain personality may often be best off not targeting its likely will be central parts of many personality scientific specific causes. Instead, they believe that researchers will explanations. Descriptive findings are then not just eventually identify the specific and potentially even uninspired examples of personality research to be replaced controllable causes of key personality phenomena, including with “proper” causal explanations; they are the ingredients naturally-occuring individual differences in them and that useful explanations organize into coherent narratives. broader patterns in these. This will require better methods, For example, theories that seek to explain personality measures, and models. But even more importantly, this variations through social interactions may benefit from a likely entails (a) defocusing from the broad and situation- large-scale project (say N = 10,000) that documents, in both general patterns of variation as the starting points of lab and naturalistic settings, hundreds of objectively explanations in favour of specific and contextualized within- measured behaviours, social interaction processes and their individual processes and (b) tolerating the complex and subjective perceptions, besides including detailed trait ratings potentially phenomenon- and person-specific (idiographic) of the participants (see also Back, in press). Using such data, explanations that result from this shift. In what follows, we researchers could look for patterns in behaviour, perception discuss what may be particularly important to facilitate and relationship dynamics and link these to measurements of moving towards causes-based explanations in personality individual differences, possibly being able to account for a science. non-trivial fraction of variation in personality nuances, facets and domains. Almost certainly, however, a large number of Some recommendations for explanatory research that seeks such patterns would uniquely contribute to accounting for to identify causes trait variance. These findings, such as those from LEWAS (Bleidorn et al., 2020), would be descriptive and unlikely to Identifying the right level of analysis for explanation. reveal causes of naturally occurring individual differences in Units at certain levels of analysis may be too far apart to the strict sense of the term. But they could help to identify construct meaningful causal accounts, at least without recurring regularities in behaviour and psychological intermediate steps. For instance, reductionists may argue processes and thereby develop and refine useful explanatory that all psychology can be understood by biology, all models of personality variation. biology by chemistry, and all chemistry by physics. But it is unlikely that we will ever identify a tractable explanation of Grosz, Rohrer, and Thoemmes (2020) have recently argued how a leader’s personality affects her organization’s that there is a widely-spread taboo against causal inference in longevity through particle physics. Instead, explanations non-experimental personality science in that what using units at more proximal levels to the phenomena we researchers are allowed to explicitly claim to have achieved wish to explain may be more useful and appropriate is often not what their findings and interpretations actually (Borsboom, Cramer, & Kalis, 2019; Dennett, 2013; imply – between the lines. We suspect that this is in part Hofstadter, 2007; Sperry, 1966). Social cognitive, learning, because of a failure to distinguish useful explanations from or functionalist accounts which explain personality trait causes in the sense that we defined them above and many levels as arising through the interactions of units such as other researchers do as well, at least implicitly. In many goals, expectancies, affordances, and perceptual processes, cases, researchers can hope to achieve explanations, but not may be more appropriate and necessary components of necessarily identify specific causes, because these are either causal accounts of the phenomena than explanations through intractable, unintelligible, or both. specific genes or even specific neurological structures It may help to tackle this taboo to realize and accept that (Back, in press; Baumert et al., 2017).14 many phenomena that personality scientists are focused on may, by their very nature, be distinctly unique in not having 14 Another level of analyses is personal narratives, discussed by Pasupathi and colleagues (2020). Description, prediction and explanation 20 Once armed with proximal causal explanations, however, protein shakes and spending hours lifting weights at the researchers can move on to identify the causes of these gym.15 causes, which ultimately can serve as a strategy for making It is important, however, not to confuse variation within sense of associations across different levels of analysis. For individuals with individual differences. The former may, instance, given the extremely distal relations between genes and in many cases likely does, contribute to the latter. But and psychological traits (e.g., Johnson & Edwards, 2002), the individual differences in within-individual processes that identifying genetic variants responsible for between- could contribute to other individual differences have to individual variation in dominance might be aided by first come from somewhere in the first place (Lunansky, identifying the major proximal causes of the variation, and Borkulo, & Borsboom, 2020; Quirin et al., 2020) and, as we then working backwards. Trait dominance tends to be know from well documented behaviour genetics findings elevated among individuals high in physical formidability, (e.g., Briley & Tucker-Drob, 2014), many sources of which in turn tends to be correlated with the individual’s individual differences are a) hardly random and b) often not physical height (Lukaszewski, Simmons, Anderson, & something in which individuals even vary greatly over time Roney, 2016). If so, understanding the genes affecting height (e.g., DNA structure). It may thus be that to a large extent can help to understand the genetic variants affecting the processes reflected in within-individual variance either formidability, which can help to understand some part of the amplify (e.g., corresponsive processes between traits and genes affecting dominance. The large number of specific experiences; Nickel & Roberts, 2007) or dampen/reverse genes affecting height in turn can be organized into smaller (e.g., somebody with maladaptive characteristics seeking to sets of specific genes affecting narrower biological processes change these) pre-existing individual differences, or such as those affecting bone lengths, cartilage production, translate some other traits (e.g., non-psychological hormone production, skeleton morphology, and other characteristics such as height, metabolic, endocrine or other processes (e.g., A. Wood et al., 2014). Thus, as we improve traits) into psychological traits, rather than create individual our accounts of the important proximal causes of a differences from scratch. phenomenon of interest, we can in turn identify the most important proximal causes of these variables, at each step Working with cleaner units. As noted above, there is a identifying more specific targets we can place as tendency for personality psychologists to combine diverse, intermediators to bridge the gulf across more distal levels of causally efficacious sets of variables into single aggregates. analysis. Those versed in the structural modeling literature However, excessive emphasis on broad all-purpose domains may think of this strategy as building a series of multiple such as the Big Few impedes representing the personality indicators, multiple causes (MIMIC) models. processes or dynamics underlying the phenomena (e.g.,Block, 1995; Mischel and Shoda, 1995; Cramer et al., 2012; Even if we can eventually identify a tractable number of Wood, Gardner, & Harms, 2015; van der Mass et al., 2006). major proximal causes of our phenomenon of interest, this This is a point that we consistently make throughout this strategy of iteratively identifying the proximal causes of each paper: we should be flexible about how, and whether at all, proximal cause as outlined in this example will likely result we aggregate variables. For instance, we might imagine that in hundreds or thousands of distal causes with miniscule tendencies toward [1] liking and caring about people effects. However, at each end of the long and complex causal increases a person’s likelihood of [2] doing favours for other chains linking one of these distal causes to the outcome of people, which in turn can increase a person’s likelihood of interest, we could be able to identify stronger causal [3] being liked by other people. Averaging such tendencies associations. For instance, on one end of the chain linking into a single scale score complicates understanding the specific genes to height or dominance, the NOX4 gene’s nature of the causal relationships that the conceptually association with height is likely mediated through stronger distinguishable attributes have with one another (van der effects on the number of osteoclasts cells produced, which Maas et al., 2006; Wood, Gardner, & Harms, 2015; aid in bone repair and maintenance (Marouli et al., 2017). On Epskamp, Waldorp, Mõttus, & Borsboom, 2018).16 This can the other, physical formidability and other proximal causes also contribute to the view that even moderate (possibly) may each have moderate to large main effects (e.g., r > .30) causal relations among personality variables are hard to find on dominant behaviour (Lukaszewski et al., 2016). This when in fact they are often hiding in plain sight – within our strategy may thus help to organize the legions of variables scales (Afzali et al., 2020). A key recommendation, then, is showing small distal effects by showing how they contribute that researchers a) aim for constructs and their measures that to more proximally related variables and processes, such as prioritize conceptual distinctions between variables (e.g., the psychological mechanisms or systems that calibrate dominant and aggressive behavior (e.g., Balliet, Tybur, Van 15 If the likelihood of increasing formidability turns out to be systematically Lange, 2017; Lukaszewski et al., 2020). The successful linked to its plausible downstream causes such as dominance (lessdominant people may bother less with having physical means of identification of the most proximally related processes in appearing threatening), the situation becomes more complicated, though, turn offers the greatest potential for intentionally affecting because the cause and effect become entangled, as we discussed above. outcomes of interest. For instance, a man might try to Scenarios such as this may in fact be uniquely prevalent for personality-related phenomena. facilitate his displays of dominant behaviour by increasing 16 It will also often result in putting indicator items of the outcomes we want his formidability, perhaps by ‘bulking up’ by downing to predict with personality scales directly into the personality scale, making it difficult to rule out that the correlations may reflect uninteresting tautologies (Mõttus, 2016; Nicholls, Licht, & Pearl, 1982). Description, prediction and explanation 21 items that concern self-perceptions of behaviour vs affect or methodological and practical challenges. Descriptive motivation; Wilt & Revelle, 2015; Wood, Gardner, & research aims to delineate associations among personality- Harms, 2015) over purely empirical ones (e.g., average all relevant phenomena and their link with other variables as items with factor loadings over .40) or b) deliberately create comprehensively as possible, while also doing this in ways measures for distinct classes of personality-relevant that allow flexibly summarizing and organizing this phenomena (e.g., Jackson et al., 2010; Costantini, Saraulli, & information; predictive research aims to maximize Perugini, 2020). generalizable out-of-sample predictive power without much Extending our range of methods and models. Establishing regard to the descriptive or explanatory elegance of the causal relations between variables often requires stronger statistical models; and approaches aiming to explain evidence than cross-sectional correlations. It is ultimately personality phenomena need to be clear about their levels of important to provide evidence that manipulating X within a analysis (patterns in naturally occurring individual potential X→Y relationship would alter the level of Y. But differences vs psychological processes and behaviour of this is often difficult as many of the X’s that we examine as particular people) and set targets that are appropriate and potential causes of personality phenomena, such as specific realistic for the type of variability or processes that are being genes, or the size or connectivity of neurological areas, do explained. not lend themselves to manipulation and many Y’s also It does not seem to us that these research kinds should strive influence their X’s, entangling the causes with effects. towards homogenization between and even within them, at Meanwhile, what is almost certain to help is greater use of least not any time soon. An approach that aims to achieve all repeated measures designs, over both long (e.g., multi-wave goals may eventually not achieve any of them particularly longitudinal studies such as Denissen et al., 2019) and short well. Descriptively most useful models may not be most measurement windows (e.g., experience sampling studies predictive or provide satisfactory explanations; most such as Sosnowska et al., 2020; Danvers et al., 2020). predictive models may be too complicated to be useful for Within such studies, finding that the levels of X at one time description or explanation; and limiting descriptive research point t are associated with the levels of Y concurrently (at or predictive modeling to variables and associations that time t) or even prospectively (e.g., can predict how Y will make conceptual sense may be counterproductive. change from t to t+1; e.g., Epskamp et al., 2018) is useful That said, it would be equally wrong to suggest that they are for bolstering evidence of causal associations, also allowing in isolation from one another. For example, descriptive to separate within- and between-individual variances. Time- findings can be the basis for building predictive and series data may be combined with experimental designs, explanatory models, predictive models can help to expand such as by experimentally manipulating the X state – for the range of descriptive research, hint at the limits of example, instructing people to pursue certain goals or to act explanatory models (e.g., how much variability among extraverted – and see if the Y state tends to increase in people in a phenomenon can models hope to account for), response (e.g., Margolis & Lyubomirsky, 2019; Steiger et and explanations can suggest which further descriptive al., 2020). There remain important questions about the research is needed or what could be included in prediction extent to which experimentally manipulating psychological models. For these reasons, it is important that descriptive, states serves as an ecologically valid means of understanding predictive and explanatory approaches rely on at least partly how the states naturally covary, however, due to issues such overlapping sets of constructs wherever possible. However, as self-selection effects (i.e., reverse causality) and issues of we argue that the commonly-used Big Few alone is finding the ideal time intervals to identify causal effects suboptimal for this and we need to develop flexible models (e.g., Jacques-Hamilton et al., 2019). of personality variance that fully embrace its hierarchical We also encourage within-individual variance designs that organization and do not confuse patterns of individual focus on estimating idiographic association patterns besides differences with variance and processes within individuals. nomothetic ones (Beck & Jackson, 2020; Lazarus et al., We also need tools to assess the variance and processes that 2020; Wright, Gates et al., 2019). It is crucial that we rely on different sources and types of information, not just understand how far our typical nomothetic models of self-reports. variance can go in principle – that is, how broad are the boundary conditions of possible causal effects. The broader References the boundary conditions and less idiosyncratic personality Achaa-Amankwaa, P., Olaru, G., & Schroeders, U. processes are, the more useful nomothetic models can be in (2020). Coffee or Tea? Examining Cross-Cultural identifying the causes of personality phenomena, however Differences in Personality Nuances Across Former numerous and multi-leveled these end up being, and vice Colonies of the British Empire. versa. https://doi.org/10.31234/osf.io/dpqrx Concluding remarks Afzali, M. H., Stewart, S. H., Séguin, J. R., & Conrod, P.(2020). The Network Constellation of Personality and In this article, we discussed three main kinds of personality Substance Use: Evolution from Early to Late research – descriptive, predictive, and explanatory – and Adolescence. European Journal of Personality. argued that they involve different priorities and face different https://doi.org/10.1002/per.2245 Description, prediction and explanation 22 Allik, J., Church, A. T., Ortiz, F. A., Rossier, J., Beck, E. D., & Jackson, J. J. (2020). Idiographic Traits: Hřebíčková, M., de Fruyt, F., Realo, A., & McCrae, R. A Return to Allportian Approaches to Personality. R. (2017). Mean Profiles of the NEO Personality Current Directions in Psychological Science, 29, Inventory. Journal of Cross-Cultural Psychology, 48, 301–308. https://doi.org/10.1177/0963721420915860 402–420. https://doi.org/10.1177/0022022117692100 Beltz, A.M., Wright, A.G.C., Sprague, B., & Molenaar, Arslan, R. C. (2019). How to Automatically Document P.C.M. (2016). Bridging the nomothetic and Data With the codebook Package to Facilitate Data idiographic approaches to the analysis of clinical Reuse: Advances in Methods and Practices in data. Assessment, 23, 447-458. Psychological Science. Bem, D. J., & Funder, D. C. (1978). Predicting more of https://doi.org/10.1177/2515245919838783 the people more of the time: Assessing the Arslan, R. C., Walther, M. P., & Tata, C. S. (2020). personality of situations. Psychological Review, 85, formr: A study framework allowing for automated 485–501. https://doi.org/10.1037/0033- feedback generation and complex longitudinal 295X.85.6.485 experience-sampling studies using R. Behavior Biesanz, J. C., & West, S. G. (2004). Towards Research Methods, 52, 376–387. understanding assessments of the big five: Multitrait- https://doi.org/10.3758/s13428-019-01236-y multimethod analyses of convergent and discriminant Ashton, M. C., & Lee, K. (2020). Objections to the validity across measurement occasion and type of HEXACO Model of Personality Structure—And Why observer. Journal of Personality, 72, 845–876. Those Objections Fail. European Journal of Bleidorn, W., Hopwood, C. J., Ackerman, R. A., Witt, E. Personality, 34, 492–510. A., Kandler, C., Riemann, R., Samuel, D. B., & https://doi.org/10.1002/per.2242 Donnellan, M. B. (2020). The healthy personality Asselmann, E., & Specht, J. (2020). Taking the ups and from a basic trait perspective. Journal of Personality downs at the rollercoaster of love: Associations and Social Psychology , 118, 1207. between major life events in the domain of romantic https://doi.org/10.1037/pspp0000231 relationships and the Big Five personality traits. Bleidorn, W., Hopwood, C. J., Back, M. D., Denissen, J. Developmental Psychology. doi:10.1037/dev0001047 J. A., Hennecke, M., Jokela, M., Kandler, C., Lucas, Back, M. D. (2020). Editorial: A Brief Wish List for R. E., Luhmann, M., Orth, U., Roberts, B. W., Personality Research. European Journal of Wagner, J., Wrzus, C., and Zimmermann, J. (2020) Personality, 34, 3–7. https://doi.org/10.1002/per.2236 Longitudinal Experience‐Wide Association Studies— Back, M. D. (in press). Social interaction processes and A Framework for Studying Personality Change. personality. In J. Rauthmann (Ed.), The handbook of European Journal of Personality, 34, 285– 300. personality dynamics and processes. Elsevier. https://doi.org/10.1002/per.2247. Bäckström, M., Björklund, F., & Larsson, M. R. (2009). Bleidorn, W., Klimstra, T. A., Denissen, J. J. A., Five-factor inventories have a major general factor Rentfrow, P. J., Potter, J., & Gosling, S. D. (2013). related to social desirability which can be reduced by Personality Maturation Around the World A Cross- framing items neutrally. Journal of research in Cultural Examination of Social-Investment Theory. personality, 43, 335-344. Psychological Science, 24, 2530–2540. doi:10.1016/j.jrp.2008.12.013 https://doi.org/10.1177/0956797613498396 Balliet, D., Tybur, J. M., & Van Lange, P. A. (2017). Block, J. (1995). A contrarian view of the Five-Factor Functional interdependence theory: An evolutionary Approach to personality description. Psychological account of social situations. Personality and Social Bulletin, 117, 187215. Psychology Review, 21, 361-388. Block, J. H., Block, J., & Gjerde, P. F. (1986). The Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Personality of Children Prior to Divorce: A Psychology as the Science of Self-Reports and Finger Prospective Study. Child Development, 57, 827–840. Movements: Whatever Happened to Actual Behavior? https://doi.org/10.2307/1130360 Perspectives on Psychological Science, 2, 396–403. Block, J. H., Gjerde, P. F., & Block, J. H. (1991). https://doi.org/10.1111/j.1745-6916.2007.00051.x Personality antecedents of depressive tendencies in Baumert, A., Schmitt, M., Perugini, M., Johnson, W., 18-year-olds: A prospective study. Journal of Blum, G., Borkenau, P., Costantini, G., Denissen, J. J. Personality and Social Psychology, 60, 726–738. A., Fleeson, W., Grafton, B., Jayawickreme, E., https://doi.org/10.1037/0022-3514.60.5.726 Kurzius, E., MacLeod, C., Miller, L. C., Read, S. J., Borsboom, D., Cramer, A., & Kalis, A. (2019). Brain Roberts, B., Robinson, M. D., Wood, D., & Wrzus, C. disorders? Not really...: Why network structures (2017). Integrating Personality Structure, Personality block reductionism in psychopathology research. Process, and Personality Development. European Behavioral and Brain Sciences, 42, 1–11 Journal of Personality, 31, 503–528. https://doi.org/10.1002/per.2115 Description, prediction and explanation 23 Bouchard, T. J. (2016). Experience producing drive and Validation of Personality Trait Questionnaires. theory: Personality “writ large.” Personality and European Journal of Personality. Individual Differences, 90, 302–314. https://doi.org/10.1002/per.2265 https://doi.org/10.1016/j.paid.2015.11.007 Condon, D. M. (2018). The SAPA Personality Inventory: Breil, S. M., Geukes, K., Wilson, R. E., Nestler, S., An empirically-derived, hierarchically-organized self- Vazire, S., & Back, M. D. (2019). Zooming into Real- report personality assessment model. Life Extraversion – how Personality and Situation https://doi.org/10.31234/osf.io/sc4p9 Shape Sociability in Social Interactions. Collabra: Condon, D.M., Roney, E. and Revelle, W. (2017). A Psychology, 5, 7 SAPA Project Update: On the Structure of phrased Briley, D. A., & Tucker-Drob, E. M. (2014). Genetic and Self-Report Personality Items. Journal of Open environmental continuity in personality development: Psychology Data, 5, p.3. DOI: A meta-analysis. Psychological Bulletin, 140, 1303– http://doi.org/10.5334/jopd.32 1331. https://doi.org/10.1037/a0037091 Connelly, B. S., & Ones, D. S. (2010). An other Briley, D. A., Livengood, J., & Derringer, J. (2018). perspective on personality: Meta-analytic integration Behaviour Genetic Frameworks of Causal Reasoning of observers’ accuracy and predictive validity. for Personality Psychology. European Journal of Psychological Bulletin, 136, 1092–1122. Personality, 32, 202–220. https://doi.org/10.1037/a0021212 https://doi.org/10.1002/per.2153 Cooper, A. B., Blake, A. B., Pauletti, R. E., Cooper, P. J., Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, Sherman, R. A., & Lee, D. I. (2020). Personality A., Day, F. R., Loh, P.-R., ReproGen Consortium, Assessment Through the Situational and Behavioral Psychiatric Genomics Consortium, Genetic Features of Instagram Photos. European Journal of Consortium for Anorexia Nervosa of the Wellcome Psychological Assessment. Trust Case Control Consortium 3, Duncan, L., Perry, https://doi.org/10.1027/1015-5759/a000596 J. R. B., Patterson, N., Robinson, E. B., Daly, M. J., Costa, P. T., & McCrae, R. R. (1992). Revised NEO Price, A. L., & Neale, B. M. (2015). An atlas of Personality Inventory (NEO PI-R) and NEO Five- genetic correlations across human diseases and traits. Factor Inventory (NEO-FFI) professional manual. Nature Genetics, 47, 1236–1241. Psychological Assessment Resources. https://doi.org/10.1038/ng.3406 Costa, P. T., McCrae, R. R., & Löckenhoff, C. E. (2019). Buss, D. M. (1987). Selection, evocation, and Personality Across the Life Span. Annual Review of manipulation. Journal of Personality and Social Psychology, 70, 423–448. Psychology, 53, 1214-1221. https://doi.org/10.1146/annurev-psych-010418- Buss, D. M., & Craik, K. H. (1983). The act frequency 103244 approach to personality. Psychological Review, 90, Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., 105–126. https://doi.org/10.1037/0033-295X.90.2.105 Mõttus, R., Waldorp, L. J., & Cramer, A. O. (2015). Caspi, A., & Moffitt, T. E. (1993). When Do Individual State of the aRt personality research: A tutorial on Differences Matter? A Paradoxical Theory of network analysis of personality data in R. Journal of Personality Coherence. Psychological Inquiry, 4, 247– Research in Personality, 54, 13–29. 271. https://doi.org/10.1207/s15327965pli0404_1 https://doi.org/10.1016/j.jrp.2014.07.003 Caspi, A., & Roberts, B. W. (2001). Personality Costantini, G., Saraulli, D., & Perugini, M. (2020). Development across the Life Course: The Argument Uncovering the Motivational Core of Traits: The for Change and Continuity. Psychological Inquiry, 12, Case of Conscientiousness. European Journal of 49–66. https://doi.org/10.2307/1449487 Personality, n/a(n/a). https://doi.org/10.1002/per.2237 Chabris, C. F., Lee, J. J., Cesarini, D., Benjamin, D. J., & Cramer, A. O. J., van der Sluis, S., Noordhof, A., Laibson, D. I. (2015). The Fourth Law of Behavior Wichers, M., Geschwind, N., Aggen, S. H., Kendler, Genetics. Current Directions in Psychological K. S., & Borsboom, D. (2012). Dimensions of Science, 24, 304–312. Normal Personality as Networks in Search of https://doi.org/10.1177/0963721415580430 Equilibrium: You Can’t Like Parties if You Don’t Chopik, W. J., Oh, J., Kim, E. S., Schwaba, T., Krämer, Like People. European Journal of Personality, 26, M. D., Richter, D., & Smith, J. (2020). Changes in 414–431. https://doi.org/10.1002/per.1866 optimism and pessimism in response to life events: Cronbach, L. J., & Shavelson, R. J. (2004). My current Evidence from three large panel studies. Journal of thoughts on coefficient alpha and successor Research in Personality, 88, 103985. procedures. Educational and Psychological https://doi.org/10.1016/j.jrp.2020.103985 Measurement, 64, 391418. Christensen, A. P., Golino, H., & Silvia, P. J. (2020). A https://doi.org/10.1177/0013164404266386 Psychometric Network Perspective on the Validity Description, prediction and explanation 24 Danvers, A. F., Wundrack, R., & Mehl, M. (2020). Fisher, A. J., Medaglia, J. D., & Jeronimus, B. F. (2018). Equilibria in Personality States: A Conceptual Primer Lack of group-to-individual generalizability is a for Dynamics in Personality States. European Journal threat to human subjects research. Proceedings of the of Personality. https://doi.org/10.1002/per.2239 National Academy of Sciences, 115, E6106. Denissen, J. J. A., Luhmann, M., Chung, J. M., & https://doi.org/10.1073/pnas.1711978115 Bleidorn, W. (2019). Transactions between life events Funder, D. C. (1991). Global Traits: A Neo-Allportian and personality traits across the adult lifespan. Journal Approach to Personality. Psychological Science, 2, of Personality and Social Psychology, 116, 612–633. 31–39. https://doi.org/10.1111/j.1467- https://doi.org/10.1037/pspp0000196 9280.1991.tb00093.x Dennett, D. C. (2013). Intuition pumps and other tools for Funder, D. C., & Dobroth, K. M. (1987). Differences thinking. Oxford, England: Norton. between traits: Properties associated with interjudge DeYoung, C. G. (2006). Higher-order factors of the Big agreement. Journal of Personality and Social Five in a multi-informant sample. Journal of Psychology, 52, 409–418. Personality and Social Psychology, 91, 1138–1151. https://doi.org/10.1037/0022-3514.52.2.409 https://doi.org/10.1037/0022-3514.91.6.1138 Funder, D. C., & Sneed, C. D. (1993). Behavioral DeYoung, C. G. (2015). Cybernetic Big Five Theory. manifestations of personality: An ecological approach Journal of Research in Personality, 56, 33–58. to judgmental accuracy. Journal of Personality and https://doi.org/10.1016/j.jrp.2014.07.004 Social Psychology, 64, 479–490. DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). https://doi.org/10.1037/0022-3514.64.3.479 Between facets and domains: 10 aspects of the Big Furr, R. M. (2009). Personality psychology as a truly Five. Journal of Personality and Social Psychology, behavioural science. European Journal of Personality, 93, 880–896. https://doi.org/10.1037/0022- 23, 369–401. https://doi.org/10.1002/per.724 3514.93.5.880 Geukes, K., Breil, S. M., Hutteman, R., Nestler, S., Dotterer, H.L., Beltz, A.M., Foster, K.T., Simms, L.J., & Küfner, A.C.P., Back, M.D. (2019). Explaining the Wright, A.G.C. (in press). Personalized models of longitudinal interplay of personality and social personality disorders: Using a temporal network relationships in the laboratory and in the field: The method to understand symptomatology and daily PILS and the CONNECT study. PlosOne, 14, functioning in a clinical sample. Psychological e0210424 Medicine. https://psyarxiv.com/bnxkq/ Gniewosz, G., Ortner, T. M., & Scherndl, T. (2020). Dreves, P. A., Blackhart, G. C., & McBee, M. T. (2020). Personality in Action: Assessing Personality to Do behavioral measures of self-control assess Identify an ‘Ideal’ Conscientious Response Type with construct-level variance? Journal of Research in Two Different Behavioural Tasks. European Journal Personality, 88, 104000. of Personality. https://doi.org/10.1002/per.2296 https://doi.org/10.1016/j.jrp.2020.104000 Goldberg, L. R. (1990). An alternative “description of Eid, M., Nussbeck, F. W., Geiser, C., Cole, D. A., personality”: The Big-Five factor structure. Journal of Gollwitzer, M., & Lischetzke, T. (2008). Structural Personality and Social Psychology, 59, 1216–1229. equation modeling of multitrait-multimethod data: https://doi.org/10.1037/0022-3514.59.6.1216 Different models for different types of methods. Goldberg, L. R. (1999). A broad-bandwidth, public Psychological Methods, 13, 230-253. domain, personality inventory measuring the lower- Egloff, B., Schwerdtfeger, A., & Schmukle, S. C. (2005). level facets of several five-factor models. In I. Temporal Stability of the Implicit Association Test- Mervielde, I. J. Deary, F. De Fruyt, & F. Ostendorf , Anxiety. Journal of Personality Assessment, 84, 82– Personality Psychology in Europe (Vol. 7, pp. 7–28). 88. Tilburg University Press. Elleman, L. G., McDougald, S. K., Condon, D. M., & Goldberg, L. R., & Saucier, G. (2016). ORI Technical Revelle, W. (2020). That takes the BISCUIT: A Report. (Vol. 56 No. 1). Eugene, OR. comparative study of predictive accuracy and Gonzalez, O., MacKinnon, D. P., & Muniz, F. B. (2020). parsimony of four statistical learning techniques in Extrinsic Convergent Validity Evidence to Prevent personality data, with data missingness conditions. Jingle and Jangle Fallacies. Multivariate behavioral European Journal of Psychological Assessment. research. Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. https://doi.org/10.1080/00273171.2019.1707061 (2018). The Gaussian Graphical Model in Cross- Gosling, S. D., & Mason, W. (2015). Internet Research Sectional and Time-Series Data. Multivariate in Psychology. Annual Review of Psychology, 66, Behavioral Research, 53:4, 453-480. 877–902. https://doi.org/10.1146/annurev-psych- https://doi.org/10.1080/00273171.2018.1454823 010814-015321 Description, prediction and explanation 25 Greenwald, A. G., & Farnham, S. D. (2000). Using the Conscientiousness. Journal of Personality and Social Implicit Association Test to measure self-esteem and Psychology, 96, 446–459. self-concept. Journal of Personality and Social https://doi.org/10.1037/a0014156 Psychology, 79, https://doi.org/10.1037/0022- Jackson, J. J., Wood, D., Bogg, T., Walton, K. E., 3514.79.6.1022 Harms, P. D., & Roberts, B. W. (2010). What do Grosz, M. P., Rohrer, J. M., & Thoemmes, F. (2020). The conscientious people do? Development and validation Taboo Against Explicit Causal Inference in of the Behavioral Indicators of Conscientiousness Nonexperimental Psychology. Perspectives on (BIC). Journal of Research in Personality, 44, 501– Psychological Science. 511. https://doi.org/10.1016/j.jrp.2010.06.005 https://doi.org/10.1177/1745691620921521 Jacobucci, R., & Grimm, K. J. (2020). Machine Learning Hall, A. N., & Matz, S. C. (2020.). Targeting Item-level and Psychological Research: The Unexplored Effect Nuances Leads to Small but Robust Improvements in of Measurement: Perspectives on Psychological Personality Prediction from Digital Footprints. Science. https://doi.org/10.1177/1745691620902467 European Journal of Personality. Jang, K. L., McCrae, R. R., Angleitner, A., Riemann, R., https://doi.org/10.1002/per.2253 & Livesley, W. J. (1998). Heritability of facet-level Hang, Soto, Lee and Mõttus (under review). Social traits in a cross-cultural twin sample: Support for a expectations and abilities to meet them as possible hierarchical model of personality. Journal of mechanisms of youth personality development. Personality and Social Psychology, 74, 1556–1565. Hasson, U., Nastase, S. A., & Goldstein, A. (2020). Johnston, T. D., & Edwards, L. (2002). Genes, Direct Fit to Nature: An Evolutionary Perspective on interactions, and the development of behavior. Biological and Artificial Neural Networks. Neuron, Psychological Review, 109, 26-34. 105, 416–434. Jonas, K. G., & Markon, K. E. (2016). A descriptivist https://doi.org/10.1016/j.neuron.2019.12.002 approach to trait conceptualization and inference. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The Psychological Review, 123, 90–96. weirdest people in the world?. Behavioral and brain https://doi.org/10.1037/0022-3514.74.6.1556 sciences, 33, 61-83. Kandler, C., Zimmermann, J., & McAdams, D. P. Henry, S., & Mõttus, R. (2020). Traits and Adaptations: (2014). Core and Surface Characteristics for the A Theoretical Examination and New Empirical Description and Theory of Personality Differences Evidence. European Journal of Personality, 34, 265– and Development. European Journal of Personality, 284. https://doi.org/10.1002/per.2248 28, 231–243. https://doi.org/10.1002/per.1952 Hilbig, B. E., Moshagen, M., Zettler, I. (2016). Prediction Kirtley, O. J., Hiekkaranta, A. P., Kunkels, Y. K., Eisele, consistency: A test of the equivalence assumption G., Verhoeven, D., Van Nierop, M., & Myin- across different indicators of the same construct. Germeys, I. (2020). The Experience Sampling European Journal of Personality, 30, 637–647. Method (ESM) Item Repository. https://doi.org/10.1002/per.2085 https://doi.org/10.17605/OSF.IO/KG376 Hofstadter, D. R. (2007). I am a strange loop. Basic Koch, T., Schultze, M., Holtmann, J., Geiser, C., & Eid, books. M. (2017). A Multimethod Latent State-Trait Model Hopwood, C. J. (2018). Interpersonal Dynamics in for Structurally Different And Interchangeable Personality and Personality Disorders. European Methods. Psychometrika, 82, 17–47. Journal of Personality, 32, 499–524. Kööts-Ausmees, L., Kandler, K., McCrae, R. R., Realo, https://doi.org/10.1002/per.2155 A., Allik, J., Borkenau, P., Hřebíčková, M., & Horstmann, K. T., & Ziegler, M. (2020). Assessing Mõttus, R. (in preparation). Social Desirability and Personality States: What to Consider when Age Differences in Personality Traits: A Multi-Rater, Constructing Personality State Measures. European Multi-Sample Study Journal of Personality. Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private https://doi.org/10.1002/per.2266 traits and attributes are predictable from digital Jacques-Hamilton, R., Sun, J., & Smillie, L. (2019). Costs records of human behavior. Proceedings of the and benefits of acting extraverted: A randomized National Academy of Sciences, 110, 827–840. controlled trial. Journal of Experimental Psychology: https://doi.org/10.1073/pnas.1218772110 General, 148, 1538–1556. Larsen, K. R., & Bong, C. H. (2016). A tool for Jackson, J. J., Walton, K. E., Harms, P. D., Bogg, T., addressing construct identity in literature reviews and Wood, D., Lodi-Smith, J., Edmonds, G. W., & meta-analyses. MIS Quarterly, 40, 123. Roberts, B. W. (2009). Not all Conscientiousness Lazarus, G., Sened, H., & Rafaeli, E. (2020). Scales Change Alike: A Multimethod, Multisample Subjectifying the Personality State: Theoretical Study of Age Differences in the Facets of Underpinnings and an Empirical Example. European Description, prediction and explanation 26 Journal of Personality. Formidability in Human Social Status Allocation. https://doi.org/10.1002/per.2278 Journal of Personality and Social Psychology, 110, LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep 385-406. https://doi.org/10.1037/pspi0000042 learning. Nature, 521, 436–444. Lunansky, G., Borkulo, C. van, & Borsboom, D. (2020). https://doi.org/10.1038/nature14539 Personality, Resilience, and Psychopathology: A Lee, K., & Ashton, M. C. (2020). Sex differences in Model for the Interaction between Slow and Fast HEXACO personality characteristics across countries Network Processes in the Context of Mental Health. and ethnicities. Journal of Personality. European Journal of Personality. https://doi.org/10.1111/jopy.12551 https://doi.org/10.1002/per.2263 Lee, J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, Mac Giolla, E., & Kajonius, P. J. (2019). Sex differences O., Zacher, M., Nguyen-Viet, T. A., Bowers, P., in personality are larger in gender equal countries: Sidorenko, J., Linnér, R. K., Fontana, M. A., Kundu, Replicating and extending a surprising finding. T., Lee, C., Li, H., Li, R., Royer, R., Timshel, P. N., International Journal of Psychology, 54, 705–711. Walters, R. K., Willoughby, E. A., … Cesarini, D. https://doi.org/10.1002/ijop.12529 (2018). Gene discovery and polygenic prediction from McAdams, D. P. (1994). A Psychology of the Stranger. a genome-wide association study of educational Psychological Inquiry, 5, 145–148. attainment in 1.1 million individuals. Nature Genetics, https://doi.org/10.1207/s15327965pli0502_12 50, 1112–1121. https://doi.org/10.1038/s41588-018- MacCann, C., Duckworth, A. L., & Roberts, R. D. 0147-3 (2009). Empirical identification of the major facets of Leising, D., Vogel, D., Waller, V., & Zimmermann, J. Conscientiousness. Learning and Individual (2020). Correlations between person-descriptive items Differences, 19, 451–458. are predictable from the product of their mid-point- https://doi.org/10.1016/j.lindif.2009.03.007 centered social desirability values. European Journal Magidson, J. F., Roberts, B. W., Collado-Rodriguez, A., of Personality. & Lejuez, C. (2014). Theory-driven intervention for Lievens, F. (2017). Assessing Personality–Situation changing personality: Expectancy value theory, Interplay in Personnel Selection: Toward More behavioral activation, and conscientiousness. Integration into Personality Research. European Developmental Psychology, 50, 14421450. Journal of Personality, 31, 424–440. https://doi.org/10.1037/a0030583 https://doi.org/10.1002/per.2111 Margolis, S., & Lyubomirsky, S. (2019). Experimental Lo, M.-T., Hinds, D. A., Tung, J. Y., Franz, C., Fan, C.- manipulation of extraverted and introverted behavior C., Wang, Y., Smeland, O. B., Schork, A., Holland, and its effects on well-being. Journal of Experimental D., Kauppi, K., Sanyal, N., Escott-Price, V., Smith, D. Psychology: General. Advance online publication. J., O’Donovan, M., Stefansson, H., Bjornsdottir, G., https://doi.org/10.1037/xge0000668 Thorgeirsson, T. E., Stefansson, K., McEvoy, L. K., Marouli, E., Graff, M., Medina-Gomez, C., Lo, K. S., … Chen, C.-H. (2017). Genome-wide analyses for Wood, A. R., Kjaer, T. R., Fine, R. S., Lu, Y., personality traits identify six genomic loci and show Schurmann, C., Highland, H. M., Rüeger, S., correlations with psychiatric disorders. Nature Thorleifsson, G., Justice, A. E., Lamparter, D., Genetics, 49, 152–156. Stirrups, K. E., Turcot, V., Young, K. L., Winkler, T. https://doi.org/10.1038/ng.3736 W., Esko, T., … Lettre, G. (2017). Rare and low- Lowman, G. H., Wood, D., Armstrong, B. F., Harms, P. frequency coding variants alter human adult height. D., & Watson, D. (2018). Estimating the reliability of Nature, 542, 186–190. emotion measures over very short intervals: The https://doi.org/10.1038/nature21039 utility of within-session retest correlations. Emotion, Markon, K. E., Krueger, R. F., & Watson, D. (2005). 18, 896–901. https://doi.org/10.1037/emo0000370 Delineating the Structure of Normal and Abnormal Lucas, R. E., & Donnellan, M. B. (2009). Age differences Personality: An Integrative Hierarchical Approach. in personality: Evidence from a nationally Journal of Personality and Social Psychology, 88, representative Australian sample. Developmental 139–157. http://dx.doi.org/10.1037/0022- Psychology, 45, 1353–1363. 3514.88.1.139 https://doi.org/10.1037/a0013914 Matz, S. C., Kosinski, M., Nave, G., & Stillwell, D. J. Lukaszewski, A. W., Lewis, D. M. G., Durkee, P. K., (2017). Psychological targeting as an effective Sell, A. N., Sznycer, D., & Buss, D. M. (2020). An approach to digital mass persuasion. Proceedings of Adaptationist Framework for Personality Science. the National Academy of Sciences, 114, 12714. European Journal of Personality. https://doi.org/10.1073/pnas.1710966114 https://doi.org/10.1002/per.2292 Mazza, G. L., Smyth, H. L., Bissett, P. G., Canning, J. Lukaszewski, A. W., Simmons, Z. L., Anderson, C., & R., Eisenberg, I. W., Enkavi, A. Z., Gonzalez, O., Roney, J. R. (2016). The Role of Physical Kim, S. J., Metcalf, S. A., Muniz, F., III, W. E. P., Description, prediction and explanation 27 Scherer, E. A., Valente, M. J., Xie, H., Poldrack, R. 246–268. http://dx.doi.org/10.1037/0033- A., Marsch, L. A., & MacKinnon, D. P. (2020). 295X.102.2.246 Correlation Database of 60 Cross-Disciplinary Molenaar, P. C. M., & Campbell, C. G. (2009). The new Surveys and Cognitive Tasks Assessing Self- person-specific paradigm in psychology. Current Regulation. Journal of Personality Assessment. Directions in Psychological Science, 18, 112–117. https://doi.org/10.1080/00223891.2020.1732994 https://doi.org/10.1111/j.1467-8721.2009.01619.x McAbee, S. T., & Connelly, B. S. (2016). A multi-rater Mõttus, R. (2016). Towards more rigorous personality framework for studying personality: The trait- trait-outcome research. European Journal of reputation-identity model. Psychological Review, 123, Personality, 30, 292–303. 569-591 https://doi.org/10.1002/per.2041 McCrae, R. R. (2015). A More Nuanced View of Mõttus, R., Allerhand, M., & Johnson, W. (2020). Reliability: Specificity in the Trait Hierarchy. Computational Modeling of Person-Situation Personality and Social Psychology Review, 19, 97– Transactions: How Accumulation of Situational 112. https://doi.org/10.1177/1088868314541857 Experiences Can Shape the Distributions of Trait McCrae, R. R., De Bolle, M., Löckenhoff , C. E., & Scores. In D. C. Funder, R. A. Sherman, & J. F. Terracciano, A. (in press). Lifespan trait development: Rauthmann (Eds.), Handbook of Psychological Towards an adequate theory of personality. In J. F. Situations. (pp. xx – xx). Rauthmann (Ed.), Handbook of personality dynamics Mõttus, R., & Rozgonjuk, D. (2019). Development is in and processes. Amsterdam: Elsevier. the details: Age differences in the Big Five domains, McCrae, R. R., & Costa Jr., P. T. (1996). Towards a new facets and nuances. Journal of Personality and Social generation of personality theories: Theoretical Psychology. http://dx.doi.org/10.1037/pspp0000276 contexts for the five-factor model. In J. S. Wiggins , Mõttus, R., Allik, J., & Realo, A. (2020). Do Self- The five-factor model of personality: Theoretical Reports and Informant-Ratings Measure the Same perspectives (Vol. 51, pp. 51–87). Guilford Press. Personality Constructs? European Journal of McCrae, R. R., & John, O. P. (1992). An introduction to Psychological Assessment, 36, 289–295. the Five-Factor Model and its applications. Journal of https://doi.org/10.1027/1015-5759/a000516 Personality, 60, 175–215. Mõttus, R., Bates, T. C., Condon, D. M., Mroczek, D., & https://doi.org/10.1111/j.1467-6494.1992.tb00970.x Revelle, W. (2017). Leveraging a more nuanced view McCrae, R. R., & Mõttus, R. (2019). What Personality of personality: Narrow characteristics predict and Scales Measure: A New Psychometrics and Its explain variance in life outcomes. Implications for Theory and Assessment. Current https://doi.org/10.31234/osf.io/4q9gv Directions in Psychological Science, 28, 415–420. Mõttus, R., Kandler, C., Bleidorn, W., Riemann, R., & https://doi.org/10.1177/0963721419849559 McCrae, R. R. (2017). Personality traits below facets: McCrae, R. R., & Sutin, A. R. (2018). A Five-Factor The consensual validity, longitudinal stability, Theory Perspective on Causal Analysis. European heritability, and utility of personality nuances. Journal Journal of Personality, 32, 151–166. of Personality and Social Psychology, 112, 474. https://doi.org/10.1002/per.2134 https://doi.org/10.1037/pspp0000100 McCrae, R. R., Mõttus, R., Hřebíčková, M., Realo, A., & Mõttus, R., Realo, A., Allik, J., Esko, T., Metspalu, A., Allik, J. (2019). Source method biases as implicit & Johnson, W. (2015). Within-trait heterogeneity in personality theory at the domain and facet levels. age group differences in personality domains and Journal of Personality, 87(4), 813–826. facets: Implications for the development and https://doi.org/10.1111/jopy.12435 coherence of personality traits. PLoS ONE, 10, McCrae, R. R., Terracciano, A., & 78 Members of the e0119667. Personality Profiles of Cultures Project. (2005). https://doi.org/10.1371/journal.pone.0119667 Universal features of personality traits from the Mõttus, R., Realo, A., Vainik, U., Allik, J., & Esko, T. observer’s perspective: Data from 50 cultures. Journal (2017). Educational attainment and personality are of Personality and Social Psychology, 88, 547–561. genetically intertwined. Psychological Science, 28, https://doi.org/10.1037/0022-3514.88.3.547 1631–1639. Metcalfe, J., & Mischel, W. (1999). A hot/cool-system https://doi.org/10.1177/0956797617719083 analysis of delay of gratification: Dynamics of Mõttus, R., Sinick, J., Terracciano, A., Hrebickova, M., willpower. Psychological Review, 106, 3–19. Kandler, C., Ando, J., Mortensen, E. L., Colodro- Mischel, W., & Shoda, Y. (1995). A cognitive-affective Conde, L., & Jang, K. (2019). Personality system theory of personality: Reconceptualizing characteristics below facets: A replication and meta- situations, dispositions, dynamics, and invariance in analysis of cross-rater agreement, rank-order stability, personality structure. Psychological Review, 102, heritability and utility of personality nuances. Journal Description, prediction and explanation 28 of Personality and Social Psychology, 117, e35–e50. Rauthmann, J. (in press). A (More) Behavioral Science https://doi.org/10.1037/pspp0000202 of Personality in the Age of Multi-Modal Sensing, Muck, P. M., Hell, B., & Höft, S. (2008). Application of Big Data, Machine Learning, and Artificial the principles of Behaviorally Anchored Rating Scales Intelligence. European Journal of Personality. to assess the Big Five personality constructs at work. Read, S. J., Monroe, B. M., Brownstein, A. L., Yang, Y., In J. Deller , Research contributions to personality at Chopra, G., & Miller, L. C. (2010). A neural network work (pp. 77-97). München, Germany: Rainer Hampp model of the structure and dynamics of human Nagel, M., Watanabe, K., Stringer, S., Posthuma, D., & personality. Psychological Review, 117, 61–92. Sluis, S. (2018). Item-level analyses reveal genetic Revelle, W., & Condon, D. M. (2015). A model for heterogeneity in neuroticism. Nature personality at three levels. Journal of Research in Communications, 9, 905. Personality, 56, 70–81. https://doi.org/10.1038/s41467-018-03242-8 Revelle, W. (2020) psych: Procedures for Personality Nicholls, J. G., Licht, B. G., & Pearl, R. A. (1982). Some and Psychological Research. (Version 2.0.9). dangers of using personality questionnaires to study Northwestern University. http://CRAN.R- personality. Psychological Bulletin, 92, 572-580. project.org/package=psych https://doi.org/10.1037/0033-2909.92.3.572 Revelle, W., Condon, D. M., Wilt, J., French, J. A., Orben, A., & Lakens, D. (2020). Crud (Re)Defined. Brown, A., & Elleman, L. G. (2016). Web and phone Advances in Methods and Practices in Psychological based data collection using planned missing designs. Science, 3, 238–247. Sage handbook of online research methods (2nd ed., https://doi.org/10.1177/2515245920917961 p. 578-595). Sage Publications, Inc. Østergaard, S.D., Jensen, S.O.W. and Bech, P. , The Revelle, W., Dworak, E. M., & Condon, D. M. (2020). heterogeneity of the depressive syndrome: when Exploring the persome: The power of the item in numbers get serious. Acta Psychiatrica Scandinavica, understanding personality structure. Personality and 124: 495-496. doi:10.1111/j.1600-0447.2011.01744.x Individual Differences, 109905. Ozer, D. J., & Benet-Martínez, V. (2006). Personality and Roberts, B. W., & Nickel, L. B. (2017). A critical the prediction of consequential outcomes. Annual evaluation of the Neo-Socioanalytic Model of Review of Psychology, 57, 401–421. personality. In J. Specht , Personality Development https://doi.org/10.1146/annurev.psych.57.102904.190 Across the Lifespan (pp. 157–177). Academic Press. 127 https://doi.org/10.1016/B978-0-12-804674-6.00011-9 Pasupathi, M., Fivush, R., Greenhoot, A. F., & McLean, Roberts, B. W., Chernyshenko, O. S., Stark, S., & K. C. (2020). Intraindividual Variability in Narrative Goldberg, L. R. (2005). The Structure of Identity: Complexities, Garden Paths, and Untapped Conscientiousness: An Empirical Investigation Based Research Potential. European Journal of Personality. on Seven Major Personality Questionnaires. https://doi.org/10.1002/per.2279 Personnel Psychology, 58, 103–139. Paunonen, S. V., & Ashton, M. C. (2001). Big Five https://doi.org/10.1111/j.1744-6570.2005.00301.x factors and facets and the prediction of behavior. Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Journal of Personality and Social Psychology, 81, Goldberg, L. R. (2007). The power of personality: 524–539. psyh. https://doi.org/10.1037/0022- The comparative validity of personality traits, 3514.81.3.524 socioeconomic status, and cognitive ability for Paunonen, S. V., & Jackson, D. N. (2000). What is predicting important life outcomes. Perspectives on beyond the big five? Plenty! Journal of Personality, Psychological Science, 2, 313–345. 68, 821–835. https://doi.org/10.1111/1467- https://doi.org/10.1111/j.1745-6916.2007.00047.x 6494.00117 Rohrer, J. M. (2018). Thinking clearly about correlations Pearl, J. (2018). The Book of Why: The New Science of and causation: Graphical causal models for Cause and Effect. New York: The Basic Books. observational data. Advances in Methods and Plomin, R., & von Stumm, S. (2018). The new genetics of Practices in Psychological Science, 1, 27-42. intelligence. Nature Reviews Genetics, 19, 148–159. https://doi.org/10.1177/2515245917745629 https://doi.org/10.1038/nrg.2017.104 Rosenbusch, H., Wanders, F., & Pit, I. L. (2020). The Quirin, M., Robinson, M. D., Rauthmann, J. F., Kuhl, J., Semantic Scale Network: An online tool to detect Read, S. J., Tops, M., & DeYoung, C. G. (2020). The semantic overlap of psychological scales and prevent Dynamics of Personality Approach (DPA): 20 Tenets scale redundancies. Psychological Methods, 25, 380- for Uncovering the Causal Mechanisms of 392. http://dx.doi.org/10.1037/met0000244 Personality. European Journal of Personality. Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. https://doi.org/10.1002/per.2295 E., Al-Ghoneim, K., Almaatouq, A., Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., Datta, Description, prediction and explanation 29 D., Davidson, T., Filippova, A., Gilroy, C., Goode, B. Personality, 32, 186–201. J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., https://doi.org/10.1002/per.2147 … McLanahan, S. (2020). Measuring the Smaldino, P. E., Lukaszewski, A., von Rueden, C., & predictability of life outcomes with a scientific mass Gurven, M. (2019). Niche diversity can explain cross- collaboration. Proceedings of the National Academy cultural differences in personality structure. Nature of Sciences, 117, 8398. Human Behaviour, 3, 1276–1283. https://doi.org/10.1073/pnas.1915006117 https://doi.org/10.1038/s41562-019-0730-3 Saucier, G. (1997). Effects of variable selection on the Sosnowska, J., Kuppens, P., Fruyt, F. D., & Hofmans, J. factor structure of person descriptors. Journal of (2020). New Directions in the Conceptualization and Personality and Social Psychology, 73, 12961312. Assessment of Personality—A Dynamic Systems https://doi.org/10.1037/0022-3514.73.6.1296 Approach. European Journal of Personality. Saucier, G., & Iurino, K. (2019). High-dimensionality https://doi.org/10.1002/per.2233 personality structure in the natural language: Further Soto, C. J. (2019). How Replicable Are Links Between analyses of classic sets of English-language trait- Personality Traits and Consequential Life Outcomes? adjectives. Journal of Personality and Social The Life Outcomes of Personality Replication Psychology. https://doi.org/10.1037/pspp0000273 Project. Psychological Science, 30, 711–727. Saucier, G., Iurino, K., & Thalmayer, A. G. (2020). https://doi.org/10.1177/0956797619831612 Comparing predictive validity in a community sample: Soubelet, A., & Salthouse, T. A. (2011). Influence of High-dimensionality and traditional domain-and-facet Social Desirability on Age Differences in Self‐ structures of personality variation. European Journal Reports of Mood and Personality. Journal of of Personality. https://doi.org/10.1002/per.2235 Personality, 79, 741–762. Scarr, S., & McCartney, K. (1983). How people make https://doi.org/10.1111/j.1467-6494.2011.00700.x their own environments: A theory of genotype→ Spadaro, G., Tiddi, I., Columbus, S., Jin, S., Teije, A. t., environment effects. Child Development, 424–435. & Balliet, D. (2020). The Cooperation Databank. https://doi.org/10.2307/1129703 https://doi.org/10.31234/osf.io/rveh3 Schimmack, U. (2020). The Implicit Association Test: A Spearman, C. (1927). The abilities of man. Macmillan. Method in Search of a Construct: Perspectives on Psychological Science. Sperry, R. W. (1966). Mind, brain, and humanist values. https://doi.org/10.1177/1745691619863798 Bulletin of the Atomic Scientists, 22, 26.https://doi.org/10.1080/00963402.1966.11454956 Schmeichel, B. J., & Vohs, K. (2009). Self-affirmation and self-control: Affirming core values counteracts Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, ego depletion. Journal of Personality and Social G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Psychology, 96, 770–782. Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, https://doi.org/10.1037/a0014635 B., & Bühner, M. (2020). Predicting personality frompatterns of behavior collected with smartphones. Schmid, M. M., Gatica‐Perez, D., Frauendorfer, D., Proceedings of the National Academy of Sciences, Nguyen, L., & Choudhury, T. (2015). Social sensing 117, 17680–17687. for psychology: Automated interpersonal behavior assessment. Current Directions in Psychological Stachl, C., Pargent, F., Hilbert, S., Harari, G. M., Science, 24, 154–160. Schoedel, R., Vaid, S., Gosling, S. D., & Bühner, M.(2020). Personality Research and Assessment in the Schmitt, D. P., Allik, J., McCrae, R. R., & Benet- Era of Machine Learning. European Journal of Martinez, V. (2007). The geographic distribution of Personality. https://doi.org/10.1002/per.2257 big five personality traits—Patterns and profiles of human self-description across 56 nations. Journal of Surgeon General's Report (2004). The Health Cross-Cultural Psychology, 38, 173–212. Consequences of Smoking. Retrieved from https://doi.org/10.1177/0022022106297299 https://www.cdc.gov/tobacco/data_statistics/sgr/2004on 14th October 2020. Schmitt, D. P., Realo, A., Voracek, M., & Allik, J. (2008). Why can’t a man be more like a woman? Sex Riemann, R., & Kandler, C. (2010). Construct validation differences in big five personality traits across 55 using multitrait‐multimethod‐twin data: The case of a cultures. Journal of Personality and Social general factor of personality. European Journal of Psychology, 94, 168–182. Personality, 24, 258–277. http://dx.doi.org/10.1037/0022-3514.94.1.168 Stieger, M., Wepfer, S., Regger, D., Kowatsch, T., Seeboth, A., & Mõttus, R. (2018). Successful Roberts, B. W., & Allemand, M. (2020). Becoming Explanations Start with Accurate Descriptions: More Conscientious or More Open to Experience? Questionnaire Items as Personality Markers for More Effects of a Two‐Week Smartphone‐Based Accurate Predictions. European Journal of Intervention for Personality Change. European Description, prediction and explanation 30 Journal of Personality. mutualism. Psychological Review, 113, 842861. https://doi.org/10.1002/per.2267 https://doi.org/10.1037/0033-295X.113.4.842 Tay, L., Woo, S. E., Hickman, L., & Saef, R. M. (2020). Vazire, S. (2006). Informant reports: A cheap, fast, and Psychometric and Validity Issues in Machine easy method for personality assessment. Journal of Learning Approaches to Personality Assessment: A Research in Personality, 40, 472–481. Focus on Social Media Text Mining. European https://doi.org/10.1016/j.jrp.2005.03.003 Journal of Personality. Vazire, S. (2010). Who knows what about a person? The https://doi.org/10.1002/per.2290 self-other knowledge asymmetry (SOKA) model. Terracciano, A., Costa, P. T., & McCrae, R. R. (2006). Journal of Personality and Social Psychology, 98, Personality Plasticity After Age 30. Personality & 281–300. https://doi.org/10.1037/a0017908 Social Psychology Bulletin, 32, 999–1009. Wendt, L. P., Wright, A. G. C., Pilkonis, P. A., Woods, https://doi.org/10.1177/0146167206288599 W. C., Denissen, J. J. A., Kühnel, A., & Terracciano, A., McCrae, R. R., Brant, L. J., & Costa, P. Zimmermann, J. (2020). Indicators of Affect T., Jr. (2005). Hierarchical linear modeling analyses Dynamics: Structure, Reliability, and Personality of the NEO-PI-R scales in the Baltimore Longitudinal Correlates. European Journal of Personality. Study of Aging. Psychology and Aging, 20, 493–506. https://doi.org/10.1002/per.2277 https://doi.org/10.1037/0882-7974.20.3.493 Wessels, N. M., Zimmermann, J., Biesanz, J. C., & Thielmann, I., & Hilbig, B. E. (2019). Nomological Leising, D. (2020). Differential associations of consistency: A comprehensive test of the equivalence knowing and liking with accuracy and positivity bias of different trait indicators for the same constructs. in person perception. Journal of Personality and Journal of Personality, 87, 715–730. Social Psychology, 118, 149–171. https://doi.org/10.1111/jopy.12428 https://doi.org/10.1037/pspp0000218 Turkheimer, E., Pettersson, E., & Horn, E. E. (2014). A Wessels, N. M., Zimmermann, J., & Leising, D. (2020). phenotypic null hypothesis for the genetics of Who Knows Best What the Next Year Will Hold for personality. Annual Review of Psychology, 65, 515– You? The Validity of Direct and Personality-based 540. https://doi.org/10.1146/annurev-psych-113011- Predictions of Future Life Experiences Across 143752 Different Perceivers. European Journal of Vachon, D. D., Lynam, D. R., Widiger, T. A., Miller, J. Personality. https://doi.org/10.1002/per.2293 D., McCrae, R. R., & Costa, P. T. (2013). Basic Traits Weston, S. J., Gladstone, J. J., Graham, E. K., Mroczek, Predict the Prevalence of Personality Disorder Across D. K., & Condon, D. M. (2019). Who are the the Life Span: The Example of Psychopathy. scrooges? Personality predictors of holiday spending. Psychological Science, 24, 698–705. Social Psychological and Personality Science, 10, https://doi.org/10.1177/0956797612460249 775-782. Vainik, U., Misic, B., Zeighami, Y., Michaud, A., Wiernik, B. M., Ones, D. S., Marlin, B. M., Giordano, Mõttus, R., & Dagher, A. (2019). Obesity has limited C., Dilchert, S., Mercado, B. K., Stanek, K. C., behavioural overlap with addiction and psychiatric Birkland, A., Wang, Y., Ellis, B., Yazar, Y., Kostal, phenotypes. Nature Human J. W., Kumar, S., Hnat, T., Ertin, E., Sano, A., Behaviour. https://doi.org/10.1038/s41562-019-0752- Ganesan, D. K., Choudhoury, T., & al’Absi, M. x (2020). Using Mobile Sensors to Study Personality Vainik, U., Mõttus, R., Allik, J., Esko, T., & Realo, A. Dynamics. European Journal of Psychological (2015). Are trait-outcome associations caused by Assessment. scales or particular items? Example analysis of https://doi.org/10.1027/1015-5759/a000576 personality facets and BMI. European Journal of Wilt, J., & Revelle, W. (2015). Affect, Behaviour, Personality, 29, 622–634. Cognition and Desire in the Big Five: An Analysis of https://doi.org/10.1002/per.2009 Item Content and Structure. European Journal of Vainik, U., Dagher, A., Realo, A., Colodro‐Conde, L., Personality, 29, 478–497. Mortensen, E. L., Jang, K., Juko, A., Kandler, C., https://doi.org/10.1002/per.2002 Sørensen, T. I. A., & Mõttus, R. (2019). Personality- Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. obesity associations are driven by narrow traits: A H., Gustafsson, S., Chu, A. Y., Estrada, K., Luan, J., meta-analysis. Obesity Reviews, 20, 1121–1131. Kutalik, Z., Amin, N., Buchkovich, M. L., Croteau- https://doi.org/10.1111/obr.12856 Chonka, D. C., Day, F. R., Duan, Y., Fall, T., van Der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. Fehrmann, R., Ferreira, T., Jackson, A. U., … P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, Frayling, T. M. (2014). Defining the role of common M. E. J. (2006). A dynamical model of general variation in the genomic and biological architecture intelligence: The positive manifold of intelligence by of adult human height. Nature Genetics, 46, 1173– 1186. Scopus. https://doi.org/10.1038/ng.3097 Description, prediction and explanation 31 Wood, D., & Brumbaugh, C. C. (2009). Using revealed Journal of Research in Personality, 44, 180–198. mate preferences to evaluate market force and https://doi.org/10.1016/j.jrp.2010.01.002 differential preference explanations for mate selection. Yarkoni, T. (2020). Implicit realism impedes progress in Journal of Personality and Social Psychology, 96, psychology: Comment on Fried (2020). 1226–1244. https://doi.org/10.31234/osf.io/xj5uq Wood, D., Gardner, M. H., & Harms, P. D. (2015). How Yarkoni, T., & Westfall, J. (2017). Choosing Prediction functionalist and process approaches to behavior can Over Explanation in Psychology: Lessons From explain trait covariation. Psychological Review, 122, Machine Learning. Perspectives on Psychological 84–11. Science, 12, 1100–1122. Wood, D., Nye, C. D., & Saucier, G. (2010). https://doi.org/10.1177/1745691617693393 Identification and measurement of a more Zheng, et al. (2017). LD Hub: a centralized database and comprehensive set of person-descriptive trait markers web interface to perform LD score regression that from the English lexicon. Journal of Research in maximizes the potential of summary level GWAS Personality, 44, 258–272. data for SNP heritability and genetic correlation https://doi.org/10.1016/j.jrp.2010.02.003 analysis. Bioinformatics, 33, 272-279. Wood, D., Spain, S. M., Monroe, B. M., & Harms, P. D. Ziegler, M., Horstmann, K. T., & Ziegler, J. (2019). (in press). Using functional fields to represent Personality in situations: Going beyond the OCEAN accounts of the psychological processes that produce and introducing the Situation Five. Psychological actions. In J. F. Rauthmann , Handbook of Personality Assessment, 31, 567–580. Dynamics and Processes. San Diego, CA: Academic https://doi.org/10.1037/pas0000654 Press. Zimmermann, J., Woods, W. C., Ritter, S., Happel, M., Wood, D., & Wortman, J. (2012). Trait Means and Masuhr, O., Jaeger, U., Spitzer, C., & Wright, A. G. Desirabilities as Artifactual and Real Sources of C. (2019). Integrating structure and dynamics in Differential Stability of Personality Traits. Journal of personality assessment: First steps toward the Personality, 80, 665–701. development and validation of a personality dynamics https://doi.org/10.1111/j.1467-6494.2011.00740.x diary. Psychological Assessment, 31, 516–531. Woods, W.C., Arizmendi, C., Gates, K.M., Stepp, S.D., https://doi.org/10.1037/pas0000625 Pilkonis, P.A., & Wright, A.G.C. (2020). Personalized models of psychopathology as contextualized dynamic processes: An example from individuals with borderline personality disorder. Journal of Consulting and Clinical Psychology, 88, 240-254. https://psyarxiv.com/amdu8/ Wright, A.G.C., Gates, K.M., Arizmendi, C., Lane, S.T., Woods, W.C., & Edershile, E.A. (2019). Focusing personality assessment on the person: Modeling general, shared, and person specific processes in personality and psychopathology. Psychological Assessment, 32, 502-515. https://osf.io/nf5me/ Wright, A. G., Creswell, K. G., Flory, J. D., Muldoon, M. F., & Manuck, S. B. (2019). Neurobiological functioning and the personality-trait hierarchy: Central serotonergic responsivity and the stability metatrait. Psychological Science, 30, 1413-1423 Wright, A.G.C. & Zimmermann, J. (2019). Applied ambulatory assessment: Integrating idiographic and nomothetic principles of measurement. Psychological Assessment, 31, 1467-1480. https://psyarxiv.com/6qc5x/ Wrzus, C., & Mehl, M. (2015). Lab and/or field? Measuring personality processes and their social consequences. European Journal of Personality, 29, 250–271. Yarkoni, T. (2010). The abbreviation of personality, or how to measure 200 personality scales with 200 items.