HUMAN AND COMPUTERIZED PERSONALITY INFERENCES FROM DIGITAL FOOTPRINTS ON TWITTER by CORY K. COSTELLO A DISSERTATION Presented to the Department of Psychology and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2020 DISSERTATION APPROVAL PAGE Student: Cory K. Costello Title: Human And Computerized Personality Inferences From Digital Footprints On Twitter This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Psychology by: Sanjay Srivastava Chair Nicholas Allen Core Member Robert Chavez Core Member Ryan Light Institutional Representative and Kate Mondloch Interim Vice Provost and Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded June 2020. ii ® 2020 Cory K. Costello This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs (United States) License iii DISSERTATION ABSTRACT Cory K. Costello Doctor of Philosophy Department of Psychology June 2020 Title: Human And Computerized Personality Inferences From Digital Footprints On Twitter The increasing digitization of our social world has implications for personality, reputation, and their social consequences in online environments. The present dissertation is focused on how personality and reputation are reflected in digital footprints from the popular online social network Twitter, and the broader implications this has for the expression and perception of personality in online spaces. In three studies, I demonstrate that personality is reflected in the language people use in their tweets, the accounts they decide to follow, and how they construct their profile. I further examine moderators of accuracy including the number of users’ tweets, the number of accounts they follow, and the density of their follower networks. Finally, I examine intra- and interpersonal consequences of being perceived accurately or ideally, speaking to the social functions of self-presentation in online environments. This multi-method investigation provides insight into how personality is represented online, how it can be recovered using computers and human judges, and the consequences this has for individuals. iv CURRICULUM VITAE NAME OF AUTHOR: Cory K. Costello GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene Wake Forest University, Winston-Salem, NC New College of Florida, Sarasota, FL DEGREES AWARDED: Doctor of Philosophy, Psychology, 2020, University of Oregon Master of Arts, Psychology, 2014, Wake Forest University Bachelor of Arts, Psychology, 2012, New College of Florida AREAS OF SPECIAL INTEREST: Personality and Interpersonal Perception Reputation Data Science PROFESSIONAL EXPERIENCE: Graduate Employee, Department of Psychology, University of Oregon, Eugene, OR 2014-2020 v PUBLICATIONS: Costello, C. K. & Srivastava, S. (2020). Perceiving personality through the grapevine: A network approach to reputations. Journal of Personality and Social Psychology. Advance online publication. https://doi.org/10.1037/pspp0000362 Thalmayer, A. G., Saucier, G., Srivastava, S., Flournoy, J. C., & Costello, C. K. (2019). Ethics-Relevant Values in Adulthood: Longitudinal Findings from the Life and Time Study. Journal of Personality 87, 6,https://doi.org/10.1111/jopy.12462 Costello, C. K., Wood, D., & Tov, W. (2018). Examining East-West Personality Differences Indirectly Through Action Scenarios. Journal of Cross-Cultural Psychology, 49, 554-596. https://doi.org/10.1177/0022022118757914 Wood, D., Tov, W., & Costello, C. K. (2015). What a ____ Thing to Do! Formally Characterizing Actions by their Expected Effects, Journal of Personality and Social Psychology, 108, 953-976. http://dx.doi.org/10.1037/pspp0000030 vi ACKNOWLEDGEMENTS I’d like to start by thanking my advisor and mentor, Sanjay Srivastava. I’m grateful for your unwavering support and encouragement over the past six years, for the invaluable advice, for the many great conversations, and most of all for training me to conduct rigorous science. I’ll always look back fondly on the days sitting in your office discussing Mehl, trying to figure out a particularly tricky path model, or pouring over R output. I’d like to thank my committee members, Rob Chavez, Nick Allen, and Ryan Light, for their helpful feedback in the design and development of this dissertation. I’d like to thank the support staff in the University of Oregon’s Psychology department for all of the help over the years, especially Lori Olsen, for making life as a graduate student so much more manageable. I’d like to thank the members of the Personality and Social Dynamics Lab I had the good fortune of overlapping with – Pooya, Bradley, Cianna, Nicole, John, and Allison – for all of the help and feedback over the years. I always knew an idea had legs if it could survive a round of criticism in a lab meeting. I want to say a special thanks to Pooya who assisted with collecting the data presented here, and Cianna who assisted with data collection and blinded screening of the Study 3 data. I’d like to thank my cohort mates - Grace Binion, Melissa Latham, Rita Ludwig, Adam Kishel (honorary member), and Brett Mercier. Thanks for making this place that is at the opposite end of the country from my family feel like home. vii The experiences we shared - from Thanksgiving festivities, to grabbing beers at a local brewery, to the trips to conferences - were some of the best of my life, and I know I’ll always look back fondly on them. I’d like to thank my Mom, for always believing in me (maybe too much), supporting me, and helping me get to where I am today. There is too much to say thank you for on this page, but know that I know that none of my accomplishments would have been possible without you. I’d like to thank my Dad for always encouraging me to study hard and for teaching me the value of hard work. I’d like to thank my stepfather Peter for all of his support, and for always providing a word of encouragement when I needed it. I’d like to thank my stepmother Tracy and stepsister Taylor for the fun times over the years, especially the trip out to Oregon. I’d like to thank Keith and Lynn Oglesby for treating me like a member of the family and raising my favorite person. My extended family - but especially my grandma Nancy and late grandfather - also deserve a special thanks for their love, support, and encouragement. I’d finally like to thank my wife and best friend, Katherine Oglesby. I can’t begin to express how grateful I am for our life together, for your willingness to uproot and move across the country with me several times over, and all that you’ve done to help me achieve this goal. I could fill up all of the pages of this document and hardly scratch the surface of how grateful I am to you and for our relationship. viii Dedication For my grandfather, Jean Claude Bensimon, for all the laughs, the insightful conversations, and for teaching me to approach all things - including myself - with an open yet critical mind. ix Chapter TABLE OF CONTENTS Page I. INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Digital Footprints, Identity Cues, and Behavioral Residue . . . . . . . . . . 2 Inferring Personality from Digital Footprints with Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Inferring Personality from Language. . . . . . . . . . . . . . . . . . . 4 Inferring Personality from Network Ties . . . . . . . . . . . . . . . . 7 Language vs. Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Inferring Personality from Digital Footprints with Human Judges . . . . . 12 Target Self-Presentation and Its Impact on Accuracy . . . . . . . . . 13 Audience, Accountability, and Accuracy . . . . . . . . . . . . . . . . . 14 Consequences of Being Perceived Accurately or Ideally . . . . . . . . 16 Overview of Present Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Samples & Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 NIMH Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 NSF Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 II. STUDY 1: PREDICTING PERSONALITY FROM TWEETS . . . . . . . . . . . . 24 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Samples & Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Analytic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Aim 1a: Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . 31 Aim 1b: Does activity moderate tweet-based accuracy? . . . . . . . . 49 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 III. STUDY 2: PREDICTING PERSONALITY FROM FOLLOWED ACCOUNTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Samples & Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Analytic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Aim 2a: Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . 63 Aim 2b: Does activity moderate followed-account-based 81 accuracy? . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 IV. STUDY 3: PERCEIVING PERSONALITY IN PROFILES . . . . . . . . . . . . . . 91 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Samples & Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 x Chapter Page Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Accuracy vs. Idealization . . . . . . . . . . . . . . . . . . . . . . . . . 107 Accuracy X Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Consequences for Accuracy & Idealization on Targets’ Well-Being . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Consequences for Accuracy & Idealization on Targets’ Likability . . . 140 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 V. GENERAL DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Accuracy and its Implications for Personality Expression Online . . . . . . 152 Implications for identity claims and behavioral residue . . . . . . . . 156 The social functions of self-presentation and personality expression on twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 The Utility of Dictionaries and Implications for Selecting and Extracting Psychologically Meaningful Features from Noisy Data . . . . . . . . . 160 From Predictive Accuracy to Construct Validation . . . . . . . . . . . . . . 162 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 xi LIST OF FIGURES Figure Page 1. K-Fold CV Accuracy for Predicting Personality from Tweets (All Model Specifications). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2. K-Fold CV Accuracy for Predicting Personality from Tweets (Best Model Specifications). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3. Importance Scores from Random Forests Predicting Agreeableness with Dictionary Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4. Importance Scores from Random Forests Predicting Conscientiousness with Dictionary Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5. Importance Scores from Random Forests Predicting Honesty-Propriety with Dictionary Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6. Importance Scores from Random Forests Predicting Neuroticism with Dictionary Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 7. Importance Scores from Random Forests Predicting Extraversion with Dictionary Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8. Importance Scores from Random Forests Predicting Openness with Dictionary Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 9. Out-of-sample Accuracy (R) for Selected and Non-Selected Tweet-Based Predictive Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 10.Out-of-sample Accuracy (R) for Tweet-Based Predictions Com-pared to Facebook Status Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 11. Results from Regressing Observed Big Six from All Tweet-Based Scores Simultaneously . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 12. K-Fold CV Accuracy for Predicting Personality from Followed Accounts (All Model Specifications). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 13. K-Fold CV Accuracy for Predicting Personality from Followed Accounts (Best Model Specifications). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 xii Figure Page 14.Out-of-sample Accuracy (R) for Selected and Non-Selected Followed- Account-Based Predictive Models. . . . . . . . . . . . . . . . . . . . . . . . . . 78 15.Out-of-sample Accuracy (R) of Followed-Account-Based Predictions Compared to Facebook Likes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 16.Results from Regressing Observed Big Six from All Followed- Account-Based Scores Simultaneously . . . . . . . . . . . . . . . . . . . . . . . 81 17.Followed-Account-Based Predictive Accuracy Moderated by Activity for Agreeableness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 18. Followed-Account-Based Predictive Accuracy Moderated by Activity for HonestyPropriety. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 19.Plot of ICCtarget for Each Big Six Domain . . . . . . . . . . . . . . . . . . . . 99 20.Accuracy vs. Idealization in Perceptions Based on Twitter Profile. . . . . . . 119 21.Accuracy and Well-Being Response Surface Plots . . . . . . . . . . . . . . . . . . . 141 22.Accuracy and Well-Being Response Surface Plots . . . . . . . . . . . . . . . . . . . 141 23.Accuracy and Likability Surface Plots . . . . . . . . . . . . . . . . . . . . . . . . . 144 24. Idealization and Likability Surface Plots . . . . . . . . . . . . . . . . . . . . . . . . 149 xiii LIST OF TABLES TABLE Page 1. Participant Gender for Study 1 and Study 2 Samples . . . . . . . . . . . . . 21 2. Participant Race for NIH and NSF Samples . . . . . . . . . . . . . . . . . . . 22 3. Participant Race for NIH and NSF Samples . . . . . . . . . . . . . . . . . . . 22 4. Specifications for Selected Models for Predicting Personality from Tweets. . 43 5. Correlations Between Tweet-Based Predictions and Observed Big Six Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6. Tweet-Based Predictive Accuracy Moderated by Activity . . . . . . . . . . . 50 7. 15 Most Important Accounts Predicting Agreeableness . . . . . . . . . . . . 68 8. 15 Most Important Accounts Predicting Conscientiousness . . . . . . . . . . 69 9. 15 Most Important Accounts Predicting Honesty. . . . . . . . . . . . . . . . . 70 10. 15 Most Important Accounts Predicting Neuroticism. . . . . . . . . . . . . . . . 71 11. 15 Most Important Accounts Predicting Extraversion. . . . . . . . . . . . . . . . 72 12. 15 Most Important Accounts Predicting Openness. . . . . . . . . . . . . . . . . . 73 13. Table of Selected Followed-Account-Based Models, their Specifications, and their Training Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 14. Model Specifications with Highest R Predicting Personality from Followed Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 15. Model Specifications with Lowest RMSE Predicting Personality from Followed Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 16. Correlations Between Followed-Account-Based Predictions and Observed Big Six Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 17. Followed-Account-Based Predictive Accuracy Moderated by Activity . . . . . . 83 18. Target Race and Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 19. Perceiver Race and Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 20. Accuracy of Profile-Based Perceptions . . . . . . . . . . . . . . . . . . . . . . . . . 100 xiv TABLE Page 21. Random Effects for Accuracy Models . . . . . . . . . . . . . . . . . . . . . . . . 101 22. Random Effects for Accuracy vs. Idealization Models . . . . . . . . . . . . . . 109 23. Results from Density X Accuracy Models . . . . . . . . . . . . . . . . . . . . . 121 24. Random Effects for Density X Accuracy Models . . . . . . . . . . . . . . . . . 123 25. Surface Parameters for Accuracy & Self-Reported Well-Being RSA . . . . . . 139 26. Surface Parameters for Idealization & Self-Reported Well-Being RSA . . . . 142 27. Surface Parameters for Accuracy and Likability MLRSA . . . . . . . . . . . . 145 28. Surface Parameters for Idealization and Likability MLRSA . . . . . . . . . . 147 xv I. INTRODUCTION Our social world is becoming increasingly digitized, with much of our daily behavior and social interactions taking place in online environments. One unique aspect of behaving and interacting in online environments is that much of this behavior is recorded and stored in more or less permanent digital records. People and organizations use these digital footprints to draw inferences about the users that generated them. For example, it is commonplace to look someone up online and form (or update) an impression of them based on what turns up, a practice which has founds its way into formal processes like hiring decisions (Grasz, 2016). In addition to inferences made by people, machine learning algorithms are being used to infer psychological characteristics from digital footprints, with some research suggesting that they can outperform knowledgeable human perceivers (Youyou, Kosinski, & Stillwell, 2015). While previous work generally finds some degree of accuracy in human and computerized inferences from online behavior, there is considerable variability (Back et al., 2010; Kosinski, Stillwell, & Graepel, 2013; Park et al., 2015; Qiu, Lin, Ramsay, & Yang, 2012; Youyou et al., 2015). In my dissertation, I build on this work with a multi- method investigation into inferring personality from digital footprints available on Twitter. In three studies, I examine computerized inferences from tweets (Study 1), outgoing network ties (Study 2), and human inferences from profiles (Study 3), furthering our understanding of how personality is manifest in and recoverable from different digital footprints. 1 Digital Footprints, Identity Cues, and Behavioral Residue Human- and computer-based personality judgments differ in many ways, but each require inferring a target person’s standing on an unobservable psychological construct (e.g., how extraverted a person is) from observable cues the target produces in a given environment (e.g., a Tweet). The Brunswik (1955) Lens Model formalizes this as two underlying processes. Cue validity refers to the extent to which the construct produces valid and available cues in a particular environment and cue utilization refers to the extent to which judges use the cues correctly to render their judgment. Likewise, Funder’s (1995) Realistic Accuracy Model (RAM) holds that accurate judgments of a construct require relevant cues be made available to the judge, which the judge then detects and properly utilizes. According to both models, accurate inferences from digital footprints - whether by a human perceiver or a computer - require access to valid cues and knowledge of how cues relate to the underlying psychological characteristics being judged. Cues are often differentiated between those that incidentally vs. intentionally communicate aspects of ourselves’ to others, generally referred to as behavioral residue and identity claims respectively (Gosling, Ko, Mannarelli, & Morris, 2002). Although typically discussed as a property of cues, it may be more fruitful to consider them as two theoretical processes that link underlying psychological characteristics to observed behavior. On the one hand, behaviors have certain predictable effects on the environment, which accumulate in frequented physical or digital spaces. This 2 accumulated behavioral residue incidentally provides insight into the psychological mechanisms that could have produced it and is thus characteristically not self-presentational. On the other hand, people use signals to intentionally communicate aspects of the self to others or to reinforce their own self-views. These identity claims are overt, self-presentational, and part of the broader identity negotiation process (Hogan, 2010; Swann, 1987) in which targets and perceivers mutually determine targets’ identities. Different approaches to inferring personality from digital footprints likely differ with respect to the how much they draw on behavioral residue vs. identity claims. Analyzing network ties like followed accounts probably relies more heavily on behavioral residue, since they are not prominently displayed and are a byproduct of following accounts. For example, following the American Psychological Association on Twitter may reflect a user’s interest in psychology and even more distal characteristics (e.g., higher levels of a personality characteristic like openness), but it is unlikely that users follow this account specifically to communicate these aspects of their identity. At the other extreme, inferences based on profiles and their constituent parts (e.g., profile picture, bio, etc.) likely rely more heavily on identity claims, given that profiles are displayed prominently and function to communicate users’ identities. Indeed, features like the bio exist primarily so that users can provide information about who they are to perceivers. Tweets likely rely on an even mix of behavioral residue (e.g., typos in tweets) and identity claims (e.g., statements of one’s value). 3 Thus, differences between judgments made with tweets, network ties, and profiles might reflect different proportions of behavioral residue and identity claims. Inferring Personality from Digital Footprints with Machine Learning Algorithms Personality can be effectively inferred from digital footprints common across many OSNs, including the linguistic content of what a user posts online (e.g., Park et al., 2015) and their network ties (e.g., Facebook-like ties; Kosinski et al., 2013). Each are discussed below with a particular eye towards their points of difference. Inferring Personality from Language. Personality and other psychological constructs (e.g., depression) can be effectively inferred from the language people use online, including Facebook status updates (Park et al., 2015) and tweets (Coppersmith, Harman, & Dredze, 2014; De Choudhury et al., 2013a, 2013b; Qiu et al., 2012; DeChoudhury2016; Dodds, Harris, Kloumann, Bliss, & Danforth, 2011; Golbeck, Robles, Edmondson, & Turner, 2011; Nadeem, 2016; Reece et al., 2017; Schwartz et al., 2013; Sumner, Byers, Boochever, & Park, 2012). Accuracy varies substantially across studies, likely due to several factors including the use of different techniques for quantifying and analyzing text, differences across different platforms due to what the technological architecture affords (e.g., length of Facebook posts vs. tweets), and norms that emerge on different platforms. Within psychology, the two most common approaches to date for automated text analysis are dictionary-based and open-vocabulary approaches, which are 4 occasionally combined. Dictionary-based approaches generally work by matching the linguistic content a person produced with entries in a dictionary, which typically form one or more higher-order groups of words. For example, the Linguistic Inquiry Word Count software (LIWC; Tausczik & Pennebaker, 2010) is a commonly used dictionary-based approach which counts up the number of words associated with 691 different psychologically meaningful categories (e.g., first person singular pronouns, positive emotion words, biological processes, etc.). Other examples include sentiment analysis, where words are either counted (like LIWC) or scored for their relative positivity or negativity based on a pre-trained dictionary (Mohammad & Kiritchenko, 2015). While useful, dictionary-based approaches can miss important features of a text if those features aren’t in the a priori dictionary. This might be especially concerning in online environments like Twitter, where abbreviations, slang, and terminology unique to the platform may be important features. This could explain the relatively poor accuracy found when predicting personality from Tweets using only dictionary-based approaches (e.g., r ’s from .13 to .18 in Golbeck et al., 2011; see also De Choudhury et al., 2013a, 2013b; DeChoudhury2016; Qiu et al., 2012; Reece et al., 2017; Sumner et al., 2012). In contrast, the open-vocabulary approach is a data-driven alternative where words, phrases, and empirically-derived topics (e.g., from probabilistic topic models using Latent Dirichlet Allocation or LDA; Blei, 2012) are extracted from the text 1 The exact number of categories depends on the version; I use the 2003 version in this dissertation, which has 69 categories. 5 without an a priori dictionary involved. This does require substantially more data than using pre-defined dictionaries, but it has the advantage of discovering non-obvious or unexpectedly important features in the text that might be missed by dictionary-based approaches. This advantage has proven to be worth the increased cost of training. Park et al. (2015), for instance, used an open vocabulary approach to predict personality from Facebook status updates with considerable accuracy (r ’s from .38 to .41; see also Coppersmith et al., 2014; Nadeem, 2016), outperforming the dictionary-based work mentioned above. While a substantial innovation over dictionary-based approaches, open-vocabulary approaches have significant limitations as well. One common to many text analytic approaches is the bag-of-words assumption, which holds that the order of words is irrelevant. This assumption, while absurd on its face, was necessary to make text analysis tractable for most purposes. More advanced techniques have overcome this simplifying assumption by training vector embeddings of words using neural network architechtures, a set of techniques which have demonstrated superior performance in a variety of natural language processing tasks (Mikolov, Chen, Corrado, & Dean, 2006; Pennington, Socher, & Manning, 2014). These methods represent relations between words in semantic space with real valued vectors, based on word-word co-occurrences (e.g., “grad” and “student” often co-occurring) and word-context co-occurrences (e.g., “grad” and “undergrad” often preceding “student”). More recent approaches go even further, taking subword information into account by training n-gram character 6 embeddings and representing words as the sum of the n-grams they contain (Bojanowski, Grave, Joulin, & Mikolov, 2017). The major drawback of vector embeddings is that they require a substantial amount of data for training, which can be circumvented by using pre-trained vector embeddings. Of course, the extent to which language use online predicts personality might vary across different OSN environments. Differences could emerge due to how the architecture of the platform shapes behavior. For example, the highest predictive accuracy for predicting personality from language use online was observed with Facebook status updates (Park et al., 2015), and one reason for this could be that the stricter character limits imposed by Twitter relative to Facebook make tweets more noisy (and therefore less predictive) than status updates. Thus, it’s possible that tweets are less predictive of personality even when using more sophisticated analytic techniques. Inferring Personality from Network Ties. Ties or connections on Twitter are directed, meaning that users can initiate outgoing ties (called “following” on Twitter) and receive incoming ties (called “being followed” on Twitter) which are not necessarily reciprocal. I’ll refer to the group of users that a person follows as their followed accounts and the group of users that follow a person as their followers. While both ties are likely rich in psychological meaning, they almost certainly require different approaches. I’ll focus exclusively on followed accounts within the context of inferring personality, treating them as individual features or predictors in predictive 7 models. Although the psychological meaning of followed accounts is perhaps less immediately obvious than the psychological meaning of tweets, there are several reasons to suspect that it may be rich. One theory anticipating links between individuals’ psychology and network ties is homophily, which holds that people like and therefore seek out others who are similar to themselves. For example, relatively extraverted individuals would be anticipated to differentially follow other similarly extraverted individuals or accounts. Homophily has been consistently observed (offline) for individual differences in emotion (Anderson, Keltner, & John, 2003; Watson, Beer, & McDade-Montez, 2014; Watson et al., 2000a, 2000b, 2004), mental health status such as depression (Schaefer, Kornienko, & Fox, 2011), and recently observed for personality among Facebook friends (Youyou, Stillwell, Schwartz, & Kosinski, 2017). We might thus expect some degree of personality homophily on Twitter, where people follow accounts based in part on perceived similarity. We can’t examine this directly in the present study, but homophily would promote followed-account-based predictive accuracy. More generally, following accounts is the primary way users’ curate their feed or what they see when they log into the platform. Followed accounts thus likely reflect the kinds of information or experiences people are seeking out on Twitter, a broad expression of interest that likely reflects users’ standings on personality characteristics to some degree. For example, Openness might be expressed by following accounts 8 that post intellectually stimulating content - such as artists, scientists, and other public thinkers. Considering followed accounts as an expressions of preferences and interests highlights their similarity to Facebook likes, a digital footprint which has been previously demonstrated to predict psychological characteristics with moderate accuracy (Kosinski et al., 2013; Youyou et al., 2015). Language vs. Ties. The language in users’ tweets and the accounts they follow are both promising predictors that differ in practical and theoretical terms relevant to the present investigation. Two critical theoretical differences are worth pointing out. The first stems from the distinction between active and passive social media use (Burke, Kraut, & Marlow, 2011). Active use refers to using social media to actively provide content, which on Twitter primarily includes tweeting and replying to others’ tweets. Passive use refers to using social media to passively consume content provided by others. Active users differ with respect to tweet-frequency by definition, and so tweet-based approaches may achieve better accuracy predicting psychological characteristic of active users than passive users. The theoretical distinction between active and passive use does not make predictions about outgoing ties. However, it is possible that users that follow more accounts are more accurately captured by followed-accounts-based predictions, which would be consistent with prior work on Facebook likes (Kosinski et al., 2013; Youyou et al., 2015). I will examine the extent to which number of tweets and number of followed accounts affects accuracy in Studies 1 and 2 respectively, speaking to the extent to predictive 9 accuracy of different cues depends on how target users use the platform. The Second critical theoretical difference between tweet content and followed accounts stems from the distinction between behavioral residue and identity claims. Although inferences from tweets and followed accounts are not strictly the product of behavioral residue or identity claims, it seems likely that tweets would rely more heavily on identity claims than followed accounts. Tweets are more overt and observable; once a user posts a tweet, it will appear in their followers’ feeds, it might invite replies or interactions, and it will later be prominently displayed within the timeline feature of their own profile. Moreover, Tweets are language, and language is inherently social, intended to serve communicative and social functions (Tomasello, 2010). Tweets are thus intended to be consumed by an audience of perceivers. Followed accounts on the other hand are relatively less observable (though still viewable in a user’s profile), and aren’t generally intended to be consumed by others. Because of these differences, tweeting, relative to following accounts, may heighten public self-awareness, thereby increasing efforts to convey a particular impression via tweets (Leary & Kowalski, 1990). Digital footprints with relatively more identity claims than behavioral residue have been theorized to be better predictors of personality (Gladstone, Matz, & Lemaire, 2019), which would suggest that tweet-based predictions may be more accurate in general than followed-account-based predictions. Another distinct possibility is that identity claims and behavioral residue are 10 better or worse predictors of different personality domains based on their level of evaluativeness (i.e., the desirability of being higher or lower on the dimension; John & Robins, 1993). Among the Big Five, Openness, Agreeableness, and Conscientiousness are relatively more evaluative, whereas Extraversion and Neuroticism are relatively less evaluative (John & Robins, 1993); Honesty-Propriety, the added sixth domain in the Big Six, is probably among the most evaluative dimensions. Desires to be seen positively will be expressed across all of the Big Six, but should be more heightened for the relatively more evaluative traits. These self-presentation efforts would affect identity claims more than behavioral residue, potentially leading to lower accuracy for tweet-based predictions for more evaluative traits (e.g., Openness). However, differences in accuracy across differently evaluative personality characteristics could also arise because self-reports, the accuracy criterion in this study, are worse indicators of evaluative constructs (Vazire, 2010). Practically speaking, tweets and followed accounts have a lot in common. For example, they’re both relatively sparse and noisy predictors (Kosinski, Wang, Lakkaraju, & Leskovec, 2016; Schwartz et al., 2013). There are also practical differences between them. Followed accounts can be relatively more straightforward to analyze, with the ties either included as individual predictors (e.g., Youyou et al., 2015) or subject to a data reduction technique like Singular Value Decomposition (SVD) or Principal Components Analysis (PCA; Kosinski et al., 2013). As outlined above, methods for quantifying text differ substantially (e.g., dictionary-based, open 11 vocabulary, embeddings, etc.), and the choice of method can drastically impact the accuracy of the corresponding model. Inferring Personality from Digital Footprints with Human Judges Digital footprints also provide a rich source for human perceivers to use in judging others’ personalities, an opportunity recognized by the many hiring managers that report using social media searches in their decisions (Grasz, 2016). Indeed, human perceivers achieve considerable consensus and some degree of accuracy when judging targets’ personalities based on their Facebook profiles or collections of their tweets (Back et al., 2010; Qiu et al., 2012). Personality judgments from digital footprints are thus moderately reliable and valid. Moreover, judgments based on Facebook profiles are closer to targets’ real self (i.e., what they say they’re really like) than their ideal self (i.e., how they wish they’d be seen by others), providing further evidence that Facebook profiles provide valid cues to targets’ real (offline) personalities. This matched what Back and colleagues’ (2010) referred to as the extended real-life hypothesis, which holds that people use online social networks as an extension of their offline lives, and thus present themselves relatively accurately online. Do we expect the extended real life hypothesis to hold for Twitter? The only work to my knowledge that has examined personality judgments made by human perceivers from digital footprints on Twitter was conducted by Qiu et al. (2012), which does demonstrate accuracy for Big Five Agreeableness and Neuroticism. However, instead of providing perceivers with targets’ profiles like the study by Back 12 et al. (2010) on Facebook profiles, Qiu et al. (2012) provided perceivers with pre-processed tweets in a text file. This is a serious shortcoming. On many OSNs, including Twitter, profiles are the hub of information about a user and include more information than what is available in Tweets, including profile and background pictures, screen names, the presence of a link to a professional blog, and other psychologically rich information provided by the target user. Thus, the use of tweets is a threat to ecological validity and likely provides lower-bound estimates of accuracy. Additionally, unlike Back et al. (2010), they did not measure how users want to be seen, preventing them from examining the extended real life hypothesis. Finally, a small methodological issue common to both studies is the use of small samples of undergraduate RAs for perceiver ratings instead of randomly sampling perceivers, which potentially limits the generalizability of their findings. Study 3 will address these shortcomings, examining the extent to which Twitter profiles provide human judges insight into target users’ real or ideal selves. Target Self-Presentation and Its Impact on Accuracy. Various theories hold that individuals want to be seen positively by others, engaging in idealized self-presentation to bolster their reputations and self-esteem (Hogan, 2010; Leary, 2007; Leary & Kowalski, 1990; Paulhus & Trapnell, 2008; Swann, Pelham, & Krull, 1989). As mentioned above, personality dimensions have a more and less desirable end (John & Robins, 1993) and the desire to present an idealized image would therefore affect how people present their personality. However, self-verification 13 theory holds that people have an even stronger desire to maintain their self-images, even if those images are less positive or desirable (Swann et al., 1989; Swann & Read, 1981). Twitter profiles, like Facebook profiles, might provide insight into users’ true personalities, either because they engage more in self-verification than idealized self-presentation or because they fail at presenting their ideal self (e.g., they can’t help but make many typos despite their attempt to present as highly conscientious). At the same time, it’s possible that the public nature of Twitter heightens individuals’ public self awareness (Leary & Kowalski, 1990), leading them to present an idealized front. Of course, there may be stable individual differences in both the extent and content of self-presentation (Paulhus & Trapnell, 2008). I will examine the extent to which profiles lead human perceivers to inferences more similar to target users’ real or ideal self, and the extent to which this varies across targets. Audience, Accountability, and Accuracy. In addition to targets’ self-presentation, accuracy may vary as a function of the context users are in (Funder, 1995). In particular, I’m focusing on a users’ followers, which constitute their audience of (known) perceivers online, as a contextual factor that might affect accuracy through its impact on target behavior. Boyd (2007) and Hogan (2010) note that online interactions are unique in that they take place in front of an unimaginably large audience, unbounded by time and space. For example, when a person decides to tweet, what constitutes their audience? If the account is public, then the potential audience consists of anyone who has or 14 ever will have access to the internet (and can reach Twitter’s servers), thus far exceeding the largest spatial-temporal boundaries one might encounter in even the most public offline contexts. Both Boyd (2007) and Hogan (2010) note that this large, unbounded audience has consequences for how people manage impressions online. Hogan (2010) suggests that rather than attempt to understand the scope and boundaries of their audience and how they might negotiate their identity given that audience, people instead consider two groups of perceivers: those whom they want to present an ideal self to, and those that might take issue with it (whom Hogan calls the lowest common denominator). This approach, called the lowest common denominator approach, suggests that understanding identity negotiation online requires considering the relative composition of target users’ audience. Hogan’s (2010) lowest common denominator approach, Back and colleagues (2010) extended real life-hypothesis, and Swann’s (1987) identity negotiation all place importance on the audiences’ role in constraining self-presentation strategies. Moreover, these theories would predict differences, across individuals or OSN platforms, to the extent that the composition of the audience differs. Indeed, it’s possible that people use Twitter differently than Facebook, using it to follow news and current events rather than connect with their offline friends and family. This could result in differences in audience composition, and therefore differences in self-presentation strategy. We can examine this indirectly by comparing our findings to that of Back and colleagues. Differences in audience composition across users 15 within a site may relate to self-presentation strategy, which we can examine directly in this study. We focus presently on the structure (rather than content) of one’s audience on Twitter, focusing specifically on the density of users’ follower networks as a moderator for accuracy of human inferences. Density captures the extent of interconnectedness in a network; denser networks are thought to enhance social support and trust, in part because they can more readily rally collective action to offer support or sanction bad behavior (Kadushin, 2012). This sanctioning of bad behavior might include dishonest self-presentation, leading to users in denser networks presenting themselves more honestly. We will examine this in Study 3 by assessing the relation between density and judgeability (i.e., how accurately people are able to judge a target user). Consequences of Being Perceived Accurately or Ideally. What are the consequences for being perceived accurately or ideally? In a classic study, Swann and colleagues (1989) demonstrated that people have a desire for self-enhancement and self-verification, meaning they want to be seen positively and self-verifyingly (i.e., consistent with their self-perception), but prioritize self-verification over positivity. Do people have a desire for their profile to convey positive and self-verifying impressions? What happens when this desire is or is not satisfied? One possibility is that being perceived self-verifyingly and positively increases individuals’ overall well-being, both by satisfying their identity negotiation goals and by providing the benefits that come along with it (e.g., being treated how one wants and expects to be treated by others). 16 However, given that people are simultaneously motivated to be seen positively and self-verifyingly, the relation between well-being and how one is perceived may not be so simple. Indeed, one can easily imagine that being perceived self-verifyingly might be more or less beneficial depending on where one lands in the distribution of a personality trait. For example, being mis-perceived on Agreeableness might have different implications for people higher or lower in Agreeableness. Being perceived accurately or ideally likely has interpersonal consequences as well. One example is likability, where individuals might be perceived as more or less likable in part based on how they’re perceived. Indeed, recent evidence suggests that perceivers like targets that they perceive accurately, supporting the accuracy fosters liking hypothesis (Human, Carlson, Geukes, Nestler, & Back, 2018). However, it’s also possible that accuracy’s relation to liking depends on the target’s personality. For example, it’s possible that accurately perceiving a target is less associated with likability when the target is highly disagreeable. Response surface analysis (RSA; Barranti, Carlson, & Cote, 2017) can be used to examine these potentially complex effects of accurate or idealized perception, providing insight into how different kinds of (in)accuracy and idealization impact targets’ well-being and likability. This will be the focus on Aim 3c in Study 3. Overview of Present Studies In three Studies, I examine personality inferences from digital footprints including computerized inferences from tweets (Study 1), outgoing network ties 17 (Study 2), and human perceivers’ inferences from targets’ profiles (Study 3). All three studies draw upon two samples we’ve collected as part of an NIMH- (Grant # 1 R21 MH106879-01) and an NSF- (GRANT # 1551817) funded project. The methodological details common to the three studies are described next, followed by the specific methods and results of each study. Samples & Procedure. General data collection includes two samples, I’ll refer to as the NIMH sample and the NSF sample. Data collection for the NIMH and NSF samples were approved by the University of Oregon Institutional Review Board (Protocol # 12082014.013 for NIMH; Protocol # 10122017.011 for NSF) and were conducted in a manner consistent with the ethical treatment of human subjects. In both samples, our inclusion criteria required participants to provide an existing unlocked Twitter account, to currently reside in the US, to primarily tweet in English, and to meet minimum thresholds for being an active Twitter user. Minimally active twitter users were defined as having at least 25 tweets, 25 friends, and 25 followers. Using two-stage prescreening, we attempted to first screen participants for eligibility before they completed the main survey; participants had to affirm that they met the inclusion criteria before they proceeded with the main survey. However, since participants could erroneously state that they met the inclusion criteria, each participant was individually screened to verify that they indeed met the criteria, and to further assess whether the Twitter handle belonged to the participant whom provided it. This consisted of manually searching each Twitter account provided, 18 ensuring it met the activity thresholds, and assessing whether the account provided was obviously fake (e.g., one participant provided Lady Gaga’s account and was subsequently excluded). When it was especially difficult to verify that the accounts provided belonged to participants, we asked them to confirm that they owned the account they provided by direct messaging our lab’s Twitter account from the account they provided. For both samples, we then downloaded each eligible participant’s data from Twitter’s API, including their full friends list, their user data (i.e., the information displayed in their profile), and up to 3200 of their most recent tweets, retweets, and replies. NIMH Sample. The NIMH sample was collected from the Spring of 2016 until the Fall of 2017, recruiting participants primarily from the “r/beermoney” and “r/mturk” Reddit communities, with additional participants from the University of Oregon Human Subjects Pool (UOHSP), Amazon’s Mechanical Turk (mTurk), and Twitter advertising (using promoted tweets). In all recruitment methods, participants were able to click a link that took them to the Qualtrics survey where they provided their Twitter handles, answered some questions about their Twitter use, completed several self-report measures (described below), and finally completed basic demographics questions. At the end of the survey, participants were thanked, and compensated either with an Amazon gift card or physical check for $10 or with course credit for participants recruited through the 19 human subjects pool. This process led to a total of nnimh−initial = 756 accounts that we were able to verify met our inclusion criteria. Ineligible prescreen participants contained a mixture of participants who did not provide an existing Twitter account, participants who provided an account that they did not own (e.g., Lady Gaga’s account), participants whose Twitter account did not meet the activity thresholds, and participants that provided an eligible but locked account. Of the 756 eligible accounts, we successfully retrieved tweets for nnih tweets = 487 and followed accounts for nnih followeds = 638. Note that these different sample sizes generally arise from being unable to download participants’ tweets or followed accounts, because participants either deleted, locked, or changed their account name between the time when they were verified as eligible and when we downloaded their twitter data (a lag which sometimes extended for months). NSF Sample. The NSF sample was collected from February 2018 to March 2020. Participants were recruited from the “r/beermoney” Reddit community and consisted of an initial sample of nnsf−inital = 654 that met inclusion criteria and completed the Big Six questionnaire. Of these participants, we were able to successfully retrieve tweets for nnsf−tweets = 614 participants and followed accounts for nnsf−followeds = 639 participants. As with the NIH sample, the difference in sample sizes reflects participants who either deleted, locked, or changed the name of their account before we downloaded their twitter data. 20 Table 1 Participant Gender for Study 1 and Study 2 Samples gender nS1 nS2 Female 404 505 Male 673 746 Other 12 13 unknown/not reported 12 15 Note. Targets for Study 3 were also drawn from these samples, but their demographic information is provided in the Study 3 Methods section. The majority of participants provided data for Studies 1 and 2. Participants in both samples responded to demographic questions reflecting NIH enrollment reporting standards. Gender, race, and ethnicity for both samples are shown in Tables 1, Tables 2, Tables 3, respectively. These are broken down by participants used in tweet-based analyses (Study 1) and followed-account-based analyses (Study 2), but keep in mind that these are mostly the same participants. Study 1 participants ranged in age from 14 to 68 with an average age of 27.12. Study 2 participants ranged in age from 14 to 68 with an average age of 26.85. Measures. Both sample completed self-reports of the Big Six personality domains using a combination of two instruments. The Big Five (extraversion, agreeableness, conscientiousness, negative emotionality, and openness) were measured using the Big Five Inventory 2 (Soto & John, 2017b), which consists of 60 short statements rated on a scale from one (Disagree strongly) to five (Agree strongly) with a neutral point of three (neither agree nor disagree). We used eight items from the 21 Table 2 Participant Race for NIH and NSF Samples race nS1 nS2 American Indian / Alaskan Native 8 9 Asian 135 147 Black / African American 70 86 more than 1 race 92 107 Native Hawaiian / Pacific Islander 1 1 White 783 913 unknown / not reported 12 16 Note. Targets for Study 3 were also drawn from these samples, but their demographic information is provided in the Study 3 Methods section. The majority of participants provided data for Studies 1 and 2. Table 3 Participant Race for NIH and NSF Samples ethnicity nS1 nS2 hisanic/latino 138 154 not hispanic/latino 951 1110 unknown/not reported 12 15 Note. Targets for Study 3 were also drawn from these samples, but their demographic information is provided in the Study 3 Methods section. The majority of participants provided data for Studies 1 and 2. 22 Questionnaire Big Six family of measures to measure the sixth domain, honesty-propriety (Thalmayer & Saucier, 2014), rated on the same scale. These scales showed adequate internal consistency, with alphas ranging from a low of .64 for honesty-propriety and .92 for neuroticism. NSF participants completed additional measures relevant to Study 3 describe in its method section below. Analyses. Unless otherwise noted, all analyses were conducted in R (Version 4.0.2; R Core Team, 2019) and the R-packages broom.mixed (Version 0.2.6; Bolker & Robinson, 2020), caret (Version 6.0.86; Kuhn et al., 2019), dplyr (Version 0.8.5; Wickham et al., 2019), forcats (Version 0.5.0; Wickham, 2019a), ggplot2 (Version 3.3.1; Wickham, 2016), igraph (Version 1.2.5; Csardi & Nepusz, 2006), lattice (Version 0.20.41; Sarkar, 2008), lavaan (Version 0.6.7; Rosseel, 2012), lme4 (Version 1.1.23; Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Version 3.1.2; Kuznetsova, Brockhoff, & Christensen, 2017), Matrix (Version 1.2.18; Bates & Maechler, 2019), papaja (Version 0.1.0.9942; Aust & Barth, 2018), purrr (Version 0.3.4; Henry & Wickham, 2019), quanteda (Version 2.0.1; Benoit et al., 2018), readr (Version 1.3.1; Wickham, Hester, & Francois, 2018), rio (Version 0.5.16; Chan, Chan, Leeper, & Becker, 2018), RSA (Version 0.10.0; Schönbrodt & Humberg, 2018), shiny (Version 1.4.0.2; Chang, Cheng, Allaire, Xie, & McPherson, 2019), stringr (Version 1.4.0; Wickham, 2019b), tibble (Version 3.0.1; Müller & Wickham, 2019), tidyr (Version 1.1.0; Wickham & Henry, 2019), and tidyverse (Version 1.3.0; Wickham, 2017). 23 II. STUDY 1: PREDICTING PERSONALITY FROM TWEETS Study 1 examines computerized judgments made from the language people share online in their tweets. In the first of two aims (Aim 1a), I examine the extent to which tweets can be used to predict self-reported personality, using a cross-validated machine learning approach. This will combine unsupervised machine learning methods for data reduction and supervised machine learning techniques to predict self-reports (from these reduced data). I’ll evaluate tweet-based models in terms of their ability to predict self-reports of new users (from only their tweets), and the extent to which the models are consistent with theoretical understandings of the predicted constructs. In the second aim (Aim 1b), I’ll examine the extent to which how people use twitter affects predictive accuracy, examining both number of tweets and number of followed accounts. This study will provide insight into how personality relates to what people talk about online, how accurately we can infer personality from online language, and the extent to which this depends on the how people engage with the platform. Methods Samples & Procedure. Study 1 used all eligible participants from both the NIMH and NSF samples that completed Big Six personality measures and for whom we were able to successfully retrieve tweets. This resulted in a total sample of Ncombined−twets = 1101 (see Tables 1 to 3 for participant gender, race, and ethnicity). 24 Analytic Procedure. In aim 1a, I predicted personality from the language in users’ tweets using a procedure designed to minimize overfitting and data leakage in estimating predictive accuracy, while also providing insight into how different analytic decisions (e.g., scoring with dictionaries vs. vector embeddings) affect predictive accuracy. Data Partitioning. We first split the final sample (N = 1101) into a training and holdout (testing) set using the Caret package in R (Kuhn et al., 2019). The training and holdout samples consisted of approximately 80% (ntraining = 882) and 20% (nholdout = 219) of the data respectively. All feature selection, data reduction, model training, estimation, and selection was determined from the training data. The final model(s), trained and selected within the training data, were tested on the holdout sample to get an unbiased estimate of out-of-sample accuracy. Preparing & Pre-processing Tweets. Tweets were tokenized into individual words and short (two-word) phrases using an emoji-aware tokenizer from the quanteda package in R (Benoit et al., 2018). Then, they were scored using three techniques: dictionaries, open-vocabulary, and vector embeddings. Tweets were scored using the 2003 version of the LIWC dictionary (Tausczik & Pennebaker, 2010), and a sentiment and emotion dictionary designed for and validated with tweets (Mohammad & Kiritchenko, 2015). LIWC scores are proportions of words from each category relative to the total number of words in 25 users’ tweets. The sentiment and emotion dictionaries have continuous scores for each term in their dictionary; sentiment scores in this dictionary theoretically range from negative infinity (maximally negative sentiment) to positive infinity (maximally positive sentiment), and emotion scores range from 0 (not relevant to emotion label) to positive infinity (maximally relevant to emotion label). Each participant received a single score for sentiment and the eight specific emotions, corresponding to the average scores across all the words in their downloadable tweet history (e.g., the average anger score across every word in their downloadable tweet history). Open-vocabulary analyses included two types of features: 1) Individual words and two-word phrases and 2) topics extracted using Latent Dirichlet Allocation (LDA; Blei, 2012), a data reduction technique that extracts topics based on the extent to which words co-occur across documents (tweet-histories in this case). There were 4.8 Million words and two-word phrases in the training users’ tweets, which is far beyond what is computational feasible or efficient. After some trial and error, we limited individual words and two-word phrases to those which were used at least once by 25% of the training sample; this reduced the number of individual words and phrases to 3,060. We then scored individual words and phrases as proportions such that each represents a words’ and phrases’ frequency in users’ tweets relative to their total number of words (across all tweets). We performed LDA topic models on just single words and used a more generous threshold of 1% (i.e., words had to be used at least once by 1% of our participants to be included in the topic models) and extracted 300 26 topics. LDA topic modeling results in a continuous score for each word in the corpus and each topic extracted that corresponds to a word’s probability of belonging to a topic, analogous to a factor loading for each item in a multi-dimensional scale. Each participants’ full tweet history was scored for topics using these continuous scores, analogously to factor scoring a set of items based on their loadings. Tweets were also scored with (pre-trained) vector embeddings from two different approaches. GloVe word embeddings trained on tweets by Pennington et al. (2014) were downloaded from their website (https://nlp.stanford.edu/projects/glove/) and applied to participants’ tweets. Likewise, word vectors derived from fastText character embeddings trained by Mikolov, Grave, Bojanowski, Puhrsch, and Joulin (2017) were downloaded from their website (https://fasttext.cc/docs/en/english-vectors.html). Then, word vectors were averaged within participants, resulting in a single score per vector for each participant; though this technique is less optimal than training a weighted model, it works reasonably well for short texts and circumvents the need for large training data sets. This resulted in 500 vector scores corresponding to the 200 GloVe and 300 FastText vectors. Model training. Dictionary scores, word and phrase proportion scores, topic scores, and vector scores were included as predictors or features in predictive models using two different approaches. Each personality trait was modeled separately, and so the model trained and selected for one construct (e.g., extraversion) could differ in every respect (approach, hyperparameters, parameters) from the model trained and 27 selected for another construct (e.g., conscientiousness). All models were trained, tuned, and evaluated (within-training evaluation) using k-fold cross-validation. This splits the data into k random subsets called folds, trains the data with k-1 folds, and tests the model’s performance on the kth fold; this is repeated until each fold has been the test fold. We set k to 10, which is commonly recommended (Kosinski et al., 2016). This procedure is an efficient means for reducing overfitting during model training and selection (Yarkoni & Westfall, 2017). Linguistic Feature Selection. We trained models on different subsets of linguistic features. There were five sets of features in total, consisting of (1) dictionary-based scores, (2) all open-vocabulary feature (words, phrases, and topics), (3) topic scores, (4) vector scores from GloVe and FastText word embeddings, and (5) all of the features (dictionaries, open-vocabulary features, and vector scores). Modeling Approaches. I compared two different modeling approaches: Ridge Regression and Random Forests. Each is described in greater detail below. Mirroring Park et al. (2015)’s approach to predicting personality from Facebook status updates, I trained models predicting self-reported Big Six personality scores from linguistic features with ridge regression. Ridge regression is a penalized regression model, which minimizes the sum of squared errors and the L2 penalty, or the sum of squared coefficient values (i.e., λ ∗Σj=βj 2j=1 Bj , where λ is a scaling parameter that determines the weight of the penalty). It has the effect of shrinking coefficients 28 to be closer to zero. Ridge can provide relatively interpretable solutions when predictors are uncorrelated, but can be misleading in the face of correlated predictors. The second approach was Random Forests algorithm. Random Forests works by iteratively taking a subset of observations (or cases) and predictors, building a regression tree (i.e., a series of predictor-based decision rules to determine the value of the outcome variable) with the subset of predictors and observations, and averaging across the iterations. It is thus an ensemble method, which avoids overfitting by averaging across many models trained on different subsets of participants and features. It works well with sparse predictors (Kuhn & Johnson, 2013), making it a promising candidate for tweet-based predictions, especially using the sparser feature-sets (e.g.„ word- and phrase-proportions). Like ridge regression, interpretation can be difficult in the presence of correlated predictors, though the permutation importance metric (used here) is relatively robust to correlated predictors (Genuer, Poggi, & Tuleau-Malot, 2010). Model selection. As mentioned above, all models were trained using the training data, and each model’s training performance was indexed via root mean squared error (RMSE) and the multiple correlation (R) from 10-fold cross-validation. Although machine learning approaches tend to prioritize predictive accuracy over interpretability (Yarkoni & Westfall, 2017), we aim to maximize both to the extent possible. As such, we based our model selection on both (quantitative) model performance criteria (minimal RMSE, maximal multiple R) and (qualitative) 29 interpretability. Note that in addition to RMSE/R for the best performing model, we also considered the spread of training results (e.g., we may choose a model that did not have the best single performance, if it has less variability in performance). Model evaluation. We selected our candidate models based on the training data, completed an interim registration of our model selection (available at https://osf.io/4xbcd/?view_only=2916632373d3410bbf02f94650e50b1d), and then tested the selected models’ accuracy using the (heldout) test data. To guard against overfitting, we selected one candidate model per outcome v ariable. In addition to our candidate models, we tested the out-of-sample accuracy for the non-selected models as exploratory analyses, but we clearly distinguish selected from non-selected models (which can be verified in our r egistration). This provides an estimate of accuracy that is unbiased by selection (accuracy from selected models) as well as some insight into the extent to which our selection process resulted in the best model. Aim 1b: moderator analyses. After selecting the model and evaluating it on the holdout set, we used the tweet-based predicted personality scores for all 1102 participants in a series of OLS moderated multiple regressions. In these analyses, actual self-reported personality scores were regressed on tweet-based scores, number of tweets (followed accounts), and their interaction, with a significant interaction indicating an effect of number of tweets (followed accounts) on tweet-based predictive accuracy. Each of the Big Six personality domains were examined separately, 30 resulting in 12 total moderator analyses. Results Aim 1a: Predictive Accuracy. Below I describe our results from model training, which models we selected for the holdout dataset, and how accurate the selected and non-selected models were in the holdout dataset. Model Training & Selection. First, I examined the accuracy with which each combination of feature set and modeling approach could predict self-reported Big Six Domains, focusing on the average R and RMSE for predicting the holdout-folds in the 10-fold cross-validation procedure. Figure 1 shows the average R (Panel A) and RMSE (Panel B) for each combination of feature-set (y-axes) and modeling approach (color); each dot represents the average R and RMSE for each set of hyperparameters and the bar represents the average (of average Rs or RMSEs) across hyperparameter specifications. Big Six domains are shown in separate panels, indicated with the first letter of the domain name. Note that some specifications of Ridge with LDA topics are omitted from the RMSE plot because they were an order of magnitude greater and beyond the limits set on the x-axis. Figure 1 demonstrates that personality can be predicted from linguistic features of tweets with at least some degree of accuracy using different combinations features, modeling approaches, and hyperparameter specifications. Moreover, it is apparent in Figure 1 that Random Forests outperformed ridge with only a few exceptions. 31 Accuracy was relatively similar across feature sets, with the possible exception of LDA topics (on their own), which tended to be less accurate across domains. This is somewhat surprising given that the dictionary models used 77 predictors and the “all” models used over 3,000 predictors. Figure 2 shows these same metrics for the best hyperparameter specification per modeling approach and set of features, and paints a similar picture. Accuracy was thus considerably higher for random forests, and there was little difference between feature sets. Though feature sets had only marginal differences in accuracy, dictionaries were best for agreeableness, and using all features simultaneously was best for the other five domains. Figure 1 . K-Fold CV Accuracy for Predicting Personality from Tweets (All Model Specifications). Interpretability. Judging models strictly by accuracy, random forests achieved 32 Figure 2 . K-Fold CV Accuracy for Predicting Personality from Tweets (Best Model Specifications). greater accuracy and there was little differentiation among feature sets. Interpretability proved helpful in this case, as models did differ in terms of how apparently consistent with prior theory they were. With the exception of the notably difficult to interpret vector embeddings, models trained with different feature sets all had some degree of consistency with prior theory. For example, the two-word phrase “thanks-much” was one of the most important predictors of agreeableness in the model trained with all features, the swear words category from LIWC was a highly important predictor for conscientiousness in the models trained with dictionaries, “anxieti-” stemmed words were the most important predictor of neuroticism in models trained with open-vocab features, tagging users was one of the most important 33 predictors of extraversion in the models trained with LDA topics, and the word stem “creativ” was one of the most important predictors of openness using either open-vocab or all features in training. However, the models that generally stood out in terms of interpretability were the models trained with dictionary scores, which are described in greater detail next. Figure 3 shows the permutation importance scores (from the best-fitting random forests model) of the dictionary categories in predicting agreeableness. To ease interpretation, bars are colored based on whether they are positive (blue) or negative (red) zero-order correlates, though it is important to keep in mind that their role in the random forests prediction algorithm may be less straightforward (e.g., not linear and additive). Dictionaries include both LIWC and NRC sentiment and emotion scores; NRC sentiment and emotion scores are all prefixed with “m_”, which can help differentiate the two dictionaries. It seems that the model is picking up on theoretically relevant content, including swear words, LIWC’s negative emotion group (e.g., abandon, abuse), LIWC’s anger category (e.g., aggressive, agitate), LIWC’s positive feelings (e.g., adore, agree*), inclusive words (e.g., altogether), negations (e.g., can’t, don’t) and other theoretically relevant content. Together, this seems to capture agreeableness’s core content of interpersonal warmth vs. antagonism. Figure 4 shows the same information for conscientiousness, where important features include NRC’s anger category, swear words, time words (e.g., age, hour, day), sexual words, NRC’s sentiment score, school-related words, negative emotions, 34 pronouns, and leisure activity. These features may reflect aspects of conscientiousness like industriousness, punctuality, and impulsivity. Figure 5 shows the same information for honesty-propriety, where you can see that important categories included negative emotion, school, leisure activities, metaphysics (e.g., bless, angels), LIWC’s anger category, sexual words, positive emotions, and third-person pronouns (labeled “Other”) and second-singular pronouns (labeled “You”). Interestingly, it overlaps somewhat with agreeableness and conscientiousness, but also seems to be picking up on some core moral content with the metaphysics category. Figure 6 shows the same information for neuroticism, which shows that important categories included core affective content, including negative emotions like NRC anger, LIWC anxiety, NRC disgust, and NRC sadness, positive emotions content such as anticipation and surprise, and sentiment (which ranges from negative to positive). Important categories also included time, friends, pronouns, the up category (e.g., high, on, top), and other indirectly relevant content. Figure 7 shows the same information for extraversion, which shows that important categories included NRC anger, school, discrepancies (e.g., should, would, could), other (3rd person pronouns), optimism, exclusive words (e.g., but, without), humans (e.g., boy, woman, adult), tentativeness (e.g., anyhow, ambiguous), and causation (e.g., because, affected). This model was harder to interpret than the others, but the models did seem to pick up on an assertiveness vs. tentativeness theme. It is 35 worth noting that extraversion is one case in which the open-vocab seemed to pick up on relevant themes, with highly important words referring to more mainstream or niche cultural interests (sports-related words like team vs. draw and anime). Figure 8 shows the same information for openness, where important word-categories included occupation-related words (accomplish, advance, administration), communication words (e.g., admit, suggest, informs), school words, hearing words (e.g., listening, speaking), insight words (e.g., analyze, understand, wonder), negative emotions, optimism, music, achievement, and other relevant content. This might correspond to pursuing and expressing intellectual and aesthetic interests on twitter, behavior highly characteristic of high openness. Selected models. The choice of algorithm was a simple one here: random forests showed consistently greater accuracy in training than ridge regression and importance scores mapped onto theoretically consistent themes for each domain. Selecting a feature set was more challenging, given the similarity in accuracy achieved with different feature sets. Consequently, we used interpretability as a guiding principle in this selection process and ultimately selected the dictionary-based models. The dictionary-based models were either the most accurate (agreeableness) or a close second or third (difference in R’s ≤ .1), and were often more interpretable than the alternatives. Within this selection, RMSE and R agreed with respect to the most accurate set of hyperparameters, and so we selected these specifications as our final models. The specifications for these final, selected models are shown (alongside 36 Figure 3 . Importance Scores from Random Forests Predicting Agreeableness with Dictionary Scores 37 Figure 4 . Importance Scores from Random Forests Predicting Conscientiousness with Dictionary Scores 38 Figure 5 . Importance Scores from Random Forests Predicting Honesty-Propriety with Dictionary Scores 39 Figure 6 . Importance Scores from Random Forests Predicting Neuroticism with Dictionary Scores 40 Figure 7 . Importance Scores from Random Forests Predicting Extraversion with Dictionary Scores 41 Figure 8 . Importance Scores from Random Forests Predicting Openness with Dictio- nary Scores 42 Table 4 Specifications for Selected Models for Predicting Personality from Tweets domain Modeling approach mtry MNS splitrule R RMSE agreeableness Random Forests 2 5 variance 0.19 0.58 conscientiousness Random Forests 39 5 extratrees 0.21 0.71 honesty Random Forests 2 5 extratrees 0.22 0.57 neuroticism Random Forests 77 5 variance 0.35 0.84 extraversion Random Forests 77 5 extratrees 0.22 0.77 openness Random Forests 2 5 variance 0.23 0.61 Note. The feature set used in the selected models were the dictionary scores. mtry and MNS are hyperparameter specifications. mtry corresponds to how many predictors the algorithm samples to build each tree in the forest. MNS stands for minimum node size and corresponds to the minimum number of observation in each ’node’, meaning it won’t create a split in the data for fewer observations than MNS. corresponding accuracy estimates) in Table 4. Model Evaluation. I next evaluated the models by assessing their accuracy in predicting self-reported personality scores in the holdout data. Correlations between predicted scores derived from the trained models and observed scores for the holdout data are shown for both selected (triangles) and non-selected (circles) models in Figure 9. You can see in Figure 9 that the model selection procedure did not lead to choosing the model with the highest or nearly highest out-of-sample accuracy. Indeed, the selected model was never the highest R, though it was very close to the highest R for openness and conscientiousness. For the other four domains, it was quite a bit lower than non-selected alternatives and even among the lowest for some domains. Importantly, the accuracy estimates from non-selected models should be taken with a grain of salt; all of these estimates are subject to fluctuation and taking 43 the non-selected models accuracy at face value undermines the principal behind using a separate evaluation set, namely, estimating accuracy removed from a further (biasing) selection effect. Moreover, the differences in correlations are not large, and are similar to differences seen in training (differences of approx. .1 or less). Even still, these results may suggest that larger and less restrictive features sets (e.g., open-vocabulary) are better fit for some domains, perhaps especially when predicting true holdout data. Figure 10 shows the estimates for selected models compared to predictive accuracy predicting personality from Facebook status updates from Park and colleagues’ (2015) study, where it can be seen that tweets predict conscientiousness, neuroticism, and openness with moderate accuracy, and honesty, agreeableness, and extraversion with little accuracy. Moreover, tweet-based predictive accuracy tended to be lower than their Facebook-status-based counterparts, which could stem from their shorter length, how that constrains the text (e.g., increased use of slang), or social norms governing what people post on Facebook vs. twitter. With the exceptions of agreeableness and extraversion, Big Six personality domains were at least somewhat predictable from tweets. However, it is not clear if the models are picking up on distinctive information about each domain (e.g., how conscientiousness specifically is reflected in tweets) or some more general information relevant across domains (e.g., how general positivity is reflected in followed accounts). To speak to these competing possibilities, I first examined the inter-correlations 44 between predicted Big Six domains, which can be seen in Table 5. Correlations between domains were generally stronger among tweet-based predictions than among (observed) self-reported scores, but the pattern of correlations were generally similar. One exception was that, among predicted scores, openness was positively correlated with neuroticism and negatively correlated with conscientiousness, whereas these domains are basically uncorrelated among self-reports. These higher intercorrelations suggest that predicted scores may indeed be picking up on more general information, rather than information specific to each domain. Next, I more directly assessed the specificity of tweet-based predictions by regressing each observed domain on all of the predicted scores simultaneously. If personality domains are distinctly reflected in tweets, we should see a significant slope for the matching predicted score and non-significant (near-zero) slopes for the non-matching predicted scores. The results from these regression analyses are shown in Figure 11, where it is apparent that models picked up on distinctive information for openness and conscientiousness, but not so much for the others, which did show less accuracy to begin with (see Figure 10). Together, the results suggest that openness and conscientiousness are reliably and distinctly reflected in the language people use on Twitter, with the other four domains being generally more difficult to predict (agreeableness, extraversion) or more difficult to predict distinctly (honesty, neuroticism). 45 Figure 9 . Out-of-sample Accuracy (R) for Selected and Non-Selected Tweet-Based Predictive Models Figure 10 . Out-of-sample Accuracy (R) for Tweet-Based Predictions Compared to Facebook Status Updates 46 Table 5 Correlations Between Tweet-Based Predictions and Observed Big Six Scores Variable 1 2 3 4 5 6 7 8 9 10 11 1. Obs. A 2. Obs. C .27** [.14, .39] 3. Obs. H .38** .42** [.26, .49] [.31, .52] 4. Obs. N -.20** -.49** -.11 [-.33, -.07] [-.59, -.39] [-.24, .02] 5. Obs. E .15* .22** -.27** -.38** [.02, .28] [.09, .34] [-.39, -.14] [-.49, -.26] 6. Obs. O .31** .12 .08 -.07 .29** [.19, .43] [-.01, .25] [-.05, .21] [-.20, .06] [.17, .41] 7. Pred. A .09 .11 .12 -.03 -.06 .03 [-.05, .22] [-.02, .24] [-.01, .25] [-.16, .10] [-.19, .07] [-.11, .16] 47 Table 5 continued Variable 1 2 3 4 5 6 7 8 9 10 11 8. Pred. C -.03 .24** .04 -.25** -.01 -.13 .45** [-.16, .11] [.12, .36] [-.09, .17] [-.37, -.12] [-.14, .12] [-.25, .01] [.34, .55] 9. Pred. H .08 .12 .16* -.09 -.06 .04 .71** .50** [-.05, .21] [-.01, .25] [.03, .29] [-.22, .05] [-.19, .07] [-.09, .17] [.64, .77] [.39, .59] 10. Pred. N .07 -.17* .02 .21** -.04 .12 -.30** -.68** -.21** [-.06, .20] [-.29, -.04] [-.11, .15] [.08, .34] [-.17, .09] [-.01, .25] [-.42, -.18] [-.74, -.60] [-.33, -.08] 11. Pred. E -.02 -.03 -.05 .01 .09 .03 .06 .29** -.11 -.44** [-.15, .11] [-.16, .10] [-.18, .08] [-.12, .14] [-.04, .22] [-.11, .16] [-.07, .20] [.17, .41] [-.24, .02] [-.54, -.33] 12. Pred. O .04 -.21** -.05 .29** -.05 .21** -.05 -.33** -.14* .27** .33** [-.09, .17] [-.33, -.08] [-.19, .08] [.17, .41] [-.18, .08] [.08, .33] [-.18, .09] [-.44, -.21] [-.27, -.01] [.14, .39] [.21, .45] Note. Pred. are tweet-based predictions and Obs. are (observed) self-reports. *p < .05; **p < .01; ***p < .001; 95 percent CIs are enclosed in brackets. 48 agreeableness conscientiousness honesty a c h n e o neuroticism extraversion openness a c h n e o .4 .2 .0 .2 .4 .40 0 0 0. 2 .0 2 4 .4 .2 0 2 4 − − 0 0 0 − − 0 0 . 0. . . .−0 −0 0 0 0 β Diff. domain Same Domain Figure 11 . Results from Regressing Observed Big Six from All Tweet-Based Scores Simultaneously Aim 1b: Does activity moderate tweet-based accuracy? I next examined the extent to which tweet-based predictive accuracy was moderated by how often individuals tweet and how many accounts they follow by regressing self-reported Big Six scores on tweet-based predicted scores (from the selected models), number of tweets (followed accounts), and the interaction term. The standardized results from these models are shown in Table 6, which shows that all of the moderator effects were small and statistically indistinguishable from zero. Tweet-based predictive accuracy does not seem to depend on how much a person tweets or how many accounts they follow, assuming they meet the minimum activity threshold(s) of our sample. 49 Table 6 Tweet-Based Predictive Accuracy Moderated by Activity domain moderator term estimate SE t p CI LL CI UL agreeableness followeds Intercept -0.01 0.02 -0.54 .588 -0.04 0.03 agreeableness followeds predicted 0.83 0.02 45.66 < .001 0.79 0.86 agreeableness followeds num. followed 0.00 0.02 -0.24 .814 -0.04 0.03 agreeableness followeds predicted * num. followed -0.04 0.03 -1.42 .157 -0.09 0.01 conscientiousness followeds Intercept 0.00 0.02 0.13 .895 -0.03 0.03 conscientiousness followeds predicted 0.86 0.02 54.18 < .001 0.83 0.90 conscientiousness followeds num. followed -0.02 0.02 -1.16 .245 -0.07 0.02 conscientiousness followeds predicted * num. followed -0.01 0.02 -0.85 .393 -0.05 0.02 honesty followeds Intercept -0.02 0.02 -0.92 .357 -0.05 0.02 honesty followeds predicted 0.84 0.02 46.64 < .001 0.80 0.87 honesty followeds num. followed -0.04 0.02 -1.76 .079 -0.08 0.00 honesty followeds predicted * num. followed -0.03 0.02 -1.80 .072 -0.07 0.00 50 Table 6 continued domain moderator term estimate SE t p CI LL CI UL neuroticism followeds Intercept 0.00 0.02 0.25 .801 -0.03 0.04 neuroticism followeds predicted 0.84 0.02 49.35 < .001 0.80 0.87 neuroticism followeds num. followed 0.03 0.02 1.35 .179 -0.01 0.08 neuroticism followeds predicted * num. followed -0.01 0.03 -0.30 .764 -0.06 0.04 extraversion followeds Intercept 0.00 0.02 0.13 .893 -0.03 0.03 extraversion followeds predicted 0.86 0.02 53.59 < .001 0.83 0.90 extraversion followeds num. followed 0.00 0.02 0.25 .803 -0.03 0.04 extraversion followeds predicted * num. followed -0.01 0.02 -0.25 .800 -0.05 0.04 openness followeds Intercept 0.01 0.02 0.33 .740 -0.03 0.04 openness followeds predicted 0.80 0.02 42.35 < .001 0.76 0.84 openness followeds num. followed 0.01 0.02 0.39 .698 -0.03 0.04 openness followeds predicted * num. followed -0.01 0.02 -0.40 .686 -0.04 0.03 51 Table 6 continued domain moderator term estimate SE t p CI LL CI UL agreeableness tweets Intercept -0.01 0.02 -0.60 .546 -0.05 0.02 agreeableness tweets predicted 0.83 0.02 45.42 < .001 0.79 0.87 agreeableness tweets num. of tweets 0.03 0.02 1.33 .183 -0.01 0.06 agreeableness tweets predicted * num. of tweets 0.00 0.02 -0.13 .899 -0.05 0.04 conscientiousness tweets Intercept 0.01 0.02 0.41 .683 -0.02 0.04 conscientiousness tweets predicted 0.87 0.02 54.11 < .001 0.84 0.90 conscientiousness tweets num. of tweets 0.03 0.02 1.61 .107 -0.01 0.06 conscientiousness tweets predicted * num. of tweets 0.03 0.02 1.94 .053 0.00 0.07 honesty tweets Intercept -0.02 0.02 -0.90 .367 -0.05 0.02 honesty tweets predicted 0.83 0.02 46.19 < .001 0.80 0.87 honesty tweets num. of tweets 0.02 0.02 0.91 .361 -0.02 0.05 honesty tweets predicted * num. of tweets -0.01 0.03 -0.25 .802 -0.06 0.04 52 Table 6 continued domain moderator term estimate SE t p CI LL CI UL neuroticism tweets Intercept 0.00 0.02 -0.01 .988 -0.03 0.03 neuroticism tweets predicted 0.84 0.02 49.06 < .001 0.81 0.88 neuroticism tweets num. of tweets -0.03 0.02 -1.16 .247 -0.07 0.02 neuroticism tweets predicted * num. of tweets 0.02 0.02 1.26 .209 -0.01 0.06 extraversion tweets Intercept 0.00 0.02 0.18 .860 -0.03 0.03 extraversion tweets predicted 0.87 0.02 53.77 < .001 0.83 0.90 extraversion tweets num. of tweets 0.04 0.02 2.41 .016 0.01 0.07 extraversion tweets predicted * num. of tweets 0.01 0.02 0.52 .603 -0.02 0.04 openness tweets Intercept 0.01 0.02 0.29 .769 -0.03 0.04 openness tweets predicted 0.80 0.02 42.35 < .001 0.76 0.83 openness tweets num. of tweets 0.00 0.02 0.03 .975 -0.04 0.04 openness tweets predicted * num. of tweets -0.04 0.02 -1.65 .100 -0.08 0.01 53 Table 6 continued domain moderator term estimate SE t p CI LL CI UL Note. num. of tweets and num. of followed accounts were grand-mean-centered. CI LL and CI UL are the lower and upper bound of the 95 percent CI. 54 Discussion Our findings indicate that at least some aspects of personality are reflected in the language people use on Twitter, but there is considerable heterogeneity across domains. Conscientiousness and openness could be predicted from tweets accurately and distinctly, honesty and neuroticism showed some accuracy but little distinctiveness, and agreeableness and extraversion showed little of either. Tweet-based predictive models appeared to use features that are both consistent with prior work (Mehl, Gosling, & Pennebaker, 2006; Park et al., 2015; Qiu et al., 2012) and with how the Big Six are thought to manifest in observed behavior. Indeed, inspecting Figures 3 through 8 paints quite the picture, of agreeableness corresponding to swearing angrily vs. expressing positivity and inclusivity, of conscientiousness corresponding to topics more or less suited to a workplace, of honesty corresponding to a metaphysically-tinged blend of agreeableness and conscientiousness, of neuroticism corresponding to greater negative affect, and of openness corresponding to talking about one’s aesthetic and intellectual interests. Extraversion was notably difficult to interpret and had the least in common with prior work, which along with the low accuracy estimates suggest that it is more difficult to predict from what people say on Twitter. Finally, tweet-based predictive accuracy appeared to be completely unaffected by how often people tweet or how many accounts they follow. This could suggest that tweet-based predictions are relatively robust to differences in activity above the minimal threshold used here (at 55 least 25 tweets and followed accounts). Tweets seem to best capture conscientiousness, openness, and neuroticism, as demonstrated by the higher accuracy in predicting them from tweets, and the relevance of the features important for predicting these domains. This may reflect some mixture of what Twitter affords to its users. Indeed, twitter offers a place for people to talk about their interests (openness), share their feelings (neuroticism), and exercise restraint or not (conscientiousness), and all of these behaviors create cues that could be easily captured with the techniques used here. Twitter may simply afford fewer opportunities to express one’s level of agreeableness, honesty, and extraversion via tweets, but this doesn’t seem entirely likely. A second possibility is that these domains manifest in more complex ways and require more sophisticated tools, a possibility highlighted by the slightly greater accuracy achieved with the more complex and open-ended approaches (e.g., open-vocab, topics, embeddings). Finally, it is worth considering the possibility that these domains, or at least agreeableness and honesty, are harder to predict via tweets because they are highly desirable (John & Robins, 1993), which could lower accuracy either because people tailor their tweets to convey a more positive impression or because self-reports are a poorer reflection of behavior for these more desirable domains (Vazire, 2010). This would be somewhat at odds with the high accuracy seen for openness, one of the most evaluative Big Six domains. Interestingly, tweet-based predictions seem to capture something broader and 56 more generic than the Big Six given the high intercorrelations among predicted scores for different domains. The pattern of intercorrelations was generally similar to self-reports, and corresponds roughly to the higher-order Big Two (Digman, 1997; Saucier & Srivastava, 2015), a structure which has been shown to be more robust across diverse personality lexicons (Saucier et al., 2014) and theorized to correspond to core biological systems (DeYoung, 2015). Despite this, conscientiousness and openness were distinctly recoverable from tweets, which is unsurprising given the accuracy with which they can be predicted. For this reason, it is somewhat surprising that neuroticism was not distinctly recoverable from tweets. However, it seems plausible that cues for neuroticism, like negative emotion words, are highly reliable but not very distinctive, and that the algorithms were unable to differentiate between people that tweet negative affect often because they experience it often (high N) or because they’re less able to inhibit the impulse to tweet about it (Low C). This is consistent with the fair amount of overlap in important features for neuroticism and the other domains and the high correlation between predicted neuroticism and other domains, especially conscientiousness (r = .68). This, coupled with the unexpected positive correlation between predicted neuroticism and predicted openness, suggests that tweet-based predictions of neuroticism were of questionable validity. 57 III. STUDY 2: PREDICTING PERSONALITY FROM FOLLOWED ACCOUNTS The aim of Study 2 was to assess computerized personality judgments from outgoing network ties on Twitter (i.e., the accounts that users follow). In the first of Study 2’s aims (Aim 2a), I examine the extent to which these outgoing ties, or followed accounts, predict self-reported personality using a cross-validated machine learning approach, testing out combinations of unsupervised and supervised machine learning techniques. As with Aim 1a, I compare models in terms of predictive accuracy and interpretability, ultimately seeking a model that can predict self-reports from followed accounts that are theoretically relevant to the construct that they are predicting. In Aim 2b, I examine how number of tweets and number of followed accounts relate to followed-account-based accuracy. Together, these analyses provide insight into the extent to which individuals’ personalities are reflected in the accounts they follow on Twitter, and whether it depends on how they engage with the platform. Methods Samples & Procedure. Study 2 was conducted on all of the eligible participants from the NIMH and NSF samples that completed Big Six questionnaires and for whom we were able to successfully download followed- account lists, which resulted in a total sample of Ncombined = 1,277 participants. Analytic Procedure. Aim 2a consisted of predicting personality from followed accounts, analogously to Aim 1a, using a procedure designed to reduce 58 overfitting and data leakage in estimating predictive a ccuracy. This consisted of a multi-stage process detailed next. Data Partitioning. Like Study 1, we first split the final sample (N combined = 1,277) into a training and holdout (testing) set using the Caret package in R (Kuhn et al., 2019). The training and holdout samples consisted of roughly 80% (ntraining = 1023) and 20% (nholdout = 254) of the data respectively. All feature selection, data reduction, model training, estimation, and selection were determined in the training data. The final models, trained and selected within the training data, were tested on the holdout sample to get an unbiased estimate of out-of-sample accuracy. Preparing & Pre-processing Followed Accounts. The followed accounts data were structured as a user-account matrix, where each row was an individual user, each column was a distinct account followed by some user(s) in the sample, and cells are filled in with 1’s or 0’s indicating whether (1) or not (0) each distinct user follows each distinct account. The total sample of 1,277 users followed 513,634 distinct accounts, which exceeded what is computationally feasible or efficient. Moreover, many of these accounts were followed by so few users as to be of little use in predictive modeling. At the extreme, uniquely followed accounts are effectively zero-variance predictors and therefore useless for most modeling and data reduction techniques. As such, the first step of our model training consisted of minimal feature selection, pruning followed accounts from the data that had few followers in our data 59 analogously to (Kosinski et al., 2013) approach to Facebook likes. The optimal threshold for feature selection in this data is not yet known, so we tried three values, eliminating friends followed by fewer than 3, 4, and 5 of the participants in our data; the minimum of 3 was chosen through extensive exploratory data analysis in similar data sets. Within the training data, removing followed accounts with fewer than 3, 4, or 5 followers reduced the 513,634 distinct followed accounts to 21,436 accounts, 12,884 accounts, and 8,923 accounts respectively. Thus, the most precipitous drop occurred when going from no threshold to a threshold of 3 followers; each subsequent increase of the threshold cut the number of distinct accounts almost in half. The impact this filtering decision had on predictive accuracy i s discussed below. Modeling approaches For followed accounts, we compared four different modeling approaches: Relaxed LASSO, Random Forests, Supervised Principal Components Analysis (Supervised PCA), and two-step Principal Components Regression (PCR) with ridge regularization. Each is described in greater detail below. Mirroring Youyou et al. (2015)’s approach to predicting personality from Facebook likes, we trained models predicting each personality variable with a variant of LASSO regression on the raw user-friend matrix, treating each distinct followed account as a predictor variable. Classic LASSO is a penalized regression model like ridge that minimizes the sum of absolute (instead of squared) beta weights (i.e., the 60 L1 penalty,λ ∗ Σj=βj j=1 |Bj |, where λ is a scaling parameter that determines the weight of the penalty). However, classic LASSO is known to perform poorly in contexts like these, with many noisy predictors (Meinshausen, 2007). Meinhausen (2007) developed relaxed LASSO to overcome this issue, by separating LASSO’s variable/feature selection function from its regularization (shrinkage) function. Essentially, it runs two LASSO regressions in sequence; the first performs variable selection, selecting k predictors (where k is ≤ total number of predictors j) based on scaling hyperparameter λ, and the second performs a (LASSO) regularized regression with the remaining k variables, shrinking the parameter estimates for the reduced variable set based on scaling hyperparameter φ. Relaxed LASSO, like classic LASSO, can be difficult to interpret when features are correlated, which may or may not be the case with Twitter friends in our data. The second approach was the Random Forests algorithm on the raw user-friend matrix, which was chosen due to its ability to build effective models with sparse and noisy predictors (Kuhn & Johnson, 2013). Details on Random Forests can be seen above in the methods section of Study 1. The third approach was Supervised Principal Components Analysis (sPCA), which first conducts feature selection by eliminating features that are below some minimum (bi-variate) correlation with the outcome variable, and then performs a Principal Components Regression (PCR) with the remaining feature variables; both the minimum correlation threshold and number of components to extract are 61 traditionally determined via cross-validation (Bair, Hastie, Paul, & Tibshirani, 2006). Interpretation tends to be relatively straightforward, even with correlated predictors, which is why it was selected as a candidate for the present aims. Finally, mirroring Kosinski et al. (2013), we conducted a two-step PCR with ridge regularization, first conducting an unsupervised sparse PCA on the user-friend matrix and using the resulting (orthogonal) components as predictors in a Ridge regression; we extracted the number of components that corresponds to 70% of the variance in the original (filtered) user-account m atrices. The analysis section of Study 1 provides further detail on Ridge regression. Model training and selection. All models were trained using the training data, and each model’s training performance was indexed via root mean squared error (RMSE) and multiple correlation (R) from 10-fold cross-validation. Like Study 1, we aimed to maximize predictive accuracy and interpretability as much as possible. Model evaluation. As with Study 1, I selected the candidate models based on the training data, completed an interim registration of model selection (available: https://osf.io/x7tnp/?view_only=e16eb14eec714ac285610543b84cc2e1), and then tested the selected models’ accuracy using the (heldout) test data. To guard against overfitting, I selected one candidate model per outcome variable, while also testing the out-of-sample accuracy for the non-selected models as exploratory analyses, distinguishing selected from non-selected models (which can be verified in our 62 registration). Aim 2b: moderator analyses. After selecting the model and evaluating it on the holdout set, we used the followed-account-based predicted personality scores for all 1277 participants in a series of OLS moderated multiple regressions. In these analyses, actual self-reported personality scores were regressed on followed-account-based scores, number of tweets (followed accounts), and their interaction, with a significant interaction indicating an effect of number of tweets (followed accounts) on followed-account-based predictive accuracy. Each of the Big Six personality domains were examined separately, resulting in 12 total moderator analyses. Results Aim 2a: Predictive Accuracy. Below I describe our results from model training, which models we selected for the holdout dataset, and how accurate the selected and non-selected models were in the holdout dataset. Of the 12 combinations of minimum-follower thresholds and modeling approaches, one combination failed to converge entirely: supervised PCA using followed accounts with at least 3 followers in the data. Thus, the model training and selection results below concern just the 11 other combinations. Model Training & Selection. First, I examined the accuracy with which each combination of minimum followers’ filter and modeling approach could predict 63 self-reported Big Six domains, focusing on the average R and RMSE for predicting the holdout-folds in the 10-fold cross-validation procedure. Figure 12 shows the average R (Panel A) and RMSE (Panel B) for each combination of minimum-followers-filter threshold (y-axes) and modeling approach (color); each dot represents the average R and RMSE for each set of hyperparameters and the bar represents the average (of average Rs or RMSEs) across hyperparameter specifications. Big Six domains are shown in separate panels, indicated with the first letter of the domain n ame. Note that some specifications of Relaxed LASSO are omitted f rom the RMSE plot because they were an order of magnitude greater and beyond the limits set on the x-axis. Figure 12 demonstrates that personality can be predicted from followed accounts on Twitter with at least some degree of accuracy using different combinations of feature selection rules, modeling algorithms, and hyperparameter specifications. Moreover, i t i s apparent in Figure 12 that Random Forests achieved the greatest accuracy (highest R and lowest RMSE) and was relatively robust across hyperparameter specifications ( indicated by the t ightly clustered d ots). Indeed, the worst hyperparameter specifications for Random Forests often outperformed the best specifications by the other algorithms. Figure 13 shows these same metrics for the best hyperparameter specification per modeling approach and minimum-follower-filter threshold, where i t shows that the best Random Forests always outperforms the best alternatives, and that the 64 minimum-follower-filter threshold made very little impact. Together, our quantitative criteria unequivocally support Random Forests and further suggest that, within Random Forests, the minimum-follower filter and hyperparameter specifications made little difference. Figure 12 . K-Fold CV Accuracy for Predicting Personality from Followed Accounts (All Model Specifications). Interpretability. I next considered the interpretability of the models as a criteria for model selection. Interpretability did not strongly differentiate the trained models; 65 Figure 13 . K-Fold CV Accuracy for Predicting Personality from Followed Accounts (Best Model Specifications). each trained model had importance scores or model coefficients for some followed accounts that seemed theoretically relevant to the predicted domain, and some followed accounts with a less straightforward theoretical connection. Moreover, there was a great deal of overlap in which followed accounts had high model coefficients or importance scores, further highlighting the lack of differentiation according to interpretability. Domains differed with respect to interpretability, but even domains 66 with less interpretable models (e.g., agreeableness) had some important accounts which seemed theoretically relevant. Reporting the model coefficient and importance values for each subset of followed accounts, predicting each domain, with each modeling approach would be well-beyond the scope of this report; it would literally include hundreds of thousands of values. As such, I focus here just on the best fitting Random Forests models given their superiority in the more differentiating quantitative criteria. Tables 7 – 12 show the 15 accounts highest in importance scores from these models alongside the zero-order correlation with the corresponding personality domain. The 15 most important accounts for predicting Agreeableness are shown in Table 7. Agreeableness was perhaps the least straightforward, but the account with the highest importance score – the founder of celebrity news and gossip site TMZ (a negative zero-order correlate) – makes some theoretical sense given the often antagonistic nature of tabloid outlets like TMZ. Otherwise, it contained a mix of brands (dove chocolate, playstation), celebrities (e.g., Nikolaj Coster-Waldau from HBO’s Game of Thrones), and other accounts. The 15 most important accounts for predicting conscientiousness are shown in Table 8; these tended to be negative zero-order correlates that relate to entertainment (video games, podcasts) and also included subversive humor accounts (e.g., “notofeminism”), potentially suggesting that lower conscientiousness is expressed by 67 Table 7 15 Most Important Accounts Predicting Agreeableness followed account importance r harveylevintmz 100.00 -0.12 jhony4942 88.14 -0.11 ossoff 82.89 -0.03 vizmedia 82.51 0.09 terrydpowell 81.62 -0.11 nikolajcw 79.45 0.09 jaguars 78.28 -0.02 dovechocolate 77.46 0.03 fancynews24 74.88 -0.13 pierrebouvier 74.78 -0.05 playstation 74.21 0.07 threadless 74.11 0.10 momspark 74.06 0.07 netaporter 73.42 0.09 lootably 72.96 -0.06 Note. Importance scores obtained with permutation method from the Random Forests with the highest R (and second lowest RMSE). r corresponds to zero- order correlation between following that account and self-reported agreeableness. using twitter for entertainment (rather than work or news) and perhaps especially more subversive entertainment. The 15 most important accounts for predicting honesty are shown in Table 9. Honesty, like agreeableness, was harder to interpret. Some highly important accounts were associated with more wholesome video games, including the official Pokemon account and the creator of the game “Stardew Valley” (“concernedape”), potentially reflecting a preference for more wholesome media content. 68 Table 8 15 Most Important Accounts Predicting Conscientiousness followed account importance r bts_twt 100.00 -0.14 thezonecast 60.46 -0.14 tobyfox 55.56 -0.14 wweuniverse 53.02 -0.02 travismcelroy 47.48 -0.13 notofeminism 40.71 -0.13 suethetrex 35.60 -0.06 cia 33.44 0.10 griffinmcelroy 32.77 -0.11 shitduosays 32.10 -0.05 gselevator 31.50 0.08 louisepentland 30.30 -0.11 zachanner 30.18 -0.10 amazon 29.20 0.09 usainbolt 28.75 0.10 Note. Importance scores obtained with permutation method from the Random Forests with the highest R (and lowest RMSE). r corresponds to zero-order correlation between following that account and self-reported conscientiousness. The 15 most important accounts for predicting neuroticism are shown in Table 10; these seemed indirectly related to neuroticism and included several artists known for emotionally-evocative music (Taylor Swift, Kid Cudi, Lana Del Rey), American activist/whistle blower Chelsea Manning, and ESPN (negative zero-order correlate). The 15 most important accounts for predicting extraversion are shown in 69 Table 9 15 Most Important Accounts Predicting Honesty followed account importance r fancynews24 100.00 -0.13 benlandis 74.83 0.07 hughlaurie 73.56 0.13 concernedape 71.29 0.12 badastronomer 71.12 0.12 businessinsider 69.17 -0.09 pokemon 66.01 0.10 ladygaga 65.23 -0.06 sirpatstew 64.17 0.11 thetweetofgod 60.74 0.06 kanyewest 58.99 -0.11 thesims 58.90 0.08 chaseiyons 58.75 -0.10 zachlowe_nba 56.91 -0.04 iownjd 56.88 0.08 Note. Importance scores obtained with permutation method from the Random Forests with the second highest R (and lowest RMSE). r corresponds to zero-order correlation between following that account and self-reported honesty. Table 11; these included vinecreators (a no-longer-active stream of content from the no-longer-active platform Vine), vinecreators’ successor account called twittervideo, all-female Korean pop group Loona, postsecret (a site for sharing secrets), an account for a developer that releases content for an anime-inspired rhythm-game, ESPN, Khloe Kardashian, and subversive humor account dril, all of which may suggest that extraversion vs. introversion may be reflected by following culutral content that is more mainstream (ESPN, Khloe Kardashian) vs. niche (anime, k-pop, etc.). 70 Table 10 15 Most Important Accounts Predicting Neuroticism followed account importance r justinmcelroy 100.00 0.17 taylornation13 87.17 0.10 thezonecast 74.03 0.15 xychelsea 72.20 0.14 espn 70.06 -0.15 colourpopco 67.26 0.13 lanadelrey 66.20 0.14 kidcudi 63.34 0.11 griffinmcelroy 63.06 0.15 nickiminaj 60.14 0.08 travismcelroy 58.58 0.15 gilliana 56.26 0.14 notofeminism 51.88 0.13 lin_manuel 50.35 0.14 vinecreators 50.33 -0.06 Note. Importance scores obtained with permutation method from the Random Forests with the highest R (and lowest RMSE). r corresponds to zero-order correlation between following that account and self-reported neuroticism. Finally, the 15 most important accounts for predicting Openness are shown in Table 12. These were the most straightforward to interpret, with important accounts that include celebrity-scientist Neil Degrasse Tyson, comedian Patton Oswalt, the Daila LLama, musical artists (k-pop band Loona), and the online craft market ETSY, all of which seem to reflect the intellectual and artistic interests characteristic of high Openness. Selected Models. Given their superior quantitative performance and sufficient 71 Table 11 15 Most Important Accounts Predicting Extraversion followed account importance r vinecreators 100.00 0.10 postsecret 93.60 -0.08 twittervideo 90.19 0.07 bbcworld 88.21 0.10 loonatheworld 79.90 -0.09 lastweektonight 79.30 0.02 id_536649400 74.52 0.11 taylornation13 71.90 -0.07 espn 58.82 0.13 rayfirefist 52.14 -0.09 translaterealdt 49.45 0.03 iamjohnoliver 49.14 0.02 askaaronlee 47.67 -0.07 kourtneykardash 47.58 0.07 dril 45.56 -0.08 Note. Importance scores obtained with permutation method from the Random Forests with the highest R (and lowest RMSE). r corresponds to zero-order correlation between following that account and self-reported extraversion. interpretability, we selected random forests as our approach, choosing the minimum followers threshold and hyperparameter specifications based on training accuracy. Selected models are shown in Table 13. The highest R and lowest RMSE were the same model specification for conscientiousness, neuroticism, extraversion, and openness, so we selected these specifications. For agreeableness, the model with the lowest RMSE differed from the model with the highest R, though each had almost identical R and RMSE values (see Tables 14 and 15); the difference in R was greater 72 Table 12 15 Most Important Accounts Predicting Openness followed account importance r neiltyson 100.00 0.09 pattonoswalt 80.57 0.14 fancynews24 61.77 -0.09 dalailama 55.43 0.11 loonatheworld 54.84 -0.04 officialjaden 53.79 0.11 actuallynph 50.28 0.14 jcrasnick 48.23 -0.12 thefakeespn 44.19 -0.12 mirandalambert 44.14 -0.08 andyrichter 40.39 0.13 etsy 39.60 0.10 cashapp 39.06 0.09 zaynmalik 38.89 -0.01 gameofthrones 38.77 0.09 Note. Importance scores obtained with permutation method from the Random Forests with the highest R (and lowest RMSE). r corresponds to zero-order correlation between following that account and self-reported Openness. than the difference in RMSE so we selected the model with the highest R. The same was true for Honesty, where the model with the highest R differed from the model with the lowest RMSE, but the difference in each (R and RMSE) was negligible. We thus selected the model with the lowest RMSE since its minimum-follower filter (4) and its hyperparameters were similar to selected models for neuroticism and extraversion. It’s worth noting that we found moderate accuracy in the training set for all six domains, but it was lowest for agreeableness, highest for neuroticism, and 73 Table 13 Table of Selected Followed-Account-Based Models, their Specifications, and their Training Accuracy domain Modeling approach Filter mtry MNS r RMSE agreeableness Random Forests 3 207 5 0.19 0.58 conscientiousness Random Forests 3 207 5 0.26 0.70 honesty Random Forests 4 160 5 0.25 0.56 neuroticism Random Forests 4 160 5 0.34 0.85 extraversion Random Forests 4 160 5 0.24 0.77 openness Random Forests 3 207 5 0.25 0.61 Note. Filter refers to the minimum followers threshold used to filter out followed accounts. mtry and MNS are hyperparameter specifications. mtry corresponds to how many predictors the algorithm samples to build each tree in the forest. MNS stands for minimum node size and corresponds to the minimum number of observation in each ’node’, meaning it won’t create a split in the data for fewer observations than MNS. roughly the same for the other four domains. Model Evaluation. I next evaluated the models by assessing their accuracy in predicting self-reported personality scores in the holdout data. Correlations between predicted scores derived from the trained models and observed scores for the holdout data are shown for selected (triangles) and non-selected (circles) in Figure 14. You can see in Figure 14 that the model selection procedure tended to lead to choosing the model with the highest or nearly highest out-of-sample accuracy. Importantly, the accuracy estimates from non-selected models should be taken with a grain of salt; all of these estimates are subject to fluctuation and taking the non-selected models accuracy at face value undermines the principal behind using a separate evaluation set, namely, estimating accuracy removed from a further (biasing) 74 Table 14 Model Specifications with Highest R Predicting Personality from Followed Accounts domain Modeling approach Filter mtry MNS r RMSE agreeableness Random Forests 3 207 5 0.19 0.58 conscientiousness Random Forests 3 207 5 0.26 0.70 honesty Random Forests 5 133 5 0.25 0.56 neuroticism Random Forests 4 160 5 0.34 0.85 extraversion Random Forests 4 160 5 0.24 0.77 openness Random Forests 3 207 5 0.25 0.61 Note. Filter refers to the minimum followers threshold used to filter out followed accounts. mtry and MNS are hyperparameter specifications. mtry corresponds to how many predictors the algorithm samples to build each tree in the forest. MNS stands for minimum node size and corresponds to the minimum number of observation in each ’node’, meaning it won’t create a split in the data for fewer observations than MNS. Table 15 Model Specifications with Lowest RMSE Predicting Personality from Followed Accounts domain Modeling approach Filter mtry MNS r RMSE agreeableness Random Forests 5 2 5 0.16 0.58 conscientiousness Random Forests 3 207 5 0.26 0.70 honesty Random Forests 4 160 5 0.25 0.56 neuroticism Random Forests 4 160 5 0.34 0.85 extraversion Random Forests 4 160 5 0.24 0.77 openness Random Forests 3 207 5 0.25 0.61 Note. Filter refers to the minimum followers threshold used to filter out followed accounts. mtry and MNS are hyperparameter specifications. mtry corresponds to how many predictors the algorithm samples to build each tree in the forest. MNS stands for minimum node size and corresponds to the minimum number of observation in each ’node’, meaning it won’t create a split in the data for fewer observations than MNS. 75 selection effect. Figure 15 shows the estimates for selected models compared to predictive accuracy predicting personality from Facebook-like ties from Kosinski and colleagues’(2013) study. As seen in Figure 15, followed accounts predict openness with considerable accuracy, neuroticism, extraversion, and honesty with moderate accuracy, and conscientiousness and agreeableness with little accuracy. Followed accounts predict openness and neuroticism about as well as Facebook likes, they predict extraversion with just slightly less accuracy than Facebook likes, and agreeableness and conscientiousness with considerably less accuracy than Facebook likes. The accuracy achieved by the models in predicting each of the Big Six suggests that, with the possible exceptions of conscientiousness and agreeableness, personality is reflected in the accounts people choose to f ollow. However, i t i s an open question whether the models are picking up on distinctive information about each domain (e.g., how openness in particular is reflected in followed accounts) or some more general information relevant across domains (e.g., how general positivity is reflected in followed accounts). Mirroring Study 1, I assessed these possibilities first by examining the intercorrelations among all followed-account-based predicted and observed scores, which are shown in Table 16. As with tweet-based predictions, followed-account-based predictions were more strongly intercorrelated than (observed) self-reported scores, though the difference was less pronounced than with tweet-based predictions. Unlike with tweet-based predicted scores, the pattern of correlations among followed-account-based predicted scores looked quite different than 76 intercorrelations among observed scale scores, with predicted conscientiousness, for example, showing virtually no correlation with predicted agreeableness and honesty-propriety. Likewise, predicted neuroticism correlated positively with predicted openness, agreeableness, and honesty, which are either uncorrelated or negatively correlated among observed scores. The structure of followed-account-based predicted scores was thus less differentiated, like tweet-based predictions, but also showed a relatively distinct pattern of intercorrelations, unlike tweet-based scores. The high intercorrelations between followed-account-based predicted scores could suggest that followed-accounts are not differentiating between cues for different Big Six domains, a possibility we examine more directly by regressing each observed domain on all of the predicted scores simultaneously. If personality domains are distinctly reflected in followed accounts, we should see a s ignificant slope for the matching predicted score and non-significant (near-zero) s lopes for the non-matching predicted scores. These results are shown in Figure 16, where it is apparent that models picked up on distinctive information for all of the Big Six except for agreeableness and conscientiousness, for which there was only a small degree of accuracy to begin with (see Figure 15). This suggests that followed accounts distinctly reflect specific personality domains when they achieve any appreciable accuracy. 77 Figure 14 . Out-of-sample Accuracy (R) for Selected and Non-Selected Followed- Account-Based Predictive Models Figure 15 . Out-of-sample Accuracy (R) of Followed-Account-Based Predictions Compared to Facebook Likes 78 Table 16 Correlations Between Followed-Account-Based Predictions and Observed Big Six Scores Variable 1 2 3 4 5 6 7 8 9 10 11 1. Obs. A 2. Obs. C .26** [.14, .37] 3. Obs. H .42** .42** [.31, .51] [.32, .52] 4. Obs. N -.19** -.43** -.09 [-.30, -.07] [-.53, -.33] [-.21, .03] 5. Obs. E .07 .18** -.30** -.41** [-.05, .19] [.05, .29] [-.41, -.19] [-.50, -.30] 6. Obs. O .23** .14* .08 .01 .17** [.11, .35] [.02, .26] [-.04, .21] [-.11, .13] [.05, .28] 7. Pred. A .07 .06 .13* .12 -.08 .08 [-.05, .19] [-.06, .18] [.01, .25] [-.00, .24] [-.20, .04] [-.05, .20] 79 Table 16 continued Variable 1 2 3 4 5 6 7 8 9 10 11 8. Pred. C -.00 .13* .05 -.20** .10 -.28** -.05 [-.13, .12] [.01, .25] [-.07, .17] [-.32, -.08] [-.02, .22] [-.39, -.16] [-.17, .08] 9. Pred. H .09 .09 .24** .18** -.22** .13* .61** -.04 [-.04, .21] [-.03, .21] [.12, .35] [.06, .30] [-.33, -.10] [.01, .25] [.53, .68] [-.16, .08] 10. Pred. N .06 -.14* .01 .30** -.15* .25** .35** -.66** .39** [-.06, .18] [-.26, -.02] [-.11, .14] [.19, .41] [-.26, -.02] [.13, .36] [.23, .45] [-.72, -.59] [.28, .49] 11. Pred. E -.05 .05 -.16** -.23** .28** -.12* -.22** .55** -.43** -.62** [-.17, .08] [-.07, .17] [-.28, -.04] [-.34, -.11] [.17, .39] [-.24, -.00] [-.34, -.10] [.46, .63] [-.52, -.32] [-.69, -.54] 12. Pred. O .14* -.03 .05 .11 -.01 .45** .20** -.18** .21** .28** -.03 [.02, .26] [-.15, .09] [-.07, .17] [-.02, .23] [-.14, .11] [.35, .54] [.08, .32] [-.30, -.06] [.09, .32] [.16, .39] [-.15, .09] Note. Pred. are tweet-based predictions and Obs. are (observed) self-reports. *p < .05; **p < .01; ***p < .001; 95 percent CIs are enclosed in brackets. 80 agreeableness conscientiousness honesty a c h n e o neuroticism extraversion openness a c h n e o .2 5 0 5 5 0 .0 .2 5 .5 0 .20 .0 0 25 50 .2 0 5 0 0 0 0 0 0. 0. 0 0. 0 0. 2 5 − − − 0 . β Diff. domain Same Domain Figure 16 . Results from Regressing Observed Big Six from All Followed-Account-Based Scores Simultaneously Aim 2b: Does activity moderate followed-account-based accuracy? I next examined the extent to which predictive accuracy was moderated by how often individuals tweet and how many accounts they follow by regressing self-reported Big Six scores on followed-account-based predicted scores, number of tweets (followed accounts), and the interaction term. The standardized results from these models are shown in Table 17, which shows that number of followed accounts moderates accuracy for agreeableness and honesty, and number of tweets moderate accuracy for agreeableness. However, these moderation effects were quite small, as seen in Figures 17 and 18, which show moderator results for agreeableness and honesty-propriety respectively. Indeed, the significant moderation in the left-hand 81 panel of Figure 18 looks hardly distinguishable from the non-significant moderation on the right-hand side. Thus, followed accounts are similarly accurate for twitter users across different rates of tweeting and following accounts. Figure 17 . Followed-Account-Based Predictive Accuracy Moderated by Activity for Agreeableness Figure 18 . Followed-Account-Based Predictive Accuracy Moderated by Activity for Honesty-Propriety 82 Table 17 Followed-Account-Based Predictive Accuracy Moderated by Activity domain moderator term estimate SE t p CI LL CI UL agreeableness followeds Intercept 0.00 0.02 0.13 .900 -0.04 0.05 agreeableness followeds predicted 0.62 0.03 23.98 < .001 0.57 0.67 agreeableness followeds num. followed -0.07 0.02 -3.09 .002 -0.12 -0.03 agreeableness followeds predicted * num. followed -0.08 0.01 -6.32 < .001 -0.10 -0.05 conscientiousness followeds Intercept 0.01 0.01 0.47 .638 -0.02 0.04 conscientiousness followeds predicted 0.86 0.01 58.77 < .001 0.83 0.89 conscientiousness followeds num. followed -0.02 0.02 -1.23 .219 -0.06 0.01 conscientiousness followeds predicted * num. followed -0.02 0.02 -1.13 .259 -0.05 0.01 honesty followeds Intercept 0.00 0.01 0.02 .982 -0.03 0.03 honesty followeds predicted 0.87 0.01 58.25 < .001 0.84 0.90 honesty followeds num. followed -0.05 0.02 -3.04 .002 -0.08 -0.02 honesty followeds predicted * num. followed -0.04 0.02 -2.30 .022 -0.07 -0.01 83 Table 17 continued domain moderator term estimate SE t p CI LL CI UL neuroticism followeds Intercept 0.01 0.01 0.51 .613 -0.02 0.04 neuroticism followeds predicted 0.86 0.01 59.10 < .001 0.83 0.89 neuroticism followeds num. followed -0.01 0.02 -0.41 .682 -0.05 0.03 neuroticism followeds predicted * num. followed -0.01 0.02 -0.57 .568 -0.05 0.03 extraversion followeds Intercept 0.01 0.01 0.54 .592 -0.02 0.03 extraversion followeds predicted 0.88 0.01 64.22 < .001 0.86 0.91 extraversion followeds num. followed -0.01 0.01 -1.00 .319 -0.04 0.01 extraversion followeds predicted * num. followed -0.02 0.01 -1.62 .105 -0.05 0.00 openness followeds Intercept 0.01 0.01 0.44 .659 -0.02 0.03 openness followeds predicted 0.89 0.01 66.67 < .001 0.86 0.91 openness followeds num. followed 0.01 0.01 0.41 .680 -0.02 0.03 openness followeds predicted * num. followed -0.01 0.01 -1.36 .173 -0.03 0.01 84 Table 17 continued domain moderator term estimate SE t p CI LL CI UL agreeableness tweets Intercept 0.01 0.02 0.35 .726 -0.04 0.05 agreeableness tweets predicted 0.59 0.02 24.09 < .001 0.54 0.64 agreeableness tweets num. of tweets -0.05 0.02 -2.26 .024 -0.10 -0.01 agreeableness tweets predicted * num. of tweets -0.08 0.02 -4.80 < .001 -0.12 -0.05 conscientiousness tweets Intercept 0.01 0.01 0.38 .701 -0.02 0.03 conscientiousness tweets predicted 0.86 0.01 58.49 < .001 0.83 0.89 conscientiousness tweets num. of tweets -0.01 0.02 -0.60 .549 -0.04 0.02 conscientiousness tweets predicted * num. of tweets -0.02 0.01 -1.89 .059 -0.04 0.00 honesty tweets Intercept 0.00 0.01 0.06 .952 -0.03 0.03 honesty tweets predicted 0.87 0.01 57.91 < .001 0.84 0.90 honesty tweets num. of tweets -0.01 0.02 -0.61 .544 -0.04 0.02 honesty tweets predicted * num. of tweets -0.02 0.02 -1.42 .156 -0.06 0.01 85 Table 17 continued domain moderator term estimate SE t p CI LL CI UL neuroticism tweets Intercept 0.01 0.01 0.48 .631 -0.02 0.04 neuroticism tweets predicted 0.86 0.01 58.76 < .001 0.83 0.89 neuroticism tweets num. of tweets -0.01 0.02 -0.65 .519 -0.05 0.02 neuroticism tweets predicted * num. of tweets -0.01 0.02 -0.34 .731 -0.03 0.02 extraversion tweets Intercept 0.01 0.01 0.57 .569 -0.02 0.03 extraversion tweets predicted 0.88 0.01 63.96 < .001 0.86 0.91 extraversion tweets num. of tweets 0.02 0.01 1.52 .129 -0.01 0.05 extraversion tweets predicted * num. of tweets 0.00 0.01 -0.06 .950 -0.02 0.02 openness tweets Intercept 0.01 0.01 0.43 .664 -0.02 0.03 openness tweets predicted 0.89 0.01 66.71 < .001 0.86 0.91 openness tweets num. of tweets 0.01 0.01 1.00 .318 -0.01 0.04 openness tweets predicted * num. of tweets 0.01 0.02 0.83 .405 -0.02 0.05 86 Table 17 continued domain moderator term estimate SE t p CI LL CI UL Note. num. of tweets and num. of followed accounts were grand-mean-centered. CI LL and CI UL are the lower and upper bound of the 95 percent CI. 87 Discussion The results of Study 2 suggest that personality is indeed reflected in the accounts that people follow on Twitter, though there was considerable variability in the extent of accuracy across domains. Moreover, the different results appear to converge on several key findings. First, Openness i s the most predictable from followed accounts. Models achieved considerable accuracy during training and evaluation (with the holdout data), the selected model appeared to use theoretically relevant followed accounts in its predictions, and the follow-up analyses demonstrated that these accounts appeared to distinctively reflect o penness. Neuroticism, honesty, and extraversion were similarly, though slightly less, predictable and interpretable from followed accounts, and each appeared to be distinctly reflected in followed accounts. Agreeableness and conscientiousness were at the other extreme, with relatively poor performance in training, even poorer performance in evaluation, and with potentially the least theoretically-consistent model parameters. With the exception of agreeableness and conscientiousness, followed-account-based predictions were similarly accurate to accuracy obtained from predicting personality from Facebook likes (Kosinski et al., 2013), which is especially impressive given the slightly more conservative design of the present study - namely, the use of a holdout sample for model evaluation. Finally, followed-account-based accuracy was virtually unaffected by how much people tweet or how many accounts they follow, suggesting that this approach is relatively robust to differences in activity and use of Twitter. 88 Followed accounts are thus a relatively robust predictor of personality, with the notable exceptions of agreeableness and conscientiousness. In some ways, it is unsurprising that followed accounts seem to reflect openness more than the other personality domains. Indeed, following accounts on twitter is the primary way people can curate their timeline - what they see when they log into twitter - and it thus makes sense that this appears to be most related to openness, the personality domain most centrally concerned with aesthetic and intellectual interests. Likewise, extraversion was fairly predictable, and the model appeared to achieve this via picking up on more mainstream (high extraversion) vs. niche (low extraversion) cultural interests, a finding consistent with work Park et a l. (2015)’s work on predicting personality from Facebook status updates and with some of the open-vocab results from Study 1. Thus, one reason for the heterogeneity of predictive accuracy across domains could be the extent to which personality domains are expressed in interests, and agreeableness and (to a lesser extent) conscientiousness may simply have few systematic relations to the kinds of interests people can seek out on Twitter. The intercorrelations among followed-account-based predictions were quite a bit stronger than among (observed) self-report scales, suggesting that followed-account-based predictions may be picking up on some broader, less specific personality information. However, unlike tweet-based scores, these did not appear to map as cleanly onto the Big Two (Digman, 1997; Saucier & Srivastava, 2015), which is somewhat puzzling. One possible explanation for this is that the structure could be 89 obscured by the seemingly poor predictions for conscientiousness and agreeableness, though these were in the same ballpark as the lowest estimates (agreeableness and extraversion) for tweet-based predictive accuracy. Another possibility is that the true structure of followed-account based predictions does not correspond to the Big Two, either reflecting deeper psychological truths (e.g., personality-relevant interests may not have the same correlation structure as personality adjectives) or for methodological reasons limited to twitter (e.g., twitter’s recommendation algorithm could introduce bias and noise). 90 IV. STUDY 3: PERCEIVING PERSONALITY IN PROFILES Study 3 focuses on judgments made by human perceivers from users’ profiles and has three specific aims. First, I examine the extent to which people reach consensus and accuracy in their judgments of targets’ personalities after viewing their (targets’) Twitter profiles, providing insight into the extent to which profiles convey consistent and accurate information about target users’ personalities (Aim 3a). Second, I examine the extent to which consensus and accuracy are affected by targets’ self-presentation goals and the density of the target users’ follower networks, speaking to the process of (in)accurate personality judgment on Twitter (Aim 3b). Third, I examine the intra- and interpersonal consequences for accuracy and idealization by examining their impact on targets’ well-being and likability (Aim 3c). Together, these aims elucidate the social functions of personality expression and interpersonal perception online. Methods Samples & Procedure. Study 3 used two samples of participants. The target sample consisted of ntargets = 100 participants from the NSF sample that provided self-reports of their personality, self-presentation goals, and access to their twitter data. Target participants were 27.22 years old on average; race and gender breakdowns are shown in Table 18. In addition to the data collection described above, we collected additional data for our target sample. We obtained screencaps of each of 91 these 100 participants’ Twitter profiles, which served as the stimuli for our sample of perceivers. We also downloaded each of their (targets’) full follower list and their followers’ followed accounts list; this data was used to calculate follower-network density. The perceiver sample consisted of an initial sample of 308 participants drawn from the UO Human Subjects Pool. Data first underwent a blinded screen wherein another PhD student in the lab screened a masked dataset (i.e., where the link between targets, perceivers, and ratings were broken) for random responding, leading to the removal of 10 participants and a final sample of n perceivers = 2 98. Perceiver participants were 19.67 years old on average; race and gender breakdowns are shown in Table 19. Data collection was approved by the University of Oregon Institutional Review Board (Protocol # 10122017.011) and was conducted in a manner consistent with the ethical treatment of human subjects. Perceivers were shown a random block of five profile screencaps from the target sample and instructed to rate target participants “based only on the information included in the profile” and to “give [their] best answer, even if it is just a guess.” Participants rated the targets standing on the Big Five using the 10-item Big Five Inventory (Rammstedt & John, 2007) and a single item for honesty. Relevant to Aim 3c, perceivers rated the extent to which they think the target is likable. After they completed their ratings of the five targets, they were thanked and compensated for their time with course credit. The questionnaire includes several other ratings not examined here (intelligent, 92 Table 18 Target Race and Gender female male Not Reported other American Indian or Alaska Native 1 0 0 0 Asian 8 8 0 0 Black or African American 3 7 0 0 Not Reported 0 0 1 0 Other 0 2 0 0 White 25 44 0 1 Note. Demographic questions were based on NIH enrollment reporting categories. self-esteem, trustworthy, funny, lonely, assertive, modest, arrogant, and physically attractive, perceived race, perceived gender, and perceived socio-economic status). Measures. For this study, we measured self-reported Big Six personality domains and well-being, reports of how targets wish to be seen on Twitter, perceived Big Six personality domains, and calculated targets’ follower-network density using Twitter API data. Self-reported Big Six. Target participants completed the self-reported Big Six measure described in the overview section (prior to Study 1). To summarize, the Big Five were measured with the 60-item BFI-2 (Soto & John, 2017b) and 8 items from the Questionnaire Big Six family of measures for honesty-propriety (Thalmayer & Saucier, 2014). Internal consistency was adequate, with alphas ranging from a low of .64 for honesty-propriety and .92 for neuroticism. Self-reported well-being. Target participants completed the single-item satisfaction with life measure of well-being (Cheung & Lucas, 2014). 93 Table 19 Perceiver Race and Gender female male other American Indian or Alaska Native 4 5 0 Asian 26 18 0 Black or African American 9 5 0 Native Hawaiian or Other Pacific Islander 1 3 0 Not Reported 1 0 0 Other 7 4 1 White 137 78 1 Note. Demographic questions were based on NIH enrollment reporting categories. Self-presentational Big Six. Target participants in the NSF sample provided self-reports of how they present themselves on Twitter. We asked participants to indicate “what impression [they] would like to make on people who see [their] Twitter profile” using the 15 i tem extra short BFI-2 (BFI-2-XS; Soto & John, 2017a) and three items to measure honesty-propriety. Alphas were much lower for these scales - as is typical for short measures - and ranged from 0.18 for honesty and 0.71 for neuroticism. Perceiver-rated Big Six. Perceiver participants rated targets using the 10-item Big Five Inventory (Rammstedt & John, 2007) and a single item for honesty. Alphas ranged from 0.44 for neuroticism and 0.68 for extraversion. Follower-network Density Targets’ Follower-network density was calculated by taking each targets’ network of followers (i.e., all users that follow the target), downloading those followers’ followed account list, and then scoring each for density using the igraph library (Csardi & Nepusz, 2006). Each targets’ score thus 94 represents the proportion of edges (relative to the total number of possible edges) among their follower-network. Analyses Aims 3a and 3b concern the extent of consensus, accuracy, idealization, and moderators of these e˙ects in profile-based perceptions. All of these analyses will consist of a series of cross-classified random e˙ects models (Bryk & Raudenbush, 2002). In this design, ratings are cross-classified by perceivers and targets, which are nested in blocks. We will examine consensus and accuracy for each trait separately, by conducting a series of mixed effects models (Bryk & Raudenbush, 2002). We’ll start with an intercept only model from which we can estimate consensus, and subsequently add self-reports (for accuracy), self-presentation reports (for idealization), and follower-network density and its interaction with self-reports to examine whether density accuracy. Specific details, including equations, are shown in the results section as relevant. Aim 3c concerns the extent to which accurate or idealized perceptions affect targets’ self-reported well-being and perceived (i.e., perceiver-rated) likability using a technique called response surface analysis (RSA; Barranti et al., 2017). RSA consists of running a polynomial regression predicting an outcome from two predictors, their quadratic e˙ect, and their interaction. This equation is used to define the response surface, the shape of which can be used to test several di˙erent questions about whether and how matches or mismatches between predictors relate to the outcome. 95 This approach is considered the most comprehensive method for examining the consequences of accuracy in interpersonal perception (see Barranti et al., 2017). Since target well-being is a single- (target-) level variable, response surfaces for well-being will be defined using s ingle-level r egressions. Since l ikability i s a target-perceiver dyadic variable, response surfaces for likability will be defined using cross-classified mixed effects models and use multi-level RSA (Nestler, Humberg, & Schönbrodt, 2019). RSA simultaneously estimates five parameters, each of which has a meaningful interpretation. First, the slope of the line of congruence (a1) captures the extent to which matching at high values is associated with different outcomes than matching at low levels. Second, the curvature of the line of congruence (a2) captures the extent to which matching at extreme values is associated with different outcomes than matching at less extreme values. Third, the slope along the line of incongruence (a3) captures whether one mismatch is better or worse than the other. Fourth, the curvature of the line of incongruence (a4) captures the extent to which matches or mismatches are better. Finally, Humberg, Nestler, and Back (2019) suggest testing that the first principal axis (also called the r idge) of the surface i s positioned at the line of congruence by testing a5, which provides a strict test of congruence hypotheses (i.e., that matching leads to the highest value for the outcome). 96 Results I start by examining consensus, accuracy, and idealization, then examine whether density moderates accuracy, and finally examine the consequences for accuracy and idealization on well-being and perceived likability. Consensus. Consensus was estimated using an intercept only model (per domain). At level 1, we regressed scale scores for each rating of target i by perceiever j in block k on a random intercept. Random effects for target and perceiver were included at level 2 and random effects for block was included at level 3. This is shown in Equation (1) below. Level1 : (1) Yijk = π0ijk + eijk Level2 : π0ijk = β00k + r0ik + r0jk Level3 : β00k = γ000 + u00k This decomposed each rating into the grand mean (γ00), variance explained by the target (Var(r0ik) or σ2target), variance explained by the perceiver (Var(r0jk) or 97 σ2perceiver), and residual variance (Var(eijk) or σ2resid; Kenny, 1994). Consensus was estimated using these baseline models, by computing the target Intraclass Correlation Coefficient (ICCtarget). The ICCtarget is defined as the target variance over the total variance (see Kenny, 1994) as shown in Equation (2) below: σ2 ICC = targetTarget σ2 + σ2 + σ2 + σ2 (2)target perceiver block resid The ICCtarget measures the percentage of the variance in ratings explained by the target being rated or the percent agreement in ratings from different perceivers rating the same target. It is also equivalent to the expected correlation between ratings made by two randomly sampled perceivers, and is thus a straightforward metric of single-judge (rather than average) agreement. ICCtarget and bootstrapped 95% Confidence Intervals for each Big Six domain are shown in Figure 19. You can see in Figure 19 that perceivers reach consensus about all of the Big Six after viewing targets’ profiles. Consensus was substantial for openness and extraversion, moderately large for agreeableness, conscientiousness, and neuroticism, and then low but distinguishable from chance guessing for honesty. These results suggest that perceivers do agree about targets’ personalities based on twitter profiles, but they do not speak to the accuracy of these judgments. 98 Figure 19 . Plot of ICCtarget for Each Big Six Domain Accuracy. Accuracy was estimated by adding target self-reports (SRik) as a level-1 predictor in the mixed effects models, allowing the accuracy slope to vary randomly over targets, perceivers, and blocks as shown in Equation (3) below: Level1 : (3) Yijk = π0ijk + π1ijkSRik + eijk Level2 : π0ijk = β00k + r0ik + r0jk π1ijk = β10k + r1ik + r1jk Level3 : β00k = γ000 + u00k β10k = γ100 + u10k 99 The accuracy slope’s intercept,γ 100, corresponds to the average accuracy across targets and perceivers. The fixed e ffects f or t his m odel a re s hown i n Table 2 0 and random effects are shown in Table 21. Accuracy was relatively low across the board, though the CIs generally range from no accuracy to moderate accuracy. Only agreeableness has a CI which excludes 0, meaning it is the only domain for which accuracy is distinguishable from chance guessing. The rest were in a similar ballpark, with the exceptions of the somewhat lower estimates for conscientiousness and Table 20 Accuracy of Profile-Based Perceptions domain term γ100 SE t df p CI LL CI UL agreeableness accuracy 0.19 0.07 2.53 38.15 .016 0.04 0.34 conscientiousness accuracy 0.04 0.06 0.67 26.44 .508 -0.08 0.17 extraversion accuracy 0.13 0.08 1.69 31.18 .100 -0.02 0.27 honesty accuracy 0.06 0.06 1.06 19.12 .300 -0.05 0.18 neuroticism accuracy 0.09 0.05 1.82 91.81 .072 0.00 0.18 openness accuracy 0.13 0.07 1.89 46.23 .065 -0.01 0.26 Note. Effect sizes are unstandardized. CI LL and CI UL correspond to the lower and upper limtis of the 95 percent CI respectively. honesty. Moreover, the target- and perceiver- variance in accuracy slopes tended to be quite low, with the possible exception of target-level variance in accuracy for agreeableness (V ar(u1ik = .11), suggesting only small individual differences in accuracy across targets and perceivers. The results are thus consistent with a small degree of accuracy which is indistinguishable from chance guessing in many cases, and which varies little across targets and perceivers. 100 Table 21 Random Effects for Accuracy Models domain term effect estimate agreeableness var(u0jk) intercept | perceiver:block 0.59 agreeableness var(u1jk) accuracy slope | perceiver:block 0.05 agreeableness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.17 agreeableness var(u0ik) intercept | target:block 1.85 agreeableness var(u1ik) accuracy slope | target:block 0.11 agreeableness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.45 agreeableness var(u00k) intercept | block 0.00 agreeableness var(u10k) accuracy slope | block 0.00 agreeableness cov(u00k, u10k) intercept | block, accuracy slope | block 0.00 agreeableness var(eijk) residual 0.47 coscientiousness var(u0jk) intercept | perceiver:block 0.11 coscientiousness var(u1jk) accuracy slope | perceiver:block 0.01 101 Table 21 continued domain term effect estimate coscientiousness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.04 coscientiousness var(u0ik) intercept | target:block 0.20 coscientiousness var(u1ik) accuracy slope | target:block 0.00 coscientiousness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.01 coscientiousness var(u00k) intercept | block 0.05 coscientiousness var(u10k) accuracy slope | block 0.01 coscientiousness cov(u00k, u10k) intercept | block, accuracy slope | block -0.02 coscientiousness var(eijk) residual 0.51 honest var(u0jk) intercept | perceiver:block 1.05 honest var(u1jk) accuracy slope | perceiver:block 0.07 honest cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.26 honest var(u0ik) intercept | target:block 0.21 102 Table 21 continued domain term effect estimate honest var(u1ik) accuracy slope | target:block 0.01 honest cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.05 honest var(u00k) intercept | block 0.11 honest var(u10k) accuracy slope | block 0.01 honest cov(u00k, u10k) intercept | block, accuracy slope | block -0.04 honest var(eijk) residual 0.55 neuroticism var(u0jk) intercept | perceiver:block 0.02 neuroticism var(u1jk) accuracy slope | perceiver:block 0.00 neuroticism cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block 0.00 neuroticism var(u0ik) intercept | target:block 0.11 neuroticism var(u1ik) accuracy slope | target:block 0.00 neuroticism cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block 0.00 103 Table 21 continued domain term effect estimate neuroticism var(u00k) intercept | block 0.00 neuroticism var(u10k) accuracy slope | block 0.00 neuroticism cov(u00k, u10k) intercept | block, accuracy slope | block 0.00 neuroticism var(eijk) residual 0.45 extraversion var(u0jk) intercept | perceiver:block 0.45 extraversion var(u1jk) accuracy slope | perceiver:block 0.02 extraversion cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.10 extraversion var(u0ik) intercept | target:block 0.15 extraversion var(u1ik) accuracy slope | target:block 0.00 extraversion cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block 0.02 extraversion var(u00k) intercept | block 0.20 extraversion var(u10k) accuracy slope | block 0.01 104 Table 21 continued domain term effect estimate extraversion cov(u00k, u10k) intercept | block, accuracy slope | block -0.05 extraversion var(eijk) residual 0.71 openness var(u0jk) intercept | perceiver:block 0.02 openness var(u1jk) accuracy slope | perceiver:block 0.00 openness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block 0.01 openness var(u0ik) intercept | target:block 0.48 openness var(u1ik) accuracy slope | target:block 0.02 openness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.09 openness var(u00k) intercept | block 0.00 openness var(u10k) accuracy slope | block 0.00 openness cov(u00k, u10k) intercept | block, accuracy slope | block 0.00 openness var(eijk) residual 0.49 105 Table 21 continued domain term effect estimate Note. perceiver:block refers to perceivers nested in blocks; target:block refers to targets nested in blocks. 106 Accuracy vs. Idealization. I next examined the extent to which targets’ self-presentation goals affect the accuracy of twitter-profile-based perceptions. To do so, I added target self-presentation (SPik; i.e., how they wish they’d be seen on twitter) to the model, allowing its effect to vary randomly across targets, perceivers, and block, as shown in Equation (4) shown below: Level1 : (4) Yijk = π0ijk + π1ijkSRik + β2ijkSPik + eijk Level2 : π0ijk = β00k + r0ik + r0jk π1ijk = β10k + r1ik + r1jk π2ijk = β20k + r2ik + r2jk Level3 : β00k = γ000 + u00k β10k = γ100 + u10k β20k = γ200 + u20k Evidence for self-idealization corresponds to the magnitude of the self-presentation slope,γ 200, analogous to (Back et al., 2010). If profiles communicate how people are, not how they wish to be seen, then adding self-presentation to the 107 model should result in virtually no change to the accuracy slope (γ100) and near-zero estimate for the self-presentation slope (γ200). At the other extreme, if profiles communicate how people wish to be seen, we should see the accuracy slope reduce to near zero and the self-presentation slope to be greater than zero. The results of these models can be seen in Figure 20, which shows accuracy (circles) and idealization (triangles) for each of the Big Six. Table 22 shows the random effects around these estimates. Although most of these slopes did not cross the threshold for significance, perceptions were more closer to targets’ ideal personality for conscientiousness and honesty, similarly influenced by both real and ideal personality for agreeableness and extraversion, and more influenced by targets’ real personality for neuroticism and openness. Random effects were generally small, suggesting small systematic variability in these effects across targets and perceivers. The results thus suggest profile-based perceptions are influenced by both what targets say they’re like and how they’d ideally be seen on Twitter, with the relative contribution of each differing across domains. FiguArec 2c0u .r aAccycuXraDcye vnss.i tIdye.aliIzaetxiaomn iinne dPetrhceepextitoennst Btoaswehdi cohn tThwe idtetenrs iPtyrooffilteargets’ follower network affects accuracy by including density and the interaction between density (dik) and self-reported personality domains as predictors in a mixed effects model, creating Equation (5). 108 Table 22 Random Effects for Accuracy vs. Idealization Models domain term effect estimate agreeableness var(u0jk) intercept | perceiver:block 0.65 agreeableness var(u1jk) accuracy slope | perceiver:block 0.08 agreeableness var(u2jk) idealization slope | perceiver:block 0.04 agreeableness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.15 agreeableness cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block -0.04 agreeableness cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block -0.03 agreeableness var(u0ik) intercept | target:block 2.02 agreeableness var(u1ik) accuracy slope | target:block 0.09 agreeableness var(u2ik) idealization slope | target:block 0.01 agreeableness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.39 agreeableness cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block -0.11 agreeableness cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block 0.01 109 Table 22 continued domain term effect estimate agreeableness var(u00k) intercept | block 0.06 agreeableness var(u10k) accuracy slope | block 0.08 agreeableness var(u20k) idealization slope | block 0.04 agreeableness cov(u00k, u10k) intercept | block, accuracy slope | block -0.07 agreeableness cov(u00k, u20k) intercept | block, idealization slope | block 0.05 agreeableness cov(u10k, u20k) accuracy slope | block, idealization slope | block -0.06 agreeableness var(eijk) residual 0.46 coscientiousness var(u0jk) intercept | perceiver:block 0.05 coscientiousness var(u1jk) accuracy slope | perceiver:block 0.03 coscientiousness var(u2jk) idealization slope | perceiver:block 0.01 coscientiousness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.04 coscientiousness cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block 0.02 110 Table 22 continued domain term effect estimate coscientiousness cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block -0.02 coscientiousness var(u0ik) intercept | target:block 0.50 coscientiousness var(u1ik) accuracy slope | target:block 0.00 coscientiousness var(u2ik) idealization slope | target:block 0.03 coscientiousness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block 0.00 coscientiousness cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block -0.11 coscientiousness cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block 0.00 coscientiousness var(u00k) intercept | block 0.08 coscientiousness var(u10k) accuracy slope | block 0.00 coscientiousness var(u20k) idealization slope | block 0.00 coscientiousness cov(u00k, u10k) intercept | block, accuracy slope | block -0.02 coscientiousness cov(u00k, u20k) intercept | block, idealization slope | block -0.01 111 Table 22 continued domain term effect estimate coscientiousness cov(u10k, u20k) accuracy slope | block, idealization slope | block 0.00 coscientiousness var(eijk) residual 0.50 honest var(u0jk) intercept | perceiver:block 0.66 honest var(u1jk) accuracy slope | perceiver:block 0.19 honest var(u2jk) idealization slope | perceiver:block 0.05 honest cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.33 honest cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block 0.18 honest cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block -0.10 honest var(u0ik) intercept | target:block 0.72 honest var(u1ik) accuracy slope | target:block 0.01 honest var(u2ik) idealization slope | target:block 0.01 honest cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.09 112 Table 22 continued domain term effect estimate honest cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block -0.10 honest cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block 0.01 honest var(u00k) intercept | block 0.03 honest var(u10k) accuracy slope | block 0.06 honest var(u20k) idealization slope | block 0.09 honest cov(u00k, u10k) intercept | block, accuracy slope | block 0.04 honest cov(u00k, u20k) intercept | block, idealization slope | block -0.05 honest cov(u10k, u20k) accuracy slope | block, idealization slope | block -0.07 honest var(eijk) residual 0.53 neuroticism var(u0jk) intercept | perceiver:block 0.06 neuroticism var(u1jk) accuracy slope | perceiver:block 0.00 neuroticism var(u2jk) idealization slope | perceiver:block 0.01 113 Table 22 continued domain term effect estimate neuroticism cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block 0.00 neuroticism cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block -0.02 neuroticism cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block 0.00 neuroticism var(u0ik) intercept | target:block 0.25 neuroticism var(u1ik) accuracy slope | target:block 0.01 neuroticism var(u2ik) idealization slope | target:block 0.03 neuroticism cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.03 neuroticism cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block -0.03 neuroticism cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block 0.00 neuroticism var(u00k) intercept | block 0.07 neuroticism var(u10k) accuracy slope | block 0.00 neuroticism var(u20k) idealization slope | block 0.01 114 Table 22 continued domain term effect estimate neuroticism cov(u00k, u10k) intercept | block, accuracy slope | block 0.00 neuroticism cov(u00k, u20k) intercept | block, idealization slope | block -0.03 neuroticism cov(u10k, u20k) accuracy slope | block, idealization slope | block 0.00 neuroticism var(eijk) residual 0.44 extraversion var(u0jk) intercept | perceiver:block 0.36 extraversion var(u1jk) accuracy slope | perceiver:block 0.04 extraversion var(u2jk) idealization slope | perceiver:block 0.03 extraversion cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.09 extraversion cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block 0.01 extraversion cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block -0.02 extraversion var(u0ik) intercept | target:block 0.22 extraversion var(u1ik) accuracy slope | target:block 0.04 115 Table 22 continued domain term effect estimate extraversion var(u2ik) idealization slope | target:block 0.07 extraversion cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block 0.09 extraversion cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block -0.12 extraversion cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block -0.05 extraversion var(u00k) intercept | block 0.22 extraversion var(u10k) accuracy slope | block 0.05 extraversion var(u20k) idealization slope | block 0.01 extraversion cov(u00k, u10k) intercept | block, accuracy slope | block -0.11 extraversion cov(u00k, u20k) intercept | block, idealization slope | block 0.05 extraversion cov(u10k, u20k) accuracy slope | block, idealization slope | block -0.02 extraversion var(eijk) residual 0.70 openness var(u0jk) intercept | perceiver:block 0.12 116 Table 22 continued domain term effect estimate openness var(u1jk) accuracy slope | perceiver:block 0.01 openness var(u2jk) idealization slope | perceiver:block 0.01 openness cov(u0jk, u1jk) intercept | perceiver:block, accuracy slope | perceiver:block -0.04 openness cov(u0jk, u2jk) intercept | perceiver:block, idealization slope | perceiver:block 0.01 openness cov(u1jk, u2jk) accuracy slope | perceiver:block, idealization slope | perceiver:block 0.00 openness var(u0ik) intercept | target:block 0.19 openness var(u1ik) accuracy slope | target:block 0.03 openness var(u2ik) idealization slope | target:block 0.01 openness cov(u0ik, u1ik) intercept | target:block, accuracy slope | target:block -0.04 openness cov(u0ik, u2ik) intercept | target:block, idealization slope | target:block 0.02 openness cov(u1ik, u2ik) accuracy slope | target:block, idealization slope | target:block -0.02 openness var(u00k) intercept | block 0.06 117 Table 22 continued domain term effect estimate openness var(u10k) accuracy slope | block 0.00 openness var(u20k) idealization slope | block 0.00 openness cov(u00k, u10k) intercept | block, accuracy slope | block 0.00 openness cov(u00k, u20k) intercept | block, idealization slope | block -0.01 openness cov(u10k, u20k) accuracy slope | block, idealization slope | block 0.00 openness var(eijk) residual 0.48 Note. perceiver:block refers to perceivers nested in blocks; target:block refers to targets nested in blocks. 118 Level1 : (5) Yijk = π0ijk + β1ijkSRik + β2ijkdik + β3ijkSRik ∗ dik + eijk Level2 : π0ijk = β00k + r0ik + r0jk π1ijk = β10k + r1ik + r1jk π2ijk = β20k + r2ik + r2jk π3ijk = β30k + r3ik + r3jk Level3 : β00k = γ000 + u00k β10k = γ100 + u10k β20k = γ200 + u20k β30k = γ300 + u30k The interaction term,γ300, is the critical test of the hypothesized effect of 119 density on accuracy. The fixed effects from these models are shown in Table 23 and random effects are shown in Table 24. The interaction term was not significant for any of the Big Six and most of the CIs ranged from large negative to large positive values. We thus found no evidence in favor of density moderating accuracy, though the CIs are large enough as to be consistent with a moderate positive, negative effect, or no effect. Consequences for Accuracy & Idealization on Targets’ Well-Being. To examine the consequences that being perceived accurately (ideally) on Twitter has on well-being, I ran a series of response surface analyses, predicting targets’ self-reported well-being from their self-reported personality (self-presentation goals) and average perceiver-rated personality, separately for each Big Six domain. For idealization, we controlled for self-reports to mirror the idealization effects shown previously. The surface parameters for accuracy are shown in Table 25 and surface plots are shown in Figure 21. Surface parameters were small and indistinguishable from zero for Agreeableness and Openness. Conscientiousness was characterized by large a1 and a4 values, though the latter’s CI did overlap with zero, which together suggest well- being is higher for targets that are higher (vs. lower) in self-reported and perceived conscientiousness (a1), but that accuracy (matching self- and perceived-conscientiousness) is generally associated with lower well-being (a4). 120 Table 23 Results from Density X Accuracy Models domain effect term estimate SE t df p CI LL CI UL agreeableness accuracy γ100 0.18 0.07 2.53 32.68 .016 0.04 0.34 agreeableness density γ200 1.04 10.44 0.10 7.21 .923 -17.76 24.76 agreeableness accuracy * density γ300 14.25 15.40 0.93 5.71 .392 -18.29 47.36 conscientiousness accuracy γ100 0.01 0.07 0.11 17.95 .917 -0.14 0.14 conscientiousness density γ200 -3.35 9.80 -0.34 6.17 .744 -26.31 30.28 conscientiousness accuracy * density γ300 -17.05 23.99 -0.71 3.97 .517 -87.80 47.85 extraversion accuracy γ100 0.12 0.08 1.51 0.18 .708 -0.05 0.31 extraversion density γ200 -5.15 26.56 -0.19 0.03 .975 -68.50 62.08 extraversion accuracy * density γ300 2.87 36.93 0.08 0.02 .988 -90.08 92.34 honesty accuracy γ100 0.08 0.06 1.38 15.05 .187 -0.05 0.21 honesty density γ200 -18.29 12.22 -1.50 0.87 .400 -48.19 8.10 honesty accuracy * density γ300 28.05 27.46 1.02 0.76 .533 -47.28 99.09 121 Table 23 continued domain effect term estimate SE t df p CI LL CI UL neuroticism accuracy γ100 0.08 0.05 1.63 95.40 .106 -0.03 0.18 neuroticism density γ200 15.42 8.58 1.80 3.81 .150 -4.57 36.42 neuroticism accuracy * density γ300 -14.99 8.92 -1.68 2.96 .193 -36.83 6.46 openness accuracy γ100 0.12 0.06 1.95 65.08 .056 -0.01 0.25 openness density γ200 1.17 7.34 0.16 2.16 .887 -20.56 23.92 openness accuracy * density γ300 16.51 14.83 1.11 1.99 .382 -18.35 53.47 Note. Effect sizes are unstandardized. CI LL and CI UL correspond to the lower and upper limtis of the 95 percent CI respectively. 122 Table 24 Random Effects for Density X Accuracy Models domain term effect estimate agreeableness var(u0jk) intercept | perceiver:block 0.02 agreeableness var(u1jk) accuracy | perceiver:block 0.05 agreeableness var(u2jk) density | perceiver:block 33.23 agreeableness var(u3jk) accuracy * density | perceiver:block 135.52 agreeableness cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block 0.02 agreeableness cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block 0.02 agreeableness cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block -0.01 agreeableness cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block -1.01 agreeableness cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block 2.05 agreeableness cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block -67.10 agreeableness var(u0ik) intercept | target:block 0.05 agreeableness var(u1ik) accuracy | target:block 0.10 123 Table 24 continued domain term effect estimate agreeableness var(u2ik) density | target:block 684.42 agreeableness var(u3ik) accuracy * density | target:block 2,699.84 agreeableness cov(u0ik, u1ik) intercept | target:block, accuracy | target:block -0.01 agreeableness cov(u0ik, u2ik) intercept | target:block, density | target:block -4.40 agreeableness cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block 4.03 agreeableness cov(u1ik, u2ik) accuracy | target:block, density | target:block 6.11 agreeableness cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block -15.67 agreeableness cov(u2ik, u3ik) density | target:block, accuracy * density | target:block -1,200.52 agreeableness var(u00k) intercept | block 0.00 agreeableness var(eijk) residual 0.47 coscientiousness var(u0jk) intercept | perceiver:block 0.03 coscientiousness var(u1jk) accuracy | perceiver:block 0.01 124 Table 24 continued domain term effect estimate coscientiousness var(u2jk) density | perceiver:block 263.06 coscientiousness var(u3jk) accuracy * density | perceiver:block 1,315.21 coscientiousness cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block 0.01 coscientiousness cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block -0.07 coscientiousness cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block 0.96 coscientiousness cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block 1.33 coscientiousness cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block -2.60 coscientiousness cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block -582.57 coscientiousness var(u0ik) intercept | target:block 0.13 coscientiousness var(u1ik) accuracy | target:block 0.00 coscientiousness var(u2ik) density | target:block 6.85 coscientiousness var(u3ik) accuracy * density | target:block 2,172.48 125 Table 24 continued domain term effect estimate coscientiousness cov(u0ik, u1ik) intercept | target:block, accuracy | target:block -0.02 coscientiousness cov(u0ik, u2ik) intercept | target:block, density | target:block 0.65 coscientiousness cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block -16.47 coscientiousness cov(u1ik, u2ik) accuracy | target:block, density | target:block -0.13 coscientiousness cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block 2.67 coscientiousness cov(u2ik, u3ik) density | target:block, accuracy * density | target:block -97.10 coscientiousness var(u00k) intercept | block 0.01 coscientiousness var(u10k) accuracy | block 0.01 coscientiousness var(u20k) density | block 5.85 coscientiousness var(u30k) accuracy * density | block 23.51 coscientiousness cov(u00k, u10k) intercept | block, accuracy | block 0.01 coscientiousness cov(u00k, u20k) intercept | block, density | block 0.25 126 Table 24 continued domain term effect estimate coscientiousness cov(u00k, u30k) intercept | block, accuracy * density | block -0.50 coscientiousness cov(u10k, u20k) accuracy | block, density | block 0.24 coscientiousness cov(u10k, u30k) accuracy | block, accuracy * density | block -0.49 coscientiousness cov(u20k, u30k) density | block, accuracy * density | block -11.63 coscientiousness var(eijk) residual 0.50 honest var(u0jk) intercept | perceiver:block 0.10 honest var(u1jk) accuracy | perceiver:block 0.07 honest var(u2jk) density | perceiver:block 6.78 honest var(u3jk) accuracy * density | perceiver:block 12.12 honest cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block 0.01 honest cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block 0.83 honest cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block 0.66 127 Table 24 continued domain term effect estimate honest cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block 0.10 honest cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block 0.69 honest cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block 5.43 honest var(u0ik) intercept | target:block 0.02 honest var(u1ik) accuracy | target:block 0.03 honest var(u2ik) density | target:block 238.03 honest var(u3ik) accuracy * density | target:block 12.84 honest cov(u0ik, u1ik) intercept | target:block, accuracy | target:block 0.01 honest cov(u0ik, u2ik) intercept | target:block, density | target:block 2.23 honest cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block 0.39 honest cov(u1ik, u2ik) accuracy | target:block, density | target:block 0.65 honest cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block 0.16 128 Table 24 continued domain term effect estimate honest cov(u2ik, u3ik) density | target:block, accuracy * density | target:block 23.44 honest var(u00k) intercept | block 0.00 honest var(eijk) residual 0.55 neuroticism var(u0jk) intercept | perceiver:block 0.04 neuroticism var(u1jk) accuracy | perceiver:block 0.00 neuroticism var(u2jk) density | perceiver:block 459.91 neuroticism var(u3jk) accuracy * density | perceiver:block 481.88 neuroticism cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block 0.01 neuroticism cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block -0.74 neuroticism cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block 1.34 neuroticism cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block -0.12 neuroticism cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block 0.22 129 Table 24 continued domain term effect estimate neuroticism cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block -465.71 neuroticism var(u0ik) intercept | target:block 0.10 neuroticism var(u1ik) accuracy | target:block 0.00 neuroticism var(u2ik) density | target:block 315.16 neuroticism var(u3ik) accuracy * density | target:block 30.96 neuroticism cov(u0ik, u1ik) intercept | target:block, accuracy | target:block 0.01 neuroticism cov(u0ik, u2ik) intercept | target:block, density | target:block -5.48 neuroticism cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block 1.71 neuroticism cov(u1ik, u2ik) accuracy | target:block, density | target:block -0.33 neuroticism cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block 0.10 neuroticism cov(u2ik, u3ik) density | target:block, accuracy * density | target:block -98.51 neuroticism var(u00k) intercept | block 0.01 130 Table 24 continued domain term effect estimate neuroticism var(u10k) accuracy | block 0.00 neuroticism var(u20k) density | block 8.94 neuroticism var(u30k) accuracy * density | block 55.85 neuroticism cov(u00k, u10k) intercept | block, accuracy | block 0.00 neuroticism cov(u00k, u20k) intercept | block, density | block -0.26 neuroticism cov(u00k, u30k) intercept | block, accuracy * density | block -0.66 neuroticism cov(u10k, u20k) accuracy | block, density | block 0.04 neuroticism cov(u10k, u30k) accuracy | block, accuracy * density | block 0.10 neuroticism cov(u20k, u30k) density | block, accuracy * density | block 22.34 neuroticism var(eijk) residual 0.44 extraversion var(u0jk) intercept | perceiver:block 0.05 extraversion var(u1jk) accuracy | perceiver:block 0.02 131 Table 24 continued domain term effect estimate extraversion var(u2jk) density | perceiver:block 41.37 extraversion var(u3jk) accuracy * density | perceiver:block 250.47 extraversion cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block -0.03 extraversion cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block 1.22 extraversion cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block 3.40 extraversion cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block -0.83 extraversion cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block -2.31 extraversion cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block 97.08 extraversion var(u0ik) intercept | target:block 0.27 extraversion var(u1ik) accuracy | target:block 0.00 extraversion var(u2ik) density | target:block 133.91 extraversion var(u3ik) accuracy * density | target:block 39.15 132 Table 24 continued domain term effect estimate extraversion cov(u0ik, u1ik) intercept | target:block, accuracy | target:block 0.03 extraversion cov(u0ik, u2ik) intercept | target:block, density | target:block 5.68 extraversion cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block 2.57 extraversion cov(u1ik, u2ik) accuracy | target:block, density | target:block 0.56 extraversion cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block 0.26 extraversion cov(u2ik, u3ik) density | target:block, accuracy * density | target:block 67.21 extraversion var(u00k) intercept | block 0.01 extraversion var(u10k) accuracy | block 0.01 extraversion var(u20k) density | block 133.77 extraversion var(u30k) accuracy * density | block 2.89 extraversion cov(u00k, u10k) intercept | block, accuracy | block -0.01 extraversion cov(u00k, u20k) intercept | block, density | block 1.13 133 Table 24 continued domain term effect estimate extraversion cov(u00k, u30k) intercept | block, accuracy * density | block -0.08 extraversion cov(u10k, u20k) accuracy | block, density | block -1.19 extraversion cov(u10k, u30k) accuracy | block, accuracy * density | block 0.09 extraversion cov(u20k, u30k) density | block, accuracy * density | block -9.99 extraversion var(eijk) residual 0.71 openness var(u0jk) intercept | perceiver:block 0.11 openness var(u1jk) accuracy | perceiver:block 0.00 openness var(u2jk) density | perceiver:block 242.56 openness var(u3jk) accuracy * density | perceiver:block 1,418.78 openness cov(u0jk, u1jk) intercept | perceiver:block, accuracy | perceiver:block 0.02 openness cov(u0jk, u2jk) intercept | perceiver:block, density | perceiver:block 1.29 openness cov(u0jk, u3jk) intercept | perceiver:block, accuracy * density | perceiver:block 0.39 134 Table 24 continued domain term effect estimate openness cov(u1jk, u2jk) accuracy | perceiver:block, density | perceiver:block 0.19 openness cov(u1jk, u3jk) accuracy | perceiver:block, accuracy * density | perceiver:block 0.06 openness cov(u2jk, u3jk) density | perceiver:block, accuracy * density | perceiver:block -563.33 openness var(u0ik) intercept | target:block 0.13 openness var(u1ik) accuracy | target:block 0.00 openness var(u2ik) density | target:block 178.43 openness var(u3ik) accuracy * density | target:block 140.94 openness cov(u0ik, u1ik) intercept | target:block, accuracy | target:block 0.00 openness cov(u0ik, u2ik) intercept | target:block, density | target:block -4.83 openness cov(u0ik, u3ik) intercept | target:block, accuracy * density | target:block -4.16 openness cov(u1ik, u2ik) accuracy | target:block, density | target:block 0.07 openness cov(u1ik, u3ik) accuracy | target:block, accuracy * density | target:block 0.06 135 Table 24 continued domain term effect estimate openness cov(u2ik, u3ik) density | target:block, accuracy * density | target:block 153.16 openness var(u00k) intercept | block 0.00 openness var(u10k) accuracy | block 0.00 openness var(u20k) density | block 10.70 openness var(u30k) accuracy * density | block 145.86 openness cov(u00k, u10k) intercept | block, accuracy | block 0.00 openness cov(u00k, u20k) intercept | block, density | block 0.00 openness cov(u00k, u30k) intercept | block, accuracy * density | block 0.00 openness cov(u10k, u20k) accuracy | block, density | block 0.00 openness cov(u10k, u30k) accuracy | block, accuracy * density | block -0.01 openness cov(u20k, u30k) density | block, accuracy * density | block -39.25 openness var(eijk) residual 0.48 136 Table 24 continued domain term effect estimate Note. perceiver:block refers to perceivers nested in blocks; target:block refers to targets nested in blocks. 137 Honesty was characterized by a large, positive a4, suggesting that well-being is higher the more self-reports and perceptions of targets’ honesty depart from one another; a5 was large and significant, suggesting that a s trict ( in)congruence hypothesis i s not, however, met. Neuroticism had a large negative a1, a large positive a2, and a large negative a3, suggesting well-being is higher when both self-reported and perceived neuroticism are lower (rather than higher; a1), that accuracy is associated with greater well-being at the scale extremes (vs. middle; a2), and that well-being is higher when perceived neuroticism is higher than self-reported neuroticism. Together with the graph in Figure 21, it is apparent that well-being is lowest for people who are high in neuroticism but come across as low in neuroticism; virtually every other combination is similarly high in well-being. Extraversion was characterized by large, positive a1 and a3 values, suggesting that well-being is higher when both self-reported and perceived extraversion are higher (rather than lower; a1) and that well-being is higher when self-reported extraversion is greater than perceived extraversion. Together with Figure 21, these results suggest a strong main e˙ect of self- reported extraversion, with a small benefit for being (accurately) perceived as higher in extraversion. Turning to idealization, the surface parameters for these e˙ects are shown in Table 26 and surface plots are shown in Figure 22, where it is apparent that these e˙ects were generally small and indistinguishable from zero, with honesty being the major exception. Honesty was characterized by a large, positive a2 value and a large negative a4 value, suggesting that well-being is associated with idealization at more extreme values (a2) and well-being increases as idealization increases (a4). 138 Table 25 Surface Parameters for Accuracy & Self-Reported Well-Being RSA Surface Parameter estimate CI LL CI UL p agreeableness a1 0.26 -0.50 1.02 .499 agreeableness a2 -0.20 -1.85 1.45 .812 agreeableness a3 0.59 -0.28 1.47 .185 agreeableness a4 -1.17 -2.81 0.48 .165 agreeableness a5 -0.32 -1.77 1.13 .666 conscientiousness a1 0.70 0.17 1.22 .010 conscientiousness a2 -0.68 -1.58 0.22 .138 conscientiousness a3 0.27 -0.26 0.80 .321 conscientiousness a4 1.11 -0.06 2.27 .062 conscientiousness a5 0.21 -0.56 0.98 .590 honesty a1 0.48 -0.49 1.45 .332 honesty a2 1.16 -0.22 2.54 .099 honesty a3 -0.29 -1.28 0.69 .557 honesty a4 2.80 0.66 4.93 .010 honesty a5 -2.09 -3.50 -0.68 .004 neuroticism a1 -0.75 -1.19 -0.31 .001 neuroticism a2 1.02 0.02 2.02 .045 neuroticism a3 -0.95 -1.49 -0.40 .001 neuroticism a4 -0.17 -1.59 1.26 .820 139 Table 25 continued Surface Parameter estimate CI LL CI UL p neuroticism a5 -0.24 -1.23 0.75 .633 extraversion a1 0.82 0.47 1.17 < .001 extraversion a2 -0.04 -0.57 0.48 .871 extraversion a3 0.85 0.40 1.31 < .001 extraversion a4 -0.14 -0.97 0.68 .731 extraversion a5 -0.20 -0.72 0.33 .460 openness a1 0.09 -0.51 0.68 .777 openness a2 -0.19 -0.87 0.49 .584 openness a3 0.05 -0.68 0.78 .889 openness a4 0.12 -1.49 1.73 .883 openness a5 -0.37 -1.20 0.45 .376 Note. CI LL and CI UL are the lower and upper limits of the 95 percent CI. Consequences for Accuracy & Idealization on Targets’ Likability. To examine the consequences that being perceived accurately (ideally) on Twitter has on likability, I ran a series of multi-level response surface analyses, predicting each perceiver i’s rating of target j’s likability from target j’s self-reported personality (self-presentation goals) and perceiver i’s rating of target j’s personality, separately for each Big Six domain. For idealization, we controlled for self-reports to mirror the idealization effects shown previously. 140 Figure 21 . Accuracy and Well-Being Response Surface Plots Figure 22. Idealization and Well-Being Response Surface Plots 141 Table 26 Surface Parameters for Idealization & Self-Reported Well-Being RSA Surface Parameter estimate CI LL CI UL p agreeableness a1 0.01 -0.68 0.70 .982 agreeableness a2 -0.07 -1.47 1.33 .920 agreeableness a3 0.67 -0.39 1.72 .214 agreeableness a4 0.13 -1.69 1.95 .891 agreeableness a5 0.62 -0.28 1.52 .178 conscientiousness a1 0.20 -0.31 0.72 .443 conscientiousness a2 -0.38 -1.17 0.40 .338 conscientiousness a3 -0.05 -0.75 0.65 .886 conscientiousness a4 -0.48 -1.52 0.57 .372 conscientiousness a5 0.75 0.31 1.18 .001 honesty a1 0.31 -0.61 1.22 .507 honesty a2 1.50 0.16 2.84 .029 honesty a3 -0.58 -1.60 0.44 .266 honesty a4 -2.68 -4.73 -0.64 .010 honesty a5 0.72 -0.49 1.93 .243 neuroticism a1 -0.01 -0.62 0.59 .961 neuroticism a2 0.77 -0.28 1.82 .151 neuroticism a3 -0.25 -0.86 0.36 .414 neuroticism a4 -0.50 -1.95 0.94 .496 142 Table 26 continued Surface Parameter estimate CI LL CI UL p neuroticism a2 0.77 -0.28 1.82 .151 neuroticism a3 -0.25 -0.86 0.36 .414 neuroticism a4 -0.50 -1.95 0.94 .496 neuroticism a5 -0.08 -0.47 0.30 .676 extraversion a1 -0.08 -0.49 0.34 .723 extraversion a2 -0.01 -0.66 0.64 .987 extraversion a3 -0.04 -0.52 0.43 .854 extraversion a4 -0.33 -1.15 0.49 .430 extraversion a5 0.20 -0.25 0.65 .383 openness a1 0.08 -0.78 0.94 .860 openness a2 -0.32 -1.54 0.89 .604 openness a3 -0.17 -0.93 0.58 .652 openness a4 -0.18 -1.60 1.25 .808 openness a5 0.25 -0.64 1.14 .579 Note. CI LL and CI UL are the lower and upper limits of the 95 percent CI. The surface parameters for accuracy are shown in Table 27 and the corresponding surface plots are shown in Figure 23. With the exception of openness, the pattern of results is the same across the Big Six, with a positive a1 and negative 143 a3 (directions are reversed for neuroticism), suggesting that perceivers like targets that are on the more desirable end of the personality domain according to self- and perceiver-reports (a1), and like targets more that they mis-perceive as being on the more desirable end (a3). This pattern of results is effectively a main effect of perceived personality, suggesting that perceivers liked targets more if they perceived them more desirably, whether that was accurate (a1) or not (a3). Figure 23 . Accuracy and Likability Surface Plots Turning to idealization, surface parameters for idealization are shown in Table 28 and the corresponding surface plots are shown in Figure 23. We saw virtually the same pattern of results for idealization that we did for accuracy, where all but openness have positive a1 and negative a3 values (reversed for neuroticism). This, as with accuracy, suggests a main effect whereby more positive perceptions are associated with liking 144 Table 27 Surface Parameters for Accuracy and Likability MLRSA Surface Parameter estimate CI LL CI UL p agreeableness a1 0.56 0.44 0.68 < .001 agreeableness a2 -0.16 -0.34 0.02 .073 agreeableness a3 -0.41 -0.55 -0.28 < .001 agreeableness a4 -0.11 -0.32 0.09 .281 agreeableness a5 -0.06 -0.22 0.09 .440 conscientiousness a1 0.28 0.17 0.40 < .001 conscientiousness a2 0.00 -0.14 0.13 .961 conscientiousness a3 -0.32 -0.45 -0.19 < .001 conscientiousness a4 -0.07 -0.22 0.09 .401 conscientiousness a5 0.07 -0.04 0.18 .197 honesty a1 0.31 0.15 0.47 < .001 honesty a2 -0.07 -0.27 0.12 .449 honesty a3 -0.33 -0.50 -0.17 < .001 honesty a4 -0.04 -0.26 0.17 .682 honesty a5 0.00 -0.18 0.17 .963 neuroticism a1 -0.29 -0.41 -0.18 < .001 neuroticism a2 0.06 -0.09 0.21 .459 145 T able 2 7 continued Surface Parameter estimate CI LL CI UL p neuroticism a3 0.23 0.09 0.36 .002 neuroticism a4 -0.01 -0.18 0.16 .911 neuroticism a5 -0.13 -0.27 0.00 .050 extraversion a1 0.25 0.15 0.35 < .001 extraversion a2 -0.04 -0.15 0.07 .509 extraversion a3 -0.18 -0.30 -0.05 .005 extraversion a4 0.00 -0.16 0.15 .954 extraversion a5 0.01 -0.09 0.12 .827 openness a1 0.22 0.08 0.36 .002 openness a2 0.00 -0.17 0.16 .956 openness a3 -0.10 -0.26 0.06 .215 openness a4 -0.08 -0.28 0.11 .387 openness a5 0.11 -0.04 0.26 .152 Note. CI LL and CI UL are the lower and upper limits of the 95 percent CI. whether they match targets’ ideal (a1) or not (a3). Openness again breaks from this pattern, but in this case is characterized by a moderate positive a1 and negative a4. This suggests that idealization is associated with liking when targets are self- presenting as higher in openness (a1) and that liking is higher the more perceptions match targets’ self-presentation goals (a4). 146 Table 28 Surface Parameters for Idealization and Likability MLRSA Surface Parameter estimate CI LL CI UL p agreeableness a1 0.53 0.40 0.66 < .001 agreeableness a2 0.00 -0.16 0.15 .976 agreeableness a3 -0.41 -0.56 -0.26 < .001 agreeableness a4 0.07 -0.10 0.24 .418 agreeableness a5 0.11 -0.02 0.24 .108 conscientiousness a1 0.30 0.17 0.43 < .001 conscientiousness a2 0.02 -0.11 0.16 .731 conscientiousness a3 -0.31 -0.45 -0.18 < .001 conscientiousness a4 -0.11 -0.24 0.02 .096 conscientiousness a5 0.07 -0.04 0.18 .197 honesty a1 0.33 0.17 0.49 < .001 honesty a2 -0.11 -0.31 0.08 .262 honesty a3 -0.31 -0.47 -0.14 < .001 honesty a4 -0.03 -0.25 0.19 .784 honesty a5 0.00 -0.17 0.18 .975 neuroticism a1 -0.25 -0.38 -0.11 < .001 neuroticism a2 0.06 -0.07 0.19 .385 neuroticism a3 0.26 0.12 0.40 < .001 147 Table 28 continued Surface Parameter estimate CI LL CI UL p neuroticism a4 0.05 -0.11 0.21 .517 neuroticism a5 -0.12 -0.24 -0.01 .041 extraversion a1 0.15 0.04 0.27 .009 extraversion a2 -0.05 -0.15 0.05 .348 extraversion a3 -0.26 -0.39 -0.13 < .001 extraversion a4 -0.07 -0.19 0.06 .295 extraversion a5 -0.03 -0.12 0.07 .593 openness a1 0.29 0.11 0.48 .002 openness a2 0.02 -0.16 0.20 .820 openness a3 -0.03 -0.22 0.16 .776 openness a4 -0.21 -0.41 -0.02 .033 openness a5 0.09 -0.07 0.25 .248 Note. CI LL and CI UL are the lower and upper limits of the 95 percent CI. Discussion Study 3 was aimed at examining the extent to which twitter profiles communicate a consistent, accurate, and/or idealized impression of individuals’ personalities, and additionally provide insight into social functions of how people present themselves on Twitter. Findings indicate an appreciable degree of consensus, suggesting that 148 Figure 24 . Idealization and Likability Surface Plots perceivers largely agree what targets are like based on their profiles. However, these impressions reach only a small degree of accuracy across the board, which is indistinguishable from chance guessing for most domains. The lack of accuracy is not explained by idealization, as perceptions were not overwhelmingly influenced by how targets want to be seen. Furthermore, accuracy appears to be una˙ected by follower- network density, and showed very little systematic variation across targets and perceivers more generally. Finally, we found that profile-based perceptions relate to well-being and likability in some more and less straightforward ways. These findings have implications for how people present themselves on Twitter and online environments more generally and why they do so. These findings have implications for how p eople present themselves 149 on Twitter and online environments more generally and why they do so. Our findings are somewhat opposed to the work by (Back et al., 2010), which found that perceptions based on Facebook profiles are more s imilar to targets’ real (rather than ideal) personalities. Our findings are somewhat murkier, suggesting that perceptions are more similar to what targets are really like for some domains (openness, neuroticism), more like how targets want to be seen for others (conscientiousness, honesty), and a roughly even mix of the two for others (agreeableness and extraversion). At first glance, there might be little theoretical sense to these results - they don’t track with overall evaluativeness (i.e., openness is perceived more accurately, honesty more ideally) for example. However, some of these results may be better understood by considering them in conjunction with the results of the RSAs. The RSAs for neuroticism suggested that well-being is higher when people are accurately perceived as higher in neuroticism. This might be why accuracy is greater than idealization for neuroticism - presenting an idealized front might have an intrapersonal cost in the way of lowered well-being that most aren’t willing to pay. Indeed, coupled with the RSA suggesting a negative main effect of perceived neuroticism on likability, these results point to an interesting tension between their intrapersonal needs to enhance well- being by expressing more negative a˙ect and their interpersonal needs to enhance liking by expressing less negative a˙ect. More generally, our results highlight that being perceived more positively has interpersonal benefits, but o ccasionally has intrapersonal c osts. Interestingly, idealization for openness appears to be associated with greater liking, which makes it puzzling why we see greater accuracy than idealization. 150 intrapersonal needs to enhance well-being by expressing more negative affect and their interpersonal needs to enhance liking by expressing less negative affect. More generally, our results highlight that being perceived more positively has interpersonal benefits, b ut o ccasionally h as i ntrapersonal c o sts. I nterestingly, i dealization for openness appears to be associated with greater liking, which makes it puzzling why we see greater accuracy than idealization. However, there appears to be such a clear signal of openness on Twitter - based both on consensus and accuracy in this Study as well as predictive accuracy in Studies 1 and 2 - that it might be too challenging to fake, even if participants are motivated to do so. For conscientiousness and honesty-propriety, response surface results suggested that accuracy was associated with lower well-being, whereas idealized perceptions of honesty were associated with greater well-being. It is not clear why well-being is associated with inaccuracy for conscientiousness and honesty, but it is interesting that these are the domains that show the most idealization and least accuracy. 151 V. GENERAL DISCUSSION To what extent are our personalities reflected i n a nd u ltimately recoverable from digital footprints? Perhaps unsurprisingly, the answer to this question appears to vary across personality domains, types of digital footprints (i.e., language, network ties, profiles), a nd w hether o ne i s u sing m achine l earning a lgorithms o r h uman judges. Indeed, openness was among the more accurately predicted or inferred across studies, predictions made from followed accounts were generally more accurate than those made from tweets, and machine learning algorithms tended to reach greater accuracy than human judges. It is of course possible that differences in accuracy within and across studies reflect s pecific fe atures of th e te chnologies an d me thods us ed presently and that future work using different technologies or with a different design might find different results. At the same time, they may reflect s omething d eeper a nd more enduring, possibly reflecting d ifferences i n how p ersonality i s manifest i n t he behaviors afforded by online social networks like Twitter and the extent to which the records of those behaviors can be used to infer personality for basic or applied purposes. Accuracy and its Implications for Personality Expression Online Accuracy varied considerably both within and across studies, with accuracy estimates ranging from an r of .45 for predicting openness from followed accounts to a low of .04 for human judges’ perceptions of conscientiousness from profiles. Though all of these estimates are far from perfectly accurate (i.e., from an r of 1), some met 152 or exceeded their benchmarks (e.g., Openness and neuroticism from followed accounts; see Figure 15), others were close but lower (e.g., conscientiousness from tweets; see Figure 10), and others were substantially lower (e.g., agreeableness from tweets or followed accounts; see Figures 15 and 10). Broadening out, the higher-end estimates of accuracy are approximately as high as meta-analytic estimates of the accuracy achieved by family members and close friends (r ’s from approximately .3 for judgments of agreeableness by close friends to .5 for judgments of extraversion by family members; see Table 5 from Connelly & Ones, 2010) and the lower-end estimates are approximately as low as meta-analytic estimates of accuracy achieved by strangers from a variety of information sources (r ’s from approximately .1 for strangers judging neuroticism to .2 for strangers judging extraversion; see Table 5 from Connelly & Ones, 2010). Thus, while no prediction or judgment exhibited perfect accuracy (r of 1) or inaccuracy (r of 0), they tended to range from approximately as (in)accurate as a stranger to about as accurate as a close friend or family member. Accuracy tended to be highest for openness across studies, which is consistent with prior work examining Facebook (Back et al., 2010; Kosinski et al., 2013; Park et al., 2015). This is especially interesting considering that openness is typically considered a less observable and more evaluative trait, two features of personality domains which are thought to depress consensus and accuracy (Connelly & Ones, 2010; John & Robins, 1993; Vazire, 2010). One possibility is that openness is much 153 more observable in online spaces than it is offline, and that this is part of why it can be predicted by machine learning algorithms and inferred by people with relatively greater accuracy. Indeed, consider what online social networks like Twitter afford to their users: they provide people a place to consume and express their interests. On Twitter, this might include tweeting about an essay one finds i nteresting, following favorite artists or public intellectuals associated with one’s interests, or mentioning those interests in the bio field o f o ne’s p r ofiles. In th is wa y, in dividual di fferences in openness may be more relevant to the behaviors that Twitter affords, resulting in relatively more information about users’ degree of openness in the digital records of those behaviors. On the other hand, accuracy tended to be lower for agreeableness across studies, though it was inferred by human judges from profiles i n S tudy 3 w ith accuracy approximately between meta-analytic estimates of judgments made by strangers and work colleague (Connelly & Ones, 2010). Agreeableness, like openness, is generally considered lower in observability and higher in evaluativeness, but differs from openness in being highly interpersonal in nature. Indeed, individuals higher in agreeableness tend to treat others more kindly and be more trusting of others, and lower scorers tend to be more critical of others, less considerate, and more rude. This may make agreeableness a poor match for the approaches taken here, especially in Studies 1 and 2, which likely had trouble with the nuances of how agreeableness is expressed behaviorally online. For example, swear words may have been useful 154 indicators of agreeableness when a user was cussing someone out, but those same words may have also been used as words of support or encouragement in another conversational context. This could be one reason human judges were somewhat accurate in inferences of agreeableness from profiles d espite t he v ery poor performance of the machine learning algorithms in Studies 1 and 2 - human judges may have been able to pick up on some of these nuances, even in the relatively small amount of information contained in users’ profiles. The specificity, o r l ack t hereof, w ith w hich t he m achine l earning algorithms predicted different personality domains likewise speaks to how behavior is manifest online. Although some predictions appeared specific t o t he c orresponding d omain - like openness in both Studies 1 and 2 - many did not, and the correlations between predicted scores were higher than would be expected based on the observed (self-reported) scores. The lack of specificity i s n ot a ltogether s urprising g iven that personality-behavior relations are thought to be complex many-to-many, rather than one-to-one, mappings (Wood, Gardner, & Harms, 2015), and so it is unlikely that cues relate to one and only one personality domain. However, it is interesting to note that the correlations between predicted scores roughly approximated the Big Two (Digman, 1997; Saucier & Srivastava, 2015) for tweet-based predictions, possibly suggesting that tweets may be better captured by this broader structural model of personality than the Big Six. This could be examined in future work by training models to predict the broader Big Two and comparing the accuracy with which they 155 can predict those broad factors to the accuracy of the relatively narrower Big Six. More generally, the present work highlights many of the diculties inherent in predicting personality from digital footprints. Indeed, behaviors are multiply determined and can have vastly different psychological meanings in different social contexts and for different people. The fact that we can apply relatively blunt tools to these noisy records and infer any aspects of someone’s personality with some degree of accuracy – let alone at accuracy similar to judgments made by a close friend or family member – is somewhat surprising and promising. With time and refinement, such techniques might even become useful for answering basic scientific q uestions or for applied purposes. Presently, however, many of the predictions and judgments examined here have yet to pass even the most basic requirement of predictive accuracy. Moreover, even those that do require further validity research before they can be interpreted in scientific r esearch o r application. Implications for identity claims and behavioral residue. To what extent are differences in accuracy within and between studies due to differences in the contribution of identity claims and behavioral residue? Although it is dicult to say, it is worth speculating about the possibility that this could underpin some of the present findings. T heoretically, p rofile-based ju dgments sh ould re ly mo st on identity claims, followed-account-based predictions should rely most on behavioral residue, and tweet-based predictions should rely on cues generated by both processes. Some prior work suggests that predictions made with identity claims are more accurate 156 than those made with behavioral residue, suggesting that tweet-based predictions should be more accurate than followed-account-based predictions (Gladstone et al., 2019). At the outset of this dissertation, I proposed a slightly more nuanced possibility whereby identity claims are less accurate for more evaluative traits due to being more subject to (potentially inaccurate) self-presentation efforts. Our findings are at odds with both possibilities. If anything, followed-account-based predictions were generally more accurate, suggesting that predictions made from behavioral residue might be more accurate than predictions made from identity claims. With respect to the second possibility, no systematic relation between accuracy and evaluativeness emerged within or across studies. Accuracy for the highly evaluative domain of openness was greater than other less evaluative domains (e.g., extraversion) across studies, and accuracy was often similarly high (and similarly low) for domains that differ in terms of evaluativeness. One possible explanation is that we were wrong about which cues contain more identity claims or behavioral residue, but it seems dicult to explain why followed accounts would contain more identity claims than tweets. Alternatively, it’s possible that we were correct about which cues contain more or less behavioral residue and identity claims, but failed to consider how these cue types interact with features of the judgment procedure. That is, it could be that machine learning algorithms primarily achieve accuracy through using behavioral residue rather than identity claims and human judges primarily achieve accuracy through identity claims rather 157 than behavioral residue, and that the machine-behavioral-residue and human-identity-claim combinations thus look more similar to one another than the alternative. This is an interesting possibility that future research should evaluate in a design better suited to examine it directly (e.g., a fully-crossed design between cue category and judgment procedure). It is worth considering the possibility that the relative presence of identity claims and behavioral residue may have less straightforward implications for accuracy than previously thought. Indeed, careful consideration of how people navigate complex and sometimes competing social motives in online spaces may provide clearer insight into how cues that are more or less subject to self-presentation affect the accuracy of different judgment procedures. The social functions of self-presentation and personality expression on twitter. One goal of this project, particularly Study 3, was to examine the extent to which people present an idealized front on Twitter and why they might decide to present themselves more accurately or ideally. Evidence here was mixed, with human-based perceptions showing evidence of both accuracy and idealization, and computerized predictions seeming relatively robust to differences in evaluativeness across domains. However, we did find evidence that perceivers l ike others more i f they perceive them more positively, which could provide the motivation to manage impressions highlighted central to self-presentation according to Leary and Kowalski (1990). However, this motivation to be seen positively and reap the interpersonal 158 reward of greater likability might be at odds with the intrapersonal gains in well-being associated with expressing one’s self accurately and being self-verified (Swann et al., 1989). We only found evidence of this tension for neuroticism, and even found that being perceived less accurately was beneficial f or c onscientiousness and honesty, a pattern of results which may explain why we saw more idealization for some domains (conscientiousness and honesty) more accuracy for others (neuroticism and openness), and a roughly even split for others (extraversion and agreeableness). More generally, this, more than evaluativeness per se, might clarify the findings across studies, where some highly evaluative traits were predicted accurately (openness) and other less evaluative traits (extraversion) were difficult to predict accurately. Put differently, accuracy may be less affected by evaluativeness per se and more affected by the interplay of the the more externally-motivated evaluativeness and more internally-motivated desires to express one’s less desirable characteristics. This would be broadly consistent with the work by Swann et al. (1989) on the interplay between self-verification and self-enhancement motives in interpersonal behavior. We found little to no evidence that the density of targets’ network of followers moderated profile-based accuracy, a hypothesis that stemmed from considering the constraining role audiences are thought to have on targets’ behavior in the identity negotiation process (Back et al., 2010; Boyd, 2007; Hogan, 2010; Swann, 1987). One plausible explanation for this is that follower-network density is a poor proxy of the true constraining factors, either the lowest common denominator audience that 159 Hogan (2010) considered or the presence of offline friends in one’s network considered important in Back and colleagues’ (2010) extended real-life hypothesis. This seems plausible and future work could more directly measure the features these theories consider important, such as the presence of people that would take issue with unrealistic self-presentation (for the lowest common denominator approach) or how many of their twitter followers they know offline (for the extended real-life hypothesis). It is also worth considering the possibility that people with a public account, a prerequisite for being included in this study, have in mind the possible audience (which is basically anyone with access to Twitter) rather than the likely audience (one’s followers). This too could be examined more directly in future work by asking participants whom they have in mind when they post on Twitter. The Utility of Dictionaries and Implications for Selecting and Extracting Psychologically Meaningful Features from Noisy Data One of the single most surprising results across studies was the relatively high accuracy achieved by models using dictionary-based scores of tweets. Indeed, models trained with the 77 dictionary scores, including the 68 LIWC categories (Tausczik & Pennebaker, 2010), sentiment, and the eight specific a ffect c ategories, w ere a ble to predict personality domains nearly as well as those trained with much more exhaustive and much more advanced sets of linguistic features. Moreover, the importance scores suggest that the LIWC scores were especially important for predicting personality (relative to the sentiment and affect dictionaries), which is even 160 more surprising given that they were developed for a very different context and kind of text (personal essays). This highlights a potentially important implication of this work, namely, the utility of domain-specific e xpertise i n c reating t ools u seful for extracting meaningful features from otherwise noisy digital footprints. How were dictionary-based models able to achieve similar performance to models trained with a far greater number of predictors or with far more advanced features? One possibility is that dictionaries like LIWC are an effective filter a nd that this filtering c apacity i s e specially u seful w hen working w ith h ighly n oisy d ata like tweets. On the technical side, this could increase accuracy by offloading feature selection from the machine learning algorithm, removing that one (potentially substantial) source of error and variability from algorithm training. More theoretically interesting, it may be that the expertise that went into the development of LIWC - both the psychological expertise that went into its initial design and development and its refinement w ith p sychologically-informed e mpirical s tudies - make it especially useful for predicting psychological constructs like personality. More generally, the present work suggests that a little bit of domain-expertise in the design of a tool can go a fairly long way. One interesting implication of this finding i s t hat i t h ighlights t he p romise of continued work refining t ools f or p redicting p sychological c onstructs f rom digital footprints. Dictionaries like LIWC can be refined t o work e ven b etter i n a domain like Twitter, by including linguistic features that are unique to twitter but relevant to 161 LIWC categories (e.g., adding more twitter-slang to LIWC). Perhaps even more promising, expertise-driven scoring algorithms could be developed for features where non exist, such as followed accounts, by combining discovery-oriented work like the present with careful theorizing and focused experimentation. Tools like this will likely take much more work and time to be useful for basic or applied science, but the present work provides a jumping off point for such efforts. From Predictive Accuracy to Construct Validation Predictive accuracy was quite high in some cases, sometimes matching or exceeding benchmarks. This begs the question of whether those models are presently useful in basic or applied scientific pursuits. To use a concrete example, predictions of openness from followed accounts were moderately accurate by most standards, slightly exceeds the closest benchmark (Facebook likes), and is even slightly higher than meta-analytic estimates of accuracy achieved by family members (r’s of .45 vs. 43; Connelly & Ones, 2010). Should we start using this model as a measure of openness in psychological research? This is tempting as it would open up new possibilities, allowing us to obtain “openness” scores from millions of people passively (i.e., without them having to actively fill o ut a q uestionnaire), c heaply, a nd rapidly. However tempting this possibility is, it would almost certainly be jumping the gun. Indeed, it would be quite misguided to focus solely on predictive accuracy in evaluating whether or not an algorithm is ready for use in basic or applied science, and even the most promising models from the present study should undergo even 162 more rigorous evaluation before inferences from them are used. Models that demonstrate sufficient predictive accuracy will need to be subjected to a rigorous program of construct validation research (Cronbach & Meehl, 1955). Indeed, the present work (especially Studies 1 and 2) can be viewed as following the tradition of criterion validity; the self-reports are treated as a gold standard criterion that we’re attempting to predict with a new technique. As recently pointed out, prediction is a worthwhile goal and might be one of the more realistic goals for a research area or program, like the present, which is still in its infancy (Yarkoni & Westfall, 2017). It is worthwhile and an important step, but it is merely a first step. Moving forward, it will become necessary to begin formulating and testing the relations – the so-called nomological network – considered important for the constructs we think these predicted scores are capturing. This might include showing similar longitudinal stability and change as is reported in work with the relatively well-validated self- and observer-reported personality measures (Roberts & DelVecchio, 2000; Roberts, Walton, & Viechtbauer, 2006), similar levels of agreement with peer-reports at di˙erent levels of acquaintanceship as seen with typical self- reports (Connelly & Ones, 2010), and showing a lack of bias across groups similar to measurement invariance research (Stark, Chernyshenko, & Drasgow, 2006). Work like this will take considerable effort, but is ultimately necessary to move inferences from digital footprint from a passing curiosity to a tool useful for scientific inquiry and intervention. 163 Conclusion The increasing digitization of our social world presents new opportunities for people to interact, to produce and consume content they find i nteresting, a nd to express their thoughts, feelings, and identities. Likewise, it presents new opportunities for studying these interpersonal processes. Our findings i ndicate that human perceivers and machine learning algorithms can infer personality with some degree of accuracy using different cues available to them, simultaneously speaking to how personality is expressed and perceived online. Moreover, the convergence across very different kinds of cues and very different kinds of “judges” suggests that some personality domains are more related to behavior on twitter, and behavior in online environments more generally. These findings t hus p rovide a n i ncremental i ncrease in understanding how personality is expressed, perceived, and ultimately recoverable from digital footprints, and the consequences these processes have for individuals’ well-being and social standing. While promising, the findings a lso e mphasize t he long road ahead before inferences from digital footprints could be used for either basic or applied purposes. 164 REFERENCES CITED Anderson, C., Keltner, D., & John, O. P. (2003). Emotional convergence between people over time. Journal of Personality and Social Psychology, 84 (5), 1054–1068. https://doi.org/10.1037/0022-3514.84.5.1054 Aust, F., & Barth, M. (2018). papaja: Create APA manuscripts with R Markdown. Retrieved from https://github.com/crsh/papaja Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., & Gosling, S. D. (2010). Facebook Profiles Reflect Actual Personality, Not Self-Idealization. Psychological Science, 21 (3), 372–374. https://doi.org/10.1177/0956797609360756 Bair, E., Hastie, T., Paul, D., & Tibshirani, R. (2006). Prediction by Supervised Principal Components. Journal of the American Statistical Association, 101 (473), 119–137. https:// doi.org/10.1198/016214505000000628 Barranti, M., Carlson, E. N., & Cote, S. (2017). How to Test Questions about Similarity in Personality and Social Psychology Research: Description and Empirical Demonstration of Response Surface Analysis. Social Psychological and Personality Science, (2001), 806–817. https://doi.org/10.1177/1948550617698204 Bates, D., & Maechler, M. (2019). Matrix: Sparse and dense matrix classes and methods. Retrieved from https://CRAN.R-project.org/package=Matrix Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48. https://doi.org/10.18637/jss.v067.i01 Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3 (30), 774. https://doi.org/10.21105/joss.00774 165 Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55 (4), 77. https://doi.org/10.1145/2133806.2133826 Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051 Bolker, B., & Robinson, D. (2020). Broom.mixed: Tidying methods for mixed models. Retrieved from https://CRAN.R-project.org/package=broom.mixed Boyd, D. (2007). Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life. MacArthur Foundation Series on Digital Learning - Youth, Identity, and Digital Media, 7641 (41), 1–26. https:// doi.org/10.1162/dmal.9780262524834.119 Bryk, A., & Raudenbush, S. (2002). Hierarchical linear modeling. Applications and data analyses methods. Newburry park. CA: Sage. Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62 (3), 193–217. https:// doi.org/10.1037/h0047470 Burke, M., Kraut, R., & Marlow, C. (2011). Social capital on Facebook: Differentiating uses and users. Conference on Human Factors in Computing Systems - Proceedings, 571–580. https:// doi.org/10.1145/1978942.1979023 Chan, C.-h., Chan, G. C., Leeper, T. J., & Becker, J. (2018). Rio: A swiss-army knife for data file i/o. Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2019). Shiny: Web application framework for r. Retrieved from https://CRAN.R-project.org/ package=shiny 166 Cheung, F., & Lucas, R. E. (2014). Assessing the validity of single-item life satisfaction measures: results from three large samples. Quality of Life Research, 23 (10), 2809–2818. https://doi.org/10.1007/ s11136-014-0726-4 Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136 (6), 1092–1122. https://doi.org/10.1037/ a0021212 Coppersmith, G. A., Harman, C. T., & Dredze, M. H. (2014). Measuring Post Traumatic Stress Disorder in Twitter. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM)., 2 (1), 23–45. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 129 (1), 3–9. https://doi.org/10.1037/h0040957 Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695. Retrieved from http://igraph.org De Choudhury, M., Counts, S., & Horvitz, E. (2013a). Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the sigchi conference on human factors in computing systems - chi ’13 (p. 3267). New York, New York, USA: ACM Press. https:// doi.org/10.1145/2470654.2466447 De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013b). Predicting depression via social media. In Seventh international aaai conference on weblogs and social media (pp. 128–137). IEEE. Retrieved from http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/viewFile/61 24/6351%20http://ieeexplore.ieee.org/document/6302998/ DeYoung, C. G. (2015). Cybernetic Big Five Theory. Journal of Research in Personality, 56, 33–58. https://doi.org/10.1016/j.jrp.2014.07.004 167 Digman, J. M. (1997). Higher-order factors of the Big Five. Journal of Personality and Social Psychology, 73 (6), 1246–1256. https:// doi.org/10.1037//0022-3514.73.6.1246 Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter. PLoS ONE, 6 (12), e26752. https://doi.org/10.1371/journal.pone.0026752 Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102 (4), 652–670. https://doi.org/10.1037/0033-295X.102.4.652 Genuer, R., Poggi, J. M., & Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition Letters, 31 (14), 2225– 2236. https://doi.org/10.1016/j.patrec.2010.03.014 Gladstone, J. J., Matz, S. C., & Lemaire, A. (2019). Can Psychological Traits Be Inferred From Spending? Evidence From Transaction Data. Psychological Science, 095679761984943. https://doi.org/10.1177/0956797619849435 Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011). Predicting Personality from Twitter. In 2011 ieee third int’l conference on privacy, security, risk and trust and 2011 ieee third int’l conference on social computing (pp. 149– 156). IEEE. https://doi.org/10.1109/PASSAT/SocialCom.2011.33 Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A room with a cue: Personality judgments based on offices and bedrooms. Journal of Personality and Social Psychology, 82 (3), 379–398. https://doi.org/10.1037/0022-3514.82.3.379 Grasz, J. (2016). Number of Employers Using Social Media to Screen Candidates Has Increased 500 Percent over the Last Decade. Retrieved from http://www.careerbuilder.com/share/aboutus/pressreleasesdetail.aspx?sd=5/ 14/2015%7B/&%7Did=pr893%7B/&%7Ded=12/31/2015 168 Henry, L., & Wickham, H. (2019). Purrr: Functional programming tools. Retrieved from https://CRAN.R-project.org/package=purrr Hogan, B. (2010). The Presentation of Self in the Age of Social Media: Distinguishing Performances and Exhibitions Online. Bulletin of Science, Technology & Society, 30 (6), 377–386. https://doi.org/10.1177/0270467610385893 Human, L. J., Carlson, E. N., Geukes, K., Nestler, S., & Back, M. D. (2018). Do Accurate Personality Impressions Benefit Early Relationship Development?The Bidirectional Associations Between Accuracy and Liking. Journal of Personality and Social Psychology. https:// doi.org/10.1037/pspp0000214 Humberg, S., Nestler, S., & Back, M. D. (2019). Response Surface Analysis in Personality and Social Psychology: Checklist and Clarifications for the Case of Congruence Hypotheses. Social Psychological and Personality Science, 10 (3), 409–419. https://doi.org/10.1177/1948550618757600 John, O. P., & Robins, R. W. (1993). Determinants of Interjudge Agreement on Personality Traits: The Big Five Domains, Observability, Evaluativeness, and the Unique Perspective of the Self. Journal of Personality, 61 (4), 521– 551. Kadushin, C. (2012). Understanding social networks: Theories, concepts, and findings. Oxford University Press. Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. Guilford. Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110 (15), 5802–5805. https://doi.org/10.1073/pnas.1218772110 Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21 (4), 493–506. https://doi.org/10.1037/met0000105 169 Kuhn, M., Jed Wing, C. from, Weston, S., Williams, A., Keefer, C., Engelhardt, A., . . . Hunt., T. (2019). Caret: Classification and regression training. Retrieved from https://CRAN.R-project.org/package=caret Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82 (13), 1– 26. https://doi.org/10.18637/jss.v082.i13 Leary, M. R. (2007). Motivational and Emotional Aspects of the Self. Annual Review of Psychology, 58 (1), 317–344. https://doi.org/10.1146/annurev.psych.58.110405.085658 Leary, M. R., & Kowalski, R. M. (1990). Impression management: A literature review and two-component model. Psychological Bulletin, 107 (1), 34–47. https://doi.org/10.1037/0033-2909.107.1.34 Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90 (5), 862–877. https://doi.org/10.1037/0022-3514.90.5.862 Meinshausen, N. (2007). Relaxed Lasso. Computational Statistics & Data Analysis, 52 (1), 374–393. https://doi.org/10.1016/j.csda.2006.12.019 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2006). Distributed Representations of Words and Phrases and their Compositionality. Neural Information Processing Systems, 1, 1–9. https://doi.org/10.1162/jmlr.2003.3.4-5.951 Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. Retrieved from http://arxiv.org/abs/1712.09405 170 Mohammad, S. M., & Kiritchenko, S. (2015). Using Hashtags to Capture Fine Emotion Categories from Tweets. Computational Intelligence, 31 (2), 301–326. https://doi.org/10.1111/coin.12024 Müller, K., & Wickham, H. (2019). Tibble: Simple data frames. Retrieved from https://CRAN.R-project.org/package=tibble Nadeem, M. (2016). Identifying Depression on Twitter. CoRR, 1–9. Retrieved from https://arxiv.org/ftp/arxiv/papers/1607/1607.07384.pdf%7B/%% 7D0Ahttp: //arxiv.org/abs/1607.07384%20http://arxiv.org/abs/1607.07384 Nestler, S., Humberg, S., & Schönbrodt, F. D. (2019). Response surface analysis with multilevel data: Illustration for the case of congruence hypotheses. Psychological Methods, 24 (3), 291–308. https://doi.org/10.1037/ met0000199 Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., . . . Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108 (6), 934–952. https://doi.org/10.1037/pspp0000020 Paulhus, D. L., & Trapnell, P. D. (2008). Self-Presentation of Personality: An Agency-Communion Framework. In Handbook of personality psychology (pp. 492–517). Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Empirical methods in natural language processing (emnlp) (pp. 1532–1543). Retrieved from http://www.aclweb.org/anthology/D14-1162 Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality expression and perception on Twitter. Journal of Research in Personality, 46 (6), 710–718. https://doi.org/10.1016/j.jrp.2012.08.008 Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality, 41 (1), 203–212. https:// doi.org/10.1016/j.jrp.2006.02.001 171 R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/ Reece, A. G., Reagan, A. J., Lix, K. L. M., Dodds, P. S., Danforth, C. M., & Langer, E. J. (2017). Forecasting the onset and course of mental illness with Twitter data. Scientific Reports, 7 (1), 13006. https://doi.org/10.1038/s41598-017-12961-9 Roberts, B. W., & DelVecchio, W. F. (2000). The Rank-Order Consistency of Personality Traits From Childhood to Old Age: A Quantitative Review of Longitudinal Studies. https://doi.org/10.1037/0033-2909.126.1.3 Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course: a meta-analysis of longitudinal studies. Psychological Bulletin, 132 (1), 1–25. https:// doi.org/10.1037/0033-2909.132.1.1 Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36. Retrieved from http://www.jstatsoft.org/v48/i02/ Sarkar, D. (2008). Lattice: Multivariate data visualization with r. New York: Springer. Retrieved from http://lmdvr.r-forge.r-project.org Saucier, G., & Srivastava, S. (2015). What makes a good structural model of personality? Evaluating the big five and alternatives. In M. Mikulincer, P. R. Shaver, M. L. Cooper, & R. J. Larsen (Eds.), Handbook of personality and social psychology. Vol. 3: Personality processes and individual differences (pp. 283– 305). Washington, DC. Saucier, G., Thalmayer, A. G., Payne, D. L., Carlson, R., Sanogo, L., Ole-Kotikash, L., . . . Zhou, X. (2014). A Basic Bivariate Structure of Personality Attributes Evident Across Nine Languages. Journal of Personality, 82 (1), 1–14. https://doi.org/10.1111/jopy.12028 172 Schaefer, D. R., Kornienko, O., & Fox, A. M. (2011). Misery Does Not Love Company. American Sociological Review, 76 (5), 764– 785. https://doi.org/10.1177/0003122411420813 Schönbrodt, F. D., & Humberg, S. (2018). RSA: An r package for response surface analysis (version 0.9.13). Retrieved from https://cran.r-project.org/package=RSA Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., . . . Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, 8 (9), e73791. https://doi.org/10.1371/journal.pone.0073791 Soto, C. J., & John, O. P. (2017a). Short and extra-short forms of the Big Five Inventory-2: The BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69–81. https://doi.org/10.1016/j.jrp.2017.02.004 Soto, C. J., & John, O. P. (2017b). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113 (1), 117–143. https://doi.org/10.1037/pspp0000096 Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91 (6), 1292–1306. https://doi.org/10.1037/0021-9010.91.6.1292 Sumner, C., Byers, A., Boochever, R., & Park, G. J. (2012). Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In 2012 11th international conference on machine learning and applications (pp. 386—–393). IEEE. https://doi.org/10.1109/IRI.2012.6302998 Swann, W. B. (1987). Identity negotiation: Where two roads meet. Journal of Personality and Social Psychology, 53 (6), 1038–1051. https://doi.org/10.1037/0022-3514.53.6.1038 173 Swann, W. B., Pelham, B. W., & Krull, D. S. (1989). Agreeable fancy or disagreeable truth? Reconciling self-enhancement and self-verification. Journal of Personality and Social Psychology, 57 (5), 782–791. https://doi.org/10.1037/0022-3514.57.5.782 Swann, W. B., & Read, S. J. (1981). Self-verification processes: How we sustain our self-conceptions. Journal of Experimental Social Psychology, 17 (4), 351–372. https://doi.org/10.1016/0022-1031(81)90043-3 Tausczik, Y. R., & Pennebaker, J. W. (2010). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29. https://doi.org/10.1177/0261927X09351676 Thalmayer, A. G., & Saucier, G. (2014). The questionnaire big six in 26 nations: Developing cross-culturally applicable big six, big five and big two inventories. European Journal of Personality, 28 (5), 482–496. https://doi.org/10.1002/per.1969 Tomasello, M. (2010). Origins of human communication. MIT press. Vazire, S. (2010). Who knows what about a person? The self-other knowledge asymmetry (SOKA) model. Journal of Personality and Social Psychology, 98 (2), 281–300. https://doi.org/10.1037/a0017908 Watson, D., Beer, A., & McDade-Montez, E. (2014). The Role of Active Assortment in Spousal Similarity. Journal of Personality, 82 (2), 116–129. https://doi.org/10.1111/jopy.12039 Watson, D., Hubbard, B., & Wiese, D. (2000a). General Traits of Personality and Affectivity as Predictors of Satisfaction in Intimate Relationships: Evidence from Self- and Partner-Ratings. Journal of Personality, 68 (3), 413–449. https://doi.org/10.1111/1467-6494.00102 Watson, D., Hubbard, B., & Wiese, D. (2000b). Self-other agreement in personality and affectivity: The role of acquaintanceship, trait visibility, and assumed similarity. Journal of Personality and Social Psychology, 78 (3), 546–558. https://doi.org/10.1037/0022-3514.78.3.546 174 Watson, D., Klohnen, E. C., Casillas, A., Nus Simms, E., Haig, J., & Berry, D. S. (2004). Match Makers and Deal Breakers: Analyses of Assortative Mating in Newlywed Couples. Journal of Personality, 72 (5), 1029–1068. https://doi.org/10.1111/j.0022-3506.2004.00289.x Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org Wickham, H. (2017). Tidyverse: Easily install and load the ’tidyverse’. Retrieved from https://CRAN.R-project.org/package=tidyverse Wickham, H. (2019a). Forcats: Tools for working with categorical variables (factors). Retrieved from https://CRAN.R-project.org/package=forcats Wickham, H. (2019b). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://CRAN.R-project.org/package=stringr Wickham, H., François, R., Henry, L., & Müller, K. (2019). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/ package=dplyr Wickham, H., & Henry, L. (2019). Tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. Retrieved from https://CRAN.R-project.org/package=tidyr Wickham, H., Hester, J., & Francois, R. (2018). Readr: Read rectangular text data. Retrieved from https://CRAN.R-project.org/package=readr Wood, D., Gardner, M. H., & Harms, P. D. (2015). How functionalist and process approaches to behavior can explain trait covariation. Psychological Review, 122 (1), 84–111. https://doi.org/http://dx.doi.org/10.1037/ a0038423 Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12 (6), 1100–1122. https:// doi.org/10.1177/1745691617693393 175 Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112 (4), 1036–1040. https://doi.org/10.1073/pnas.1418680112 Youyou, W., Stillwell, D., Schwartz, H. A., & Kosinski, M. (2017). Birds of a Feather Do Flock Together. Psychological Science, 28 (3), 276–284. https://doi.org/10.1177/0956797616678187 176