Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial
Loading...
Date
2023-05-25
Authors
Weston, Sara J.
Shryock, Ian
Light, Ryan
Fisher, Phillip A.
Journal Title
Journal ISSN
Volume Title
Publisher
Sage Journals
Abstract
Topic modeling is a type of text analysis that identifies clusters of co-occurring words, or latent topics. A challenging
step of topic modeling is determining the number of topics to extract. This tutorial describes tools researchers can use to
identify the number and labels of topics in topic modeling. First, we outline the procedure for narrowing down a large
range of models to a select number of candidate models. This procedure involves comparing the large set on fit metrics,
including exclusivity, residuals, variational lower bound, and semantic coherence. Next, we describe the comparison
of a small number of models using project goals as a guide and information about topic representative and solution
congruence. Finally, we describe tools for labeling topics, including frequent and exclusive words, key examples, and
correlations among topics.
Description
13 pages
Keywords
Child, Development, Health, Infant, Natural language processing, Topic modeling, Structural topic modeling
Citation
Weston SJ, Shryock I, Light R, Fisher PA. Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial. Advances in Methods and Practices in Psychological Science. 2023;6(2). doi:10.1177/25152459231160105