Cataloging & Classification Quarterly ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/wccq20 Musings on Faceted Search, Metadata, and Library Discovery Interfaces Kelley McGrath To cite this article: Kelley McGrath (2023) Musings on Faceted Search, Metadata, and Library Discovery Interfaces, Cataloging & Classification Quarterly, 61:5-6, 439-490, DOI: 10.1080/01639374.2023.2222120 To link to this article: https://doi.org/10.1080/01639374.2023.2222120 © 2023 The Author(s). Published with license by Taylor & Francis Group, LLC. Published online: 21 Jun 2023. Submit your article to this journal Article views: 1777 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=wccq20 Cataloging & ClassifiCation Quarterly 2023, Vol. 61, nos. 5–6, 439–490 https://doi.org/10.1080/01639374.2023.2222120 Musings on Faceted Search, Metadata, and Library Discovery Interfaces Kelley McGrath university of oregon libraries, eugene, or, usa ABSTRACT ARTICLE HISTORY Faceted search is a powerful tool that enables searchers to Received March 2023 easily and intuitively take advantage of controlled vocabu- Revised May 2023 laries and structured metadata. Faceted search has been Accepted May 2023 widely implemented in library discovery interfaces and has KEYWORDS provided many benefits to library users. The effectiveness of Faceted search; faceted facets in library catalogs depends on a complex interaction vocabularies; metadata between facet vocabularies, metadata quality and structure, quality; library discovery and the library discovery interface’s capabilities. This article interfaces; library provides a holistic overview of challenges for optimally catalogs; usability issues implementing facets in library catalogs. This supports a sys- tematic approach to refining and enhancing the capacity of faceted search to improve searching and exploring bib- liographic metadata. Overview Faceted search is a powerful tool that enables searchers to easily and intuitively take advantage of controlled vocabularies and structured meta- data. It can make searches more efficient, help users better understand options, and guide users who are browsing or exploring. It has been widely implemented in library discovery interfaces and has provided many benefits to library users. Tunkelang says that the essence of faceted search is the combination of text search of unstructured content and faceted navigation of structured content organized into component facets.1 Facets are familiar to most users as they frequently encounter them on commercial websites. The Nielsen Norman Group notes the ubiquity of facets on contemporary retail websites and states that many users are frustrated when they are absent. They point out that only the largest ecommerce websites with the most diverse products, such as Amazon and Walmart, retain formerly common alternatives like scoped search and advanced search in addition CONTACT Kelley Mcgrath kelleym@uoregon.edu university of oregon libraries, eugene, or 97403, usa. © 2023 the author(s). Published with license by taylor & francis group, llC. this is an open access article distributed under the terms of the Creative Commons attribution-nonCommercial-noDerivatives license (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. the terms on which this article has been published allow the posting of the accepted Manuscript in a repository by the author(s) or with their consent. 440 K. MCGRATH to facets.2 As will be discussed below, libraries also have very large and diverse inventories. When implemented well, faceted search enables users to more quickly and accurately perform successful searches. It increases both precision and recall. Faceted search is ubiquitous in contemporary library discovery interfaces, but there remain challenges for optimizing the implementation of faceted search in library catalogs. These include scale, computational constraints, user interface design, metadata quality and complexity, the diversity of library resources, and the theoretical and practical impossibility of completely populating consistent, accurate facets. The effectiveness of facets in library catalogs depends on a complex interaction between facet vocabularies, metadata quality and structure, and the library discovery interface’s capabilities. There are two main areas of difficulty. The first relates to systems design. This can further be subdivided into problems stemming from technical challenges and computational limits, and the challenges of design- ing user interfaces to present complex data. The second relates to the data that populates the facets. This includes characteristics of bibliographic metadata, such as the complexity and heterogeneity of information that libraries attempt to record about bibliographic resources. It also includes the design of controlled vocabularies and the definitions and data struc- tures used for recording bibliographic information. Limits on the time, funding, and expertise available to populate bibliographic metadata, as well as theoretical and practical limits on the data it is possible to record, also present challenges. System design issues and data issues are deeply intertwined and interact in complex ways. They cannot be considered in isolation. For example, some of the shortcomings of using the Library of Congress Subject Headings (LCSH) for faceted navigation are due to the fact that it was designed to work in the card catalog environment where users access the subject headings only through a left-anchored, alphabetical list. It is not possible to evaluate the effectiveness of a controlled vocab- ulary or metadata structure independently of the technology that is used to interact with it. This article provides a holistic overview of challenges for optimally implementing facets in library catalogs. It also includes illustrative examples and case studies, covering topics such as the complexities of dates and musical medium of performance, as well as challenges for using LCSH to populate topical subject facets. This overview is intended to support a systematic approach to refining and enhancing the capacity of faceted search to improve the process of searching and exploring bibliographic metadata. It is hoped that this exploration will generate ideas and encour- age experimentation that will enable the development of more powerful and easier to use library discovery interfaces. CATAloGinG & ClAssiFiCATion QuARTeRly 441 The many benefits of facets and faceted navigation Faceted search is a flexible, powerful, and user-friendly way to explore result sets. It enables users to choose limiters that are relevant to their initial search. Facets support an interactive experience where users have the opportunity to progressively refine their search while avoiding zero result sets. The ability to independently combine facets at whim lets users intuitively and effectively explore result sets to find the most pertinent resources. Facets also provide an overview of the search space and the resources that may potentially be relevant. Some challenges for implementing facets and faceted navigation in library discovery interfaces When libraries moved from card catalogs to online public access catalogs (OPACs), the functionality of the catalog was improved in many ways. In particular, keyword search provides access to many parts of the catalog record, such as free text notes, that could not be searched in a card cat- alog. However, many of the capabilities of the card catalog that supported browsing and provided context have never been successfully reproduced in online catalogs. From the early days of computerized library catalogs, there have been calls for modifying OPAC functionality and adjusting cataloging practice to improve the discovery process. In 2023, Coyle could still note that contemporary library discovery interfaces do not provide as much context as card catalogs did nor do they exhibit the “modicum of conversation” or back-and-forth that a card catalog could.3 There are two main ways that library catalog functionality could be improved. The first is to change the system functionality, so that it works better with the existing data. Massicotte represents one of many calls to rethink the way that we present subject headings to users.4 The second is to change our controlled vocabularies or cataloging practices to better utilize the affor- dances of computerized interfaces. For example, Nahotko advocates for new knowledge organization structures that take better advantage of fea- tures of digital interfaces, including faceted navigation.5 Both approaches have potential for improving the usability of facets in library catalogs and thus the overall functionality of our discovery interfaces. Scale Maintaining accurate, consistent, and completely-populated facet values and presenting them effectively at scale can require significant resources. Tunkelang lists three aspects to consider when scaling faceted search: 1) number of documents; 2) number of facet values per document; and 3) searchable text 442 K. MCGRATH per document.6 Although the number of characters and amount of searchable metadata in most MARC bibliographic records is relatively small, most library catalogs contain a large number of records, each of which potentially has many facet values associated with it. A search interface with a large number of records combined with a large number of facets creates user interface design challenges. It also requires significant effort to create and maintain vocabularies and to populate records with accurate metadata to support facets. Finally, it comes with high computational costs. It is challenging to present large numbers of complex facets in a user interface that is still easy to comprehend and utilize. The Norman Nielsen Group says that there is a tradeoff where the “extra power of faceted navigation also adds interaction cost by presenting users with more options to comprehend and manipulate.”7 No arrangement of a long list of possible facets will be optimal for all users. Vocabulary maintenance and metadata creation and quality control to support faceted navigation consume significant financial and human resources. Many libraries found that when they introduced faceted navigation in their OPACs and discovery interfaces, it exposed much inconsistent, incorrect, and missing metadata that required remediation. Over the past fifteen years, the Library of Congress has developed several new faceted vocabularies, such as the Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT),8 the Library of Congress Medium of Performance Terms for Music (LCMPT),9 and the Library of Congress Demographic Group Terms (LCDGT).10 Multiple MARC fields have been added or expanded that can support structured data from these vocabularies or other sources and that can be used to populate facets. Although this expands the possibilities for using faceted search in library discovery interfaces, it also requires cat- alogers to take the time to add these values to new records. In addition, to improve recall, there must be some sufficiently accurate and scalable method for populating existing records with relevant facet values. Even with quality metadata and the best possible user interface design, there are challenges for implementing facets on a large scale. As Tunkelang explains, facets make the presentation of search results much more com- putationally demanding. In a faceted search system, query processing consists of two steps. The first step of retrieving the set of records that matches the query is common to information retrieval systems and can be performed relatively efficiently. It takes significantly more computational resources to identify the associated facets. Many faceted search interfaces also compute the count associated with each facet value, which increases the load even further.11 This constraint has led some major library dis- covery interfaces to provide incomplete recall both in terms of the number of facet values presented to users and in terms of the percentage of records that are used to generate the associated count. CATAloGinG & ClAssiFiCATion QuARTeRly 443 User interface design There are many challenges for populating and presenting facets in a way that is easy to understand and use. Some user interface design decisions are unrelated to the metadata used for faceting, such as whether the facets should be placed on the left, the right, or the top. However, in many cases, design decisions interact with the available metadata and users’ anticipated needs. Such decisions include what facets to provide, how many facet values to display, and whether and how to implement Boolean search strategies. Number and choice of facets Tunkelang discusses this in terms of information overload and users’ scarce attention. He states that both the number of different facets and the number of facet values in a given facet have the potential to over- whelm users.12 There are many different facets that could potentially be useful to some subset of library users and many facets generated from library metadata have long lists of potential values. Tunkelang suggests three considerations to help decide which facets to display.13 The first is to favor displaying facets that have what he calls high coverage, i.e., there are values associated with the facet for most or all records in the collection. In a library context, the vast majority of records have some sort of date of publication coded in the 008 fixed field, so this is an example of a facet with high coverage. In contrast, only a small number of records have the date of creation of the work coded in the 046 field. His second recommendation is to favor facets that produce what he calls a “high-entropy distribution of values” in the result set. By this he means that the facet values are more evenly distributed rather than lop- sidedly clustering around a single value or few values. In the context of many libraries in the U.S., language would be a low information facet because the vast majority of materials are in English. The values in a date of publication facet would likely have a more balanced distribution. Finally, he recommends consolidating facets with similar or overlapping values, particularly if the distinctions are sometimes made arbitrarily. He gives the example of merging authors, editors, and other contributors into a single facet, which is the approach commonly taken in library discovery interfaces. Tunkelang more recently suggested criteria for selecting facets to display dynamically in response to a query.14 The first is popularity, defined as the facets most often wanted by users doing the same or a similar search. On an existing site this can be measured, but is also influenced by the current presentation of the facets and cannot necessarily be obtained for less common searches. Coverage is mentioned again and also utility, which is defined as having a meaningful effect on the search results. 444 K. MCGRATH Tunkelang also discusses several ways to mitigate information overload for facets that generate a high number of values.15 His first suggestion is to exploit hierarchy when possible. Some types of facets lend themselves to this approach more than others. For example, a geographic facet might allow users to nav- igate between the continental, country, state, and city levels. However, intro- ducing too many layers of hierarchy creates its own usability problems. It also makes maintenance of the facet vocabulary more complicated, as it is necessary to keep track not just of the terms, but also the hierarchy. Some potential vocabularies, such as LCSH, do not have an existing hierarchy that is fit for this purpose and it is not easy to retrofit a large vocabulary with a suitable hierarchy. Lacking an existing hierarchy, Tunkelang proposes that facets could either be grouped arbitrarily (e.g., alphabetically A-D, E-H) or statistically by clustering similar values.16 An approach taken in many library catalogs is to limit the facet values shown to the ones that are most common in a given result set. For example, Ex Libris’s Primo Back Office presents the top 20–50 values for each dynamic facet. The facet values displayed “are derived from the values stored in the Facets section of the 2,000 top-ranked PNX records in the search results. Once the system determines which values to display for each category, it will count the matching records from the first 50,000 results per slice and display the count next to the facet value.”17 Alternatively, Tunkelang says it might be possible to only show the facet values that occur more frequently in the result set than in the overall collection.18 The labeling of facets may also be challenging. Some facets may contain similar values and be difficult to differentiate in a way that is easily understood by the casual user. For example, a language facet could com- bine all languages associated with the manifestation into a single language facet as is typically done. Alternatively, multiple types of languages could be distinguished, such as written languages, spoken or sung languages, subtitle languages, and caption languages. A separate facet could also be created for the original language of the work. There may be a disconnect between labels used in library catalogs and user vocabulary. It is also not clear that users think of some categories, such as genres, in the same way as librarians tend to. Crowdsourced sites often to use tags that cover a multiplicity of uses without distinction or in overlapping ways. For example, Upton describes the tag “dark romance” as more of a “content warning.”19 Ordering of facet values In addition to decisions about how many facets and facet values to display, decisions must be made about how to order the values within each facet. There are three main approaches. One is to present the facet values in a fixed order, commonly alphabetical order. CATAloGinG & ClAssiFiCATion QuARTeRly 445 Figure 1. facets for MesH shown in alphabetical order. Figure 2. facets for MesH shown in ranked order. 446 K. MCGRATH An  alternative is to present the facets in a ranked order where the values that are most common in the result set are shown first. Finally, a hierarchical or nested approach may be used to group facet values. The optimal decision depends on the facet values and user needs and is subject to system constraints. For example, a topical Medical Subject Heading (MeSH) facet might be easier to scan if it were organized alphabetically (Figure 1), but if only the top five facet values are displayed by default, it will potentially hide or deemphasize the most frequent values that would be highlighted in ranked results (Figure 2). Even if the default order is optimal for most use cases, it is desirable to allow users to toggle between these two options. In this particular case, a nested presentation with the top-level terms in ranked order might be better than either of the options shown, but this option is not supported by the discovery system used. Boolean operators Some use cases for facets call for more sophisticated strategies that emulate the Boolean operators AND, OR, and NOT. These are more difficult to present to users in an easily understood way. Ex Libris’s Primo supports all three in some situations. Selection from different facets always performs an AND search. Sequential selection of values from a single facet by clicking hyperlinked terms performs an AND search. If a user searches for “biography” and then clicks the topical facet value for “United States,” the user gets just the biographies that have a subject term for United States. At this point, the user can then look at the topical facet again and select another subject, such as “presidents.” They might then select an additional facet, such as “generals,” which will produce a list of resources about American presidents who were also generals (Figure 3). On the other hand, simultaneous selection of values from a single facet using checkboxes performs an OR search (Figure 4). If the user has searched for biography and selected the topical facet “United States” as before and then selects both “presidents” and “generals” simultaneously using the checkboxes, they will get biographies about either presidents or generals or people who were both (Figure 5). Primo follows good user design in that all of the facets exhibit the same behavior. Nevertheless, as Tunkelang points out, “Users are notoriously bad at inferring Boolean logic from subtle cues.”20 There are a couple of situa- tions where library metadata exacerbates this challenge. One is the hetero- geneity of terms that are included in some library facets. A notorious example is the common implementation of topical subject facets where different types of topical aspects are intermixed in a single facet list. Returning to the biography example, a user might naïvely select three of the first four facet values presented, “1900–1999,” “United States,” and “politicians.” CATAloGinG & ClAssiFiCATion QuARTeRly 447 Figure 3. results of a search for biographies narrowed sequentially to the topic facet for united states, presidents and generals to retrieve biographies of americans who were both presidents and generals. Figure 4. Multi-select option in Primo topical subject facet. The user might expect to get biographies of twentieth century American politicians, but what they will actually get are all the biographies of twentieth century persons plus all the biographies of Americans plus all the biographies of politicians with only a subset exhibiting all three characteristics. This can potentially be remedied by splitting the topical facet into subcategories, such as topic, place, and time period, at the cost of increasing information load by presenting a larger number of facets. It is also not possible to split these categories cleanly in some vocabularies, such as LCSH. For example, LCSH marks many named chronological periods, such as “Middle Ages,” in a way that is indistinguishable from standard topics. 448 K. MCGRATH Figure 5. results of a search for biographies narrowed by the topic facet for united states, after which the topic facet values for both presidents and generals were selected simultane- ously. includes biographies of americans who were either presidents, generals, or both. Another difficulty is that some facets are never or almost never useful as OR searches. For example, it is difficult to come up with a use case where selecting “violin” OR “piano” in a medium of perfor- mance facet makes sense. Rather than checking the boxes for violin and piano and clicking apply facets, the user must know that they need to first select “violin” and then select “piano” after the results from their first selection have updated. This will find records that include pieces with a piano part and pieces with a violin part with the caveat that in the case of aggregates, they may refer to two different works in the same publication rather than a piece that has both a piano and a violin part. There are some rare cases where one can envision a user preferring an OR search in a medium of performance facet, such as a vocalist who wishes to retrieve “soprano voice OR high voice OR singer” in order to compensate for varying levels of specificity in descriptions. This particular use case, though, might be better met by functionality that leverages the hierarchy of the underlying vocabulary. CATAloGinG & ClAssiFiCATion QuARTeRly 449 Figure 6. Boolean not option in Primo resource type facet. Tunkelang proposes the heuristic that “facets that are typically singly assigned to documents (e.g., brand, document type) work well with dis- junctive [OR] selection, whereas facets that are often multiply assigned to documents (e.g., consumer electronics features, topic) work well with conjunctive [AND] selection.” However, he simultaneously recommends consistent behavior within a given search interface, which is impossible to align with facets that individually benefit most from inconsistent behav- ior.21 In an ideal interface, users would be able to intuitively choose to use AND or OR to combine facet values in the way that best supports their information need, but this would be complex to communicate. Ex Libris’s Primo also supports the Boolean NOT operator in the form of a red checkbox with a slash through it that appears on mouseover (Figure 6). This functionality is not common on commercial websites and anecdotally public services librarians say that users have to be taught to use this function. It is particularly useful to remove unwanted formats, such as microforms or government documents. It works less well in sit- uations where the facet value being removed applies to only part of the resource. For example, if a user is searching for fugues, but wants some- thing new and decides to NOT out Bach from a creator facet, they will simultaneously remove any fugues by other composers that are part of compilations that include a piece by Bach. Exploratory search and search-free browsing In library catalogs and databases, a distinction is often made between known-item searches and other types of searches, such as subject searches. These other types of searches can be grouped in the overlapping categories of browsing and exploratory search. Facets can support users seeking 450 K. MCGRATH known items by helping them narrow large result sets to the item sought. This is especially useful when the query consists of short, generic search strings, such as “nature” for the journal Nature, or very commonly occur- ring terms, such as when a user is seeking a specific performance of Beethoven’s Ninth Symphony. However, faceted search has even greater potential to improve the browsing and exploratory search process. Browsing and exploratory search encompass a variety of use cases rang- ing from users seeking a particular type of resource, such as recent fantasy novels or textbooks for learning Python, to users exploring an information space that is new to them. McKay, Buchanan, and Chang list a number of use cases for browsing, including “meeting loosely specified information needs, meeting well specified but hard to describe needs with recognition strategies, … refining information needs in view of constructing a query, … ‘social’ information seeking, and … serendipitous discovery.”22 Kules and Capra state that “uncertainty, ambiguity and discovery” are common characteristics of exploratory tasks.23 McKay, Buchanan, and Chang conducted a user study to try to empir- ically determine requirements for effective online browsing systems. They advocate for “purpose-built systems designed to facilitate serendipity and browsing” and emphasize the importance of search-free browsing.24 A faceted interface that works well for both searching and browsing could present the user with an initial screen that features facets as well as a search box. It should also enable a user who has performed a key- word search and then selected relevant facets to remove their search terms, so that they can see all the resources that fall into the category or cate- gories that they selected. Implementation of search-free browsing in library discovery interfaces faces several significant challenges, particularly if the discovery interface aggregates large amounts of metadata from many sources. These obstacles include computational constraints on scaling, heterogeneous facet values that are not consistently present or do not conform to a standardized vocabulary, either because the values come from nonstandardized sources like author keywords or because they come from multiple conflicting vocabularies that cover the same concepts, and the challenge of providing users with large numbers of options without overwhelming them. Library resources are heterogeneous, so it may be difficult to pick facets to display on the initial page of a search-free browsing interface. McGrath points out that LCSH has far too many top-level terms to be useful for topical browsing in this fashion, but classification schemes are more prom- ising.25 Search-free browsing interfaces may work better for smaller, clearly defined subsets of library resources. In 2018 McKay, Buchanan, and Chang published a review of two interfaces that offer search-free browsing: WhichBook, which covers fiction and poetry, and BookFish for young CATAloGinG & ClAssiFiCATion QuARTeRly 451 readers.26 In 2011 McGrath, Kules, and Fitzpatrick developed a prototype for moving images.27 Complexity and multiple entity types One of the most significant difficulties for successfully implementing fac- eted search in library discovery interfaces is the scale and complexity of library metadata. Several structural features of library data make it chal- lenging to implement easily understood facets that produce precise, accu- rate results. These include what Tunkelang describes as “multiple entity types,”28 such as aggregates or compilations and the multiple levels described by the Functional Requirements for Bibliographic Records (FRBR)29 and the IFLA Library Reference Model (LRM),30 which are popularly known as the WEMI (work, expression, manifestation, item) stack. WEMI leads to superficially similar sets of facet values with different meanings, which are either conflated in a single facet list or produce a longer list of similar facets that must have easily distinguished and interpretable labels. Multiple entity types may also cause unexpected and undesirable results when certain combinations of facets are selected. Multiple entity types Tunkelang says that faceted search interfaces are normally based on sets of records for a single type of entity, such as books, which are associated with facets. If a second type of entity, such as authors, is introduced and multiple instances of the new entity (authors) can be associated with a single instance of the original entity (books), it can become problematic. The difficulty arises if one wants to provide access to facets associated with the second entity, such as the nationality or institutional affiliation of the authors.31 Unless one has a way to make sure that the facets related to authors are forced to apply to the same author, faceting for Canadian authors associated with the University of Oregon may also bring back results where the facet values apply to different authors. For example, the result set might include a paper coauthored by an American working at the University of Oregon and a Canadian working at the University of Washington. It is much harder and more complex to design a search interface that provides effective faceting for multiple entity types. This is further complicated by the fact that some attributes of creators of bib- liographic resources are not static and are associated only with a subset of that creator’s works. For example, if a user is looking for symphonies by children, they do not want all the Mozart symphonies, only those that he wrote during his childhood. 452 K. MCGRATH Aggregates Aggregates, such as compilations or anthologies, suffer from a version of the multiple entity type problem. The resource as a whole has a certain set of attributes, such as an editor and date of publication, which can be used as facets. The individual works included in the anthology have a different set of attributes, such as author and date of creation, which can also potentially be used as facets. Because each anthology contains multiple individual works and those works have multiple characteristics, users may get misleading results. The Music Library Association’s Music Discovery Requirements gives the example of a user searching for Beethoven sym- phonies and getting a CD that contains a Beethoven overture and a Mozart symphony, but no Beethoven symphony.32 Aggregates are not an easy problem to solve in our current flat MARC record environment and the current push for indefinite roundtripping between MARC and BIBFRAME prevents any near-term solution. A subfield for “materials specified” ($3) has been introduced for many MARC fields. This allows catalogers to include a free text label for metadata that applies to only part of the resource. It may be helpful for humans looking at a catalog record, but it is not suitable for machine manipulation. The values are not standardized and are prone to typo- graphical errors. The MARC 21 format does include a subfield that is intended to support this type of linking, the “field link and sequence number” ($8). Over two decades ago, McBride advocated for the use of linking subfields to improve access to music,33 but no systems that make input easy or that use these linking fields for discovery have been developed since then. Even if tools and models are developed that allow catalogers to accurately associate metadata with the appropriate works within a compilation and also enable discovery systems to present this data usefully, the immense corpus of existing records remains an obstacle to accurate retrieval as any automated remediation would be extremely challenging and error prone. McGrath has referred to this as the “Humpty Dumpty problem.”34 All the pieces may be present in the record, but it is hard to imagine how they can all be linked up again without manual human intervention. WEMI (work, expression, manifestation, item) There is yet another multiple entity lurking within bibliographic metadata. The IFLA Library Reference Model,35 building on the Functional Requirements for Bibliographic Records or FRBR36 describes four entities related to bibliographic resources: work, expression, manifestation, and item (WEMI). These are defined as follows. CATAloGinG & ClAssiFiCATion QuARTeRly 453 • Work: “The intellectual or artistic content of a distinct creation.” • Expression: “A distinct combination of signs conveying intellectual or artistic content.” • Manifestation: “A set of all carriers that are assumed to share the same characteristics as to intellectual or artistic content and aspects of phys- ical form. That set is defined by both the overall content and the pro- duction plan for its carrier or carriers.” • Item: “An object or objects carrying signs intended to convey intellec- tual or artistic content.”37 All of these entities have multiple attributes which end users may be interested in and which could be presented as facets in a library discovery interface. For example, each entity has a date of creation, as well as one or more creators. As mentioned above, those creators also have relevant attributes. While troubleshooting some facets, the author once encountered a book that came up under a search for poetry with the creator demo- graphic facet limited to African Americans and the original language facet limited to old French. This seems an unlikely combination, but the book contained a translation authored by an African American (an expression) of a text originally in old French (the work). This is further complicated by the fact that not all bibliographic models use all of the entities separately as described above and some introduce additional entities. Resource Description & Access (RDA) uses all of the LRM WEMI entities.38 BIBFRAME, the proposed successor to MARC, posits somewhat different entities: Hub (which has no exact WEMI equiv- alent), Work (which includes both the LRM work and expression entities), Instance (or LRM manifestation), and Item.39 Share-VDE’s BIBFRAME implementation replaces Hub with a different entity called Opus.40 Other variations are conceivable, such as a combination of entities for Work plus Representative expression, Expression plus Manifestation and, where needed, a separate Expression entity for bundled expression attributes, as used in the prototype moving image search interface described in a 2011 Joint Conference on Digital Libraries presentation.41 In the MARC bib- liographic records used by most current systems, all of these types of information are stored in flat records. Reconciling all these different per- spectives is difficult. Coyle notes that the disjoint nature of the LRM WEMI entities “makes it difficult to combine metadata using different entity definitions, such as the difference between BIBFRAME’s work-in- stance-item and RDA’s work-expression-manifestation-item.”42 She advocates for “a minimally constrained set of classes and relationships that could form the basis for a useful model of created works” to help mitigate this clash in worldviews.43 454 K. MCGRATH Dates: a case study in complexity Dates are a good example of the challenges of presenting facets for com- plex information in a useful, comprehensible way. Dates may be related to all three of the categories of multiple entity types described above. In addition to dates associated with the whole resource described by the record, there are multiple dates potentially associated with the parts of the resource, with the WEMI entities related to the resource and its parts and with the agents related to the resource and its parts. It can be chal- lenging to identify accurate, consistent date information. Some date infor- mation is impossible for catalogers to know. Some dates are impossible or impractical to ascertain with a level of specificity that aligns with the majority of library metadata. It is difficult to encode the complexities of dates in machine interpretable form and it is also difficult to unpack that data again to interpret it for discovery interfaces. The gradual, undirected evolution of the MARC 21 format in response to the needs of the moment has created additional difficulties for interpreting date data in MARC records. Finally, developing a user interface design that presents all of the date information associated with bibliographic resources in a way that is easy to understand and use poses its own challenges. First of all, there are many types of dates that can potentially be asso- ciated with bibliographic resources, many of which involve multiple entity types. For example, different dates may be associated with the different levels of the WEMI stack. The most commonly used date facet in library catalogs is the date of publication or creation of the manifestation. Manifestation date is also the date that is most reliably encoded in machine-processable form in existing bibliographic records. A common use case for faceting on manifestation date would be to find the most recently published materials on some topic. However, for many date-related use cases, the date of the work more accurately embodies what users are seeking. A reprint of an older book may have a recent publication date, but will not meet a user’s need for up-to-date information. Most users are interested in the date of release of a movie not of the DVD. Users may also be interested in the date of an expression. For example, a user might want a recent annotated edition of Shakespeare or be interested in early recordings of performances of Verdi operas. Expression dates are further complicated by the potential for multiple layers. An annotated edition of an English translation of Homer’s Iliad is potentially associated with the date of the translation and the date of the annotated edition, both of which may be different from each other and earlier than the date of publication of the manifestation owned by the library. A recording of a performance of an arrangement for orchestra of a piano piece is asso- ciated with both a date of arrangement and a date of performance. There CATAloGinG & ClAssiFiCATion QuARTeRly 455 may also be cases where users are interested in the date of production of a particular item that belongs to a particular manifestation. In both MARC bibliographic and authority records, the date of the work or expression may be encoded in machine-actionable form in the 046 (special coded dates) field. These dates may also be encoded in textual form in the 388 (time period of creation) field. The 046 field is an excel- lent example of a jerry-rigged, Rube Goldbergesque MARC field. It has been modified and expanded over time to deal with various use cases and to accommodate an increasing variety of dates in increasingly nuanced ways. The history of the 046 field and attempts to use it to record dates for works and expressions is discussed in more detail in the appendix. The net result of this agglomerative approach to developing metadata schemas is that the 046 field does not always clearly answer the question: what are the creation date or dates of the work or works contained in this book or other resource? It is not structured to answer this question and this situation is further complicated because this data has been recorded according to different standards at different times. Therefore, it is difficult to create a facet with values that are clearly defined and mean the same thing. If users are most interested in the dates of creation of the works within resources rather than the dates of aggregation or com- pilation, existing MARC metadata does not easily meet that user need due to a combination of inconsistent metadata and the need for complex logic to isolate the correct date or dates. The challenges of using dates from the 046 field are compounded by the use of the Extended Date/Time Format (EDTF), a complex standard capable of conveying detailed information about approximate dates, date ranges, and degrees of certainty.44 This data can be difficult to translate into a clear form for display and even more difficult to integrate into a list of facet values for dates. Beyond these structural and technical problems that are theoretically resolvable, there are more intractable problems surrounding metadata quality and type. An ideal date facet would have complete coverage with accurate exact dates. These dates should be determined in a consistent manner, such that different catalogers will independently arrive at the same value. The dates should also be at the same level of specificity. In library discovery interfaces, dates in facets are most commonly specified at the level of the year. Facets work best when precise values can be known and supplied for all the resources being described. This is much easier to achieve in most retail applications than in library catalogs. For example, there are many cases when a precise date is not known or is impractical to determine. Particularly for older titles, this may be because the information has been lost. For certain types of works, such as works that began as oral literature, 456 K. MCGRATH it is not clear that an exact date of creation even makes sense. For some works, it is only possible to determine an approximate date. For short time spans, such as a play written in 1667 or 1668, it is reasonable just to facet on both dates. Longer date spans could potentially be handled by creating a value in the date facet for each year of the time span. However, this creates usability and technical challenges. It is more difficult to see how to effectively incorporate such assertions as “written in or before 1984,” which are also permitted in the 046 field. It is also not clear whether and under what circumstances a value of “unknown” might be usefully included in a date facet. Even when research could determine specific dates, the time and effort involved may be prohibitive. This is particularly true for aggregates, such as The Norton Anthology of Poetry, which may contain a very large number of aggregated works. In order to provide accurate information, the cata- loger would have to determine and record the dates of creation of all the individual works within the anthology. Unfortunately, this information may be difficult to ascertain and time-consuming to record. Alternatives to recording precise dates include recording a date range or a named time period. For some resources, particularly literary anthologies, the creation dates of the aggregated works may be referred to only by a named time span, such as the Renaissance or the Middle Ages. The 2014 MARC pro- posal for the creation of field 388 (Time Period of Creation) listed five situations where it could be helpful to record a named time span. • Historical or cultural periods are often difficult to date exactly, and specific dates may differ from location to location (e.g., the Renaissance has no exact beginning and ending dates and began earlier in Italy than in other parts of Europe). • There is overlap in time periods (e.g., the dates of the late Middle Ages, the Renaissance, and the early modern period overlap and more information is needed for context). • The significance of date spans differs from place to place or culture to culture (e.g., Middle Ages in Europe, Song dynasty in China). • It is not possible to precisely date the creation of some works (e.g., Beowulf, the Iliad). • Editors and publishers of aggregate works are often intentionally vague (e.g., it might be difficult and time-consuming to identify exact dates of creation for an anthology of fiction written during the period of World War I broadly defined to include the buildup to the war and postwar reconstruction).45 The difficulty with recording only a named time span is that it is not easy to see how to integrate those periods into the same facet as numerical CATAloGinG & ClAssiFiCATion QuARTeRly 457 dates. Alternatively, creating two facets is likely to be confusing to users and both facets will suffer from a lack of recall. Catalogers could also choose to record a numerical date range that is approximately associated with the named time span or the dates covered by an aggregate. This approach comes with its own drawbacks and poten- tial to produce misleading results. The fundamental problem is that for a date facet that is populated by years, there is no optimal way to map date ranges where not all dates in the range correspond to work creation dates. An anthology of 1950s science fiction stories that includes stories from every year in the decade, could accurately be mapped to each year of the decade (e.g., 1950, 1951, etc.). However, if an anthology that says it is a collection of twentieth century science fiction short stories does not include any stories from the 1910s, providing facet values for every year or even every decade of the twentieth century would not produce accurate results. A book that calls itself an anthology of Elizabethan drama might be given the date range of 1500–1600 based on the Library of Congress subject heading “English drama $y Early modern and Elizabethan, 1500–1600.” However, if the contents were only from the late sixteenth century, a user selecting 1400–1550 would not actually want this book. Depending on the system-supplied tools, it may also be difficult or impossible to map all the dates in a date range to individual years. For example, Ex Libris’s Primo has the ability to perform simple transforma- tions of metadata values, including the use of regular expressions. This has enabled the Orbis Cascade Alliance to create an original date facet populated by date ranges such as decades or centuries (Figure 7). However, Primo’s tools do not reproduce the capabilities of a true programming language with features such as mathematical functions and loops. This makes it difficult or impossible to map date ranges to separate values for all the individual years within a range. In addition to lack of information or imprecise information, there may also be multiple ways of defining and calculating a particular date, such as the date of a work. For a film, the date of production, date of original release, and copyright date could all potentially be used as the date associated with a work. There may be variation in what date is recorded based on what information exists, what information is available to the cataloger, and what sources are preferred. In film reference sources, it is not uncommon to see a one-year discrepancy in the date associated with a title, which is presumably based on the method used to calculate the date. There are less common situations where the defi- nition used can have a significant impact. For example, Eisenstein’s film Ivan the Terrible, Part II was completed in 1946, but not released until 1958.46 458 K. MCGRATH Figure 7. orbis Cascade alliance original date facet featuring predefined date ranges. Figure 8. example of Primo tool for manually entering date of publication ranges. Finally, there are user interface considerations. There are two main ways that dates can be presented to users. Users may be asked to select the start and end date of a date range. This can be implemented either by having users manually type in their desired beginning and end dates (Figure 8) or by providing some sort of slider or other widget for selecting a customized date range (Figure 9). Alternatively, dates or date ranges may be presented as links for predefined ranges (Figures 7 and 10). The amount of granularity supported by a facet depends on the range of values in the data. Library metadata generally specifies dates at the level of years or sometimes in terms of a range of years. Because a long list of years would be unwieldy, facets are usually presented as ranges of years. Some interfaces combine these approaches. For example, WorldCat provides predefined date ranges for recent time spans, such as the last five years. For users for whom this is not sufficient, a cus- tomizable date limiter where years can be manually input is provided (Figure 10). CATAloGinG & ClAssiFiCATion QuARTeRly 459 Figure 9. example of date slider for date of publication ranges from stanford university libraries’ Blacklight interface. Heterogeneity of library metadata Taylor’s well-known definition states that facets are “clearly defined, mutu- ally exclusive, and collectively exhaustive aspects, properties, or character- istics of a class or specific subject.”47 Facets are most effective when populated by values that share these characteristics. This requires quality metadata with the attributes defined by Park: completeness, accuracy, and consistency.48 Despite the best efforts of metadata professionals, several forces work against achieving the necessary level of data quality to support accurate facets. There are, of course, fiscal constraints that limit the amount of money that is available to hire catalogers or others to spend the time to add or perform quality control on metadata to populate facets. Due to these resource constraints, completeness will always be aspirational in any large database. Clearly defined facet definitions improve accuracy of assign- ment and increase ease of user understanding of the terms used. Consistency, in particular, is also hobbled by the wide variety of approaches to recording library metadata and the sources of that data. This diversity is found both within standard MARC bibliographic records handcrafted according to cataloging rules and with metadata generated by other processes, such as vendor records and metadata from institu- tional repositories and digital asset management systems. Although 460 K. MCGRATH Figure 10. example of combination of named date ranges and manual input for date of pub- lication from WorldCat. MARC bibliographic records created according to cataloging standards are more consistent with each other than records from other sources, there are nevertheless many variations. Cataloging rules and practices have changed over time and cataloger judgment varies. Although most MARC bibliographic records are created according to RDA/AACR, there are also records created following numerous other standards. For many fields in MARC records, there are also numerous potential controlled vocabulary sources. This leads to duplicative variants and the presence of near synonyms in lists of facet values presented to users. Figure 11 shows some of the subject facets retrieved with a search for Tolstoy. Tolstoy’s name appears in two forms, Tolstoy and Tolstoj. The list includes both “Literature” and “Literatur,” as well as both “Fiction” and “Nouvelles.” It also includes some headings that lack context, such as “1900 1990” from FAST and “18 53 Russian literature,” a term from the Dutch Basic Classification.49 The top four subject facet values that result from a search for the keywords “climate change” scoped to consortial records in the University of Oregon’s Primo include “Climatic changes” from LCSH, “Climate change” from MeSH and “Climat--Changements” from Répertoire de vedettes-matière. These inconsistencies only multiply when library CATAloGinG & ClAssiFiCATion QuARTeRly 461 Figure 11. Heterogeneous list of topical subject facet values resulting from a search for tolstoy in Primo. discovery interfaces incorporate data from non-MARC sources. Expanding the search for “climate change” to include external metadata from Ex Libris’s Central Discovery Index, mostly from articles, causes the subject facet list to shift to include many broad categories that are not present as subjects in typical MARC records, such as the new top match of “Science & technology.” 462 K. MCGRATH The usability of the Primo topical subject facet, as shown in Figure 11, is impaired because it mixes terms from different vocabularies with dif- ferent structures, as well as uncontrolled keywords. These vocabularies are in multiple languages and may have different meanings or be at different levels of specificity. This lack of consistency leads to lists of terms within a facet that are not mutually exclusive and perhaps not collectively exhaus- tive. Lack of consistency reduces both precision and recall and makes the facet list more confusing and less effective for users. Overlapping and duplicative headings waste prime space at the top of the heading list and require users to select multiple terms if they want comprehensive results. They also reduce variety in the top results and prevent users from easily noticing other aspects or terms that would be helpful. This problem is exacerbated in systems like Ex Libris’s Primo that only display a limited number of facet values, commonly twenty. Keeping only a preferred vocabulary and suppressing the others reduces redundancy and increases variety, although it does not solve the recall problem when not all records contain terms from the selected vocabulary. Populating a single facet with terms from multiple vocabularies can also conceal differences in the meaning of the same term, since terms from different vocabularies may have different definitions. For example, a genre facet could include both the term “Drama” from LCGFT, which is limited to stage drama, as well as “Drama” from FAST, which is used both for stage drama and for film, television, and radio dramas. This conflation means that users are unable to isolate stage drama because it is combined with other forms of drama. Terms may also be used in different ways. The Library of Congress subject heading “children” covers birth through 12 years of age, but the superficially equivalent MeSH term “child” covers only ages six through twelve. The differing structure of the two vocabu- laries also reduces precision. LCSH is precoordinated and “children” by itself is only used for works that are narrowly focused on children as a topic. MeSH is a post-coordinated vocabulary. For the types of resources where MeSH terms are assigned, children as a concept rarely or never occurs as the main topic of a resource. Rather “child” is used to mark that a resource discusses some disease, treatment, or other topic in rela- tionship to children. This problem can be mitigated, at the cost of a potential loss of recall, by populating facets with values from only a single controlled vocabulary. In some contexts, libraries may use different vocabularies for different purposes, such as supplementing a general-purpose subject vocabulary like LCSH with additional vocabularies that provide more granular access to certain topics or that support diversity, equity, and inclusion goals. In this case, some sort of mapping or other process to coordinate the vocabularies, so that the union of the vocabularies produces a coherent list of facet CATAloGinG & ClAssiFiCATion QuARTeRly 463 values that are not redundant and do not conflict, is desirable. Metadata remediation to ensure the presence of standardized terms can also improve the situation. For example, Ma describes a project to clean up variations in facet values in a digital collection of oral histories.50 Heterogeneity of library resources Despite the popularity of collections of tools, board games, or other objects in some libraries, the stock held by libraries is not as diverse as that of a giant ecommerce site like Amazon. Nevertheless, there are many cate- gories of resources held by libraries that could benefit from specialized subsets of facets. In addition to a wide variety of textual resources, most libraries also hold at least some other types of materials, including scores and musical recordings, films and videos, maps, games and objects, images and pictures, or computer software. Textual resources also vary and include not just mainstream published monographs, but journals, articles, and primary sources. Many genres, such as literature, biographies, and dance, also have specialized characteristics. On many ecommerce websites, this problem is addressed by mapping the items in inventory to categories. When a user searches, their search terms are matched to one or more of these categories. In some cases, the user is able to select a category. For example, a search on Amazon for “Beethoven” brings up facets for departments, such as music and books. Selecting the book department brings up categories appropriate to books, including subcategories like biography and history, formats such as paper- back and board books and language. Selecting music brings up some similar categories, including subcategories and formats, but also some different categories like edition and an option to exclude explicit lyrics. There are many cases where a single search in library catalogs across formats and genres is optimal, but it would be beneficial to be able to also offer users with more specialized needs the ability to hone in on the characteristics of the resources that they are seeking. Many of the cate- gories of library resources discussed above could benefit from focused lists of facets. In addition to developing user interfaces with this capability, it is also necessary to be able to identify the resources that fall into a given category and to ensure that a sufficient percentage of the records for those resources contains adequate metadata to support those facets. For many types of resources, this is challenging. For example, it is not easy using existing MARC metadata to definitively identify records that describe resources that consist of or contain literary works. Much of the metadata that would be useful as facets, such as genre, date of work, original language, and creator demographic characteristics, has not tradi- tionally been recorded and is not consistently added to new records. 464 K. MCGRATH Retrospective work is challenging due to the large number of records and inconsistent practices. Other potential categories are easier to identify. Records for some types of resources are more amenable to retrospective data remediation as past and current cataloging practice has been to more consistently record information as structured data that can be used for facets. Perhaps the most promising category of this sort is music. Correctly coded records for scores and audio recordings can be identified by the record type in the leader of the MARC record. Video recordings of musical performances are more challenging, but many of them can probably be identified from subject or genre headings and marked with an additional RDA content type of “performed music.” Incorporating actual musical instruments or resources about music would be more challenging. Below I will discuss some issues with providing faceted access focused only on musical resources themselves. Music: a case study Musical resources are complex and have many unique characteristics. There is also a long history of detailed and relatively consistent cataloging by metadata professionals with deep domain knowledge. Many of the search needs of users of musical resources cannot be effectively met with keyword searching in contemporary discovery interfaces. Music specific facets have the potential to make searching for music resources simultaneously more powerful and easier. Medium of performance is a prime example of an unmet information need. Musicians and music researchers often wish to search by instrumen- tation or voices in a piece of music. Historically, this has been difficult to do in library catalogs. With the introduction of the Library of Congress Medium of Performance Terms (LCMPT) vocabulary and the implemen- tation of the 382 (Medium of Performance) field in MARC 21, it has become more common to record this information in a structured manner in library metadata. Catalogers find it easier to enter than the previously available coded fields and the Music Library Association in conjunction with Gary L. Strawn has developed a widely used macro that aids in accurately entering this data into new and existing records.51 More com- pletely populated data has created incentives for discovery interfaces to provide access to this information. For example, the Orbis Cascade Alliance has created several facets based on LCMPT in the 382 field of the MARC record.52 There are three main challenges for using medium of performance from the 382 field in discovery interfaces. One is the complexity of the structure of the field and the need in many cases to manipulate the data before it can be presented to users. The second is the complexities associated with CATAloGinG & ClAssiFiCATion QuARTeRly 465 certain types of medium of performance information that cannot be accu- rately encoded in the 382 field. The third is the problem of false drops associated with aggregates or compilations. The Orbis Cascade Alliance has created three facets based on medium of performance information from 382. Creating the values to populate these facets ranges from straightforward pulling of the data from MARC to presenting facet values that are significantly transformed from the data in the underlying MARC record. The medium of performance facet is populated with the individual names of each instrument given in the 382 field (Figure 12). The data is not manipulated for faceting other than mapping repeated subfields that occur in a single instance of 382 inde- pendently and double posting instruments and voices that are identified as solos both under the plain instrument name (“piano”) and under the instrument name qualified by the word solo (“piano (solo)”). The number of performers facet is similar (Figure 13). The data is only slightly manip- ulated for display. The word “performer” or “performers” is added after the bare number given in the MARC data and any number of ensembles is mapped to the generic term “ensemble(s).” The final facet is called medium of performance statement and attempts to represent the complete instrumentation for a piece (Figure 14). This facet requires the most manipulation, which necessitates a system that is capable of transforming data before use and is subject to the constraints of such systems. Alternative and doubled instrumentation is dropped from the medium of performance statement facet because it cannot be presented usefully with the tools available. To promote readability, the number of performers is dropped when it equals one. This also improves consistency by compensating for cases where the cataloger has omitted the “1.” This approach does, how- ever, also conflate single performers with instances where the number of performers is not specified or varies and thus is not recorded. Higher numbers of performers are surrounded with parentheses and each instru- ment is separated with a semicolon. Some inconsistencies in the under- lying data impact performance of this facet. In particular, there is no prescribed order for listing instruments in 382. In Figure 14, “violin; cello; piano” and “piano; violin; cello” are listed separately because they have been entered differently in the MARC record. Compensating for this would require more tools than Primo provides and ideally would be done by making the original metadata more consistent. The medium of performance statement facet also suffers from an occasional mismatch between the atomized terms used in 382 and a collective term in common use. For example, users may be seeking a piece for “string quartet,” while the facet used is “violin (2); viola; cello.” Although the 382 field for medium of performance is complex and includes numerous subfields that enable catalogers to record nuances 466 K. MCGRATH Figure 12. the orbis Cascade alliance music medium of performance facet. Figure 13. the orbis Cascade alliance music number of performers facet. of the instrumentation and voices used, it nevertheless falls short of being able to accommodate all situations accurately and with sufficient granularity. One example of this is the number of performers, pianos, and hands involved in the performance of certain piano music. Another situation that presents challenges is the way that a single per- cussion player may use multiple instruments within a single piece. This can be recorded as either a single percussion player or a CATAloGinG & ClAssiFiCATion QuARTeRly 467 Figure 14. the orbis Cascade alliance music medium of performance statement facet. if the instruments are listed in different orders in the metadata, they do not collocate, as seen in the two entries for violin, cello and piano. single performer doubling on multiple specific percussion instruments. The 382 field can be repeated to bring out both aspects, but this is not always done, which leads to data inconsistency. The need to repeat the field also creates extra work for catalogers. Problems with and potential solutions for more complex medium of 468 K. MCGRATH Figure 15. selecting facets for violin and piano may find compilations with a piece for piano and a separate piece with a violin part. performance situations are discussed in Szeto (2022), Lee and Robinson (2018), and Lee (2017).53 The potential for misleading combinations of facets related to different components of aggregates is particularly common when dealing with musi- cal recordings. Most musical recordings include more than one piece. Each individual piece is commonly described at the level of the piece rather than with a single, broader term that is meant to encompass the shared characteristics of the whole, as is commonly done with literature. For example, a user who searches for violin and piano and then limits with those facets, may still retrieve resources that include some pieces for piano and some different pieces for violin with no overlap (Figure 15). There are many other types of data unique to music that could be used to create helpful facets. The Orbis Cascade Alliance has created facets for musical key, music number, composer, and performer. The composer and performer facets consist of names from the MARC 1XX and 7XX fields that are associated with relator codes or relator terms for those roles.54 The composer facet includes additional logic where any 1XX on a score record or any 1XX on a record for a musical recording with a uniform title in a 240 field is mapped to the composer facet. Names in 7XX name-title fields with second indicator 2 for component part or a rela- tionship in $i indicating that the name and title represent a work within the resource are also mapped to the composer facet on records in the music format. The Music Library Association issued a report on music discovery requirements that makes extensive recommendations on how facets can profitably be used to improve access to music materials, which influenced some of the work of the Orbis Cascade Alliance.55 Assorted metadata-related issues Hierarchy, nesting, specificity Tunkelang points out that hierarchical or nested facets can be an effective way to present facets with long lists of values to users. However, if each top-level facet value has a large number of values immediately under it or, alternatively, the facet hierarchy is many layers deep, this approach potentially introduces usability issues because of increased information load and the complexity of navigating a hierarchical tree.56 In addition, the vocabulary used to populate the facet must have a hierarchical CATAloGinG & ClAssiFiCATion QuARTeRly 469 Figure 16. nested facets for resource types in WorldCat. structure and the library discovery interface must be able to interpret and present this structure in a useful way. Most library catalogs make little or no use of hierarchical or nested facets. One example of nested facets is WorldCat’s format facet (Figure 16). McGrath notes that being able to navigate to different layers of a hier- archy in facets gives users the power to adjust their searches depending on their needs and the number of resources in a given category. She gives the example of searches for “everything about communicable diseases (broad) in Kenya (narrow) or AIDS (narrow) in Africa (broad),” which demonstrates the need to support differing levels of specificity based on users’ require- ments.57 Supporting multiple levels of specificity also helps users perform more effective searches depending on the number of resources available. The combination of facets for nineteenth century works originally in English will be more effective if it can be combined with specific genres of poetry rather than just the broad category of poetry. However, if the user has additionally selected works by children or if they are looking for nineteenth century poetry originally in Polish, they may be better served by a facet value for poetry that combines all types of poetry, including subgenres. Flexibility for users in navigating hierarchies can either be provided by discovery interfaces that leverage the syndetic structure of controlled vocab- ularies or by selectively assigning applicable terms from different levels of the hierarchy. For example, if a user is seeking nineteenth century novels written by Americans and set in Mexico, but only the more specific creator demographic group terms such as New Yorkers and Illinoisans are present in the bibliographic record, without system support, the user must manually check for authors from every state and city in the U.S. In the Orbis Cascade Alliance’s former Primo Back Office discovery interface, they compensated for the lack of support for nested facets in Primo in their Medical Subject Headings (MeSH) facet through double posting. Base headings with no subdivisions (e.g., “Kidney Diseases”) were 470 K. MCGRATH posted under both “Kidney Diseases” and “Kidney Diseases (general).” Base headings with subdivisions (e.g., “Kidney Diseases $x therapy”) were posted under both the full string “Kidney Diseases--therapy” and the base heading “Kidney Diseases.” This resulted in a list like the one shown below. The display would have been improved if all of the headings with the base heading of kidney diseases could have been nested under plain “Kidney Diseases” with the heading qualified by general coming first in the list when the hierarchy is expanded. • Kidney Diseases (2) • Kidney Diseases (general) (1) • Kidney Diseases--therapy (1) Many potentially useful facets in library discovery interfaces would benefit from hierarchical or nested navigation, including formats, topics, genres, geographic areas, time periods, and languages. LCSH is particularly challenging in this regard due to its large size; high number of top-level terms; and idiosyncratic, incomplete syndetic structure. LCSH was not developed in accordance with modern guidelines for developing thesauri and much of its syndetic structure was retrofitted. As Svenonius points out, the Library of Congress took the “quick-fix” approach and used an automated process to convert all of its historically inconsistent see and see also references to broader and narrower terms in one fell swoop, with only “a few ‘Band-Aid’ reparations … to fix some of the more egregious structural deficiencies.”58 Lead-in terms and cross-references Facets make it easier for users to take advantage of controlled vocabularies in some ways. For example, facets enable users to recognize relevant terms without having to come up with them themselves. Users may not neces- sarily know in advance what terms might be used in the library catalog to describe the resources they are seeking. When facets are populated with terms from a consistent controlled vocabulary, they guide users toward search terms that are most likely to lead to their success. As Buckland notes, it is “easier for a person to recognize pertinent terms than to predict them.”59 However, unless they are performing a search in a system that supports browse without search, users may not retrieve all relevant results if their initial search terms do not include the standardized term in the relevant facet. For instance, a user might search for “World War I” and then select “World War, 1914–1918” from an LCSH-based facet. Without removing the initial keyword search, the user will not retrieve everything with the subject heading “World War, 1914–1918,” since all of those records will not include the string “World War I.” This type of situation can be CATAloGinG & ClAssiFiCATion QuARTeRly 471 mitigated by query expansion where synonyms or terms based on LCSH cross-references are included in the initial result set. There may also be a disconnect between phrases entered by users (e.g., “African-American lesbian poets) and the atomistic terms used in a faceted vocabulary (e.g., “African Americans” plus “Lesbians” plus “Poets”). Query expansion based on stemming can help with many of these situations. However, sometimes the mismatch is less straightforward. Users may seek musical works for “string quartets,” but LCMPT describes this by listing the instrumentation separately (“violin” plus “viola” plus “cello” or, alternatively as something like “violin (2); viola; cello”). It is theoretically possible to design a search interface that will compensate for known instances of these mismatches, but it is more complex. Coverage, recall, and retrospective enhancement of bibliographic metadata As described above, for good coverage in facets, comprehensive structured metadata describing attributes of interest is necessary. Because the MARC format was created in order to print catalog cards in the late 1960s, it originally featured only a small amount of structured metadata mostly in the form of single characters that stand for values rather than in a form that is comprehensible by humans without a key. Some other data is in the form of structured strings that are prone to typos and errors. Over time MARC 21 has shifted away from its original focus on printing and emulating catalog cards and more and more structured elements have been added. Many factors have led to the addition of more fields and subfields to the MARC 21 format that are intended to be machine actionable. These include an increased understanding of the value of machine-actionable metadata, the development of newer and more complex cataloging stan- dards, the creation of a number of new faceted Library of Congress vocab- ularies and the demands of various user communities. This means that there are many more attributes related to bibliographic resources recorded in MARC records that can potentially be used to populate facets. However, current library discovery interfaces do not make optimal use of them. Presenting users with a large number of choices without overwhelming them is, of course, one challenge. Another frequent reason given for not providing access to these potential facets in discovery interfaces is concern that lack of coverage will mislead users. No large-scale information retrieval system will have perfect recall, but in many catalogs legacy records that do not include data in newer fields and subfields far outnumber newer records that include this metadata. Tunkelang lists two situations where recall is critical for user success.60 One is when searches would otherwise return few or no results and sparsely-populated facets cause users to believe that the library does not provide access to relevant resources. He notes 472 K. MCGRATH that it is also important when “searchers care about aggregate information about the results, such as the total number of results or the distribution of attributes of those results.” The only practical remedy for this situation is automated or semi-au- tomated methods for adding this data to existing records. As discussed previously, there has been progress in this area for records describing scores and musical recordings through a macro-based approach.61 However, many of the other enhancements that would be desirable will be much more challenging. The music formats have two significant advantages as targets for this kind of metadata enhancement. First of all, the target set of records can be easily identified. The macro works only on scores and musical sound recordings. These can be reliably identified by the record type in the MARC leader, which is required and is rarely incorrect except for some older records for electronic resources or for scores put on book records. Some spoken recordings and recordings of sounds may be incor- rectly coded as musical sound recordings, but the vast majority of records coded as scores and musical sound recordings describe resources that actually fall into those categories. The second factor that makes data remediation for music records a more tractable problem is that music catalogers have traditionally recorded more information in a more con- sistent fashion, so in most cases the data that the macro needs to generate the new fields is present in the record. Much of the data is in the form and of a type that can be accurately parsed and transformed without human intervention. The new fields generated may be incorrect or incom- plete, but they will reflect the existing metadata and be no less accurate. Records may also contain conflicting metadata in different fields, such as genre and medium of performance in Library of Congress subject headings vs. what’s found in 047 (Form of Musical Composition Code) and 048 (Number of Musical Instruments or Voices Code) coded data. An auto- mated process has no way to resolve these discrepancies unless it were sophisticated enough to identify and access external authoritative data. The American Library Association’s (ALA) Subject Analysis Committee’s (SAC) Subcommittee on Faceted Vocabularies (SSFV) has begun working on the problem of retrospectively enhancing bibliographic records with data to support faceting, such as genre/form terms from LCGFT. They have written a paper providing an overview of the issues and listing many types of information in bibliographic records that would benefit from metadata remediation.62 SSFV has developed provisional mappings from LCSH form/genre subdivisions to LCGFT and from selected fixed fields to both LCGFT and LCDGT.63 The subcommittee has begun work on mapping LCSH for literature to LCGFT. This is more challenging for several reasons. It is harder to reliably identify records that consist of or contain literary works. Records CATAloGinG & ClAssiFiCATion QuARTeRly 473 for literature, especially older records, are also less likely to contain genre or form information in any form than records for music, which means that there is no data for an automated process to work with. The Subject Headings Manual tells catalogers not to record genre/form terms in MARC field 650 (topical subject headings) for individual works of fiction and literature, so if LCSH is correctly applied, these terms are only recorded for anthologies.64 Even when a record contains an LCSH term for genre or form, it may be difficult to reliably determine whether the resource consists of or contains that genre or form or if the resource only contains criticism. For both users and catalogers, the difference can be very subtle. In LCSH, “Symphonies” is used to describe resources that contain symphonies while “Symphony” is used for resources about symphonies. Often the same base term is used with some sort of sub- division appended to indicate that something is or contains criticism, for example “Poetry” for poetry and “Poetry--History and criticism” for works about poetry. This distinction is not always reliably made in practice. In particular, some older records describing critical resources may lack the subdivision while records that describe resources that con- tain both poetry and criticism of poetry do not necessarily include both subject headings. Although this method is more error-prone, data is sometimes available in the record in free text fields, such as notes, and work is being done to find automated methods to map it to structured data. Progress in machine learning and natural language processing in combination with the ability to match entities described in bibliographic records with exter- nal, trusted descriptions of those entities elsewhere also represents a pos- sible approach for enhancing library metadata. Structured metadata and metadata preparation Even when structured fields exist to record attributes of interest and data in these fields is sufficiently populated within the dataset, the data in some fields requires additional processing and transformation before it can be presented to end users. For example, the 008/22 target audience fixed field in the book format contains data like “a,” which must be trans- formed into its meaning of “preschool” before it can be used in a facet. Some fields, such as 382 medium of performance, may require even more transformation before they are suitable for display or faceting. Some vocab- ularies would also benefit from more complex mapping. For example, the Program for Cooperative Cataloging (PCC) has recently issued guidelines for recording ISO 639-3 language codes in MARC records.65 However, few or no library discovery systems are capable of mapping these codes to words out-of-the-box. For an optimal user experience, some of these codes 474 K. MCGRATH should be mapped to more than one facet value. For example, ISO 639-3 includes both a code for Chinese as a collective macro language for all varieties of Chinese and codes for individual varieties, such as Mandarin and Cantonese. In order to have good recall for users who select the facet value for Chinese that facet value must also bring up the records that are coded for Mandarin and Cantonese. This means that effective implementation of facets in library discovery interfaces depends on the capabilities of the software being used in com- bination with deep understanding of the metadata and likely user needs. Local control over the data being displayed, indexed, and faceted in library discovery interfaces varies greatly. With some online catalogs, the library may have no control over display and facets or the library may only be able to choose which fields and subfields to use without being able to manipulate them in any way. Other products, such as Ex Libris’s Primo, support much more powerful manipulation of the metadata by local insti- tutions for use in their discovery interface. Open-source discovery inter- faces, such as Blacklight, with sufficient investment offer even greater flexibility and power. Even in these cases, there are limits that may prevent some desired transformations. An alternative approach would be to alter the underlying metadata in some way, but this may lead to problems with nonstandard metadata. One question that is not asked enough in the library metadata world is what question or questions is this metadata trying to answer? Related to this, does the way that the field for this metadata is defined and struc- tured, as well as the way that the metadata is entered in practice, enable it to answer that question or questions? Evolving needs and historical contingencies sometimes mean that the answer is no. For example, the path of evolution of subfields associated with dates of creation in 046 means that data in some subfields cannot be accurately interpreted without evaluating it in association with what other subfields are present. This makes the field much more difficult for catalogers to understand how to use correctly while also making it more complicated for systems to use. Useful facets require not only consistent, structured metadata, but also that the metadata be structured in such a way that it can either be used in facets in its raw form or be accurately transformed in as straightforward a way as possible. The unknown, the unknowable, the vague, and the inconsistent Many challenges for incorporating facets into library discovery interfaces are technical or could be solved with sufficient time, trained personnel, and funding, but some issues are less tractable. These include missing values, imprecise values, and inconsistent values. CATAloGinG & ClAssiFiCATion QuARTeRly 475 Certainly, no large database will ever be clean enough and complete enough to have perfect recall. Older bibliographic records are often missing data that would be common to find in more recent records. Even in contemporary records, data may be missing because it is impractical to spend the time and effort to identify it. In some cases, no amount of time or effort will uncover the correct value. There are other reasons for incom- plete coverage, such as cases where an appropriate subject heading has not yet been or cannot be established at the time of cataloging. Jahnke gives the example of Eve Sedgwick’s Epistemology of the Closet, which was written by a founder of queer theory, but was published before there was literary warrant or the idea had coalesced into a namable entity.66 With currently available resources and technology, it is impractical at best to later identify all these cases and go back to add the new information. However, fixing sizable gaps that are of clear interest to users, such as reliably identifying fiction, should be a priority. The messiness and fuzzy boundaries of the real world are often at odds with the clearly defined categories required for optimal facets. There has been much recent criticism in the library world of the practice of cate- gorizing gender as binary with sharp boundaries, but the problem of what philosophers call vague predicates pervades the categories used in bib- liographic metadata. As previously discussed, many named time periods have fuzzy boundaries. This is also true of places, classes of persons, languages and most other topics described by library metadata. Based on a project to map statements about responsibility in moving image records to standardized role designations, McGrath discusses additional situations where it is difficult to map information provided by a resource to clearly defined categories. For example, it may be difficult to interpret cases where language use has changed over time, such as the earlier use of the credit “art director” for what is now called “production designer.”67 Inconsistent cataloging practices or lack of inter-indexer consistency potentially has negative effects on facet usefulness. Cataloging practices have changed significantly over time. Newer records are often fuller and contain more structured metadata. Older records, vendor-created records, or records created according to minimal standards may lack useful meta- data. Metadata values reflect the information available at the time, as well as the prejudices and perspectives of the era. Different catalogers bring differing amounts of expertise, time, and inclinations to their work. For example, there is often a conflict between catalogers who take a maximalist approach and prefer to add any potentially relevant value and catalogers who emphasize precision. To take one example, the RDA content type “still image” is added inconsistently to records for books. Some catalogers add “still image” even if there is just one portrait on a frontispiece on the principle that images are present. Others include “still image” only if 476 K. MCGRATH the book contains a significant number of images that are topically rele- vant, such as in a graphic novel or book of photographs. When catalogers do not agree, the metadata, and any facets it generates, will answer neither the question “Does the book contain any images at all?” nor the question “Does the resource contain interesting, useful images related to the topic of the book?” Variation in cataloger judgment and practice is exacerbated by the fact that most values in controlled vocabularies used by libraries are subject to the paradox of the heap68 where there are inevitably situa- tions where there is not consensus about what value should be recorded in the metadata. LCSH and topics: a case study One of the most important and challenging types of information to incor- porate into facets in library discovery interfaces is topical or subject infor- mation. The largest and most widely used controlled vocabulary for subjects in library catalogs is LCSH. A number of characteristics of LCSH make it challenging to present as facet values. These include its origin as a pre-coordinated and nonsystematic vocabulary combined with an inability to easily deconstruct it into more granular facets and its historical use for recording information about both topical and non-topical aspects of a resource. One fundamental challenge is that LCSH was designed to be used in multifaceted, precoordinated strings that attempt to collocate all the sig- nificant aspects of what a resource is about. It was also designed to be browsed in a left-anchored, alphabetical list. Both the LCSH strings as a whole and the individual parts of the strings combine different types of information. Some of this is distinguished in MARC records by subfield coding, such as chronological aspects in $y and geographical aspects in $z and can be easily separated. In other cases, different kinds of infor- mation may be encoded in a single subfield, which makes it harder to separate. For example, “Waterloo, Battle of, Waterloo, Belgium, 1815” is recorded in a single topical 650 $a, but includes chronological and geo- graphical information that is not separately subfielded. Some topical subject headings in LCSH include prepositions that relate more than one term and conflict with the atomistic concepts that are optimal for faceted search. In some cases, compound terms seem to merely present synonyms, near synonyms or opposites (e.g., “Ambushes and surprises,” “Belief and doubt”). In other cases, they combine related terms that are different types of things, which should properly be separated for faceting (e.g., “Boats and boating,” “Collectors and collecting”). Structurally similar headings may also be used to present the relationship between two distinct things (e.g., “Age and sports,” “Artists and architects”). Conversely, parallel meanings CATAloGinG & ClAssiFiCATion QuARTeRly 477 may be represented by different structures (e.g., “Africa $x In motion pictures” vs. “African American cowboys in motion pictures”). This last example actually represents three concepts: African Americans, cowboys, and portrayal in movies. Young notes that a given structural pattern in LCSH does not always have the same meaning, which, in addition to potentially confusing users, impedes mapping to a more faceted presen- tation. She gives the example of “Children’s diaries,” which is used for diaries written by children and “Children’s films,” which is used for films made for children.69 “African Americans in motion pictures” (the portrayal of African Americans in movies) and “African Americans in the motion picture industry” (African Americans working in the movie industry) do not imply the same relationship between the two nouns. These concepts cannot be further combined into one long string, such as “African Americans in the motion picture industry in motion pictures,” but a thor- oughly faceted vocabulary should enable these sorts of novel combinations without requiring that they be precomposed. McGrath points out that “terms, such as ‘Cookery, Japanese’ or ‘Adult children of alcoholics, Writings of,’ that incorporate more than one facet or aspect of a concept reduce the power and flexibility of faceting by preventing users from limiting by the individual aspects separately.”70 This is further complicated by the fact that LCSH has historically been used to encode some non-topical information as well, such as genre and audience. Young writes that LCSH combines the topical, genre/form, creator, audience, and medium of perfor- mance facets in contradictory and sometimes unpredictable ways, and even headings that are similarly formatted may denote quite different facets. Those problems are only exacerbated by the fact that form headings can also usually be used as topics.71 There have been two main attempts to improve the suitability of LCSH for faceting. Both have as goals simplifying metadata creation and making it easier for users to discover library resources.72 One is OCLC’s develop- ment of FAST, which is derived from LCSH, but breaks it down into more post-coordinate categories, such as topical, geographic, chronological, and form/genre aspects. The other is the Library of Congress’s creation of several new vocabularies to accommodate non-topical information currently contained in LCSH. FAST is a largely post-coordinate vocabulary that can be assigned inde- pendently or automatically derived from LCSH strings. FAST takes advan- tage of MARC field and subfield codes to create nine separate facets for topics, personal names, corporate names, meetings, named events, uniform titles, chronological information, geographic areas, and form and genre.73 However, it does not merely perform a naïve mapping of the components 478 K. MCGRATH of LCSH strings to these categories, but rather transforms the data in a number of ways that make it clearer and more amenable for use in faceted search. The FAST Quick Start Guide notes that FAST introduces useful distinctions that are not made in LCSH, such as named events, which are described as “events associated with a particular date, and possibly a par- ticular geographic location, and that are well known by a recognized name,” such as particular battles or earthquakes.74 In some areas, such as its chronological facet, FAST introduces flexibility that is not available in LCSH. Time spans are recorded in FAST using explicitly coded beginning and end years. Chronological information can therefore be coextensive with the coverage of the resource, although in practice, much chronological information is automatically derived from more generic information found in time spans given in LCSH topical headings. Some relationships are more explicitly encoded in FAST. Geographic headings are given in indirect order and include the relationship to the larger place up to the country or state level. Geographic headings are given in a consistent form without abbreviations or inversion. FAST uses “Illinois--Chicago” rather than LCSH’s use of both “Illinois--Chicago” and “Chicago (Ill.).” This makes searching by place names more predictable. It also supports a certain amount of hierarchical access, but does not always include the country name and does not include the continent level. However, FAST authority records for geographical areas do include the MARC geographical area code that does map to higher levels and could be employed to generate a fuller hierarchy. FAST is not a completely post-coordinated vocabulary since it allows combination of terms from the same facet category. In particular, main topics and topical subdivisions continued to be pre-coordinated in FAST. This improves precision in some cases (e.g., “History--Philosophy” vs. “Philosophy--History”) while reducing flexibility in others. For example, the topical subdivision “Economic aspects” that follows topics remains precoordinated while the topical subdivision “Economic conditions” that follows places is given separately. In a more fully faceted system, these two topical subdivisions could probably be profitably combined into a single term. FAST also makes no effort to create separate categories for types of topical headings. For example, one can imagine that a separate facet for classes of persons would better support browsing for biographies or types of characters in literary works. In 2007, the Library of Congress began a project to develop LCGFT, a vocabulary to separately describe the form or genre of resources. Their intent is to remove form and genre from LCSH and record it only using LCGFT in a separate field designated for form and genre information. Form and genre terms remaining in LCSH would only be used for works about those forms and genres. As the Library of Congress worked on this CATAloGinG & ClAssiFiCATion QuARTeRly 479 project, they realized that other non-topical aspects of resources are recorded in LCSH that need to be moved elsewhere in order to limit LCSH and topical fields to topical content. The Library of Congress has since developed vocabularies for musical medium of performance and for demographic group terms that can be used to describe creators and intended audiences. McGrath describes a number of problems that arise when trying to use LCSH in a faceted search interface that are not necessarily resolved by FAST or the new Library of Congress vocabularies. For example, there are situations where aspects of an LCSH string are implicit, so there is no data to populate facets.75 Practices around implicit information in LCSH often have their origin in its roots as a vocabulary designed for left-an- chored, alphabetical browsing where shorter strings may be desirable and lead to less fragmentation. For example, the subject heading “National socialism” is used both for national socialism in general and national socialism in Germany as a whole. The subject heading is only geograph- ically subdivided for works about Nazism in smaller places within Germany (e.g., Berlin) or for works about allied countries, such as Austria. This means that even if the geographic subdivision is presented in a separate facet in a library discovery interface, not all the works about Nazism in Germany will include the term for Germany in this facet, resulting in incomplete recall. “African Americans--United States” is a cross-reference for “African Americans,” so a naïve automated system for splitting LCSH strings into multiple facets based on subfield data will not create a geo- graphic facet value for United States from “African Americans--History.” This means that a user who searches for a historical topic and then limits by United States in the geographic facet will not retrieve a resource based on this subject heading. Some topical subdivisions are also omitted on the basis that the topic is clear from the main heading. For example, the topical subdivision “Law and legislation” is not used under topics such as “Human rights.” In addition to metadata values that are not explicitly recorded because they were deemed not useful in left-anchored, alphabetical browsing, LCSH use in practice reduces recall in some areas. A prime example is a genre facet that includes “Fiction” from the $v genre/form subdivision that fol- lows headings for what a novel or short story is about. These kinds of headings are not found on most older records or on records where the work does not have an easily identifiable topical aspect. Even when the facet value is supplemented with information from the literary form fixed field in MARC 008, many older records or minimally coded records lack the correct fixed field coding. Chronological information can be difficult to extract from LCSH or may describe a less precise time span than the resource covers. McGrath 480 K. MCGRATH describes a number of situations where chronological information is not given explicitly in LCSH strings or where it may not be coextensive with the time period covered by the work.76 FAST is a significant improvement on LCSH here because time periods are always encoded separately as numbers, either as a single year or as the beginning and ending date of a time span. Some chronological information in LCSH is not subfielded separately and is not easily accessible to populate a facet. This situation is improved in FAST, where dates contained within topical headings have been mapped to explicit dates or date ranges, although FAST does not provide access to time spans shorter than a year. For example, “Chile Earthquake, Chile, 2010 (February 27)” maps to 2010. This mapping does not work when the date is not recorded or implied somewhere in an existing LCSH string, but could be recorded proactively at the time of metadata creation. It also remains difficult to map named or imprecise time periods with fuzzy boundaries to specific beginning and ending dates in a way that works for all search queries and resources. Nested, hierarchical facets are a good way to present multiple levels of geographic information. However, it is not always possible to identify the type and level of geographic entity being described in LCSH in an auto- mated way. In some cases, authority records could be used to identify the type of geographic entity and the related broader and narrower places to create a hierarchy. It might also be desirable for a geographic facet to include certain places that are not currently marked as places in LCSH. These are generally names that identify something that can both be a place and act as an agent. The Subject Headings Manual section H405, often colloquially known as “the division of the world,” provides guidance on how to treat these entities.77 Some of these, such as “Buckingham Palace (London, England),” do have broader terms that indicate the type of entity and its location. In this case, the broader term is “Palaces-- England.” Other examples do not have broader terms, but in some cases, such as “Museo Guggenheim Bilbao,” the type of entity and its location are encoded elsewhere in the authority record. Increasing the number of cases where entity types can be explicitly modeled could help improve this situation. As mentioned above, there are places where LCSH uses the exact same construction to mean more than one thing. Some of these have to do with geographic information and often reflect a conflation of nationality and place. For example, the subject heading “Prisoners of war” may be geographically subdivided. Unfortunately, the authority record instructs catalogers to use a place both to designate the current location of prisoners of war and the place of origin of the prisoners. Subject heading strings make no distinction between prisoners of war being held in France and French prisoners of war being held anywhere. CATAloGinG & ClAssiFiCATion QuARTeRly 481 Language, nationality, and ethnicity are also entangled in many literary headings in such a way that it is difficult for users to build a coherent mental model. For example, all of the following are legitimate LCSH strings: “Nigerian drama (English),” “English drama--Irish authors,” “American drama,” and “Hispanic American drama (Spanish).” For literary works themselves, the introduction of LCDGT and an associated MARC field for creator demographic terms promises the ability to clearly identify the nationality and ethnicity of authors. In combination with the expanded definition of MARC 041 $h to allow the recording of the original language of the work regardless of whether or not there is a translation involved, use of LCDGT will enable the disentangling of these concepts. However, it is less clear how to resolve the problem of disentangling and clarifying headings for resources consisting of criticism and other types of works about literature that will remain in LCSH. There are several decisions that must be made when incorporating LCSH facets into a library discovery system. First is the question of whether each heading should be displayed as a complete string or whether the headings should be split based on the subfield markers. If the headings are split, a further question is whether to combine them in a single topical LCSH facet or to divide them into separate facets for topic, time period, geographic focus, and genre/form. Although full LCSH strings are not designed for faceting, they may potentially increase precision. However, a significant drawback of presenting full LCSH strings is that it greatly increases the number of facet values while decreasing the number of records associated with each individual facet value. This means that the number of records that a user is exposed to through the top ten or twenty facet values is greatly reduced. On the other hand, if every subfield is mapped to a separate facet value, there is loss of accuracy. “Philosophy--History” (history of philosophy) is not the same as “History--Philosophy” (philosophy of history). McGrath suggested that many headings would be clearer to users (and catalogers) if the relation- ship between the parts of an LCSH string were made more explicit than the use of double dashes.78 Thus “United States $x Geography” could be displayed as “Geography of the United States” rather than “United States--Geography” and “Geography $z United States,” could be displayed as “Geography (discipline) in the United States” rather than “Geography--United States.” This is more difficult to do in situations where an identical string can mean more than one thing, but introducing a way to explicitly encode prepositions to show a relationship is a potential way to reduce ambiguity. However, it is not obvious how to make these rela- tionships clear to users when the individual terms are in different facets. When placed into separate topical and geographic facets, “United States $x Geography” and “Geography $z United States” look the same. 482 K. MCGRATH Recommendations for future work From the discussion above, it is possible to extract a number of recom- mendations for improvements to faceted search in library catalogs. These are listed below. Note that many of these recommendations could poten- tially be listed under more than one category. Computational efficiency Library discovery interfaces should maximize the efficiency of the under- lying systems that index and retrieve their metadata and related facets in order to increase the speed of response and expand the number of facets and facet values that can be presented to users. User interface design Usability testing and user needs analysis should be done to better deter- mine what facets users are interested in and what challenges they face when using facets. There should be more experimentation with and assess- ment of number and order of facets and facet values. Functionality should be developed to allow users to customize the order and number of facet values displayed. User studies should be performed to identify the most easily understood labels for facets and facet values. It would also be useful to investigate whether there are ways that Boolean operations with facets can be made more transparent and understandable for users. Where pos- sible, methods should be developed to present facets most likely to be relevant to the user’s search in the way that Amazon presents book-related facets if it detects that a user appears to be searching for a book title. For example, a search for Beethoven could highlight music-related facets. Interfaces should be developed that allow users to take advantage of the hierarchical structure of many controlled vocabularies. Exploratory search and recall Library discovery interfaces could better support exploratory search and browsing by allowing users to select facet values without first doing a keyword search, i.e., search-free browsing. It should be possible for users to obtain a complete list of facet values in a particular facet that are related to their search. A user looking for novels should be able to get a comprehensive list of genres or creator demographics not just the top twenty results. Users should be able to remove keywords while retaining the facet values that they have selected. This will enable them to do a keyword search to find relevant facet values and then remove the keywords CATAloGinG & ClAssiFiCATion QuARTeRly 483 in order to get complete recall for the facet value. There should be inves- tigation into what facets and vocabularies are most useful for search-free browsing both for a general-purpose interface and for specialized views focused on particular types of resources, such as music, moving images, or literature. Multiple entity types This is a thorny problem. First of all, the metadata must be structured in such a way that information is clearly associated with specific entities in a consistent, machine-actionable way. Although existing systems take little advantage of them, authority records for persons and corporate bodies could be leveraged for this purpose. However, much work remains to be done before information related to the parts of the WEMI stack or for aggregated and aggregating works can be cleanly specified in this way. When properly structured metadata is available, faceted search interfaces that utilize this information in a more accurate manner should be developed. Heterogeneity of library metadata and resources Work should be done to integrate values in facets that will be drawing on multiple vocabularies in order to present users with a more coherent list that minimizes redundancy and maximizes variety and relevance. This may include both vocabulary mapping work and system design. Experimentation should be done with customized interfaces for subsets of library resources, as well as developing systems that select the most relevant facets for display based on a user’s search. Cross-references and facets Systems should allow users to remove their initial keyword search term or terms once they have identified the relevant controlled vocabulary term as a facet. There should be user studies and system design experimentation to identify ways to handle mismatches between the atomistic nature of facets (e.g., a list of individual instruments) and the phrases that users might be seeking (e.g., string quartets, piano trios). Coverage, recall, and retrospective metadata enhancement Tools that make it easier for catalogers to add structured data should be developed and improved. Cooperative projects, such as the work of ALA’s SAC Subcommittee on Faceted Vocabularies, to identify and implement 484 K. MCGRATH strategies and processes to retrospectively enhance bibliographic metadata should be undertaken. Other metadata issues Fields intended to populate facets should be examined to make sure that they meet the conditions of being clearly defined, mutually exclusive, and collectively exhaustive inasmuch as is possible. The fields and facet values should have clearly defined operational definitions that lead to consistent application. The definitions and metadata structure should unambiguously answer the question or questions that they are intended to answer. Facets and facet values should undergo user testing to make sure that they are easily understood and meet relevant information needs. Values should be recorded in a way that makes them easy to use as facets without complex processing. Acknowledgments The author is grateful to Chew Chiat Naun, Casey Mullin, and Adam Schiff for helpful and insightful comments on a draft of this article. ORCID Kelley McGrath http://orcid.org/0000-0002-5524-6417 Notes 1. Daniel Tunkelang, Faceted Search (San Rafael, CA: Morgan & Claypool Publishers, 2009). 2. Kate Moran, “The State of Ecommerce Search,” Nielsen Norman Group, June 24, 2018, https://www.nngroup.com/articles/state-ecommerce-search. 3. Karen Coyle, “KO is KO’d,” Coyle’s Information, January 10, 2023, https://kcoyle.blogspot. com/2023/01/ko-is-kod.html. 4. Mia Massicotte, “Improved Browsable Displays for Online Subject Access,” Information Technology and Libraries 7, no. 4 (1988): 373–80. 5. Marek Nahotko, “Knowledge Organization Affordances in a Faceted Online Public Access Catalog (OPAC),” Cataloging & Classification Quarterly 60, no. 1 (2022): 86–111, doi: 10.1080/01639374.2021.2015734. 6. Tunkelang, Faceted Search. 7. Kathryn Whitenton, “Filters vs. Facets: Definitions,” Nielsen Norman Group, March 16, 2014, https://www.nngroup.com/articles/filters-vs-facets. 8. “Library of Congress Genre/Form Terms,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/aba/publications/FreeLCGFT/freelcgft.html. 9. “Library of Congress Medium of Performance Thesaurus for Music,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/aba/publications/FreeLCMPT/freelcmpt.html. CATAloGinG & ClAssiFiCATion QuARTeRly 485 10. “Library of Congress Demographic Group Terms,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/aba/publications/FreeLCDGT/freelcdgt.html. 1 1. Tunkelang, Faceted Search, 48. 12. Ibid. 1 3. Ibid. 14. Daniel Tunkelang, “Facets of Faceted Search,” Query Understanding, November 23, 2020, https://medium.com/@dtunkelang/facets-of-faceted-search-38c3e1043592. 15. Tunkelang, Faceted Search. 16. Ibid. 1 7. “Facets,” Ex Libris Knowledge Center, accessed March 1, 2023, https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Primo/Back_ Office_Guide/100Facets. 1 8. Tunkelang, Faceted Search, 52. 19. Chels Upton, “The Backlash Against America’s Most Popular Novelist Is Way Less Satisfying Than I’d Hoped,” Slate, February 2, 2023, https://slate.com/culture/2023/02/ colleen-hoover-domestic-violence-ends-with-us.html. 2 0. Tunkelang, Faceted Search, 65. 21. Ibid., 66. 22. Dana McKay, George Buchanan, and Shanton Chang, “It Ain’t What You Do, It’s the Way That You Do It: Design Guidelines to Better Support Online Browsing,” Proceedings of the Association for Information Science and Technology 55, no. 1 (2018): 347–56, doi: 10.1002/pra2.2018.14505501038. 2 3. Bill Kules and Robert Capra, “Creating Exploratory Tasks for a Faceted Search Interface,” in Proceedings of 2nd Workshop on Human–Computer Interaction, 2008, 18–21. 2 4. McKay, Buchanan, and Chang, “It Ain’t What You Do, It’s the Way That You Do It,” 355. 25. Kelley McGrath, “Facet-Based Search and Navigation with LCSH: Problems and Opportunities,” The Code4Lib Journal, no. 1 (2007). 2 6. McKay, Buchanan, and Chang, “It Ain’t What You Do, It’s the Way That You Do It.” 27. Kelley McGrath, Bill Kules, and Chris Fitzpatrick, “FRBR and Facets Provide Flex- ible, Work-Centric Access to Items in Library Collections,” in Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, 2011, 49–52, doi: 10.1145/1998076.1998085. 2 8. Tunkelang, Faceted Search, 54. 2 9. IFLA Study Group on the Functional Requirements for Bibliographic Records, “Functional Requirements for Bibliographic Records: Final Report, As Amended and Corrected through February 2009” (International Federation of Library Asso- ciations and Institutions, February 2009), https://repository.ifla.org/handle/ 123456789/811. 3 0. Pat Riva, Patrick Le Bœuf, and Maja Žumer, “IFLA Library Reference Model: A Conceptual Model for Bibliographic Information, As Amended and Corrected through December 2017” (Den Haag, IFLA, January 2018), https://repository.ifla.org/ handle/123456789/40. 31. Tunkelang, Faceted Search, 54. 32. Music Discovery Requirements Update Task Force, “Music Discovery Requirements,” Version 2 (Music Library Association, August 2017), https://www.musiclibraryassoc. org/resource/resmgr/mdr/MusicDiscoveryRequirements2.pdf. 33. Jerry L. McBride, “Faceted Subject Access for Music through USMARC: A Case for Linked Fields,” Cataloging & Classification Quarterly 31, no. 1 (2000): 15–30, doi: 10.1300/J104v31n01_03. 486 K. MCGRATH 34. Kelley McGrath, “Will RDA Kill MARC?” (American Library Association Midwinter Meeting, San Diego, CA, January 8, 2011), http://hdl.handle.net/1794/23939. 3 5. Riva, Le Bœuf, and Žumer, “IFLA Library Reference Model.” 3 6. IFLA Study Group on the Functional Requirements for Bibliographic Records, “Func- tional Requirements for Bibliographic Records.” 37. Riva, Le Bœuf, and Žumer, “IFLA Library Reference Model,” 21-27. 38. “RDA Toolkit,” ALA Publishing, accessed March 1, 2023, https://access.rdatoolkit.org. 39. “BIBFRAME 2 List View,” Library of Congress, accessed March 1, 2023, https://id.loc. gov/ontologies/bibframe.html. 40. “Share-VDE Model (Simplified Version),” Casalini Libri, accessed March 1, 2023, https://docs.google.com/presentation/d/1cTf6UC_wSj-C43OxGj0du47HwOGl8FQbVL VF3goNUG8/edit#slide=id.g18c8243e708_0_0. 4 1. McGrath, Kules, and Fitzpatrick, “FRBR and Facets Provide Flexible, Work-Centric Access to Items in Library Collections.” 42. Karen Coyle, “Works, Expressions, Manifestations, Items: An Ontology,” The Code4Lib Journal, no. 53 (2022), https://journal.code4lib.org/articles/16491. 43. Ibid. 44. “Extended Date/Time Format (EDTF) Specification,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/standards/datetime. 45. “MARC Proposal No.: 2014-06: Defining New Field 388 for Time Period of Creation Terms in the MARC 21 Authority and Bibliographic Formats,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/marc/mac/2014/2014-06.html. 46. Joan Neuberger, “Not a Film but a Nightmare: Revisiting Stalin’s Response to Eisen- stein’s Ivan the Terrible, Part II,” Kritika 19, no. 1 (2018): 115–142, doi: 10.1353/ kri.2018.0005. 47. Daniel N. Joudrey, Arlene G. Taylor, and David P. Miller, Introduction to Cataloging and Classification, 11th ed. (Santa Barbara, CA: Libraries Unlimited, 2015), 679. 48. Jung-Ran Park, “Metadata Quality in Digital Repositories: A Survey of the Current State of the Art,” Cataloging & Classification Quarterly 47, no. 3–4 (2009): 213–28, doi: 10.1080/01639370902737240. 49. “Dutch Basic Classification,” BARTOC, last modified July 14, 2021 https://bartoc.org/ en/node/745. 50. Xiaoli Ma, “One Concept, One Term, Good Practice but How to Achieve? – Im- proving Facet Values Quality for Samuel Proctor Oral History Collection, Hosted by the University of Florida Digital Collections,” Journal of Library Metadata 22, no. 3–4 (2022): 167–83, doi: 10.1080/19386389.2022.2096385. 51. “New OCLC Music Toolkit for Generating Faceted Music Data,” Music Library Associ- ation Cataloging and Metadata Committee, April 20, 2018, https://cmc.wp.musiclibraryassoc. org/2018/04/20/new-oclc-music-toolkit-for-generating-faceted-music-data. 52. Kelley McGrath and Lesley Lowery, “Getting More out of MARC with Primo: Strat- egies for Display, Search and Faceting,” The Code4Lib Journal, no. 41 (2018), https:// journal.code4lib.org/articles/13600. 53. Kimmy Szeto, “Ontology for Voice, Instruments, and Ensembles (OnVIE): Revisiting the Medium of Performance Concept for Enhanced Discoverability,” The Code4Lib Journal, no. 54 (August 29, 2022), https://journal.code4lib.org/articles/16608; Deborah Lee and Lyn Robinson, “The Heart of Music Classification: Toward a Model of Classifying Musical Medium,” Journal of Documentation 74, no. 2 (March 12, 2018): 258–77, https://doi.org/10.1108/JD-08-2017-0120; Deborah Lee, “Numbers, Instru- ments and Hands: The Impact of Faceted Analytical Theory on Classifying Music Ensembles,” Knowledge Organization 44, no. 6 (2017): 405–15. CATAloGinG & ClAssiFiCATion QuARTeRly 487 54. McGrath and Lowery, “Getting More out of MARC with Primo.” 55. Music Discovery Requirements Update Task Force, “Music Discovery Requirements.” 56. Tunkelang, Faceted Search. 5 7. McGrath, “Facet-Based Search and Navigation with LCSH.” 58. Elaine Svenonius, “LCSH: Semantics, Syntax and Specificity,” Cataloging & Classifi- cation Quarterly 29, no. 1–2 (2000): 17–30, doi: 10.1300/J104v29n01_02,22. 59. Michael Buckland et  al., “Mapping Entry Vocabulary to Unfamiliar Metadata Vo- cabularies,” D-Lib Magazine 5, no. 1 (January 1999), https://doi.org/10.1045/january99- buckland. 60. Daniel Tunkelang, “The 3 Rs of Search: Relevance, Recall, and Ranking,” Query Understanding, December 21, 2020, https://dtunkelang.medium.com/the-3-rs-of-search- relevance-recall-and-ranking-c9a785578653. 6 1. “New OCLC Music Toolkit for Generating Faceted Music Data.” 62. ALA Core Subject Analysis Committee, Subcommittee on Faceted Vocabularies, “Retrospective Implementation of Library of Congress Faceted Vocabularies: Best Practices for Librarians and Programmers,” last updated March 25, 2022, http://hdl.handle.net/11213/17998. 63. Casey A. Mullin, “Iteration, Not Perfection: The ‘Long Game’ of Retrospective Im- plementation of Faceted Vocabularies” (IFLA Subject Analysis and Access webinar: “Fascinating Facets,” May 19, 2022), https://cdn.if la.org/wp-content/ uploads/2CaseyMullin_IFLA-Webinar-220519-Mullin.pdf 64. “H 1775: Literature: General,” in Subject Headings Manual (Washington, DC: Library of Congress, 2015), https://www.loc.gov/aba/publications/FreeSHM/H1775.pdf 65. “Guidelines for the Use of ISO 639-3 Language Codes in MARC Records,” (Program for Cooperative Cataloging, January 12, 2023), https://loc.gov/aba/pcc/scs/documents/ ISO-639-3-guidelines.pdf. 66. Lori M. Jahnke, Kyle Tanaka, and Christopher A. Palazzolo, “Ideology, Policy, and Practice: Structural Barriers to Collections Diversity in Research and College Librar- ies,” College & Research Libraries 83, no. 2 (March 3, 2022): 166, doi: 10.5860/ crl.83.2.166. 67. Kelley McGrath, “Ostriches, Minotaurs, Ghosts and Fossils in the Brave New Meta- data World: Categorization & Linked Data” (Online Northwest, Portland, OR, May 31, 2017), http://hdl.handle.net/1794/23941. 68. “Sorites paradox,” Stanford Encyclopedia of Philosophy, last modified March 26, 2018, https://plato.stanford.edu/entries/sorites-paradox/. 6 9. Janis L. Young, “Unlimited Opportunities for Enhanced Access to Resources: The Library of Congress’ Faceted Vocabularies” (Subject Access: Unlimited Opportunities, Columbus, Ohio, USA, 2017), http://library.ifla.org/2074/. 70. McGrath, “Facet-Based Search and Navigation with LCSH.” 7 1. Young, “Unlimited Opportunities for Enhanced Access to Resources,” 3. 72. Rebecca J. Dean, “FAST: Development of Simplified Headings for Metadata,” Cataloging & Classification Quarterly 39, no. 1/2 (2004): 331–52, doi: 10.1300/J104v39n01_03; Young, “Unlimited Opportunities for Enhanced Access to Resources.” 73. Chew Chiat Naun, Kerre Kammerer, Kim Mumbower, and Dean Seeman of the FAST Policy and Outreach Committee, “FAST Quick Start Guide,” (OCLC, April 2022), https://www.oclc.org/content/dam/oclc/fast/FAST-quick-start-guide- 2022.pdf 7 4. Ibid., 13. 75. McGrath, “Facet-Based Search and Navigation with LCSH.” 76. Ibid. 488 K. MCGRATH 77. “H 405: Establishing Certain Entities in the Name or Subject Authority File,” in Subject Headings Manual (Washington, DC: Library of Congress, 2021), https://www. loc.gov/aba/publications/FreeSHM/H0405.pdf. 78. McGrath, “Facet-Based Search and Navigation with LCSH.” 79. “046 – Special Coded Dates,” Library of Congress, accessed March 1, 2023, https:// www.loc.gov/marc/bibliographic/bd046.html. 80. “Proposal No.: 2002-03: Expanding Field 046 for Other Dates in the MARC 21 Bibliographic Format,” Library of Congress, accessed March 1, 2023, https://www. loc.gov/marc/marbi/2002/2002-03.html. 81. “Best Practices for Cataloging DVD-Video and Blu-ray Discs Using RDA and MARC21,” Version 1.1, OLAC, accessed March 1, 2023, https://cornerstone.lib.mnsu. edu/olac-publications/4. 82. “MARC Proposal No.: 2013-07: Defining Encoding Elements to Record Chronolog- ical Categories and Dates of Works and Expressions in the MARC 21 Bibliograph- ic and Authority Formats,” Library of Congress, accessed March 1, 2023, https:// www.loc.gov/marc/marbi/2013/2013-07.html. 8 3. “MARC Proposal No.: 2016-03: Clarify the Definition of Subfield $k and Expand the Scope of Field 046 in the MARC 21 Bibliographic Format,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/marc/mac/2016/2016-03.html. 84. “MARC Proposal No.: 2021-06: Accommodating Work and Expression Dates, and Related Elements, in Bibliographic and Authority Field 046,” Library of Congress, accessed March 1, 2023, https://www.loc.gov/marc/mac/2021/2021-06.html. 85. Best Practices for Recording Faceted Chronological Data in Bibliographic Records, Version 1.0 http://hdl.handle.net/11213/16710 Appendix  The tangled history of MARC field 046 and work creation dates The 046 field was originally created to record dates that could not be accommodated in the date fixed fields, such as BCE dates.79 Additional subfields were added in 2002 for the pur- pose of recording data about internet resources.80 Moving image catalogers long wanted a place to unambiguously record the original release date of a film, which is important to users and often unrelated to the date of publication. The existing subfield 046 $k (Beginning or single date created) was repurposed to meet this need. For some time, it was informally recommended in the audiovisual cataloging community. It was first officially recommended in 2017 in OLAC’s “Best Practices for Cataloging DVD-Video and Blu-ray Discs Using RDA and MARC21.”81 Meanwhile in 2013, ALA’s SAC Subcommittee on Genre/Form Implementation proposed two new subfields for a related but different use case.82 These sub- fields, $o and $p, are intended to encode the beginning and ending dates of aggregated content where the genre or form is described using LCSH strings, such as “Operas $y Eighteenth century.” This supports the Library of Congress’s plan for disaggregating non-top- ical information traditionally found in LCSH. Later, the MARC documentation was modified to adjust some wording that seemed to conflict with effectively using $k (date created) to record the date of the work.83 The original definition stated that dates recorded in $k could not be recorded elsewhere in the same record. This worked for the original use case for “a data element for creation date not recorded elsewhere” but is incompatible with unambigu- ously recording the original date of the work when it is the same as the date of the mani- festation. Finally, in 2021 the 046 was again modified to incorporate new indicators that distinguish between dates associated with a work and dates associated with an expression.84 CATAloGinG & ClAssiFiCATion QuARTeRly 489 When aggregates are introduced, dates become even more complicated. In addition to the publication date of the manifestation, there are expression and work dates associated with both the aggregating work and expression and the individual works and expressions that are being aggregated. Although the 046 field has been modified and expanded in order to support distinctions between work and expression dates, as well as between dates asso- ciated with the aggregating work and the works being aggregated, this was not designed into the field as it was constructed. The result is not intuitive for catalogers to apply. It has also led to recommended practices changing over time. This evolutionary legacy compli- cates and compromises the reliability of machine interpretation of this data. At the time that OLAC began recommending the use of 046 $k to record the date of the work for moving images, they only anticipated using this data for the date of work of the movie or movies contained in the resource and not for the aggregating work. The OLAC documentation recommended, and continues to recommend, coding dates for in- dividual works that are part of compilations either individually in multiple $k or as a range in $o and $p. This is not a problem, so long as only the dates of the individual works are recorded. However, as interest in the 046 field expanded for other uses, such as music or literature, some catalogers began to record the date of the aggregating work in $k and the dates of the aggregated works in $o and $p. This meant that $k began to be used for two purposes. If there is only a single work, the date of the work is recorded in 046 $k. However, in the case of an aggregate if a cataloger wants to record the original date of the aggregating work, 046 $k will contain the date of the aggregating work. Users are likely to find it confusing to have these two types of dates mixed up in a single facet. A user seeking twentieth century poetry will probably be unhappy if an anthology of seventeenth century poetry published in 1995 comes up in their results. In response to this problem, the ALA SAC Subcommittee on Faceted Vocabularies rec- ommended always coding the dates of aggregated works and expressions in $o and $p, even if only a series of single dates rather than a range is being recorded and even if the date of the aggregating work is not being recorded.85 However, two unresolved problems remain. For a date of creation of the work facet that only includes the contents of resources and not aggregating works, $k should only be included conditionally, which requires pre- processing tools that not all systems have. In addition, if catalogers ever use $k for an aggregating work without corresponding $o for the aggregated works, there is no possible logic to distinguish it from the date of creation of a single work. Specific systems may have additional limitations. For example, it is not possible to make the necessary logic work in Primo VE if $k and $o are recorded in different instances of field 046. Table A1 shows common practices and recommendations. Although the dates of aggre- gated works ($o and $p) can be consistently interpreted, dates in $k and $l cannot. This greatly increases the preprocessing required to generate a coherent facet. In order to exclude the dates of aggregating works, it is necessary to only include single dates or ranges of dates of creation ($k and $l) when the dates of any aggregated works ($o and $p) are not present. The ability to do this depends on the affordances of a particular system. For example, in Primo VE, this is possible if all the relevant subfields are in the same instance of field 046. However, it turns out to be impossible to do this in the situation given on line four of the table where $k and $o are in separate instances of field 046. It is also impossible to distin- guish the situation where only a single date of creation is reported in 046 $k (line 2 of the table) and the situation where only the date of the aggregating work is recorded and thus only 046 $k exists (line 6 of the table). It might almost be better to abandon field 046 for this purpose and record numeric dates in 388 where the distinction between aggregating and aggregated works is made more clearly. It would be necessary to add an indication of whether the date or dates apply to a work or expression to field 388, though. 490 K. MCGRATH Table A1. Comparison of 046 original date coding practices for a single work and an aggregate. Common practice olaC best when including ssfV practices aggregating work recommendations single work Date of work: 1995 $k 1995 $k 1995 $k 1995 aggregating work (compilation): 2023 aggregated work: 1983 aggregated work: 1995 aggregated work: 2008 Work dates in an aggregate recorded separately $k 1983 $k 2023 $k 2023 $k 1995 $o 1983 $o 1983 $k 2008 $o 1995 $o 1995 $o 2008 $o 2008 Work dates in an aggregate recorded as a range $o 1983 $k 2023 $k 2023 $p 2008 $o 1983 $o 1983 $p 2008 $p 2008 aggregated works only $k 1983 $k 1983 $o 1983 $k 1995 $k 1995 $o 1995 $k 2008 $k 2008 $o 2008 aggregating work only $k 2023 $k 2023