Tag Wrangling for Fiction

The problem of fiction in library cataloging

Though fiction makes up a significant portion of libraries’ circulation, it gets short shrift in cataloging due to the inherent difficulty of determining the subject matter of fiction. Characteristics that are not obvious from novels’ covers or marketing materials often don’t get cataloged.

Another problem is the lack of a detailed subject heading vocabulary for fiction. A proliferation of third-party reader’s advisory databases have stepped in to fill this gap, leaving the library metadata landscape littered with proprietary and incompatible systems.

Reader’s advisory literature focuses on affective characteristics–moods, pacing, and appeal terms. Within reader communities, we have observed that requests for recommendations often focus on more objective facets: characters’ ages, specific settings, and cross-genre tropes such as friends-to-lovers romance or heist plots. This data, we think, lends itself to crowdsourcing.

While some library cataloging systems have incorporated user tagging in the past, none of them have supported tag curation or considered it integral to the tagging process in the way AO3 does, and the messiness of unfiltered user tags has left many people in the library world with unfavorable impressions of crowdsourced tagging as a useful source of metadata.

When we talked about fiction in libraries with people who were also familiar with fan fiction, all of them said, “I wish library search was more like AO3!” We set out to demonstrate what that might look like.

What’s AO3?

The Archive of Our Own (AO3 for short) is a fanfiction archive built as an open source software project by its own users (thus the name, with a nod to Virginia Woolf). Its tagging and search system is legendary. Writing about the system for Wired, Gretchen McCulloch says, “When I tweeted about AO3’s tags a while back, I received many comments from people wishing that their professional tagging systems were as good, including users of news sites, library catalogs, commercial sales websites, customer help-desk websites, and PubMed (the most prominent database of medical research).“

AO3 employs a unique hybrid tagging system in which volunteer tag wranglers create a thesaurus out of user-submitted tags, arranging preferred terms into hierarchies and adding synonyms. Tag wranglers volunteer to manage the tags in fandoms they know well, serving as domain experts in niche fields.

We believe a vocabulary supported by a tag wrangling system would transform search and discovery for fiction in libraries.

Our project: crowdsourced tagging demo

We are building a demo site where we will collect user tags for novels and assemble them into a thesaurus. Our workflow will differ from AO3’s in several significant ways due to the shift from fanfiction to published works and the more limited scope of our Capstone project, but we hope to show a proof of concept that demonstrates how a hybrid system of user tags would improve search and discovery for fiction.

We recognized that none of the English vocabularies for fiction were sufficiently detailed for our purposes–neither LCSH nor LCGFT include tropes, for example–so we are building a new vocabulary based on our own research into terms for subgenres, tropes, character traits, and settings. We will seed our demo site with these terms and encourage our testers to build on it. Behind the scenes, we will apply the tag wrangling model to incorporate their contributions into the vocabulary. Our Capstone presentation will include both the new fiction vocabulary and the tag wrangling workflow behind it.

Future research

Questions that will not be answered by our project:

How would domain experts be assigned? Genres are much broader than fandoms–and that’s not even getting into the question of nonfiction.
Would tags be submitted by library staff only, or would a volunteer training program need to be created? How much time would be required of staff?
Would tags be managed on a per-library basis, or centrally through a vendor or consortium? How might we centralize the vocabulary into one community-maintained project?
Readers often consider author demographics when choosing books, but recording this data in authority files is fraught with ethical questions. What characteristics might reasonably be added, and how could citations be recorded to justify their inclusion?
How might crowdsourcing be used to supply metadata other than subject terms in library records?

Additional reading

Ludi Price’s “Fandom, Folksonomies and Creativity: the case of the Archive of Our Own” offers a brief introduction to the tag wrangling process.

The AO3 workflow is described in more detail in Julia Bullard’s dissertation and her paper “Curated Folksonomies,” with diagrams taken from the tag wranglers’ internal documentation. (While the wranglers’ general guidelines are public, the training documentation is not.) Bullard also has a paper on tag wrangling volunteers’ motivations that may be useful in thinking through potential library implementations.

Among the available controlled vocabularies for fiction in English, the NoveList Guide to Story Elements is the closest to our ideal. We have also drawn from the Video Game Metadata Schema, created by the GAMER Group at the UW iSchool.

The problem of fiction in library cataloging

What’s AO3?

Our project: crowdsourced tagging demo

Future research

Additional reading

How does this work?