Professor Michelle Kazmer
Analysis of Class Folksonomy
Chapter 10: Systems For Vocabulary Control
As Taylor and Joudrey (2009) point out in the beginning of this chapter, “…there is evidence that people writing about the same concepts often do not use the same words to express them, and people searching for the same concept do not think of the same words to search for it,” (p. 333). With a class of 30 students, all in the same branch of study (similar audience) and assigned to the same tasks (similar information needs) it is interesting to see how the creation of a “flat classification system, using tags as descriptors” or folksonomy reflects the truth in that statement (Kakali and Papatheodorou, 2010, p. 192).
Different Frames of Mind. Each one of us had a distinct motivation and semantic framework used to help identify the characteristics of a specific journal article. Our motivation for the terms chosen to describe each article encompassed a wide range of areas from the need to describe the aboutness of an article, to following an instruction, relying on the originating source’s controlled vocabulary terms, or simply making the article personally discoverable through a set of terms that are most familiar to us. All of this funnels down to what descriptors or tags we choose to relate our mental concept of the item to that of the actual resource (Guy and Tonkin, 2006).
Controlled Vocabulary versus Folksonomy. Is our class set of descriptors a controlled vocabulary? No. A controlled vocabulary requires a list of terms to have structure and relation to one another and may have some form of authority over those terms. Kakali and Papatheodorou (2010) make it clear that a group of tags has “no authority control, nor are there selection criteria and instructions for tag generation” (p. 192). In other words no one is responsible for what tag descriptor is linked to what item/concept (with the exception of the Required Descriptors). Since the tags lack authority do they maintain structure? Taylor and Joudrey (2009) divide controlled vocabularies into three groups: subject heading lists, thesauri, and ontologies. Subject heading lists and thesauri are made up of a hierarchy of terms that are broader or narrower and in some way related to one another. They also require authority control over terms used to represent a concept. Ontologies do not use an “authorized term” but do carry a hierarchical structure (p. 334). Therefore, our class descriptors remain a folksonomy.
Collectively, the class created a total of 255 unique descriptors (See Appendix 1.) used to describe 135 articles allocated from the FSU Library web site as part of Contribute Assignments 1,2, 4 and 5. For each article, the student was instructed to include 6 descriptors, 3 of which were required as part of the assignment: paper name (Resource and Description or Subjects), assignment name (Contribute 1, 2, 4, or 5) , and z-name descriptor (to identify the student who uploaded the article). The 3 remaining descriptors were chosen by the student.
Tag Morphology. Reviewing all 255 unique descriptors there were key variations in the term make-up. There were 190 singular terms and 65 plural terms. Word phrases accounted for 204 of the terms with only 51 single terms. Abbreviations made up 26 total terms on the list.
Methodology. To best illustrate which descriptors were used most often and under what circumstances, a word cloud was created using the infogr.am site. All 255 unique descriptors were copy-pasted from RefWorks onto a Numbers spreadsheet. Each descriptor was linked to the number of times it was used to tag articles in the ReWorks’ bibliography list. The spreadsheet listed five sub-sheets titled: All Descriptors, Required Descriptors, Resource and Description Descriptors, Subject Descriptors, and Tag-Related Descriptors. It was then exported as an XLS file and imported to the infogr.am site as a data set. Once uploaded, the word cloud was published as an interactive data visualization tool.
All Descriptors. For the bird’s eye view of the list of descriptors, all 255 unique descriptors were uploaded to a word cloud (See Figure 1.). From here we can view Golder and Huberman’s (2005) “collective sensemaking” at work, whereby the students’ collaboration to create a large classification system demonstrates both “idiosyncratically personal categories” and those “widely agreed upon” (p. 201).
The most used terms were the most visible and prominent in the cloud. The lesser used terms were harder to discern. The majority of terms were only mentioned once and account for most of the cloud’s make up. The Z-Name descriptors were not incorporated into this word cloud since this was the only descriptor used for tagging the student with the submitted article, rather than to describe the actual article.
Required Descriptors. Of all the descriptors, this set of terms was the most prominent (See Figure 2.). The reasoning behind this of course is because without these descriptors, one could not receive full credit for the assignment. These descriptors also played an important role in discoverability, since this helped the professor view how who submitted an article reference (Z-Name Descriptors) and for what assignments (Resource and Description, Subjects Descriptors and Contribute Descriptors).
Resource and Description & Subjects Descriptors. If a controlled vocabulary hierarchy was created from this folksonomy, then these word clouds provide good examples for illustrating the difference between broader, narrower and related terms (See Figure 3 and Figure 4). The authority term would be the largest term found in the cloud (Resource and Description and Subjects), with smaller text listed as narrower or related entry terms.
There is also a visual difference between both word clouds. The Resource and Description Descriptors word cloud had more unique terms than in the Subjects Descriptors word cloud. It can be deduced that in the first paper assignment students relied more on their own lexicon than in the second paper assignment. At that time, students would have been inclined to take advantage of ”the social tagging aspect of tagging services…as a kind of feedback mechanism for the folksonomy” whereby one person “observes how others have tagged a resource, [and] are more likely to adopt a similar tagging vocabulary” for similar resources (Sinclair & Cardew-Hall, 2008, p. 16). Once a term was created as part of the first assignment, and as users got used to the tagging mechanics more reliance was placed on others and the system, increasing the likelihood that terms could be utilized more than once.
Tag-Related Descriptors. Tagging tags about tagging is about as much meta as one can get in an LIS class (See Figure 5). However, this word cloud shows the difference that word order has on descriptors, especially when defining the same exact concept. The term “Tags (Metadata)” was utilized 16 times, while its sibling term “metadata tags” was used 1 time. Inverted versus direct order sequences is part of the ongoing challenge when constructing terms that help make up a controlled vocabulary schema. Most institutions are doing away with terms in inverted order since evidence shows that users prefer direct order when searching by key terms (Taylor and Joudrey, 2009, p. 338). However, it is interesting to note the opposite in this example.
Subject Access Systems: A Comparison
Similar to the RefWork tagging scheme, many social tagging sites utilize what is called a “flat folksonomy” one that lacks structure or a hierarchy, where “users are free to attach a tag (or tags) to their web content…according to their own needs…[where] no predefined tags exist to categorize the content,” (Yoo, Choi, Suh & Kim, 2013, p.594). User-generated tagging schemes are flexible for the user in that they allow the user to create tags suitable for various purposes, as seen in the RefWorks descriptor schema. Below are examples of three different subject access systems that share similar characteristics to RefWorks insofar as its ability to help construct user folksonomies.
Subject Access System – CiteULike
CiteULike is quite similar to RefWorks in their intended purposes. Both sites allow users to import article citations to create large bibliographies. RefWorks requires a paid subscription and/or institutional access while CiteULike is free.
All tags are user-generated and can be used to define the aboutness of an item, pull key terms, optimize discoverability for the user with a large library of article references, or bookmark articles under a series (e.g., “lis5703paper”). Unlike RefWorks, however, CiteULike does not adhere very well to “tagging as is”. Rather the user uploads a new article citation and then submits a list of descriptor tags that are then hyperlinked back to the article in alphabetical order (See Figure 6.). In other words, the system over thinks how tags are displayed. It effectively splits tags that are used as word phrases, unlike RefWorks which keeps word phrases together. Word phrases like “controlled vocabularies” and “recommendation engines” used to describe Lev Grossman’s Time Magazine article on Pandora and Netflix are reorganized and recategorized under the single terms “controlled” or “vocabularies”, “recommendation” or “engines”. The system benefits most to the user through single term tagging which is simplistic in nature, but limited in scope to similar citation management tools like RefWorks.
Subject Access System – Instagram
Whereas RefWorks and CiteULike are citation systems for academic publications, Instagram has become the social tagging phenomenon with a panoramic view of the world. Instagram also incorporates a flat folksonomy through it’s collection of tags known as “hashtags”. Users take photos and upload them to the site and in order to gain visibility or popularity (an argument can be made for both) of that image. These tags or hashtags as referred to on the site, use a pound sign (“#”) and can be written as single terms and/or word phrases.The user then tags the photo with any number of what Lawson (2009) would classify as “objective tags” that describe the content like “#beach” for when photographing the shoreline or “subjective tags” that are not content related like “#wishyouwerehere” for the same photo (p. 577). There are even tag generator apps such as “TagsForLikes” which help to establish tag consistency by the creation of a predetermined set of hashtags dependent upon the subject matter of the photo or a desire to increase followers on the site.
Subject Access System – Pandora
Unlike CiteULike and Instagram, Pandora is a closed tagging system. It utilizes the Music Genome Project as its controlled vocabulary system. Within the Music Genome Project are a set of tags known as “attributes” that are assigned to each song by a Pandora music indexer. Lev Grossman (2010) of Time Magazine described the process as:
“Every time a new song comes out, someone on Pandora’s staff — a specially trained musician or musicologist — goes through a list of possible attributes and assigns the song a numerical rating for each one. Analyzing a song takes about 20 minutes.”
Because the system is closed, it helps retain the tag quality and discoverability of the song. Since this system banks on getting the right song to the right user every time, it makes the argument to avoid user-generated tagging features that would interfere with song discovery. When a song plays, a short sample list of attributes are shown to the user but not all attributes are displayed since the controlled vocabulary used by Pandora is proprietary.
Golder, S. & Huberman, B. (2005). Usage Patterns of Collaborative Tagging Systems. Journal of Information Science. 32(2): 198-208. Retrieved from
Grossman, L. (May 27, 2010). How Computers Know What We Want – Before We do. Time Magazine. Retrieved from
Guy, M. & Tonkin, E. (2006). Folksonomies: Tidying up tags? D-Lib Magazine, 12(1).
Kakali, C. & Papatheodorou C. (2010). Exploitation of Folksonomies in Subject Analysis. Library & Information Science Research. 32 (2010): 192-202. Retrieved from
Lawson, K. G. (2009). Mining Social Tagging Data for Enhanced Subject Access for Readers and Researchers. The Journal of Academic Librarianship. 35(6): 574-582.
Sinclair, J. & Cardew-Hall, M. (2008). The Folksonomy Tag Cloud: When is it Useful? Journal of Information Science. 34(1): 15-29.
Taylor, A. G., & Joudrey, D. N. (2009). Chapter 10: Systems for Vocabulary Control. The Organization of Information. (3rd ed.). Westport, CT: Libraries Unlimited.
Yoo, D., Choi, K., Suh, Y., & Kim, G. (2013). Building and Evaluating a Collaboratively Built Structured Folksonomy. Journal of Information Science. 39 (5): 593-607.