Evidence of Term-Structure Differences among Folksonomies and Controlled Indexing Languages

Benjamin Good and Joseph Tennis

ASIS&T 2008 Annual Meeting (AM08 2008)
Columbus, Ohio, October 24-29, 2008


With the advent of Internet-based technologies for information organization, many groups have constructed their own indexing languages. Biologists, Library and Information Science practitioners, and now social taggers have worked together to create large and sometimes complex indexing languages. Two questions surface: (1) what are the measurable characteristics of these indexing languages, and (2) do measurements of these indexing languages speciate along these characteristics? This poster presents data from this exploratory work.

A number of theoretical works have compared folksonomies to controlled indexing language. However, little empirical work has looked at the anatomy of these languages and compared their similarities and differences.

Working primarily in the semantic Web environment, we harvested 25 indexing languages, normalized the languages, and performed statistical tests (comparative counts on seven variables, creating radar graphs, and cluster analysis on those variables). Examining the radar graphs of the normalized indexing languages we can see that the different groups of indexing languages investigated here display particular shapes that correspond to varying extents to the kinds of systems they arose from. For example, we observe distinct shapes for folksonomies and for subsets of the controlled languages. Interestingly, though maintaining the basics of the folksonomy shape, the Connotea folksonomy appears much more similar to the controlled vocabularies then the other folksonomies do.

The clusters formed from the seven selected variables and the 25 normalized indexing languages indicate also, as expected, that the folksonomies form a separate group from the controlled indexing languages. In addition, they show that two distinct subsets exist within the group of controlled languages.

The primary contribution of this work is the development of a framework for the empirical comparison of the terms from different indexing languages.

