ASIS&T 2006 START Conference Manager    

Quantifying literature citations, index terms, and Gene Ontology annotations in the Saccharomyces Genome Database to assess results-set clustering utility

W. John MacMullen

ASIS&T Annual Meeting - 2006 (ASIS&T 2006)
Austin, Texas, November 3-9, 2006


A set of 37,325 unique literature citations was identified from 120,078 literature-focused annotations in the Saccharomyces Genome Database (SGD). The citations, gene products, and related Gene Ontology (GO) annotations were analyzed to quantify unique articles, journals, genes, and to rank by publication year, language, and GO term frequency. GO terms, MeSH indexing terms, MeSH Journal Descriptors, and SGD Literature Topics were quantified and analyzed to assess their potential utility for results set clustering. Results: Bradford’s Law of Scattering was shown to hold for the citations, journals, and gene products. Only the MeSH terms and article title/abstract pairs had significant numbers of term co-occurrence. Multiple term types may be useful for faceted searching and clustered results set browsing if the strengths of each are leveraged.

