In This Issue



Bert Boyce


The Concept of Relevance in IR
Pia Borland
Published online 7 May 2003

Borlund indicates through her literature review that a concept of relevance can be presented that is consistent and compatible with expressed views. In information retrieval relevance is both multidimensional and dynamic and thus must be situational. By situational Borlund means a user centered, empirically based concept that expresses the relationship between the user's perception of the usefulness of a retrieved object and a specific work task situation. This work task situation appears to be a way of expressing the information need apart from any information request.










The Changing Face of Scientific Discourse Analysis of Genomic and Proteomic Database Usage and Acceptance
Cecelia Brown
Published online 20 May 2003

Brown points out that the early practice of molecular biologists, reporting DNA strings, protein structure data,  and amino acid sequences when identified,  in peer reviewed journals, has given way to deposit in public web-based databases like GenBank and the Protein Data Bank known collectively as GPD.  Journals in the area often no longer accept such data because of its huge and growing volume and instead require deposit in a GPD prior to publication of work based upon this data.  Biologists, biochemists, and biomedical scientists at the University of Oklahoma - Norman were surveyed by e-mail concerning usage and acceptance of GPD. Willing participants were then interviewed and observed in a think aloud session with a GPD. Occurrences of citations to GenBank in ISI's Web of Science were also recorded. Medline and CAPlus were searched for occurrences of the term "GenBank,." and the 23 journals with the largest number of hits were chosen for analysis of "instructions to authors" material over a 20 year period.  GenBank is used routinely and without concern for the validity of the non-peer reviewed data. The participants believe such data should be freely available and it is not viewed as intellectual property. The use of the GPD is indicated by an increase in the keyword "GenBank" over time and changes in instructions to authors. GPDs have become an integral part of the communication cycle in molecular biology.







Multidimensional Data Model and Query Language for Informetrics
Timo Niemi, Lasse Hirvonen, and Kalervo Järvelin
Published online 20 May 2003

Niemi, Hirvonen and Jarvelin are interested in a multidimensional data structure to be used in On-Line Analytical Processing (OLAP) where summary data generated from, and updated by, data warehousing applications makes up the material for on-line analysis, which in turn, constructs differing views of this data to those presenting queries. In particular they wish to use such a structure for informetric analysis. The model structure consists of multidimensional data cubes, which are  multidimensional arrays of facts and dimensions (ways viewing facts), and collections of dimension tables and hierarchy tables. A query language is developed that allows the definition of a result table based upon dimension conditions, and definition of desired columns.











An Emerging View of Scientific Collaboration Scientists' Perspectives on Collaboration and Factors that Impact Collaboration
Noriko Hara, Paul Solomon, Seung-Lye Kim, and Diane H. Sonnenwald
Published online 21 May 2003

Hara, Solomon,  Kim, and Sonnenwald conduct an intensive study of a  geographically distributed chemistry and chemical engineering research center spanning four universities with the intent of identifying patterns of collaborative behavior which might lead to theories whose generality could be later confirmed.  The roughly 100 members of the center are about one third faculty and the remainder students, postdoctoral fellows and research associates. Data was collected by interviews of one research group and for all participants by surveys and observation of video conferences and meetings. Many scientists believed that to collaborate unique and sufficient expertise for both parties was a requirement, whereas student participation in research projects is viewed as education.  However, collaboration may mean a student working for both scientists and acting as a bridge. The data suggest that collaboration is influenced by compatibility, work connections, incentive, and socio-technological infrastructure and that a continuum of levels of collaboration exists beginning with low level sharing of knowledge where one scientist's work complements another's and moving to a completely shared project where the collaborators work closely together throughout the process.  A necessary foundation is a perception of scientific and professional compatibility, and interpersonal trust. 











Multi-Agent Information Classification Using Dynamic Acquaintance Lists
Snehasis Mukhopadhyay, Shengquan Peng, Rajeev Raje, and Mathew Palakal
Published online 27 May 2003

Mukhopadhyay, Peng, Raje, Palakal  and Mostafa use a vector space model, term frequency-inverse document frequency method, with various and disparate document collections to produce classification agents with varying vocabularies that classify new documents by similarity to generated centroids. If an agent generates a null vector the document is unclassified, but might be classified by an agent with a different vocabulary.  An 81 term computer science vocabulary was broken into nine disjoint sub-vocabularies creating agents that attempt to classify their own document sets, and time permitting, try to assist other remote agents. In the multi-opinion model all remote agents try to classify unclassified documents but as the number of agents available increases a saturation point is reached where more agents result in a small incremental increase in successful classification while response time increases linearly with the number of agents. Thus proper selection of a small number of remote agents could achieve high performance at low response time and could be achieved by creating a small acquaintance list for each agent using a Pursuit Learning algorithm. On the basis of quickest return, or highest similarity value return, a best acquaintance is chosen and given a positive ranking weight which will modify the probability that its future choice will result in a reward. Algorithm performance compared to four best off line chosen agents resulted was 39% better than a random selection and 363% better than a worst four performance.


