A new unified probabilistic model
David Bodoff, Stephen Robertson 
Published Online 21 Jan 2004

Bodoff and Robertson see that the difficulty with the current probabilistic models is in determining how to combine the document oriented view with the query oriented view in the estimate of probabilities, and in how to handle situations where data is not available for either approach.  Relevance assessment will be a source of error in both approaches, but if one has only relevance error and error in query term assignment, a query view will provide a proper estimate for both error sources, while if error exists in document term assignment, a document view will be correct for both. In a combined approach it is not clear how the error is to be distributed and thus how to combine the probabilities. Instead of proceeding directly from indexing and relevance data to contingency tables for probability prediction,  their method assumes and estimates true indexing by hypothesizing the form of the document, query and relevance functions, constructing a maximum likelihood function for all data, and then using observed document and query terms with a training sample of relevance judgments to estimate the unknown document and query parameter vectors, and allow the prediction of relevance of document query pairs. If evidence is unavailable the initial term vector will be unmodified. A  pilot experiment on 1400 Cranfield documents and 225 queries using an heuristic search to maximize the likelihood function yielded an improvement in average precision from .23 to .48.









Properties-based retrieval and user decision states User control and behavior modeling
Gerald Benoît 
Published Online 21 Jan 2004

Benoit selects a population of 20 graduate student researchers in biology to interact with a retrieval system that generates a term/property matrix, with both collection and property weights, utilizing 300 Dialog documents and an xml-tagged journal providing documents in four languages. Query terms could be entered alone or in combination with their assigned properties and result in ranked lists whose membership and order may be modified by the use of slide bars  to adjust the weight that term property combinations exert. Recording think aloud searches and using a post search survey, satisfaction was measured in terms of whether the mapping was close to the user's understanding, whether transition among levels of detail met expectations, the availability of help to the user, and adaption to the user's current state of knowledge. With this small sample t-tests indicate that users find property based retrieval more useful than normal ranking. Users emphasized different properties in their final manipulation of results and a gradient space analysis of this data may indicate user cognitive states.






Design of cataloging rules using conceptual modeling of cataloging process
Shoichi Taniguchi 
Published Online 18 Dec 2003

Taniguchi constructs a conceptual model of the cataloging process which will lead to a process to design cataloging rules. The model consists of "event patterns" that trigger actions, associated "action patterns," and orientation," an inclination toward application to a particular objective or function such as "identity," "relationships, or "contents," with triples propagated for each data element and allowing choice of event, action, orientation triples for any situation. These  are then verbalized as rules. A rule system so designed should have the advantage of consistency and scalability.










A classification of author co-citations Definitions and search strategies
Ronald Rousseau, Alesia Zuccala 
Published Online 8 Jan 2004 

Rousseau and Zuccala. see author co-citation analysis, which is used for understanding change in a subject area over time, to look for convergence in research traditions, and to understand how scholars seek and use information, as dependent upon a co-citation definition which is based upon first authors. They provide four distinct classes of author co-citation pure first-author co-citation, where an article by A as a first or sole author appears in a paper's reference list where an article by B as a first or sole author also appears; pure author co-citation, where an article with A as co-author and an article with B as a co-author appear in the same paper's reference list with papers co-authored by the two excluded; general author co-citation, where an article with A as co-author and an article with B as a co-author appear in the same paper's reference list but co-citation of co-authored articles themselves are excluded; and finally special co-author/co-citation, where A and B as co-authors may be counted as co-cited. "Pure first author" is suitable for developing a picture of a topic, "pure author" is suitable for evaluation of research, "general author" gives a more accurate picture of an author's contribution, and "co-author/co-citation" shows a more precise picture of intellectual similarity  among authors. Techniques using current Dialog functions for the creation of the various classes of co-citation data are described.  










Measuring retrieval effectiveness A new proposal and a first experimental validation
Vincenzo Della Mea, Stefano Mizzaro 
Published Online 2 Feb 2004

Della_Mea. and Mizzaro see traditional recall and precision measures as hopelessly tied to a binary view of relevance and suggest a user relevance weight and a system relevance weight be assigned for each document in the file and an average distance measure be based upon the average of the differences between each weight pair subtracted from one to give a zero minimum value. Small differences in system relevance weights can result in large differences in precision and recall, but will not effect the average distance measure. Big differences in system relevance weights may not greatly effect recall and precision but will effect the average distance measure.  If a single measure is not desired, a precision like measure can be constructed by computing the average distance measure only on documents above the ideal system relevance - user relevance line, and a recall like measure on only those below that line. Using TREC-8 data with a document's rank as its system relevance weight, they tried to provide  user relevance both with the binary based qrel data as well as with a continuous modification of the qrels incorporating the best system relevance data. Using Kendall, both ADM measures correlate strongly with each other and with the number of relevant documents among those retrieved, with average precision and with R-Precision. Calculation on subsets of relevant documents gives very similar results as on the whole set.







Long-term influences of interventions in the normal development of science: China and the Cultural Revolution
Bihui Jin, Ling Li, Ronald Rousseau 
Published Online 3 Feb 2004

Jin, Li, and Rousseau investigate the relationship between age and scientific productivity in China. using the Chinese Science Citation Database to access 185,486 first author's ages in the 1995 to 2000 period. The proportion of papers authored by people in each age group was calculated for each year. In 1995 the curve is bi-modal with peaks in the late thirties and at about 60, but by 2000 only the late thirties peak remains and moves to the right over time. The lack of scientific training between 1966 and 1976 means that the older scientists that would be expected to be productive are not present. Authors born after 1972 show a rapid increase in productivity.











European Research Letter

A report on the first year of the INitiative for the Evaluation of XML retrieval (INEX'02)
Gabriella Kazai, Mounia Lalmas, Norbert Fuhr, Norbert Gövert
Published Online 21 Jan 2004

Kazai, Lalmas, Fuhr, and Govert provide a review of INEX'02 in our European Research Letter.  INEX's purpose is to provide an infrastructure for the evaluation of content oriented XML retrieval systems which support search by both content and structure, often attempting retrieval of relevant structural components rather than complete documents. Thus success involves identification of the most specific relevant document components that exhaust the topic  of the request in a file environment where structural constraints may be specified and both topical relevance and, since large components include small components,  component coverage of a topic are meaningful. A four point relevance scale is utilized as is a four point coverage scale including no coverage, overly large coverage, overly small coverage, and exact coverage.  The collection is made up of the full text of 12,107 documents from 18 IEEE publications from 1995 to 2002 where the average paper contains 1,532 XML nodes with an average depth of 6.9. Sixty queries were chosen, 30 requiring structural query constraints, and were searched by participants who provided 51 runs for assessment. The merged pools were between 300 and 900 papers per query, and were assessed by the submitting teams for 54 queries. Values between 0 and 1 were assigned based upon combined relevance and coverage assessments, and Raghavan's expression for probability of retrieval given relevance.


Brief Communication





Bowling alone together Academic writing as distributed cognition
Blaise Cronin 
Published Online 17 Dec 2003

While philosophers are not often co-authors, 94% of their studied papers from 1999 included an acknowledgment of intellectual contribution of others to the work. Cronin argues that all scholarship is the result of cognitive interactions with a socio-technical system of people, texts, and other aids and artifacts, and that bibliographies and acknowledgments prove scholarship is an instance of distributed cognition.


Book Review


Democracy and New Media
John P. Renaud
Published Online 26 Jan 2004

