Bulletin, June/July 2006
What's New?
Selected Abstracts from JASIST
Authors
who choose to do so prepare and submit these summaries to the editor of the Bulletin.
From
JASIST v. 57 (3)
Elovici, Y., Shapira, B., & Kantor, P.B.
(2006). A decision theoretical approach to combining information filters:
Analytical and empirical evaluation, 306-320.
Study
and Results:
This paper asks how an information professional can make best use of multiple
search engines when filtering streams of data. The hypothesis, which is
verified, is that different ways of combining the information will be
appropriate for users with different value schemes. It is found that different
logical fusion rules should be used for different value schemes. The result is
verified by experiments using the TREC collection and two different filtering
methods.
What's
New? Careful
application of this approach could improve the value of filtering systems used
in support of the financial industry, the intelligence community, and other
areas.
Limitations:
The work is mathematically rigorous, being an extension of Blackwell's Theorem,
but the size of the effect will depend on the specific collection and standing
queries.
Asonuma, A., Fang, Y., & Rousseau, R.
(2006). Reflections on the age distribution of Japanese scientists, 342-346.
Study
and Results: The
age distribution of Japanese scientists is investigated to determine whether
major non-demographic events such as World War II had an appreciable effect on
its features. Contrary to the Chinese situation where the effects of the
Cultural Revolution are clearly visible, no such effect was found. Yet, it was
found that the baby boom generation, born after World War II, dominates the
scientific landscape.
What’s
New?
It is shown that World War II itself had no influence on university enrollments
in
Limitations:
Results of the population census held in the year 2000 were not yet available at
the time of writing.
Sun, A., & Lim, E. -P. (2006) Web-unit
based mining of homepage relationships, 394-407.
Study
and Results:
We study the problem of mining the relationships among homepages on the same
website. Homepages are usually the main targets for searching and browsing.
They, together with relationship instances among them, facilitate semantic-based
information retrieval on websites. In this research, we adopt a classification
approach in homepage-relationship mining and investigate the features to be
used. We identify three types of inter-homepage features, namely, navigation,
relative-location, and common-item
features. We also propose deriving for each homepage a set of support pages. The
homepage together with its support pages are known as a Web
unit. Our experiments on the WebKB dataset showed that by extracting
inter-homepage features from Web units, better homepage-relationship mining
accuracies can be achieved compared with using features derived from individual
homepages.
What’s
New?
The
problem of homepage-relationship mining is formally defined and three types of
inter-homepage features are carefully studied. These features can be derived
from either individual homepages or Web units that contain more complete
information about the homepages.
Limitations:
Experiments
were conducted on WebKB dataset, which is small relative to the size of the Web
today. Experiments on larger datasets and datasets from different domains will
yield more interesting results.
Yoon, Y., Lee, C., & Lee, G. (2006). An
effective procedure for constructing a hierarchical text classification system,
431-442
Study
and Results:
Hierarchical classification can provide solutions to effectiveness and
efficiency problems of practical text classification tasks. We devised a new
evaluation technique applied to internal classifiers (nodes), which guarantees
more opportunity of classification to the lower classifiers (nodes or leaves) in
the hierarchy, hence upgrading the overall classification performance. We could
get more improved classification accuracy than any other methodology in the
experiment that used 20 newsgroups and OSHUMED as test data collections.
What’s
New? Our
method is based on the new evaluation scheme for internal classifiers and is
systematic and well defined in its classification procedure. Therefore, it can
effectively be applied to the practical classification task with very large
number of documents and categories. In addition, sacrificing a slight decrease
in accuracy, we could save the training time dramatically.
Limitations:
Our hierarchical
classification system adopts the top-down level-based approach in classifying
hierarchically, thus cannot be applied to other hierarchical methods such as the
big-bang approach that determines classes in single classifying run.
From
JASIST v. 57 (4)
Shiri, A., & Revie, C. (2006). Query
expansion behaviour within a thesaurus-enhanced search environment: A user-centred
evaluation, 462-478
Study
and Results:
The
query expansion behavior of end-users interacting with a thesaurus-enhanced
search system on the Web was investigated. Thirty searchers – academic staff
and postgraduates – at a university were recruited to perform search tasks
based on their own genuine information requests. The results indicated that
thesauri are capable of assisting end users in the selection of search terms for
query formulation and expansion, in particular by providing new terms and ideas.
In 50% of the searches where additional terms were suggested from the thesaurus
users stated that they had not been aware of the terms at the beginning of the
search. This observation was particularly noticeable in the case of postgraduate
students.
What’s
new?
The
main contribution of the study lies in the finding that academic searchers
representing various levels of subject knowledge can benefit from thesauri for
search term selection and query expansion purposes. The results also have
implications for online user education. It is recommended that online and
database-searching courses should incorporate training on thesaurus-based search
options to improve user performance by conducting high quality searches.
Limitations:
This
study employed a commercial retrieval system. Therefore, the search states
defined to represent typical thesaurus-based search stages were restricted to
those features.
Cheung, C.M.K., & Lee, M. K. O. (2006).
Understanding consumer trust in Internet shopping: A multidisciplinary approach,
479-492.
Study
and Results:
A
survey of 278 university students in Hong Kong, aged 18-20, found that
trustworthiness of Internet vendors (competence, integrity and security
control), legal framework and third-party recognition (e.g., TRUSTe, BBBOnline,
Verisign and the like) are important factors for trust building in the online
environment.
What’s
New? The
fact that trust is a fundamental component influencing Internet shopping comes
as no surprise. This study further synthesizes diverse theories of trust and
develops a framework that provides significant explanation of trust and offers
important insights in trust formation strategies. The main lesson to be learned
is that third-party recognition is the central element in developing trust in
the online environment. Internet vendors should affiliate with trusted third
bodies and acquire the third party’s seal of approval to endorse their
security policies. Interestingly, consumer’s propensity to trust does not play
any role in building trust. Our findings support the societal culture of trust
where propensity to trust is lower for individuals from collectivist cultures
(e.g.,
Limitations:
Relatively
homogeneous student samples were used in this study. Only replicating this study
using different sampling units can assess whether the results are applicable to
other Internet users in other cultures.
Shen, X., Li, D. & Shen, C. (2006).
Evaluating
Study
and Results:
This
paper applies correspondence analysis to analyze five
aspects of
15 university libraries
in
What's
New?
There
are many differences between the libraries and business companies; however,
library websites without a geographic-position advantage become competitive
independent individuals in cyberspace. This
evaluation found clear differences among them. Moreover, this study is the first
time to utilize the correspondence analysis in a library field in a way that
contributes to library website construction. We also found that building
Limitations:
There
are limitations in the data provided by Alexa. For example, the Alexa toolbar
presently supports only Internet Explorer, and thus will decrease the websites
with non-IE browsers in its calculations. Also,
some other evaluation standards such as authority are not included in the
evaluation model.
From
JASIST v. 57 (6)
Wu, Y., -F. B., Li, Q., Bot, R.S., &
Chen, X. (2006). Finding nuggets in documents: A machine learning approach,
740-752.
Study
and Results:
Document keyphrases are
highly useful; they can be used as metadata for documents or to develop a
glossary. This paper describes a Keyphrase Identification Program (KIP). The
logic of our algorithm is the more keywords a candidate phrase contains and the
more significant these keywords are, the more likely this candidate phrase is a
keyphrase. KIP has a system glossary database storing prior positive samples of
human-identified phrases, which are used to assign weights to the candidate
phrases. The evaluation results show that KIP has better performance than the
systems we compared it to.
What's
New?
Besides KIP’s methodology, this paper also introduce KIP’s two other unique
features: the learning function, which can enrich
the system glossary database by automatically adding new identified keyphrases,
and the personalization feature, which can
help a user build a glossary database specifically tailored for the area of
his/her interest.
Limitations:
The coverage of the prior positive samples of human-identified inputs influences
the performance. However, enabling the learning function can rectify
deficiencies in the samples.
Chen, L, Zeng, J., & Tokuda, N. (2006).
A "stereo" document representation for textual information retrieval,
768-774.
Study and Results:
Human
experiences, which show that stereo audios and videos have always been
beneficial in acoustic and visual recognition, lead us to our belief that
perceiving objects from two or more perspectives is always an advantage. The
purpose of this article is to discover if "stereo" views of a textual
object (i.e., a document) are helpful for information retrieval purposes.
Experiments on two standard corpora have illustrated that both the standard
term-vector method and the latent-semantic-indexing method are able to achieve
significant improvements by adopting the stereo
representation model.
What’s
New?
Paralleling stereo audio and video, the concept of "stereo" document
representation is proposed. Although this concept is only used in limited
experiments in this article, we expect it to be applied as a general principle
to many textual retrieval approaches.
Limitations:
Experiments on large corpora will be necessary for further verification of the
observed improvements. For an arbitrary corpus, it is not yet clear what
constitutes a general condition for an optimal overlapping rate for multiple
"stereo" views of each document.
Articles in this Issue
Toward Terminology Services: Experiences with a Pilot Web Service Thesaurus Browser
Web Services for Controlled Vocabularies
Versioning Concept Schemes for Persistent Retrieval
Growing Vocabularies for Plant Identification and Scientific Learning
Toward Human-Computer Information Retrieval
Toward an Enriched (and Revitalized) Sense of Help: Summary of an ASIS&T 2005 Panel Session
IA Column: The Confluence of Research and Practice in Information Architecture
What's New?
Selected Abstracts from JASIST