The Ecological Approach to Text Visualization
James A. Wise
The Spatial Paradigm for Information Retrieval and Exploration, SPIRE, converts digitized text documents into vector
space document representations using 280 element vectors whose elements were produced by a neural net trained on the domain of the documents. These are clustered with a similarity measure and projected onto a
two-dimensional plane using a modification of multidimensional scaling that uses document-to-centroid distances rather than pairwise document distances. The visualization shows the reoccurrence of a concept as a height
on a projection that resembles a terrain map.
A Hybrid Method for Abstracting Newspaper Articles
James Liu, Yan Wu, and Lina Zhou
Liu, Zhou, and Wu begin their abstract extraction from
Chinese text by comparing character pairs with user chosen keywords for exact, partial, or variable character matches. Word frequency of all words is compared to a standard word frequency table, where nouns and verbs of
frequency at variance with the standard are extracted . High variance words are used to select sentences until the required length of text is extracted. In a combined method, matching is supplemented by weighted
extraction. An additional level uses parts of speech, pronoun referents, and syntactic rules as well as syntactic markers explicit in Chinese text.
Thirty five users were surveyed and 60% found keyword and percentage
extraction to be useful. The extraction of summaries was not well received.
Formal Features of Cyberspace: Relationships between Web Page Complexity and Site Traffic
Erik P. Bucy, Annie Lang, Robert F. Potter, and Maria Elizabeth Grabe
Using a sample of 5,000 Web sites top ranked by hits using 100hot's InSite Pro service, Bucy, Lang, Potter, and Grabe randomly selected
500 sites, and 496 home pages were coded to reflect domain name, rank, average number of page views over six weeks, and the banner, body, and advertisements were analyzed for features and links. Banners occur on 75% of
sites and are most commonly white. One-fifth featured movement. Home pages averaged 2.4 screens in length and 79% used one or more frames. The dominant background color was white. A graphical element occurred on 95% of
the pages, with a logo being the most common. Movement was present in about one-third of the pages. Asynchronous elements--links, surveys, contact information--occurred in 98.9% of pages with an average of 27.1 such
elements per page. Just 15.9% used real-time interactive elements, like audio or video links, or chat rooms (which were the most common of these). Over half the pages exhibited advertisements of some kind but less
thanone-third of these had dynamic features.
For commercial sites, high visitation correlates with high graphics use and less strongly with asynchronous interactive elements. In noncommercial sites, there is a strong
correlation between visits and asynchronous interactive elements. Real time interactive elements are rare. Advertising is prominent, but pages are not generally over-designed.