Hyperlink Extraction Improves State of Illinois Website Identification
Larry S. Jackson, Yan Zhang, Jian Wu
ASIS&T Annual Meeting - 2006 (ASIS&T 2006)
Austin, Texas, November 3-9, 2006
Abstract
In state government web archiving, new state websites are continually discovered. Largely manual techniques used to date to locate new state websites are here supplemented with automated hyperlink extraction. The rate of discovering new state websites quadrupled, at the cost of having to evaluate very long lists of hyperlinks. Newly discovered websites tended to be smaller, and often adjuncts to agencies for which some other websites were already known. Beginning the website discovery process using a combination of what is known, or easily discovered manually, followed by the more thorough analysis of hyperlinks seems likely to discover the largest percentage of state government websites quickly.