Hyperlink Extraction Improves State of Illinois Website Identification

Larry S. Jackson, Yan Zhang, Jian Wu

ASIS&T Annual Meeting - 2006 (ASIS&T 2006)
Austin, Texas, November 3-9, 2006


In state government web archiving, new state websites are continually discovered. Largely manual techniques used to date to locate new state websites are here supplemented with automated hyperlink extraction. The rate of discovering new state websites quadrupled, at the cost of having to evaluate very long lists of hyperlinks. Newly discovered websites tended to be smaller, and often adjuncts to agencies for which some other websites were already known. Beginning the website discovery process using a combination of what is known, or easily discovered manually, followed by the more thorough analysis of hyperlinks seems likely to discover the largest percentage of state government websites quickly.

