of the American Society for Information Science and Technology    Vol. 29 No. 1     October / November 2002

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore



Re-Engineering the Immigration System: A Case for Data Mining and Information Assurance to Enhance Homeland Security

Part II: Where Do We Go from Here?

by Lee S. Strickland and Jennifer Willard

Lee S. Strickland, visiting professor, College of Information Studies, University of Maryland, College Park, MD 20742; phone: 301-314-4342; e-mail: lesss@ucia.gov

Jennifer Willard can be reached at jennoelle@smith.alumnae.net

In the first article in this two-part series, we considered the current environment of immigration information and the issues it raises. In this concluding piece, we turn first to current initiatives and then to some additional proposals for effective improvements that could be quickly implemented.

A Review of New and Current Initiatives

A number of executive branch and legislative initiatives are in place or are in process to improve the status quo. The primary accomplishments to date include a series of new INS regulations, several initiatives from the Attorney General, and the recently enacted Border Security Act.

New INS Regulations. Issued on April 8, 2002, these regulations address a range of issues but not the core problem: Is admission documentably in the national security interests of the United States? A number of the provisions reduce the admissible period for various classes of visa time; others provide that if an alien refuses to surrender after a final order of deportation, that person may be barred from obtaining future immigration benefits such as permanent resident status or citizenship. Neither of these provisions will deter or impede terrorists. Only one change may improve security by minimizing flagrant student visa abuse. Effective immediately, aliens who want to study here must first obtain a student visa. Previously an alien with any class of visa could arrive in the United States, enroll in a school and then apply for and receive an automatic student visa.

The Attorney General's Initiatives. Three days after the INS announcement, Attorney General Ashcroft directed the law enforcement and intelligence agencies within the Department of Justice to improve their data sharing. But while it includes direction for the development of a Web-based system for the secure sharing of data with state and local governments, it limits that sharing to unclassified data. Moreover, it speaks to policies and guidelines but not deadlines and mandates and, as such, it is expected that the order will have minimal operational impact in the short term.

More recently, on June 5, the AG also announced a limited registration plan for certain high-risk non-immigrant visa holders on arrival and at intervals thereafter. While an incremental step in improving our knowledge base, there are critical shortcomings there is no proposed technology to facilitate collection and integration (for example, smart cards) and no proposed institutional changes within INS to utilize the new data.

The Border Security Act (S. 1749). Officially termed the Enhanced Border Security and Visa Entry Reform Act, and signed into law by President Bush on May 14, 2002, this act addresses a number of the weaknesses in our immigration and visa processes that resulted in the legal entry of the September 11th terrorists.

    1. It recognizes that INS and the State Department require better and more comprehensive access to intelligence and criminal information and that there are bureaucratic and technology impediments and, as such, it directs a review of the specific information requirements and authorizes funds for technology upgrades. It also directs several other studies to determine the feasibility of (a) developing an interoperable database with Visa Waiver Program countries so that information collected by those countries may be made available in real time to INS, Customs and State Department, (b) pre-clearing as well as pre-inspection of foreign passengers by U.S. officials at the point of foreign departure and (c) synchronizing the immigration systems of the United States, Canada and Mexico given the realities of the open border and the importance of free trade and movement.

    2. It creates new layers of security in the immigration process by requiring extra scrutiny for visa applicants from countries designated as state sponsors of terrorism; requiring extra training for consular officers in screening for security threats; requiring that airlines transmit passenger and crew lists to U.S. authorities prior to arrival in order to permit more thorough data checks, and eliminating the now-current 45-minute clearance rule for foreign arrivals.

    3. It begins to establish a requirement for secure identification documents with an October 2003 deadline for State Department to issue machine-readable visas with biometric identifiers, for the Justice Department to deploy readers and for Visa Waiver countries to issue machine-readable passports with biometric identifiers. It is mandated that any such system would allow the collection of the same biometric data at the point of entry and comparison with the encoded biometric data on the travel document.

    4. It makes some improvements in the ability of the INS to monitor (but not track) aliens in the United States. Once funded and implemented, it will require confirmation that a student has been accepted by an approved U.S. school before a visa is issued and will require the school to notify the INS of non-enrollment; similarly, it will require that a correlated record be maintained of each foreign national who enters and leaves the United States.

However, there are many important things that the act does not do. First, it does not change the paradigm by which visas are granted the process will still look only to the existence of negative information indicating links to international terrorism; any individual not specifically identified as a terrorist in government databases will be granted entry. Second, it does not conclusively address the issue of identity fraud. And third, it does not require self-disclosure or otherwise allow the government to know where an alien is or even intends to be; as such, subsequently developed information as to terrorist relationships or evidence of non-compliance with visa terms cannot be acted upon.

It also does not address a range of information access issues to include authorizing State Department Consular Affairs access to the FBI's NCIC database, or to foreign government data or commercial credit data. This is of significant concern in that such databases can provide a wealth of wide-ranging positive information compiled over time and by processes that would be difficult to falsify. In addition, it would also make feasible the issue of commercial information technology solutions for data mining to develop knowledge in order to defeat terrorism. For example, Accenture has proposed the use of their profiling system and data for airport security use. Although federal privacy laws would likely require amendment to permit government access to commercial information, the objective would be to use commercial information and commercial predictive software to develop a "threat index" for each passenger. Indeed, it is possible that such a system could be improved by incorporating government data, although privacy concerns would be multiplied. In sum, there are two salient points: first, our information problems are not intractable and very similar issues have been addressed in the commercial environment; second, cost and time demand that we leverage from existing solutions.

Ninety Days to an Effective Solution

In the previous article we reviewed the many problems with the current system and some of the barriers to improving it. One is certainly the inclusion of the unstructured information that is so critical to intelligence functions and the development of evaluated knowledge (See box: A Note About Electronic Unstructured Databases). However, what could happen in the short term is an information revolution indeed a knowledge revolution that arrives much like today's Internet.

If we think back just a few years and consider any knowledge activity that may be of interest to the average citizen, there has indeed been a revolution: investing in the stock market, genealogy research or simply communication with each other has fundamentally changed. We submit that the singularly most successful knowledge management system is the public Internet and search engines with knowledge-based enhancements like Google (see box) decentralized input without the constraints of data standards and inherently flexible use and knowledge development by millions of users.

Indeed, the government has pursued a little-noticed initiative based on this example. Intellink is a classified model of the public Internet for the ad hoc sharing of general intelligence-related information. What could and must be done is the implementation of a new Internet-like system for counter-terrorism information that would allow all concerned federal and local government agencies to web-publish relevant information in the particular format of their choice, and would permit all users to data mine as their missions demand. We would not be concerned with integrating data among legacy systems and the lengthy and expensive planning and development effort that is required. We would be able to protect yet share data since the publishing responsibility would be decentralized and flexible (e.g., a law enforcement agency could publish only a selective extract from a sensitive investigation that might simply alert users to certain names of interest and protect even the fact of an on-going investigation). We would ensure the fastest possible dissemination because the publication would be decentralized and we would have an essential data warehouse and a data-mining tool (Google) in place in the briefest possible time.

But more would also be required given that the keys to solving the terrorist dilemma are not only data sharing and data mining, but also data analysis and, hence, knowledge development. These latter steps are a human-centric business that can, however, benefit from effective IT tools provided that we recognize that such steps are complex in that the relevant data is largely not structured, and the task of intelligence analysis will require more than traditional data integration efforts that focus on structured data. This is why watch lists have proven to be so ineffectual even if the sharing of specific names of known terrorists is perfect, the nuanced information that can lead to the development of detailed knowledge on terrorism and terrorists is lacking. Thus our new information-sharing environment must be teamed with human analysts as well as analytical data mining tools (as done in the commercial credit environment) if we are to develop the most effective knowledge base for visa decision-making. Our objective would be a knowledge base that could support a range of required activities from developing effective questions to be posed to validate identity or individuals to identifying (by profile comparison) those applicants that pose a potential security risk.

Are there developing commercial models for the integrated data mining and analysis system that we propose? The answer is "yes," as evidenced by the Regulatory DataCorp International LLC formed earlier this year by a number of financial services companies to screen new and existing customers and identify those who pose a risk of fraud or may otherwise be involved in criminal or terrorist activity. While many companies maintain demographic and transaction records about individuals, this effort will be substantively different in that such records will be integrated with those of like companies and analyzed for activities suggesting criminal activity.

What knowledge could such a system develop? From commercial and governmental information, a "reliability score" conceptually similar to the various proprietary current consumer credit scores could be computed. From credit bureau data, there might be a greater focus to the duration and extent of community relationships rather than credit worthiness per se. Such factors might include the length of time and number of documented residences cross-referenced as to general validity, known intelligence links and even community demographics. Other factors could include the number and details of records for relatives and references. Access to educational records would permit validation of assertions on the visa application as well as consistency with credit and other documentary information. National insurance numbers could be validated and data records as to earnings and claims in those national insurance systems could be cross-checked for consistency with other developed or asserted data. Other identity documents (e.g., passports and identity cards) could also be validated to eliminate the use of counterfeit and stolen blanks; the identity should also require clearance from the police and security services of the home country. These are merely examples, but the intent of the system would be to identify as many indicia of lawful community presence and permanence as possible.

Is it possible that terrorists could develop backgrounds that would pass muster? Certainly, just as the former USSR did when their intelligence service inserted "sleeper agents" (also known as "illegals") into this country before and during the Cold War, but there is great difficulty and cost in so doing.

Are their privacy issues? Yes, but just as with the commercial world, there would be consent by virtue of the voluntary application for a U.S. visa. Could predictive or profiling software improve the decision-making? Certainly yes; although the number of terrorists are few and thus complicate profile development, there are indicators that would warrant additional scrutiny for example, the existence of prior transit visas to high-risk countries or the submission of a re-issue passport.

Concerns. Certainly there are concerns to be addressed by our knowledge-base proposal for immigration management, but they do not include information and information technology issues. One is required personnel resources, given the need for an increase in information analysts by user agencies at the federal as well as state and local level. Users would not generally query the system as they do NCIC for a name check and receive a definitive response since this is not the nature of the terrorist threat and the unstructured relevant information. Users would interact with the system in an analytical capacity but this is neither difficult nor unduly costly. Indeed, our interviews with senior officials in state and local governments have indicated an enthusiasm for such an approach and an understanding that the nature of terrorism is different from that of traditional crime. It requires a proactive approach to sharing that focuses on the broadest possible exchange of information that is relevant to the subject of terrorism in general and not specific terrorism cases. And it eliminates a subtle problem even where there is good sharing that sharing is generally case specific and local police simply don't know what they don't know.

Another is the need for an enhanced identification system. We have discussed in some depth the current identification problems from fraud in the issuance to the inability to identify visa holders subsequently. The mandate of the new Border Security Act for secure identification documents by October 2003 (i.e., machine-readable visas with biometric identifiers and INS readers at points of entry) is a positive step but worrisome in its lack of specificity and absence of technical coordination with state and local authorities. New electronic visa documents that cannot be validated by our first line of defense local police and airport security forces are simply not an optimum solution. But there are answers here also. A number of American airlines have initiated efforts to deploy a "trusted" passenger identification system that would be based on background validation and include digital biometric identification that could be read and re-validated at desired checkpoints. According to various media reports, the federal government has had different responses the Transportation Security Administration is negative given the abstract potential for terrorists' subterfuge while Governor Ridge and the Office of Homeland Security has shown support. Clearly, such a solution would present a vast improvement over the present system that is inexorably inefficient, without standards and makes little or no use of information.

As with any solution to any problem, there are issues to be resolved including who would conduct the background and challenges from those concerned with civil liberties. To some degree, the privacy issue is resolved by recognizing that this is a voluntary, contractual matter between a service provider (the airline) and a service user (the passenger). More specifically, however, it would be a relatively direct matter to establish federal rules on the use, maintenance and dissemination of such information perhaps patterned on the federal Privacy Act of 1974 that concerns only information maintained by the federal government. The critical point, however, is that the current system relies on highly intrusive physical searches of the body while ignoring a vast resource of information that would permit better decision making.

Our challenge is that we must ensure that the broadest array of information from definitive structured data to lead-type (i.e., intelligence) information is made available so that it can be exploited by elements of our defense. Today in the world of business, we would not likely hire employees or enter into partnerships if we knew only the fact that the person was not a convicted felon; rather we require and analyze a range of data to make a knowledge-based decision. But the immigration system relies on the simple check that a person is not a known terrorist. The challenge to the federal government to change the decision-making process and invest in realistic knowledge development is clear.

For Further Reading

The following are a few of the many resources available that address the issues discussed in this paper and that are recommended as representative and informative.
"A Single National Security Database," Larry Ellison, CEO Oracle, Inc., The International Herald Tribune, January 31, 2002 (available on the Internet at
www.iht.com/articles/46455.html ).
"Backgrounder: The Enhanced Border Security and Visa Entry Reform Act," National Immigration Forum, Washington, DC 2002 (available on the Internet at
www.immigrationforum.org/currentissues/ articles/bordersecurity.pdf ).

"Knowledge Related to a Purpose: Data-Mining to Detect Terrorism," William Stanley Hawthorne, PriceWaterhouseCoopers LLP, presented at Syracuse University, January 18, 2002 (available on the Internet at www.maxwell.syr.edu/campbell/Governance_Symposium/ hawthorne.pdf ).

"Terrorism: Automated Lookout Systems and Border Security Options and Issues," Congressional Research Service, Library of Congress, RL 31019, June 18, 2001 (available on the Internet as reprint at www.fas.org/irp/crs/RL31019.pdf ).

For continuing coverage on the many issues presented by the war on terrorism, especially in the context of homeland security, we highly recommend the Journal of Homeland Security, published by ANSER, a nonprofit, public sector research institute. It may be found on th e Internet at www.homelandsecurity.org/journal/.

For detailed discussions on the concept of NetWAR the asymmetric conflict of today between the United States and the terrorist networks, see www.rand.org.

An Important Note About Electronic Unstructured Databases

Many discussions in the media relating to data mining in this context tend to focus on structured databases sharing and mining for knowledge much as occurs in the commercial credit world. As we discuss in this article, however, much relevant information in the federal government is contained in unstructured, intelligence-type records for example, e-mail systems, case files or personality dossiers. And the sharing of such information (e.g., the e-mail documenting the findings and hypotheses of the Phoenix FBI office) is just as important as the sharing of the structured data (e.g., a watch list compiled by a federal agency). But the effective sharing of this unstructured information presupposes that there is the ability to identify and access such documents subsequent to creation. It is this failing that led to the de facto disappearance most recently of the Phoenix memo as well as the disappearance of dozens of documents relevant to the Timothy McVeigh prosecution and the resultant stay in the imposition of his sentence.

What is required is the adoption of electronic record keeping systems ERKS that assure the capture, maintenance and subsequent access of unstructured documentary material. Whether that access is through metadata or full text search is not critical; what is critical is that each agency have in place a system and process that ensures the effective management of all electronic documentary materials in accordance with National Archives-approved schedules. As information professionals often remark, "Space on a disk drive is not a record system."


Google is an effective knowledge management system, in our judgment, because it mimics to a substantial degree the manner by which humans validate printed and other information in the real world. Known as page rank , the Google system evaluates, first, the link structure as an indicator of an individual page's value (the more links to a given page, the higher the importance of a page, weighted by the importance of those linking pages) and, second, the content of a given page through a sophisticated text-matching algorithm. In our scenario, the knowledge value of a terrorism watch system would grow as users accessed data and inserted links to specific documents deemed of high value to them in their law enforcement, immigration or intelligence work.

How to Order

ASIST Home Page

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2002, American Society for Information Science and Technology