of the American Society for Information Science and Technology    Vol. 29 No. 1     October / November 2002

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore



Re-Engineering the Immigration System: A Case for Data Mining and Information Assurance to Enhance Homeland Security

Part I: Identifying the Current Problems

by Lee S. Strickland and Jennifer Willard

Lee S. Strickland is visiting professor in the College of Information Studies at the University of Maryland, College Park, MD 20742. He can be reached by phone at 301-314-4342 or by e-mail at lesss@ucia.gov.

Jennifer Willard can be reached at jennoelle@smith.alumnae.net

The most significant improvement in homeland security can be achieved by reducing the threat at its source through a fundamental review of the process by which foreign nationals are admitted to and managed while in this country. All 19 hijackers involved in the attacks on 9/11/01 entered this country legally, and unknown numbers of similar enemy combatants have or can do the same. No information was identified in the visa process to link the hijackers to international terrorism, nor was there any information raw, validated or assessed to establish that their entry into the United States would be consistent with our security interests.

This dual deficit suggests two critical failings. The first failing to detect their terrorist backgrounds might be explained in part by the secret and shadowy world of terrorism, the ease by which new identity documents can be procured abroad and the realities of the intelligence business: watch lists will never contain the names of all terrorists. But the second failing not requiring positive information is impossible to justify. Why should the visa process not require the existence of positive information that establishes the identity of the applicant by virtue of a validated history? Why would we not resort to the abundance of extant information that is compiled daily on individuals in our global economy in this information age? In short, why should the process for the issuance of a visa be so much less rigorous than for a VISA credit card?

To compound the problem the Immigration and Naturalization Service (INS) immediately lost track of the plotters once they cleared U.S. immigration and customs as they do for all alien visitors. The INS cannot generally facilitate the arrest of alien visitors who violate the terms of their visa or are later determined to pose a security threat, as was the case with some of the 9/11 hijackers.

What is required for effective, preventive homeland security is nothing less than a fundamental re-engineering of the immigration system predicated on the concept of information superiority and achieved through effective data mining and information assurance processes. And this equates to fundamental changes in the involved data, processes and tools. A visa must not be issued unless there is sufficient, analyzed positive information to establish that entry is in our national security interest, and we also propose a rapid prototyping effort to web-publish and analyze the requisite range of terrorism-related information federal and state government, commercial and foreign government data in a security-tiered environment that would avoid the data integration problems involved with numerous legacy systems and the security clearance problems that have plagued data sharing efforts in the past.

These proposals are neither revolutionary nor expensive, but aggressive data sharing as well as cost-effective IT solutions leveraged from the commercial environment have little political constituency. It is neither information classification nor IT that is the roadblock; rather it is the bureaucratic power that flows from the possession and control of information. The system can be re-engineered, but it will require the assistance of the commercial world and willingness inside the government to accept that guidance and vision.

This article will examine the current immigration system and its shortcoming. In Part II: Where Do We Go from Here? (immediately following this article) we examine recent legislative and regulatory changes and our proposals for improving the system.

The Current Operating Environment

Procedures for processing non-immigrant visas vary among consular posts. Some work on a first-come, first-served basis, others require applicants to make appointments in advance for the application process, and still others rely on businesses and travel agencies to act as intermediaries between applicants and consular offices. In general, however, the basic elements remain the same. First, an applicant must complete an application. The non-immigrant visa application asks for basic information, such as name, place and date of birth, passport number, address and occupation information, as well as more detailed questions such as "purpose of trip," "who will pay for the trip," and "have you ever been issued a U.S. visa." A photograph is required as is a current passport. Furthermore, a consular official must interview applicants unless the interview is waived. Perhaps the most critical part of the processing involves the obligation of consular officials to perform basic background checks (or request additional information from the applicants) in order to confirm the applicants' bona fides a requirement imposed by Congress after a visa was issued to Sheik Rahman, a known terrorist, who subsequently masterminded and was convicted of the 1993 bombing of the WTC garage.

What the media generally call the watch list is actually a collection of databases from various federal agencies containing a variety of information and presenting a number of critical issues. Each database has a significantly different focus reflecting agency-specific interests, each has different standards for inclusion even of similar information, and there is limited integration. Where data sharing exists, it is often partial. But certainly the critical issue is that these systems are name-based they identify "wanted" or "excludable" people they do not contain more general intelligence information on people, places and things that permits analysis and judgment. We discuss the primary databases below.

State Department. CLASS (Consular Lookout and Support System) is the primary watch list and is a name-based system intended for use by consular officials overseas as part of the visa process. CLASS includes records of approximately 5.7 million individuals who "have been or should be denied U.S. visas." Most of the data originates with the visa application process, although some comes from INS and the Drug Enforcement Agency (DEA). Some limited information on stolen blank foreign passports is included, but only since 1999. CLASS also includes records generated by TIPOFF, an intelligence-related system containing the names of suspected terrorists collected and managed by the Bureau of Intelligence and Research (INR) at the Department of State from intelligence and diplomatic reports. There is limited sharing between CLASS and the Customs Service IBIS (see below). There is a current effort to initiate limited sharing of NCIC (National Crime Information Center) records maintained by the FBI.

There are two additional systems of note at the State Department. The first is CCD (Consolidated Consular Database) which is also a name-based tracking system that contains records of all visas issued and refused worldwide, including some photographs and biometric data. It also includes records of overseas issuance of U.S. passports and consular reports of birth abroad to U.S. citizens. CCD and CLASS records overlap to some degree; CCD does not interface with any other identification system but limited CCD data is being shared with INS in a variety of pilot projects. Full INS deployment would allow inspectors to verify that the holder of a visa at a point of entry was indeed the same person who applied for the visa. The other system of note is CLASP, a tracking database that records U.S. passports reported lost and stolen. It is being pilot tested on a limited basis.

U.S. Customs Service. IBIS (Interagency Border Inspection System) is a port-of-entry name-check system on individuals that should be denied entry and includes an interface to APIS (see below), CLASS and NAILS (see below under INS). It integrates data from Customs, the FBI, the DEA and the Animal and Plant Health Inspection Service. It also contains very limited records on stolen passports. IBIS is accessible to some 20 federal agencies in addition to some consular offices, but not to commercial airlines. Recently, limited access to the NCIC has been added, although full NCIC screening is currently only done by the FBI on immigrant visas.

Of additional note at Customs is APIS (Advance Passenger Information System), a new system that analyzes biographical data on individual air passengers (contained in the aircraft flight manifest) prior to their arrival in the United States. Specifically, it permits a single name check against several law enforcement databases, including IBIS and NCIC. Under the Aviation and Transportation Security Act, signed into law by President Bush last November, air carrier participation in APIS is mandatory under a three-phase plan.

 Immigration and Naturalization Service. NAILS (National Automated Immigration Lookout System) is the primary INS database and contains records of persons "of interest" to INS and law enforcement agencies, as well as biographical and case data. NAILS records are transferred electronically to the U.S. Customs Service and thus are also available to IBIS users. NAILS also contains the State Department's classified TIPOFF list, some leads from other agencies and data from INS' Central Index System which maintains data on all aliens who have been assigned alien-numbers or "A-numbers" (identifiers similar to social security numbers for U.S. citizens).

Other INS systems exist and present significant issues. The first is IDENT, an automated fingerprint detection system that is generally only used during secondary inspections and that does not interact in any way with the FBI's fingerprint system. Focus and inclusion are major issues: recent reports indicate that INS actively entered into IDENT less than two-thirds of aliens who were apprehended along the U.S.-Mexican border and the use of the system by INS personnel is inconsistent and ineffective. The second system is NIIS (Non-Immigrant Information System) that was intended to track the arrivals and departures of non-immigrant foreign nationals. According to the Office of the Inspector General, NIIS does not contain reliable data by virtue of missing and/or improperly entered records. Third is RIPS (Record of Intercepted Passenger System), which is another non-integrated INS system used at stand-alone PCs at ports of entry to collect information on aliens denied entry into the United States at a particular port. And lastly, SEVIS (Student Exchange Visitor Information System), formerly known as CIPRIS, which is a planned system, not yet implemented, will include information on foreign national students studying in the United States.

Federal Bureau of Investigation. The NCIC (the National Crime Information Center) was introduced in 1967, most recently upgraded in 2000, and is perhaps the world's most famous "watch list." It is also a name check-based system with no analytical component. Local and state law enforcement have access to NCIC, and the State Department has also recently been granted some access to FBI criminal history records for use as background checks on visa applicants.

As shown in Figure 1, the path of information sharing through the systems described previously is complex, fragmentary and all-too-often manual. According to recent estimates there are in excess of 10 terrorist watch lists and more than 50 databases containing terrorist-related information in the federal government.  According to the Washington Post, the Office of Homeland Security has spent months identifying such databases.

System Shortcomings

There are substantive and pervasive shortcomings in our immigration system, all relating to the role of information in the process. From a system perspective, the algorithm is fatally flawed in that permission to enter is denied only in the remote event that one is a known terrorist and has been foolish enough to apply in true name. From an information management perspective, relevant data is badly fragmented among various federal agencies while information sharing with foreign governments, as well as state and local governments, is non-existent. From a knowledge management perspective there are few analytical tools to assess structured data and little ability to access and assess non-structured data. And from a case management perspective, there is no oversight of individuals admitted. In sum, effective information assurance as well as knowledge development and application are lacking; and the net result is a system that does not protect the United States from a recognizable and significant threat, which will not be remedied by reorganizations of the INS. We begin our examination with a consideration of the substantive shortcomings in the immigration system from both a systems and information perspective.

The first and most pervasive problem is the flawed decision-making process. Today, visas are granted based on the absence of derogatory information indicating ties to terrorist activities and not on the existence of positive information to confirm identity and to determine that admission would be in the interests of the United States. Our watch list check is simply a check for one specific item of derogatory information. Positive information, on the other hand, would include the same type and extent of data compiled over time that is required, for example, for granting credit as well as the logical validation of that data by software systems to confirm the applicant's true identity and activities.

Some metrics on the scope of this problem are instructive. There are more than 50 categories of non-immigrant visas, ranging from documents for ships' crews to diplomats. In the year 2000 the Department of State considered some 10 million visa applications at some 200 consular offices. The Department issued 7.1 million visas. In many countries, even where terrorism is a concern, the rejection rate is relatively low for example it was only 12% in Saudi Arabia. The fact that all of the 19 hijackers held the most common visas tourist, student or business illustrates both the ease of acquiring visas and the current difficulty of identifying the few dangerous individuals from the millions of legitimate foreign arrivals. Moreover, and as a further measure of the threat picture, we must note that an even larger number of arrivals, some 17 million per year, come from 29 countries where no visa is required, such as Canada, England and France.

Beyond establishing a complete information profile on applicants, we must also insure that specific admissions are in the general national security interests of the United States. This translates to a finding that a visa applicant will not acquire technology, information or skills that could be used against the United States. At an obvious level, it is clear that allowing commercial jet pilot training, not sponsored by a foreign government or airline, is ill-advised. But the problem is much larger given the list of technologies that have been identified by the U.S. government as sensitive to our national security, including nuclear and missile technology, aircraft propulsion and information security. Fortunately, the White House recently announced the formation of a new body the Interagency Panel on Advanced Science Security (IPASS) that will screen student applicants for such study. According to James Griffin of the White House Office of Science and Technology Policy, this initiative is a critical element of our security.

The second and related problem is failure to achieve the needed levels of information assurance through effective data integration and data mining (Figure 1). Initially this is a problem of data fragmentation and quality within the federal government. Secondarily it is the serious shortcomings in data sharing between the U.S. government and other holders of relevant information, including commercial entities, state and local governments and foreign governments. Lastly it is the absence of analytical tools to allow the most effective data mining of both structured and unstructured information repositories.

In addition to data fragmentation, there is the issue of data completeness and quality. Even when data is entered into computer databases, it is not always possible to cross-check the data. Different alphabets, languages and spelling variations complicate the matter. Since the data is matched letter for letter, it must appear in exactly the same form every time it is entered, or else it will be missed in a cross-check. This becomes especially difficult when the cross-checking is over several non-integrated systems.

Beyond these issues within the federal government, one of the biggest gaps in our national knowledge base, perhaps even the biggest gap, is the current lack of information sharing between the various federal agencies and the other holders of relevant data, including state and local agencies, foreign governments and commercial entities. The barriers to better information sharing include cultural traditions of agencies keeping tight guards on their own information, legal impediments and, of course, technological incompatibilities with information exchange. By way of example, until very recently, the INS did not share with the FBI's national crime database the names of more than 300,000 illegal immigrants who have been ordered deported but have absconded. State and local governments had neither the knowledge necessary to identify such individuals in the course of general police work nor to assist in apprehension if information should later identify them as posing a terrorist risk. Similarly, the airlines do not automatically know whether an individual from a watch list has purchased an airline ticket.

But the collection of information, however optimized, is not the end state. True data mining includes the ability to analyze that data and to generate statistically and logically valid judgments. Data alone, even complete data, does not drive the real world of commerce; analysis does. This simple observation compels that we apply these commercial practices here.

The third problem is fraud in the issuance process, which has several aspects including most significantly the easy availability of false documentation. Even if an individual's name is on the watch list of terrorists, those lists are rendered useless if that individual can obtain a foreign passport or other foreign identity documents in a new, false name. Between credible forgeries and the theft of millions of blank passports from thousands of embassies around the world, the acquisition of false documentation requires only a few hundred dollars. And, since there is minimal ability to validate the authenticity of a foreign passport in real time, a properly completed but stolen passport provides carte blanche to enter the United States.

In fact, the problem of false documentation is endemic both overseas and in the United States and is very difficult to detect. Entirely counterfeit documents may be detected, although government printers and the counterfeiters are in a technology race, and the majority probably escape detection. Similarly, altered stolen documents may be detected but many escape notice. However, the most critical problem is the theft of blank official identity documents including passports a problem of immense but uncertain proportions. There is little likelihood that U.S. immigration officials will identify such documents because there is only very limited, voluntary reporting by foreign governments and there is little focus on the problem by U.S. officials.

Another way to evade the watch lists with a false identity includes the use of criminal organizations established for the purpose of defrauding the visa system. They may establish sham companies to justify business travel to the United States or establish religious facilities that are little more than fronts for enabling foreign nationals to enter this country with a religious visa. Still others have been known to bribe or otherwise defraud the education system in order to enter the United States with a student visa. Currently, little is done to track much less address these types of fraud.

There can also be corruption in the issuance of visas, and the proliferation of special-purpose visas that require little in the way of documentation other than money is also an avenue for abuse. Finally, there is the 245(i) provision that allows current visa holders, even if they are out-of-status, to pay a $1,000 fee and gain a current, valid status without leaving the United States and reapplying for a visa with a new background check. It is estimated that one million illegal aliens may take advantage of this provision that has been extended to November 2002 by legislation (H.R. 1885).

The critical element to combat fraud is biometric identification. Although a full discussion of this issue is beyond the scope of this paper, it is important to note that there are simple, low-cost but secure solutions.

Our fourth and final problem area is case management from the time an application is made overseas for a non-immigrant visa to the time of presence in the United States pursuant to the visa terms. Some problems can be traced to personnel issues including understaffing, overwork, lack of training and reliance on contractors who approach their tasks as mechanical exercises. There are, for example, over 20 categories of non-immigrant visas, each with separate requirements, and it is all too easy for a consular official to miss a requirement. Others relate to the fact that the determination of an applicant's bona fides is largely dependent upon a consular official's judgment as to the veracity of the applicant's answers to a few interview questions, the applicant's supporting documents and the consular official's own intuition. Since consular offices are usually required to interview many applicants daily, this part of the process may be rushed. At best, this can be a difficult, tedious and exhausting task, not conducive to detailed information analysis.

However, the most critical issue in case management is that there is no mechanism to track foreign nationals who legally enter this country and thus no means to identify those who subsequently violate the terms of their visas or who are subsequently determined to pose a national security threat. How is that possible? The answer is that the I-94 system is non-functional for a majority of arrivals. More specifically, the INS requires the completion of a two-part form (I-94) upon arrival and departure and, although the INS has had automation efforts underway since 1995, the process remains a manual one today (except for a prototype automation effort involving one airline and four airports). I-94 cards are manually distributed, completed, collected and then passed to a contractor for manual data entry.

But the process is much worse than this suggests collection, completion and entry are so haphazard that no reliability can be attributed to the resulting database. Indeed, it is estimated that the INS only collects about 10% of arrival and departure data on foreign nationals. Moreover, the data collected is only name and address; there is no collection of visa data, biometric data or planned U.S. presence and activities. As a result, the INS is unable even to characterize the overstay populations (countries of origin, workplaces, etc.), much less identify any individual overstay.

However, this situation is not entirely the problem of the INS. In 1996 Congress enacted the Illegal Immigration Reform and Immigrant Responsibility Act (IIRIRA). Section 110 directed the INS to establish an automated entry and exit control system as discussed here. However, in short order, special interests, including the travel industry, Canada and certain other countries, and many members of Congress, began to oppose in the strongest terms any implementation largely on economic impact and cost grounds. As a result, Congress enacted the INS Data Management and Improvement Act of 2000 that eviscerated Section 110 by prohibiting any solution that required the collection of additional information or documentation. The world of immigration management continues to exist in this bureaucratic limbo today a mandate, inadequate IT and wholly insufficient data.

There are effective answers to the problem of entry-exit control. For example, Australia has created an electronic visa incorporated into airline tickets that substantially reduces airline administrative costs and insures effective data transmission to the government. In addition, the current "trusted traveler" program in experimental use in dedicated road lanes at the U.S.-Mexican border could be expanded. Here, regular travelers are pre-approved and given secure documents that allow automated crossing much like the EZ Pass modules in use in some American states. The benefits are twofold. First, a significant proportion of crossings are by frequent travelers and thus the initial clearance is leveraged into multiple secure clearances without additional cost or delay. Second, it allows scarce resources to be more effectively deployed against those travelers posing a potentially greater risk.

However, there are additional problems to address even if the I-94 system is re-engineered. Specifically, there is no process to ensure that individuals are in compliance with the terms of their specific visa or should be granted a change in visa status. The most notorious problem is presented by the I-20 student visa as evidenced by the fact that at least two of the 9/11 hijackers and one of the terrorists involved in the 1993 WTC bombing were present on student visas but had never attended their scheduled school. How is this possible? The simple answer is that there is no system to validate that a student's entry is consistent with U.S. national security or that the student in fact attends the designated school. The more complex answer is that although the INS has had an interest in at least checking enrollment, such plans have died in a firestorm of criticism by educational institutions that view foreign students as a major source of revenue and object to any inquiry or data sharing on the grounds of student privacy. Ostensibly, the Student Exchange Visitor Information System (SEVIS) will help in this regard, but completion before a congressionally mandated deadline of 2003 is very doubtful.

The lack of case management has the most serious impact on our homeland security. Quite simply the United States can neither identify visa violators nor seek their removal from the country. Indeed, estimates of the number of illegal overstays in this country range from 500,000 to several million; no one knows. Again, there are relatively simple management solutions available from current technology. One such solution starts with a digital visa in the form of a smart card and requires (at a minimum) an automated telephone or Web-based contact on a regular basis. Indeed, the smart card visa could be considered an end-to-end solution where magnetically coupled readers at points of entry and exit would automate the I-94 process and at other locations manage access to sensitive areas and facilities.


What should we conclude from these problem areas that have been detailed? The clear answer is the critical need for new information tools and processes to support a new focus for our immigration system as follows:

    1. The focus itself the requirement for positive information to support entry, as well as a requirement for alien tracking within the United States

    2. Full and seamless data integration (sharing) among the various federal systems vastly more than the current limited interfaces

    3. Access to commercial and other non-federal government sources of information where we are most likely to see information indicative or suggestive of terrorist affiliations

    4. Deployment of an effective biometric identification system for all alien visitors a step that may happen with the passage of recent legislation

    5. Use of knowledge development tools to mine most effectively the new data sources and identify visa applicants that warrant admission

Together, these steps are a mechanism by which visa applicants can be identified and profiled as to whether admission is in the national security interests of the United States. Together, these steps will enable the United States to identify more certainly known terrorists, those individuals using newly-minted identities or fraudulent documentation and those whose profile reasonably suggests a security risk. Such a system is possible, as we shall demonstrate in Part II of this discussion.

How to Order

ASIST Home Page

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2002, American Society for Information Science and Technology