|Annual Meeting Contributed Papers 2009||START Conference Manager|
In this paper we outline an algorithm for disambiguating author names of publications via deterministic clustering based on well-defined similarity measures between publications in which their names appear as authors. The algorithm is designed to be used for constructing a collaboration network, i.e., a graph of author nodes and co-author links. In this context, the goal is to produce a co-authorship graph with network characteristics that are close to those of the “true” collaboration network, so that meaningful network metrics can be determined from it. The algorithm we present here is fairly easily comprehended as it does not depend on any black-box AI techniques. This is important in the context of policy studies, in which we successfully applied it, as it enables policy makers to judge the soundness of the methodology with considerable confidence. It is also fast, making it possible to run large-scale analyses (here, in the order of a hundred thousand publications and the order of a million names to be disambiguated) on a moderately sized desktop computer within a few days.
|START Conference Manager (V2.54.6)|