Since the 1960s citation counts have been the standard for judging scholarly contributions and status, but growing awareness of the strategy’s limitations should lead to acceptance of alternative metrics. Citation analysis drawbacks include lack of timeliness, self citation and citations that are superfluous, negative and incomplete, and traditional counts reflect only a small fraction of actual usage. A better categorization of scholarly impact would cover usage, captures, mentions and social media in addition to citations. Metrics should include mentions in blogs and other nontraditional formats, open review forums, electronic book downloads, library circulation counts, bookmarks, tweets and more. Such alternative metrics provide a more complete view of peer response to scholarly writings and better demonstrate the relative position of a research grant applicant and potential for influential work. Altmetrics are readily available, and their value for evaluating scholarly work should be recognized.
impact of scholarly output
Bulletin, April/May 2013
Are Alternative Metrics Still Alternative?
by Mike Buschman and Andrea Michalek
Citation counts have long been the tried and true measure of academic research usage and impact. Specifically, published articles in prominent journals citing other published articles in other prominent journals equate to prestige and tenure. This scheme for determining impact was developed in the 1960s, and while so much else about collecting and disseminating information has changed since that time, the citation count mechanism continues to dominate the way research is evaluated. Yet, there are many well-known problems with this system.
The most obvious problem is that, as the pace of scholarly communication and science advancement has increased, citation analysis is a lagging indication of prestige. Brody and Harnad  found that it takes five years for a paper in physics to receive half of the cited-by references that the article will ever acquire. Another issue is self-citation. While there are often very good reasons authors cite themselves in an article, it is also a practice that has been criticized as a tool to increase citation counts and thus potentially artificially inflate prestige and influence. Another is the known practice of publications pressuring academics to pad their papers with superfluous citations. This pressure is applied for a variety of reasons, the most nefarious being that publishers can elevate the status of their own journals with increased citations. Then there is the problem of negative citations. Just because a paper is cited does not mean that it is cited positively; yet, there is no distinction between positive and negative references when evaluating citations counts.
A further problem with citation analysis is the acknowledgement that not all influences are cited in an article, thus leaving the whole measure incomplete. In fact, MacRoberts and MacRoberts reported in their 2010 study that only 30% of influences are typically cited . There are several reasons for these omissions, including authors not citing informal influence or citing a review paper and hence not citing the original work. Citations are a victim of the Matthew effect where the rich get richer. That is, for a variety of reasons, authors tend to cite well-cited material from well-cited journals and ignore other valid work.
In an analysis by PLOS, citation counts only represent a small fraction of how a paper is used; in fact, citation counts represent less than 1% of usage for an article. Therefore, an article reaches many people but citation counts do not begin to capture the extent of that reach.
Figure 1: Chart showing relative reuse metrics for PLOS papers . Used with permission.
Five Categories of Impact
It is not surprising that a metric created in the pre-digital world of the 1960s misses a lot of impact and usage. That failure does not make citation analysis inherently bad; it is still a useful tool. But, it does make it inadequate for a complete picture of the usage and impact both of research articles and other research artifacts. To create that complete picture, Plum Analytics studied all of the ways that research artifacts, from articles to videos and everything in between, are made available and used. That research led to the following categorization of impact (Table 1).
|Mentions|| Blog Posts
|Social Media|| Tweets
|Citations||Citations Citation Count|| Pubmed
Table 1. Categorization of impact of scholarly research
These are example sources only; the full list of metrics supported by Plum Analytics can be found at www.plumanalytics.com/metrics.html.
By capturing valuable metrics in all of these categories and creating a more complete representation of research and researchers, Plum is able to provide a more holistic picture than traditional citation analysis. While many will claim that these newer metrics are “alternative,” it is our position that all these metrics are anything but alternative. They are readily available, abundant and essential.
Era of Increased Competition
The world of scholarly research is getting more and more competitive. Research budgets are tightening, and funding sources are not meeting the increased demand. Figure 2 shows how applications for NIH grants have been steadily increasing and success rates have dropped by over 43% in the last 10 years.
Figure 2. NIH Research Grants: Applications, Awards and Success Rates 
When applying for grants, researchers need to show reasons why their use of the award will provide the greatest impact possible. Currently researchers rely on classic citation analysis for relating impact – generally for work that is several years old in order to collect the maximum number of cited-by references. Their latest work, however, is often most relevant to the grant application at hand and may not have had the time to acquire the requisite citation counts. Also there is the possibility that highly cited work that is several years old may have already spurred the most interesting new related research, hurting the chances for award success (for good reason). This paradox exposes another problem of relying only on traditional impact metrics for this purpose.
If researchers can show that their recent research is generating a lot of interaction in the scholarly community, that information can provide an advantage in this tight funding environment. A large number of downloads, views, plays and so forth can show not only early interaction with research, but also how open and accessible the scientists are making their research – a more and more important indicator for funding bodies. In addition if peers are following, saving and bookmarking a researcher’s output, it may portend future citations. Early adopters of these newer impact metrics can reap a noticeable advantage in standing out before the full range of impact metrics becomes universal.
Metrics for Funders
The funding bodies themselves can gain new understanding and better measure their own success with more timely and holistic metrics about the research they fund. As Bill Gates wrote in his 2013 annual letter, “Given how tight budgets are around the world, governments are rightfully demanding effectiveness in the programs they pay for. To address these demands, we need better measurement tools to determine which approaches work and which do not.”  As the success rates for grant funding go down, funding bodies will need to make sure they are making the right choices and are able to defend their decisions over time.
Negative Results and Other Forms of Research Output
Most researchers agree that both positive and negative results help advance science. Not sharing negative results can lead to unnecessary duplication and incomplete understanding of positive results. The current promotion system, however, discourages publishing research with negative results. Fanelli  finds that articles showing negative results have declined in the literature, and positive articles have grown 22% across all disciplines and geographies in the last 20 years. As a reaction, new journals have cropped up such as Journal of Negative Results in Biomedicine (http://jnrbm.com), All Results Journal (www.arjournals.com/ojs/) and Journal of Pharmaceutical Negative Results (http://pnrjournal.com). These journals, however, by their very nature, make it difficult to discover these findings being grouped by negativity rather than by the scientific niche in general. Many scientists are using blogs to show more details of their research including negative results. Being able to measure the impact of this output in non-traditional venues and formats will encourage scientists to share more details of their research.
Besides negative results, blogs are being used in other ways to communicate research output. Descriptions of methods and settings are increasingly being posted. A good example can be found on Nicholas Pyenson’s blog post (http://nmnh.typepad.com/pyenson_lab/nature-rorquals-organ.html) about a recent article published in Nature. It contains additional content that makes the study more accessible to lay people, as well as discussing what was not included in the study. Maps and photographs of archeological and paleontological sites as well as other visual artifacts from field study can be widely found these days. Humanities researchers are creating open-review manuscripts using WordPress with the CommentPress Core Plug-in so that the community can comment as the work is coming together and, in the process, create something altogether new, where the comments become preserved peer-review and can themselves lead to new avenues of research. Writing History in the Digital World (http://writinghistory.trincoll.edu) and Subjecting History (http://subjectinghistory.org) are two examples of this in action.
With so much research relying on large data coming from sources such as sensor networks, telescopes, instruments, surveys and simulations, the datasets themselves are often the new “output.” Being able to measure interaction and future science based on a researcher’s data set incentivizes scientists to share their datasets. Sites such as figshare (http://fishare.com), Dryad (http://datadryad.org) and DataCite (http://datacite.org ) now allow for datasets to be better hosted, shared and found.
As impact measurements become more accepted and researchers receive credit for these other forms of research output, researchers will be freer to utilize the right scholarly communications format for their output without having to conform to a publication model with the limitations of a print journal.
Challenges of Disciplines Where Journal Articles Do Not Apply
Classic citation analysis has been applied most readily to those disciplines where journal articles have been the dominant format of research output. In disciplines where books and book chapters prevail, however, it is more difficult to impose the cited-by reference model. And while Elsevier’s Scopus and Web of Science from Thomson Reuters have recently added some book and data citation sets, citation analysis does not properly offer defensible impact metrics for these disciplines.
Metrics that take into account usage, such as library holdings, library circulation, course readings and eBook downloads, add a layer of impact that is more meaningful for these disciplines. Another category of impact comes with reviews – published and informal – as well as comments and other mentions.
Web Scale Is a Must
It is easy to see that a system that can support collecting, analyzing and calculating the plethora of metrics for the world’s scholarly research output requires web-scale architecture. There are millions of researchers with hundreds of millions of pieces of research output. A core technology challenge in this space is combining metrics for the same research artifact when it appears in many separate digital locations. For example, the same article can exist in a preprint repository, on the final publisher’s website or in open access repositories as well as being directly downloadable from a researcher’s homepage. A full representation of the use of this article should capture and algorithmically combine metrics from each of these locations. The problem gets even more complicated since in order to capture the sharing of links to the article, it is necessary to determine all of the URLs that might take a user to that article. There can be multiple URLs that are valid on each website that hosts the article. The process of identifying all of these disparate sources of the same article is called identity Resolution. Although linking articles together by well-known identifiers such as DOI will get partial coverage, this method is insufficient for a full identity resolution solution.
People Not Papers
The greatest opportunity for applying these new metrics is when we move beyond just tracking article-level metrics for a particular artifact and on to associating all research outputs with the person that created them. We can then underlay the metrics with the social graph of who is influencing whom and in what ways. We can further examine and compare sets of cohorts, whether such a set is a particular lab, institution or set of researchers. Of course this data is valuable to administrators seeking to get a picture about how groups compare to one another and to their peer groups in other institutions. It is also extremely useful for giving researchers the context of their impact with groups meaningful to them. For example, some researchers might be more interested in how they stack up against others in their discipline, their institution as a whole or other researchers at a similar career stage.
In 1903, with only 150 miles of paved roads in the entire country, Horatio Nelson Jackson was the first to drive a car across the United States. He did not wait for highways to be built. In the same way the popularity of the car created the demand for better highways, the availability of more complete impact metrics for research will surely change the current system. However, even before the system as a whole changes, new metrics are already available to those conducting, supporting or funding research today.
There is a temptation to see this new paradigm for measuring impact as a passing fad: interesting, but too early, or simply not serious with regard to scientific research. The question arises: Does the process for granting tenure need to be changed in order for these measures to be accepted? A better question is why a demonstrably sub-standard process whose faults and drawbacks are so well known has persisted for so long. The easy answer is that it is all we have had for five decades, but the truth is that decision-makers want quantifiable data for making decisions. Promotion, hiring and grant funding processes will continue to evolve, but those changes will not be prerequisites for including more holistic measurements.
Resources Mentioned in this Article
 Brody, T. & Harnad, S. (2005). Earlier web usage statistics as predictors of later citation impact. Retrieved March 5, 2013, from http://arxiv.org/ftp/cs/papers/0503/0503020.pdf
 MacRoberts, M. H., & MacRoberts, B. R. (2010). Problems of citation analysis: A study of uncited and seldom-cited influences. Journal of the American Society for Information Science & Technology, 61(1), 1-12
 Public Library of Science. (November 2012). Citations are only a small fraction of how a paper is reused [slide]. Article Level Metrics: Analyzing Value in the Scholarly Content. Presentation by Richard Cave at the Charleston Conference 2012, Charleston, SC, November 7-10, 2012.
 National Institutes of Health. (2012). Research project grants: Applications, awards, and success rates [chart]. Research Portfolio Online Reporting Tools (RePORTS). Retrieved March 5, 2013, from http://report.nih.gov/NIHDatabook/Charts/Default.aspx?showm=Y&chartId=20&catId=2
 Gates, B. (2013). 2013 Annual Letter from Bill Gates. Gates Foundation. Retrieved March 6, 2013, from http://annualletter.gatesfoundation.org/
 Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891–904. Retrieved March 6, 2013, from http://eloquentscience.com/wp-content/uploads/2012/02/Fanelli12-NegativeResults.pdf
Mike Buschman and Andrea Michalek are the co-founders of Plum Analytics. Mike has worked at Microsoft as a librarian and program manager for Microsoft Academic and Book Search and most recently was the director of product management for ProQuest’s Summon Discovery Service. Mike lives in Seattle and can be reached at mike<at>plumanalytics.com.
Andrea is a serial entrepreneur with a focus on search and information retrieval products. She ran a technology consulting firm and, prior to founding Plum Analytics, was the director of technology for ProQuest’s Summon Discovery Service. Andrea lives in Philadelphia and can be reached at andrea<at>plumanalytics.com.
Articles in this Issue
Are Alternative Metrics Still Alternative?