Bulletin, August/September 2006

Metadata: Practical, Painless, Profitable

by Christine Connors

Christine Connors has an MSLIS from Simmons. Intending to go the route of the reference librarian, she found herself instead in IT in charge of taxonomies metadata and website management. Reach her at Christine<at>sw2sw.net

Sometimes I’m Pinky, sometimes Brain. My inner “Brain,” eager to graft librarian DNA everywhere possible, is ecstatic that thousands have been bitten by the tagging trend. Slowly but surely metadata is gaining the appreciation it deserves. Teenagers and geeks aren’t the only ones who’ve been bitten. Executives and judges are coming around to the value of metadata too. 

Taking Over the World!
Jessica Vascarello, in her January 24, 2006 Wall Street Journal article, called tagging “The Next Big Thing in Searching.” IDC published a whitepaper in May 2006 calling the management of metadata “essential for success” (“Managing Metadata in the Coherent Information Environment: Essential for Success in the 21st-Century Enterprise” IDC Executive Brief, May 2006, www.idc.com). Judge David Waxse, of the US District Court for Kansas, in September 2005 determined in Williams v Sprint that electronic documents should be produced with their metadata intact, unless the parties agree to the metadata being removed or the producing party requests a protective order (www.ksd.uscourts.gov/opinions/032200JWLDJW-3333.pdf). This decision was heavily influenced by the Sedona Principles, produced by a group of legal experts concerned with intellectual property rights and other issues (http://www.thesedonaconference.org/).

Practical applications of metadata can improve the bottom line. We are beginning to see evidence that can make the most skeptical manager supportive of quality metadata. Soft dollar savings are easier to calculate, though harder to sell. Cost avoidance is good, but not nearly as interesting to management as hard dollar savings. Allow me to put forth some practical ideas for consideration.

Good metadata, including elements for rights and security management, can help a company avoid millions of dollars in intellectual property infringement fines. While the judge in the Williams v Sprint case did not choose to assess fines, the legal fees alone would be worth saving. Other infamous cases do come to mind, including the Tasini vs. New York Times (http://laws.findlaw.com/us/000/00-201.html) and American Geophysical Union v. Texaco (American Geophysical Union v. Texaco, 37 F.3d 882 (2d Cir. 1994), http://fairuse.stanford.edu/primary_materials/cases/texaco/settlement.html).

Content management and search are applications where metadata are critical. Content re-use is becoming important for companies as they attempt to streamline operations. Reducing duplicates is also desirable – better server management means lower costs and fewer versions to review to find the right one saves employee time. Reducing near duplicates – documents that are only slightly different – is of even greater importance. Reading multiple copies to determine which has the precise data you need or which is the authentic, legal version is time-consuming. Requiring authors to spend a few extra minutes applying correct metadata or employing catalogers in the content publishing workflow increases the relevance of an information object. Employee time is not wasted in searching since time and material costs are saved when employees don’t have to recreate a document. 

Two Examples That Demonstrate Strong Positive ROI for Metadata

  1. We use metadata to free wasted employee time to accomplish our goals.
    Let’s look at a time-wasting scenario. Here are the employee constants: 

      The organization’s burdened rate is $100,000 a year for a full time employee (FTE).
      FTEs are paid for 2080 hours a year (40 hours/week for 52 weeks)

  2. Our imaginary organization has 25,000 employees
    If we save each employee just 15 minutes per week – less than 1% of their time – the savings to the organization is $15.6 million a year. Susan Feldman of IDC has estimated the wasted time to be much higher, which is no surprise to many of us, but is a hard sell to management (see “The High Cost of Not Finding Information” by Susan Feldman of IDC, in the March 2004 issue of KMWorld, www.kmworld.com/Articles/ReadArticle.aspx?ArticleID=9534).

    I have worked on enterprise search applications and can vouch for an average of 30% of searches being abandoned per month. How much time does that represent? Moreover, how many frustrated attempts at finding information lead to recreating existing data? When I worked for the libraries at Raytheon, a Six Sigma study done by my colleagues in their Integrated Defense Systems (IDS) Research Library determined that the average savings per object borrowed or acquired by a librarian was $2200. Savings estimates included lost employee time, time to research and recreate the information, and costs for purchasing information.

  3. We use metadata to free up space on computer networks to install our critical data.

    If information is easier to find, then its rate of duplication will be lower. If a 20% reduction can be had simply by not re-creating information, what further impact is achieved by not having to store it?

      Let us again consider a scenario: 
      An organization is storing 100 terabytes (TB) of data.
      Twenty percent (20%) is duplicates.

    A mid-range Sun server holds 1.7 TB and costs $31,000.
    If fewer servers can be purchased or some returned, then the savings would be approximately $364,000 ((20/1.7) x 31k=364,706)

    Further, we could also consider a tiered storage solution. As data becomes less active and/or less valuable, it can be moved to lower cost storage systems. The moves can be done intelligently, based on the metadata such as date of creation, date last accessed, security rules, records management rules or retention schedules. 

    Let’s assume the following:

      We start with 100 TB of data on Tier 1 – the top of the line storage.

      Tier 1 costs $3000/TB/Month; Tier 2, $1500/TB/Month; Tier 3, $800/TB/Month

      Suppose that analysis determines that we could usefully reorganize the data such that:

      20% is on Tier 1, 30% is on Tier 2, and 50% is on Tier 3

    Under our original configuration costs are $300,000 per month, while the tiered storage costs would be $145,000 per month, for a savings of $155,000. Imagine the savings going forward if the amount of data increases at a rate of 60% per year!

Are You Pondering What I’m Pondering?
Try social tagging behind the firewall. It gives employees the means to label content in personally meaningful ways. It helps secure data against security leaks or the competitive intelligence gathering possible when links are posted on public sites. It can also help to boost the usage of existing data silos inside the organization, such as directories, best practices or lessons-learned databases. 

Incrementally add value with metadata-based services. Start small, aim at high-value content. Add Suggested Sites or Suggested People to search engine results pages. Get forward thinking website developers to use RDF/A (www.w3.org/2001/sw/BestPractices/HTML/2006-01-24-rdfa-primer) or Microformats (http://microformats.org/) to tag sections of their content. [RDF/A is a collection of attributes for layering RDF (Resource Description Format) on XML languages. It is a World Wide Web Consortium (W3C) internal working document.] Get hold of an LDAP (Lightweight Directory Access Protocol) feed and convert it to FOAF. FOAF stands for Friend-of-a-Friend and describes people, the links between them and the things they create and do (www.foaf-project.org). LDAP already contains great contact information, so take it to the next level. Discover the appropriate schemas for your data, add your terms and start small. Find the supporters in your organization. 

Above all, keep metrics! Vital research continues to come from academic organizations. Executives however, want to benchmark against other organizations in their industry or of their size. We need to collectively contribute to a body of knowledge on the value of metadata to support our business cases. Let’s work together to evolve – and take over the world!