Please tell us what you think of this issue!  Feedback

Bulletin, December 2008/January 2009


The Development and Usage of the Greenstone Digital Library Software

by Ian H. Witten

Ian H. Witten is in the Department of Computer Science at the University of Waikato, Hamilton, New Zealand. He can be reached by phone at +64 7 838-4246 or by email at ihw<at>cs.waikato.ac.nz

Greenstone is a suite of software for building and distributing digital library collections. It is not a digital library but a tool for building digital libraries. It provides a way of organizing information and publishing it on the Internet (or on removable media) in the form of a fully searchable, metadata-driven collection. It has been used to create fully searchable and browsable collections of all kinds of documents, books, photographs, newspaper images, metadata such as library catalogues of MARC records, audio (MP3 files) and video – as well as mixed collections. Most are distributed on the web, but several collections of humanitarian information have been produced on CD-ROM for distribution in developing countries. Greenstone has been used for collections of a small handful of documents up to collections of several million newspaper articles (20 GB of raw text, 2 billion words, 60 million unique terms).

The New Zealand Digital Library project began 13 years ago. Two years later the name Greenstone was adopted for the software that was produced. It has been developed and distributed in cooperation with UNESCO and the Human Info NGO in Belgium. It is distributed under the GNU General Public License and runs on all popular operating systems. For more details see the book How to Build a Digital Library [1] and the website www.greenstone.org. Today Greenstone’s user base hails from 70 countries, and the reader’s interface has been translated into between 50 and 60 languages. Downloads from SourceForge exceed 200 per day. 

This article recounts how Greenstone’s international, humanitarian focus arose from a few essentially serendipitous events involving a local collection in New Zealand’s Māori language, a chance contact with a small humanitarian organization and a formal link with UNESCO. In retrospect, these events conspired to set the project’s direction. We then review the immense importance of digital libraries in developing countries and the special requirements imposed by the conditions that prevail there. Finally we discuss efforts to establish regional support organizations for Greenstone in India and Africa.

First, however, let us begin by summarizing salient aspects of this open source software package and its user population.

Platforms. Greenstone runs on all popular operating systems: all Windows versions, Linux, Mac – even the iPod. It is very easy to install. For the default Windows installation absolutely no configuration is necessary, and end users routinely install Greenstone on their personal laptops or workstations. Institutional users run it on their main web server, where it interoperates with standard web server software such as Apache.

Table 1

User base. As with most open source projects, the user base for Greenstone is unknown. It is distributed on SourceForge, a leading distribution center for open source software. Table 1 gives relevant download statistics; it also shows the number of people who contribute to the Greenstone mailing lists and the volume of traffic. The website www.greenstone.org points to a representative selection of examples of public Greenstone collections. The institutions they belong to are shown in Table 2. A survey of Greenstone users was undertaken in 2004-2005 [2].

Educational usage. Greenstone forms a popular basis for practical work in U.S. library and information science training programs, and several leading institutions employ it for this purpose. Indeed the book How to Build a Digital Library, which contains extensive material on Greenstone, is the most frequently assigned text in U.S. digital library courses [3].

Interfaces. Greenstone has separate interactive interfaces for readers and librarians. End users access the digital library through the reader interface, which operates within a web browser. The librarian interface is a Java-based graphical user interface (also available as an applet) that makes it easy to gather material for a collection (downloading it from the web where necessary), enrich it by adding metadata, design the searching and browsing facilities that the collection will offer the user, and build and serve the collection. There is also a separate editor for creating new metadata sets and adding elements to them.

Table 2

Standards. Greenstone is strongly standards-compliant. It incorporates a server that can serve any collection over the Open Archives Protocol for Metadata Harvesting (OAI-PMH), Z39.50 and SRW (Search/Retrieve via the Web). Greenstone can harvest documents over any of these protocols and include them in a collection. Collections can be exported to METS (MEtadata harvesting and Transmission Standard) in the Greenstone METS Profile, approved by the METS editorial board, and Greenstone can ingest documents in METS form. Any collection can be exported to DSpace [4] ready for DSpace’s batch import program, and any DSpace collection can be imported into Greenstone [5]. There is also a close connection with Fedora [6], and a modified form of the librarian interface can be used to build Fedora collections [7].

Ingesting documents. An extensible plugin scheme is used to ingest documents. There are plugins for most common formats of textual documents, listed in Table 3, including PowerPoint and Excel documents. There are also plugins for most image formats and some audio and video formats. There is a generic plugin that can be configured for other multimedia formats such as MPEG and MIDI.

Table 3

Metadata. The librarian interface includes flexible facilities for adding metadata to documents. Where externally prepared metadata is available it can be ingested using plugins. Plugins for about 10 widely used standard metadata formats are listed in Table 3. (There are, in addition, some plugins for non-standard metadata).

Metadata sets. Four predefined metadata sets are provided with the software (Table 3). New metadata sets can be created interactively using Greenstone’s metadata set editor. 

Languages. One of Greenstone’s unique strengths is its multilingual nature. The reader’s interface is available in the 50 languages shown in Table 4, with another 10 in progress. The librarian interface is available in 10 languages, and the full Greenstone documentation (which is extensive) is available in English, French, Spanish, Russian and Kazakh.

Table 4

International, humanitarian focus. Three formative, serendipitous events described in the following paragraphs had a major impact in making Greenstone the system of choice for internationalized collections of indigenous and humanitarian information.

Niupepa: The Māori Newspapers. Early on we embarked on a large collection of Māori-language newspapers (“Niupepa”), sourced from New Zealand’s Turnbull Library [8]. We made an initial demonstration with a full Māori interface and sought funding from the NZ Ministry of Education to continue the work. This activity had two formative effects: we focused from the beginning on multiple-language interfaces and the ability to very quickly build small but fully functional demo collections became a valued feature of Greenstone. The full Niupepa, which was officially launched in March 2002, is still the largest collection of online Māori-language documents. It is extensively used for historical, social, legal and linguistic research, and in a moving ceremony in November 2000 the Māori people presented the Greenstone project with a ceremonial toki (adze) as a gift in recognition of our contributions to indigenous language preservation.

Humanitarian collections. Also in the early days, Human Info NGO sought help for producing fully searchable CD-ROM collections of humanitarian information. These collections were the vision of a Belgian medical doctor who had worked in Africa, witnessed a desperate need for such information in developing countries and hit upon electronic distribution as the solution. Unfortunately he had encountered difficulties in developing appropriate software. To bring Greenstone into line with his needs we had to make our server (and in particular its full-text search engine), which had been developed under Linux, run on Windows machines – including Windows 3.1 and 3.11 because, although by then obsolete, they were still prevalent in developing countries. This task was demanding but largely uninteresting technically: we had to develop expertise in long-forgotten software systems. However, it focused our attention on the need to run on all platforms, not just Linux. 

The first Humanity Development Library CD-ROM was issued in 1998, closely followed by UNESCO’s Sahel point Doc. In the latter all documents, along with the entire interface, help text and full-text search mechanism, are in French, further underscoring the project’s focus on multilingual interfaces and processing non-English documents. The first multilingual collection soon followed: a Spanish/English Biblioteca Virtual de Desastres/Virtual Disaster Collection aimed at South America. We also began to develop interfaces in non-European languages such as Chinese and Russian. To date, about 40 humanitarian CD-ROM collections, listed in Table 5, have been published. We were heavily involved with the first few and then transferred the technology to Human Info’s people in Romania so that they could proceed independently.

Table 5

The UNESCO connection. Human Info introduced us to UNESCO. Although UNESCO supports the idea of producing humanitarian CD-ROMs and distributing them in developing countries, they are really more interested in sustainable development. They stress the value of empowering non-technical people in a more primitive computing milieu than ours to produce and distribute their own digital library collections, following that old Chinese proverb about giving a man a fish versus teaching him to fish. We had transferred our collection-building technology to Human Info, but UNESCO envisaged a completely different proposition: to put the power to build collections into the hands of librarians and other non-IT specialists the world over. (In New Zealand by the way, they say, “Give a man a fish, and he will eat for a day. Teach a man to fish, and he’ll sit in a boat and drink beer for the rest of his life.”)

We began by packaging and documenting our perl scripts, and slowly and painfully came to terms with the fact that operating at this level is anathema for librarians. In 2001 we produced a web-based system for building digital libraries called the Collector [9]. However, this system was never a great success. Today web-based submission to institutional repository systems (including Greenstone collections) is commonplace, but back then we were trying to allow users to design and configure digital libraries over the web as well as populate them. Shortly thereafter we began a Java development that became known as the Greenstone Librarian Interface, which grew over the years into a comprehensive system for designing and building collections [10].

Requirements for Digital Libraries in Developing Countries
Digital libraries are the killer app for computing technology in developing countries. Priorities here include health, agriculture, nutrition, hygiene, sanitation and safe drinking water. Computers are not a priority, but simple, reliable access to targeted information meeting these basic needs certainly is [11]. UNESCO taught us the importance of recognizing the special conditions that prevail throughout the developing world.

Working without the Web. Digital library projects invariably presuppose usage over the web. But Internet access varies widely across the globe. Schools and hospitals in the developing world are poorly connected; even universities often have appalling access by western standards. In this environment, Greenstone’s ability to create CD-ROM-based collections is crucial. However, we had to develop our own installer, for we could not distribute the commercial installer that Human Info and we used to produce these CD-ROMs – and this development was before the days of comprehensive open-source installers.

Software distribution. From the outset, UNESCO’s goal was to lead by example, producing CD-ROMs containing the entire Greenstone software (not just individual collections plus the run-time system, as in Human Info’s products) for use by those without ready access to the Internet. Although we continue to produce them annually, they are more of symbolic than actual significance because they become outdated by new releases of the software that appear on the Internet.

Instructional material. When we and others started to give workshops, tutorials and courses on Greenstone, we adopted a policy of putting all instructional material – PowerPoint slides, exercises, sample files for projects – on a workshop CD-ROM, and we began to include this auxiliary material on the UNESCO distributions too.

Multilingual documentation. UNESCO saw good documentation as crucial. They helped us make the entire Greenstone technology available in Spanish, French and Russian – and, later, in the other two official UNESCO languages, Arabic and Chinese. We already had versions of the interface in these (and many other) languages, but UNESCO wanted everything to be translated – not just the documentation, which was extensive (four substantial manuals) but all the installation instructions, README files, example collections, warning messages from perl scripts, etc. We might have demurred had we realized the extent to which such a massive translation effort would threaten to hobble the potential for future development, and we have since suffered mightily in getting everything – including last-minute interface tweaks – translated for each upcoming CD-ROM release. 

Interface translation. The cumbersome process of maintaining up-to-date translations in the face of continual evolution of the software – which is, of course, to be expected in open source systems – necessitated a scheme for maintaining all language fragments [12]. We chose to use a version control system in order to automatically determine what needs updating. This decision resulted in the Greenstone Translator’s Interface, a web portal where officially registered translators can examine the status of the language interface for which they are responsible and update it. Today the interface has been translated into many languages (see Table 4), most of which have a designated volunteer maintainer.

Training workshops. Training is a bottleneck for widespread adoption of any digital library software. With UNESCO’s encouragement and sponsorship we have worked to enable people to take advantage of digital library technology by running hands-on workshops. Many Greenstone workshops have been given in developing countries, ranging from half a day to six days. Table 6 lists ones given by people closely associated with the project, but there have been many others. 

Table 6

Other training. The United Nations Food and Agricultural Organization (FAO) and UNESCO’s Institute for Information Technology in Education have also produced training material on Greenstone. Furthermore, we have been active in conducting Greenstone tutorials at major digital library conferences – JCDL, ECDL, ICADL, ICDL (on several occasions in each case) – and library conferences such as the LITA, DLF and ALA conferences. The Payson Institute of International Development at Tulane University has run courses that use Greenstone collections as a resource in dozens of locations in Africa (Burkina Faso, Cameroon, Cote d’Ivoire, Democratic Republic of Congo, Ghana, Rwanda, Senegal, Sierra Leone, Togo) and Latin America (Argentina, Bolivia, Colombia, Ecuador, Guatemala).

Regional Support 
Recognizing that devolution is essential for sustainability, we are striving to distribute Greenstone training, maintenance and support by establishing regional support groups. There are around 60 registered volunteer language translators who provide a natural focus for informal local support. User groups for Spanish and French users have existed for some time, and a Greenstone blog for Arabic users (in Arabic) has appeared recently. However, more formal support is needed, especially in developing countries. We have concentrated our efforts so far on South East Asia – in particular India – and on Africa, particularly southern Africa. Figure 1 shows where Greenstone workshops and tutorials have been held in these parts of the world.

Figure 1

South Asia support group. Greenstone has been widely adopted in India, a country that is renowned for its strong library community. One of the earliest training courses was held (in 2003) at the Indian Institute of Sciences, and many other centers emerged and began to arrange locally run training courses as well as ones led by project members from New Zealand. A tutorial at the International Conference on Asian Digital Libraries in Bangalore attracted 200 attendees, and robust interest has been shown at international conferences on digital libraries in New Delhi. 

In April 2006 a Greenstone Support Group for South Asia was launched, centered in Kozhikode, India. It has established its own website and an accompanying electronic discussion and support forum. It is coordinating the development of a network of Greenstone users under the supervision of an advisory committee. It has organized several workshops in the region, most recently an advanced training workshop at the Indian Institute of Sciences in Bangalore, and a further workshop is planned for early 2009 as part of an intensified Greenstone skill-development program in different parts of India. 

The support group is surveying Greenstone usage in India and studying the feasibility of a more comprehensive and sustainable organization that relies less on individual volunteers and will function as a springboard for a broader participative collaboration in South Asia. In the meantime, it has initiated the production of a basic Greenstone handbook for use in schools of library and information science and is making efforts to bring other Asian countries into the network.

An important activity, particularly in this region, concerns the use of local languages. Greenstone interfaces have been established in several Indian languages, including Hindi, Tamil, Malayalam, Marathi and Telegu, with Nepalese and Sinhalese interfaces underway. The support group is also promoting the acquisition and processing of digital library content in local languages.

The South Asia support group is seen as a model for Greenstone organizations in other regions, and although it has not yet reached full self-sustainability it is probably not far off.

Southern Africa support group. In 2005 UNESCO sponsored a study of the feasibility of setting up a Greenstone Support Organization for Africa [13]. This study began with a survey questionnaire that was widely circulated to African professionals, which underlined what we already suspected: Africa is far less developed in terms of basic awareness of the need for digital libraries than India. In order to foster the construction of digital libraries we decided to take a proactive approach. We were fortunate to receive funding from a private foundation, and we began a one-year pilot project. Greenstone User Support in Southern Africa began in mid-2007 coordinated by eIFL.net, a not-for-profit organization that advocates the wide availability of electronic resources by library users in transitional and developing countries.

The project, which focuses primarily on Namibia, Malawi, Zimbabwe and Lesotho and neighboring countries, is just approaching completion. Workshops were run at the University of Namibia Library, which was designated the regional coordination center, and at three national coordination centers: Bunda College Library in Malawi, the National University of Science and Technology Library in Zimbabwe, and in Lesotho where the National University of Lesotho Library and the Lesotho College of Education Library share responsibility. A total of 66 specialists from 10 countries were trained in basic Greenstone use and digital library techniques, with further advanced training for 13 of them from four countries. Each of these workshops has been a learning experience for the organizers and the southern African resource persons, so that there is now a pool of technical and methodological expertise to extend the training effort throughout the region.

The national centers are in the process of developing their own initial digital library applications and organizing basic Greenstone training to support the development of national Greenstone networks in their own and neighboring countries. The regional center has set up a website for the project and is hosting an electronic user support service. This is developing well, with the user discussion list providing a lively forum for technical exchange.

As the pilot project comes to an end, the national centers are being helped to evolve as digital library centers of excellence, and the participating institutions and specialists are further developing their capacities through cooperation and self-help. A survey of potential and actual Greenstone users is being organized. The next step is for users to consider how to continue and reinforce their cooperation as a sustainable regional support network. There is some way to go before the network reaches the same degree of potential self-sustainability as the South Asian group, and we have secured funding for a follow-up project to continue the work.

Conclusions
We would like to underscore the enormous importance of digital libraries for the developing world. Most digital library research is conducted in libraries whose purpose is scholarship, and from most people’s perspective such libraries often seem esoteric. But they are not necessarily so. Digital libraries are the killer app for information technology in developing countries: they provide a low-cost way of distributing organized information widely throughout the vast Internet-challenged regions of the world. In comparison, digital library technology is relatively unimportant in developed countries, according to some, because there are so many alternative sources of information.

Sustainability is one of the greatest challenges for open-source projects with international user populations – particularly when the users are not programmers and when much of the usage is in poor countries. Our approach is to foster the establishment of regional support centers that provide a variety of functions: training, technical support, documentation, software internationalization and localization, discussion, inspiration and visibility. Whether these efforts will result in a truly sustainable infrastructure for Greenstone has yet to be seen.

We have been asked the secret to success when striving to build a community around a piece of open source software. Some think we must have found it since we have a wealth of activity spread around the globe, volunteers who translate the interface into local languages, associations with UNESCO and other international organizations, and support activities moving into autonomous regional centers. In fact, luck and serendipity played a major role. And, indeed, there have been many failures, too: initiatives that did not come off, sources of support that we have been unable to tap, opportunities that knocked but were unheeded. The secret of success (as a thousand self-help books on financial management will tell you) is that there is no secret of success. Open source software projects are built upon a compelling vision and an excellent implementation. But equally important is the need to communicate that vision and what the software can do – clearly, widely and enthusiastically. Perhaps that necessity is the hardest part.

Acknowledgements
I warmly acknowledge the entire New Zealand Digital Library Project team for their unstinting work in providing an environment that makes this kind of research meaningful – and enjoyable. John Rose, formerly of UNESCO, has been an inspiration. Eric Morgan contributed a key insight. And thanks to our users for making it all worthwhile.

Resources Mentioned in the Article
[1] Witten, I.H., & Bainbridge, D. (2003). How to build a digital library. San Francisco, CA: Morgan Kaufman

[2] Sheble, L. (2006). Greenstone user survey. Detroit, MI: Wayne State University

[3] Pomerantz, J., Oh, S., Yang, S., Fox, E.A., & Wildemuth, B.M. (2006, November). The core: Digital library education in library and information science programs. D-Lib Magazine, 12(11). Retrieved October 21, 2008, from www.dlib.org/dlib/november06/pomerantz/11pomerantz.html

[4] Smith, M., Bass, M., McClella, G., Tansley, R., Barton, M., Branschofsky, M. Stuve, D., & Wakler, J. (2003, January). DSpace: An open source dynamic digital repository. D-Lib Magazine, 9(1). Retrieved October 21, 2008 from www.dlib.org/dlib/january03/smith/01smith.html

[5] Witten, I.H., Bainbridge, D., Tansley, R., Huang, C.Y., & Don, K. (2005, September). StoneD: A bridge between Greenstone and DSpace. D-Lib Magazine, 11(9). Retrieved October 21, 2008, from www.dlib.org/dlib/september05/witten/09witten.html

[6] Lagoze, C. Payette, S. Shin, E., & Wilper, C. (2006). Fedora: An architecture for complex objects and their relationships. International Journal on Digital Libraries, 6(2). 

[7] Bainbridge, D., & Witten, I.H. (2008). A Fedora librarian interface. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries, 8, 407-416.

[8] Apperley, M., Keegan, T.T., Cunningham, S.J., & Witten, I.H. (2002). Delivering the Māori-language newspapers on the Internet. In J. Curnow et al. (Eds.). Rere atu, taku manu! Discovering history, language and politics in the Māori-language newspaper (pp. 211-232). Auckland, New Zealand: Auckland University Press.

[9] Witten, I. H., Bainbridge, D., & Boddie, S.J. (2001). Power to the people: End-user building of digital library collections. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries, 1, 94-103.

[10] Bainbridge, D., Thompson, J. and Witten, I.H. (2003). Assembling and enriching digital library collections. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries, 3, 323-334. 

[11] Witten, I.H., Loots, M., Trujillo, M.F., & Bainbridge, D. (2002). The promise of digital libraries in developing countries. The Electronic Library, 20(1), 7–13.

[12] Bainbridge, D., Edgar, K.D., McPherson, J.R., & Witten, I.H. (2003). Managing change in a digital library system with many interface languages. Proceedings of the European Conference on Digital Libraries ECDL2003, 7, 350-361.

[13] Peters, D.P. (2006.January). Feasibility study on the establishment of a Greenstone Support Organization for Africa. Retrieved October 21, 2008, from www.greenstone.org/docs/GSOA%20Feasibility%20Study.pdf