The IMPACT Center of Competence for Digitization Project
Results and Future Path of Practice and Research
Wednesday, October 12, 2011, Full Day 9:00am-5:00pm (seminar fee)
Over the last years mass digitization has become one of the most prominent issues in the library world. This workshop provides a unique opportunity to get to know more about the challenges major European libraries face when digitising their large collections of historical material and the solutions they developed within the IMPACT research project. In this workshop several IMPACT speakers will present innovative solutions and demonstrate tools for various stages in the digitization workflow.
IMPACT (http://www.impact-project.eu/) is focused on improving access to historical text by innovating OCR software and language technology. Examples of results include new approaches in areas such as image enhancement, segmentation, OCR correction through crowd sourcing, quality assurance through document profiling, document structuring through structure parsing and existing as well as experimental OCR engines. Tools for language technology as well as historical lexica for nine European languages have been developed to improve both OCR processing and information retrieval. The IMPACT framework also incorporates valuable testing and evaluation tools, which enable current and future developers to verify their progress against the existing state-of-the–art methods. Another valuable output is the IMPACT dataset (with scanned material and ground truth from ten European libraries) that formed the basis for much of this research and will continue to inspire future research activities.
From 2011 onwards, IMPACT will continue as the IMPACT Centre of Competence for Text Digitization, bringing together experts from all sides of the digitization process. Participants are also invited to share their ideas on the IMPACT Centre of Competence, their possible involvement in this community of experts and their own digitization experiences.
- Keynote speech by Laura Mandell (Texas A&M University) on the value of digital resources for humanities research
- The challenges for content holders in making their content available to researchers (Aly Conteh - The British Library)
- Solutions of the IMPACT project: Brief overview (Hildelies Balk - KB National library of the Netherlands and IMPACT Project Director)
- Demonstration of a number of IMPACT tools with evidence of improvement as tested on real-life library material (Clemens Neudecker - KB National library of the Netherlands, Christoph Ringlstetter - University of Munich, Katrien Depuydt - INL and Aly Conteh - The British Library)
- Future practice and research: the IMPACT Centre of Competence: how content holders and researchers will work further on improving access to text (Hildelies Balk)
Laura Mandell, Professor of English Literature and affiliate of Armstrong Interactive Media Studies at Miami University of Ohio has published Misogynous Economies: The Business of Literature in Eighteenth-Century Britain (1999), a Longman Cultural Edition of The Castle of Otranto and Man of Feeling, and numerous articles primarily about eighteenth-century women writers. Her recent article in New Literary History describes how digital work can be used to conduct research into conceptions informing the writing and printing of eighteenth-century poetry. That article forms part of a book manuscript in progress: “Carved in Breath: Technology and Affect in Gothic Fiction and Romantic Poetry.” She is Editor of the Poetess Archive, an online scholarly edition and database of women poets, 1750-1900 (http://unixgen.muohio.edu/~poetess); Associate Director of NINES (http://www.nines.org); and is currently participating in the development of 18thConnect, a similar online network for eighteenth-century scholars. Her current research involves developing new methods for visualizing poetry (http://miamichat.wordpress.com), developing software that will allow all scholars to deep-code documents for datamining, and improving OCR software for early modern and 18th-c. texts via high performance and cluster computing.
Hildelies Balk - Pennington de Jongh
Hildelies Balk – Pennington de Jongh is Head of the section European Projects for Research and Development in the department of Innovation and Research of the KB National library of the Netherlands. Her section acquires and runs research projects on interoperability, digital preservation, digitization and access with partners in and outside Europe. Hildelies holds a PhD in the History of Art and is an experienced researcher and manager in the field of cultural heritage. She joined the KB in 2006 as head of the National programmes for digitization. The obvious need for improving access to the digital content created in these programmes gave rise to the forming of a European consortium to address the challenges in OCR in mass digitization of historical text. Hildelies coordinated the forming of this consortium and the writing of the proposal, resulting in the IMPACT project, led by the KB. She is project director of IMPACT and responsible for sustaining the results and the expertise of this project in a Centre of Competence to be launched at the end of 2011.
Aly Conteh, Digitization Programme Manager at the British Library. He has been involved in many digitization projects at the British Library including projects to digitise 25 million pages of 19th Century books, 4 million pages of pre-1900 newspapers and significant numbers of manuscript volumes. He serves on the Executive Board for the IMPACT project and is a member of the European Commission’s Member States’ Expert Group on Digitization and Digital Preservation.
Clemens Neudecker holds a M.A. in Philosophy, Computer Science and Political Science. He has been a member of the Munich Digitisation Centre (MDZ) from 2003-2009, mostly involved with OCR processing workflows in various national digitization projects. Since December 2009 he works at the KB National Library of the Netherlands, currently as the Technical Project Manager for IMPACT.
Katrien Depuydt, Head of the Dutch Language Bank at the INL (Institute for Dutch Lexicology) in Leiden. She is a historical linguist and lexicographer. She has worked on two major historical dictionaries and has many years of experience in managing electronic publishing and content management projects. In IMPACT she leads the work packages on language resources and on tools for building and applying language resources.
Christoph Ringlstetter, Research Associate, Univ. Munich, Centrum für Informations- und Sprachverarbeitung. He holds a Ph.D. in Computational Linguistics from the University of Munich. From 2006 – 2008, he was a postdoctoral fellow at the Alberta Ingenuity Center for Machine Learning (AICML), University of Alberta, Canada. In 2008 he joined IMPACT as a researcher for the work packages on Text Recognition and Language Resources. Christoph is a co-chair of the workshop series on Analytics of Noisy Unrestricted Text Data (AND). Current research interests are mainly centered on the areas of document post-processing, information retrieval in noisy environments and semantic search.
Members $80, non-members $90