Tools and Demos|
Building the Universal Library: The Promise and Challenges of HathiTrust
Back to top
HathiTrust is a multi-institutional effort to create the universal library – to bring together as comprehensive a body of works as possible and to do it in a way that ensures access, permanence, content preservation, and an advanced environment for research. In short, HathiTrust is an effort born of libraries, working to bring the lasting contributions of libraries to bear on the growing body of digital materials available to students and researchers. Much has been said and written about the silo effect of digital libraries, the way that our early technological efforts Balkanized content and failed to capitalize on economies of scale. With the creation of HathiTrust, many of the world's great research libraries will work together to create a single, comprehensive library without walls. Our partners will work to coordinate their investments both in curating content and in building services, to create a whole greater than the sum of its parts.
“Able To Develop Much Larger and More Ambitious
Projects”: An Exploration of Digital Projects Teams
Siemens, L., Duff, W., Cunningham, R., & Warwick, C.
This paper reports on a research project which is exploring the prevalence and nature of research teams undertaking digital projects. Drawing upon interview and survey data, it aims to identify the methods and patterns of interaction used by collaborating digital projects teams and provide recommendations to support effective and efficient teams.
Building Australia’s eResearch Capability: The Challenge of Data Management
Burton, A. & Henty, M.
The creation of the Australian National Data Service (ANDS) provides an opportunity to devise a national approach to the provision of skills for improving both institutional and individual eResearch capability. One of the four proposed ANDS programs is called Building Capabilities. This will have three broad areas of activity: including curriculum development, stakeholder engagement and the development and implementation of an audit and certification framework. This paper discusses the first of these: its target groups and constituency, skills areas and levels, and delivery strategy.
Capturing a Plurality of Perspectives: A Framework for Developing Culturally Sensitive Curriculum and Digital Repositories
White, K.L. & Abbas, J.
This paper presents a framework that was developed from a study in 2008 of archival education in Mexico. Based on the results of survey data, semi-structured interviews, and ethnographic data, a framework was developed consisting of six elements (conceptual expansion; embeddedness; collaboration; leadership, activism and ethics; sustainability; and reflexivity), which are useful for capturing a plurality of perspectives when developing a culturally sensitive graduate-level curricular framework and course modules to prepare students for digital curation in various environments. A case study of the applicability of the framework within digital curation education is presented to illustrate further the necessity of this conceptual approach.
Communicating Archives of Cultural Institutions: Venice as a Case Study
Niero, M. & Urbani, C.
In this paper, we describe the “pilot project” of archival cooperation between cultural institutions in the digitalization process.
Creating Metadata for a Digital Database: A Case Study
She, J. & Chace, M.B.
Digitizing a complex collection that contains a variety of content types creates a great challenge to make the entire
contents of the documents fully accessible to meet researchers’ needs. The authors provide a case study that describes
an innovative approach to create metadata for a complex legislative history digital database with several features: (1)
Focus on users’ legislative history research needs to select search fields beyond the traditional access points; (2)
Innovatively apply the Dublin Core Metadata Standards and the concept of Functional Requirements for Bibliographic
Records (FRBR) to contextually design the database and user interface; (3) Contextually create metadata for
legislative documents in order meet complex research needs: locating all the documents of a particular legislative
history; finding related documents, or a particular document; searching a particular author/sponsor’s information;
searching multiple legislative histories. The authors also provide sample metadata records, sample user interface screen
with search features. This case study shows that digital technology helps not only to convert information from a variety
of source materials to a single accessible (digital) format, but also offers multiple access points to match complex
research needs because digital technology can contextually organize information in an active form.
Data Access and Long Term Data & Knowledge Preservation for Earth Science: An Overview on Some ESA Initiatives
Albani, S., Beruti, V., Fusco, L., & Giaretta, D.
Earth Observation Space Missions provide continuous surveillance of the Earth producing huge amounts of data every year that need to be processed, elaborated, appraised and archived by dedicated systems.
The ability to preserve these data in the long term and to provide easy access in order to facilitate their exploitation and utilisation is a fundamental issue and a major challenge at programmatic, technological and operational levels.
This contribution will provide an overview on some ESA initiatives carried out in collaboration with European entities and institutions, with the objective of guaranteeing long term data preservation.
In particular the paper will focus on the ESA participation and contribution in the CASPAR project, the PARSE.Insight project and the Alliance for Permanent Access to the Records of Science coalition.
Digital Curation and the Citizen Archivist
The increasing array and power of personal digital recordkeeping systems promises both to make it more difficult for established archives to acquire personal and family archives and less likely that individuals might wish to donate personal and family digital archives to archives, libraries, museums, and other institutions serving as documentary repositories. This paper provides a conceptual argument for how projects such as the Digital Curation one ought to consider developing spinoffs for archivists training private citizens how to preserve, manage, and use digital personal and family archives. Rethinking how we approach the public, which will increasingly face difficult challenges in caring for their digital archives, also brings with it substantial promise in informing them about the nature and importance of the archival mission. Can the Digital Curation project provide tools that can be used for working with the public?
A Digital Library Service for the Small
Angelis, S., Constantopoulos, P., Gavrilis, D., & Papatheodorou, C.
In this paper, we present MOPSEUS, a lightweight digital library service based on the Fedora system. This service was created to address the needs of small libraries without support from technical staff. Hence, MOPSEUS attempts to balance flexibility against ease of installation, configuration and use. The service is available as a standard Java Web servlet, uses no external databases or other systems and can easily be deployed on top of any Fedora installation. Preliminary tests concerning the ease of installation and use are encouraging. We contend that facilitating the introduction of digital library infrastructures in the small may contribute to spreading digital curation practices.
Documentation Evaluation Model for Social Science Data: An Empirical Test
Niu, J. & Hedstrom, M.
This paper builds on our prior research in which a Documentation Evaluation Model (DEM) for social science data was constructed and hypotheses about impacting factors of perceived documentation quality were proposed. In this paper, results from interviews and a survey were used to validate the model and test those hypotheses. We found the DEM model is valid, and that perceived documentation quality varies with several characteristics of data and is weakly affected by users’ absorptive capacity.
Educating Archivists about Copyright: How Can We Do It Better?
A recent study that looked at the sources and quality of Canadian archivists’ copyright knowledge in relation to repository practices in digitizing archival holdings for Internet access found that Canadian archivists get their copyright knowledge from a range of sources, not all of which are authoritative or up-to-date. Consequently, copyright issues are not always addressed correctly, with adverse effects on access to, and use of, digital resources. If a good knowledge of copyright is an essential component of digital curation, this study suggests that there are some problems in copyright education, and proposes possible ways of addressing them.
Effective Access to Digital Assets: An XML-based EAD Search System
Zhang, J., Fachry, K.N., & Kamps, J.
This paper focuses on the question of effective access methods, by developing novel search tools that
will be crucial on the massive scale of digital asset repositories. We illustrate concretely why XML
matters in digital curation by describing an implementation of a baseline digital asset search system
that is fully XML-driven. The system aims to provide better access to archival material through digital
finding aids in the Encoded Archival Description (EAD) standard. Relevant (parts of) archival descriptions
within often lengthy and complexly organized digital archival finding aids can be found faster and with more
ease. A succinct walk-through of the process of design and implementation of such a system is given, from a
higher-level conceptual and generic view, where we start from the actual digital archival finding aid to the
eventual delivery of the fonds to the user. Beyond this baseline, we propose a method for automatically providing
extra archival context through automatic link detection between archival finding aids. We relate our efforts with
the Encoded Archival Context (EAC) initiative.
Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations
McCown, F., Nelson, M.L., & Van de Sompel, H.
The Open Archives Initiative (OAI) has recently created the Object Reuse and Exchange (ORE) project that defines
Resource Maps (ReMs) for describing aggregations of web resources. These aggregations are susceptible to
many of the same preservation challenges that face other web resources. In this paper, we
investigate how the aggregations of web resources can be preserved outside of the typical
repository environment and instead rely on the thousands of interactive users in the web
community and the Web Infrastructure (the collection of web archives, search engines, and personal archiving
services) to facilitate preservation. Inspired by Web 2.0 services such as digg, deli.cio.us, and
Yahoo! Buzz, we have developed a lightweight system called ReMember that attempts to harness the
collective abilities of the web community for preservation purposes instead of solely placing the burden of
curatorial responsibilities on a small number of experts.
Identifying and Implementing Modular Repository Services
In recent work at the Library of Congress, we have been identifying requirements for digital repositories for locally created collections and collections received from partner institutions. Our most basic needs are not surprising: How do we know what we have, where it is, and who it belongs to? How do we get files – new and legacy – from where they are to where they need to be? And how do we record and track events in the life cycle of our files? This paper describes current work at the Library in implementing tools to meet these needs as a set of modular services -- Transfer, Transport, and Inventory -- that will fit into a larger scheme of repository services to be developed.
An Implementation of the Audit Control Environment (ACE) to Support the Long Term Integrity of Digital Archives
Smorul, M., Song, S., & JaJa, J.
In this paper, we describe the implementation of the Audit Control Environment (ACE) system that provides a scalable, auditable platform for ensuring the integrity of digital archival holdings. The core of ACE is a small integrity token issued for each monitored item, which is part of a larger, externally auditable cryptographic system. Two components that describe this system, an Audit Manager and Integrity Management Service, have been developed and released. The Audit Manager component is designed to be installed locally at the archive, while the Integrity Management Service is a centralized, publically available service. ACE allows for the monitoring of collections on a variety of disk and grid based storage systems. Each collection in ACE is subject to monitoring based on a customizable policy. The released ACE Version 1.0 has been tested extensively on a wide variety of collections in both centralized and distributed environments.
Integrating Metadata into the NARA Transcontinental Persistent Archive Prototype
via the OAI-PMH
Ward, J., de Torcy, A., Mantooth, J., Chua, M., & Crabtree, J.
The H.W. Odum Institute for Research in Social Science (Odum), the Renaissance Computing Institute (RENCI), the
School of Information and Library Science (SILS), and the National Center for Data Intensive Cyber Environments (NC-DICE), all part of the University of North Carolina at Chapel Hill (UNC-CH), are collaborating on an extension of the National Archives and Records Administration's (NARA) transcontinental persistent archive prototype (TPAP) data grid with the new integrated Rule Oriented Data System (iRODS). The goal of the project is to enable collection interoperability among preservation environments around the TPAP data grid using iRODS. This paper presents the results of one part of that project, which is the development of a prototype service by which metadata can be transferred into the NARA TPAP metadata catalogue (iCAT) via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), using the Odum Institute Data Archive’s (OIDA) Data Document Initiative (DDI) metadata as the test
data. The successful development of this prototype will enable bibliographic metadata-based queries in iRODS, as well as enable any digital library or data archive that is an OAI-PMH-compliant Data Provider (DP) to upload their metadata into the preservation grid. Future work will include ingesting the digital objects, as well.
Lessons Learned from the DISC-UK DataShare and Data Audit Framework Implementation Projects
This paper discusses some of the outcomes from two JISC-funded data curation projects in the UK from 2007-2009: DISC-UK DataShare and the Edinburgh Data Audit Framework Implementation project. DISC-UK DataShare involved investigating deposit of research data in institutional repositories including metadata solutions and policy development; the second was about understanding and improving data management practice through partnering with academic departments in the use of the Data Audit Framework.
Invited Paper: MoReq2: a European Contribution to the Preservation of Electronic Records
MoReq and its successor MoReq2 are European model specifications of requirements for Electronic Records Management systems, often referred to as ERM or EDRM systems. There are several other specifications that set out to define model requirements for EDRM systems – notably the US DoD 5015.2 standard. However, the other specifications almost all are designed expressly for government bodies in one country, MoReq and MoReq2 are differentiated in three ways:
This paper concentrates on the third differentiator, and specifically on features that address digital preservation, notably:
- they are designed to apply to all sectors (public, private and not-for-profit alike);
- they apply to all members states of the European Union;
- they include features that have been found to be valuable in practice, even though they are not strictly required for the theoretical management of records.
- automated rendition;
- import and export;
- XML schema.
Pathways to Preservation: Digital Curation Strategies in North Carolina State Government
Eubank, K., Ricker, J., & Rudersdorf, A.
This paper discusses the impact of digital publishing, e-mail, and electronic records management on the North Carolina State Archives and the State Library of North Carolina, the entities responsible for gathering, providing access to, and permanently storing state agency electronic publications and records in North Carolina. In addition to outlining the current state of digital government information in North Carolina, as evidenced by recent survey data, this article touches on future plans and collaborative efforts, both within state government and with other states, as well as some of the challenges to successful implementation that the State Archives and State Library must overcome.
Preservation Workflows, Strategies and Infrastructure
The OAIS Reference model  provides the de-facto model for digital archives and forms the basis for the effort to produce an international standard for audit and certification of digital repositories. OAIS is however an abstract model; digital repositories must have concrete tools, strategies and appropriate support to implement the requirements which derive form OAIS.
This paper introduces and briefly describes a number of fundamental strategies, workflows, and a support infrastructure to enable repositories to follow OAIS and to help them be better positioned for international certification.
Qualification & Education in Digital Curation: the nestor Experience in Germany
Neuroth, H., Osswald, A., & Strathmann, S.
Being a relatively new topic in research and education, digital curation is, for a number of reasons, currently not very well covered by university curricula. Within the project "nestor", a transnational partnership of academic institutions in Germany, Switzerland, and Austria, has established a comprehensive qualification program not only based on e-learning tutorials but also on schools, seminars and an (open access) encyclopedia in digital curation.
Reconstructing the Digital Past: A Case Study of the Reconstruction of the Lost Pittsburgh Project
The Web-based Pittsburgh Project, aka Functional Requirements for Evidence in Recordkeeping, was administered by the University of Pittsburgh’s School of Information Sciences between 1992 and 1996. The site disappeared in 2000 when the School switched servers. Although partial versions of the project could be recovered through the Wayback Machine, graduate students in the School’s 2008 Digital Preservation course reconstructed the entire site, and added documentation on the reconstruction process. The reconstructed site is now available at http://www.sis.pitt.edu/~bcallery/pgh/index.htm. This case study discusses educational strategies used in the reconstruction process, particularly the introduction of issues of the completeness and authenticity of the restored site, and considers the effectiveness of collaborative tools in the management of a group project.
The Russian Doll Effect: A Case Study in Digital Artifact Recontextualization
This paper explores a specific project to digitize and make available artworks of the Ball State University Museum of Art. By establishing partnerships and maintaining flexible metadata, the portability and recontextualization of digital artifacts (termed by the author as the ‘Russian Doll Effect’) has been maximized. The case study details the primary context of the digital assets, current recontextualizations, and future directions.
Speech Acts and Electronic Records
All written records of human activity involve actions expressed in records. Archivists identify these acts when they describe records and when they review records for possible restrictions on disclosure. This paper reports the results of an analysis of Presidential records to determine the speech acts conveyed by the records and the role of these speech acts in describing the records. It also proposes a method for automatically recognizing these acts for use in support of archival description and review.
The Survival of Records (and Records Management) in the Twenty-First Century
This paper discuss the nature and role of records and records management in relation to the contemporary cyber-landscape, and describes how the principles of records management inform the retention of electronic records and what changes in perspective and method are needed given the digital domain.
Thinking Like a Digital Curator: Creating Internships in the Cognitive Apprenticeship Model
Yakel, E., Connway, P., & Krause, M.G.
Effective formal learning about digital curation must take place both in the classroom and in the field. This paper discusses how the cognitive apprenticeship model is being applied in the new Preservation of Information specialization at the University of Michigan School of Information to foster learning inside and outside the classroom. By adopting this approach, our philosophy is that pedagogy is as important as content to achieve the goal of the specialization which is to help students learn to ‘think like a digital curator’ while imparting specific skills. The opportunity to create synergy between courses and internships was made possible by a grant, "Engaging Communities to Foster Internships for Preservation and Digital Curation" from the US Institute for Museum and Library Services.
Use of Computer Forensics in the Digital Curation of Removable Media
The purpose of this paper is to encourage the discussion of the potential place and value of digital forensics techniques when dealing with acquisitions on removable media in the field of digital curation. It examines a basic computer forensics process, discusses a typical file system for removable media, and raises questions about necessary processes and incentives for addressing data capture in the field of digital curation.
Web Access for the Museum of Anthropology’s Collections
Back to top
Whittington, S.L., Bryner, K.E., Hancock, B.H., & Vidrine, M.R.
This paper is a case study of a phased project to digitally curate archaeological and ethnographic collections and associated archival materials of the Museum of Anthropology at Wake Forest University and
make them freely accessible through the World Wide Web. Multiple federal grants and infrastructure support from the museum’s parent organization have been essential for successful project implementation.
Eaton, F., Chapman, S., & Crabtree, J.
This session on change management in the digital curation environment will have three speakers representing
different perspectives. Fynnette Eaton served as the Change Management Officer for the Electronic Records
Archives Program at the National Archives from 2002-2007 and will provide an overview of types of issues
that anyone introducing major changes in an organization will face. She will use examples from her experiences
both at the Smithsonian and the National Archives. The emphasis will be on dealing with a system that will
change how staff performs its work. Stephen Chapman from Harvard University will discuss the experiences in the Open
Collections Program at his institution as they deal with changes in digitization mandates that will change how staff
performs its work. One example is the shift from a mass digitization mandate (for published materials) to one that
focuses upon production digitization for unique materials ( archives, manuscripts and rare books) which
required across-the-board adoption of workflows and systems. All of which underscores the point that adaptability
and other skills are intrinsic to the ongoing success of digitization programs.
Jonathan Crabtree will share some experiences that Odum has had during the DataPASS project and migration to the
current Dataverse archive distribution and management software. The major change being the shift to a federated approach
for preservation and the IT infrastructure that comes with it. Changes in workflow are important but in addition it is
the social structures of the federated environment and the efforts placed on partnership building. This has had
significant impact on collection development and the acquisitions process. In addition the shift in user patterns
causes concern and consequences in the collection of usage statistics. These changes require attention on both the staff
development front as well as administrative expectations.
Common Workflows: Health and Social Science Data Curation Collaborations
Granda, P., Thomas, D., Grasso, C., & Teixeira, C.
This panel session will describe collaborations between different organizations to establish common technologies and procedures to process and preserve quantitative data in health and the social sciences for dissemination to the research community.
Comparing Curricula for Digital Library and Digital Curation Education
Pomerantz, J., Oh, S., Wildemuth. B.M., Hank, C., Tibbo, H.R., Fox, E.A., & Yang, S.
Two related curriculum development projects are currently underway, one concerning digital libraries and one
concerning digital curation. This paper explores the convergence and divergence of these two federally-funded projects’ approaches
to curriculum development.
Cooperative Approaches to Digital Preservation: Panel
Halbert, M., Walters, T., Trehub, A., Pearce-Moses, R., & Crabtree, J.
In this panel, representatives from four archives – the MetaArchive Cooperative,
Alabama Digital Preservation Network, Data-PASS, and the Persistent Digital Archives and
Library System – discuss the versatility, low cost, and compelling benefits of using cooperative
distributed digital preservation networks to safeguard categories of digital content that define our
culture, identity, and history and that might otherwise be lost as a result of natural disaster, human
error, or neglect.
Curation of Scientific Datasets: Trends, Current Initiatives, and Solutions
Dreyer, M., Neuroth, H., Carrier, S., Greenberg, J., Abrams, S., Cruse, P., Kunze, J., Day,
M., Neilson, C., Ball, A., & Russell, R.
E-Science and cyberinfrastructure developments present information professionals and researchers with significant
curation challenges relating to the management of scientific datasets. Among pressing questions are: What data should be collected for data curation? How can quality control be maintained?
And, how can metadata be generated effectively? These and other challenges are made complex, given the diversity of
methods by which data are produced, their heterogeneity, and the increasing scale and scope of scientific research
projects. Available literature on the topic of data stewardship provides grounding for approaches addressing these
problems, yet more work specifically relating to cyberinfrastructure and repository frameworks is required.
This international panel will report on current initiatives addressing the management of scientific data,
focusing on advances and solutions in the curation of datasets. The reporting will take place in the context
of recommendations from funding agencies and international councils, and models for data curation such
as the DCC Curation Lifecycle Model. The panel will provide recommendations for the scope and form of the
effort required to address the challenge of scientific data curation and the implications for digital curation education.
Digital Curation and Preservation Training and Education: A Panel to Consider Options and Intersections
McGovern, N.Y., Tibbo, H.R., Cragin, M.H., Davidson, J., & Hofman, H.
This panel will present a variety of continuing education initiatives in digital curation, digital preservation, and data curation. Panelists will discuss the potential intersections of these initiatives and the opportunities for enhancing synergies and extending resources. The speakers represent curriculum development from a range of digital disciplines and organizational settings and will discuss the implications for sustainability of educational programs within these settings.
Digital Curation of Humanistic, Multimedia Materials: Lessons Learned and Future Directions
Winget, M., Frick, C., McDonough, J., Rennear, A., & Lowood, H.
Library and archival literature often points to the audio-visual and other non-text preservation communities as leaders in envisioning new, digitally driven curation methods and practices. A closer look, however, reveals that the majority of such institutions and key practitioners – both commercial and not-for-profit – remain committed to 19th century conservation theory and models. This panel features interactive case studies that illustrate participants' practical and theoretical experiences with multimedia and new media collections, and will provide alternative approaches to traditional models of appraisal, collection development, access, and preservation. In this panel discussion we will discuss what it means to curate a collection of multimedia and interactive media, how such collections might be used in the humanities, and what role curators play in creating, preserving, and promoting their use.
Digital Curation Vignettes: Personal, Academic, And Organizational Digital Information
Beaudoin, J.E., Esteva, M., & Japzon, A.
This panel presents variations on the theme of digital curation by examining the digital information management
and preservation practices of three different populations.
Personal digital information management, personal collections transferred to institutional repositories, and a
digital archiving case in a private organization, offer a wide view of the types of contexts in which
digital material is being produced “in the wild.” Across the cases we found that digital record-keeping
and preservation practices are not well understood or established, and that a vast amount of digital content
created currently is at risk. Other issues, such as an individual’s perception of digital information value,
and the feasibility of preservation beyond an individual’s or organization’s lifetime, surfaced as determinants
of the current situation. The findings have important implications for appraisal and post-custodial archival
strategies. They are also useful for identifying critical decision points when digital curationissues are best
Digital Curation Policy Issues
Smith, C.A., Agnew, G., Cragin, M.H., Dryden, J., & Eschenfelder, K.R.
This panel’s speakers will discuss policy issues associated with the curation of digital materials including privacy, copyright/intellectual property, cultural sensitivity and trustworthiness. Several collection types will be included in the discussion including health information, multimedia collections, scientific data collections and digital library, museum and archival cultural collections.
Distributed Custodial Frameworks for Archival Preservation
Marciano, R., Wojcik, C., Wilczek, E., Conrad, M., & Tibbo, H.R.
Building end-to-end digital repositories is a task few institutions can currently accomplish. The Distributed Custodial Archival Preservation Environments (DCAPE) project addresses this problem by offering a preservation service that includes a trusted digital repository infrastructure assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. Presenters discuss case studies of preservation environments they are jointly building. These include TPAP (Transcontinental Persistent Archives Prototype), DCAPE, Fedora / iRODS integration, and iRODS (integrated Rule-Oriented Data Systems) prototypes and represent distributed custodial frameworks from the federal, state, university, and cyberinfrastructure perspectives.
Extending the Data Curation Curriculum to Practicing LIS Professionals
Cragin, M.H., Smith, L.C., Palmer, C.L., & Heidorn, P.B.
In this panel we will present an overview of and outcomes from the inaugural Summer Institute on Data Curation held at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. The Institute addresses a growing need for continuing professional development in data curation. Panelists will present their experiences attending the Institute, and discuss these in relation to the current and ongoing data curation activities at their own universities.
Wurl, J., Ray, J., Grindley, N., & Williams, K.
This panel brings together leaders from four funding agencies in the US and the UK -- the National Endowment
for the Humanities (NEH); the Institute of Museum and Library Services (IMLS); National Historical Publications and Records Commission (NHPRC); and the Joint Information Systems Committee (JISC). The panelists will speak on the ways in which their organizations are supporting digital curation initiatives at their respective institutions, and future directions for funding.
Gaps and Persistent Challenges
Lynch, C., Sawyer, D., Ashley, K., and van Diessen, R.J.
Where have we been? Where do we need to go to advance the principles
and goals of digital curation? This panel brings together leading researchers and
thinkers in the areas of digital curation and digital preservation to respond to this
critical question. Panelists will offer their unique and informed perspectives on the
scope, extent, relevance, and quality of current digital curation research, practices
Moving Web Archiving into the Classroom
Bragg, M., Fox, E.A., Hedstrom, M., & Lee, C.A.
Web archiving is a professional activity that manages the risk of information loss by identifying, selecting, collecting, managing and providing ongoing access to resources from the Web that are deemed to have sufficient continuing value. Graduate-level professional education has only recently begun to address the role of web archiving within digital collection and digital library development.
In order to promote and cultivate such “citizen web archiving,” it will be beneficial for information professionals to provide opportunities for training and guided hands-on experience with available tools. The members of this session panel have led efforts to introduce web archiving into university and K-12 classrooms. Based on their experiences from the educational activities discussed above, members of the panel will discuss the following questions: What are the main principles, concepts and skills required to appropriately archive web resources? What are the most effective strategies for and biggest challenges associated with incorporating web archiving activities into the classroom?
Personal Digital Archiving
John, J.L., Marshall, C.C., Pearson, D., & Rauber, A.
This panel addresses personal digital archiving, which is an essential, but often neglected, arena of digital curation. Panelists will discuss their efforts to understand, support and document personal digital archiving activities, as well as exploring general challenges and opportunities for digital curation professionals.
Skills for Significant Properties: Debating Pragmatics and Philosophy in an Area of Digital Curation
Grace, S., Anderson, S., & Lee, C.A.
A debate at DigCCurr 2009 will allow educators and practitioners the chance to reflect on the balance of practical and theoretical skills required of digital curators. The area of significant properties is used as a case study for drawing out some of these skills in research and teaching environments. A panel will debate issues, with the audience invited to contribute to the discussion.
Snapshot of Digital Preservation in Federal Libraries
Keller, D., Murphy, P, Dunn, C., Huffine, R., & Crawford, C.
In this panel, we represent the variety within the federal library sector as seen through the digital projects being pursued. Panelists include librarians and digital project coordinators from both large and small federal libraries containing diverse subject collections, and address the challenges presented by digitizing different physical media. Each panelist’s comments will focus on three areas: First, the presenter will frame their comments by providing the purpose or impetus for pursuing digital preservation project in their library. Second, the presenter will discuss approach the library has taken in pursuing the project, including considerations such as scanning, repository software, and other technical requirements, as well as the availability and uses of human and financial resources. Finally, the presenter will analyze the project, highlighting steps or decisions that have led to successes and pointing out areas of difficulty that have been overcome as well as those that present ongoing challenges in the project. A moderator will facilitate discussion between panelists and field questions from the audience. Although federal libraries support organizations whose missions and areas of interest differ considerably, this snapshot will illustrate that their digital projects experience many of the same constraints and that regardless of the library or project size, librarians often are faced with similar challenges.
Technology Learning for Digital Curators
Back to top
Botticelli, P., Bradley, J., & Fulton, B.
In this panel, we will explore current needs for technology learning in digital curation, and we will examine the role of hands-on learning methods in training digital curators.
Tools and Demos
ContextMiner - Collect Different
We present ContextMiner, a web-based service for collecting contextual information for digital objects
from a variety of sources. ContextMiner lets one run campaigns that can include a set of
queries that ContextMiner can run on various sources, such as YouTube and blogs, and keep extracting and
adding contextual information to the collected objects based on their usage. Such contextual information can help to make sense of digital objects and better preserve them.
Creating a Preservation Plan using the Preservation Planning Tool Plato
Kulovits, H., Becker, C., Kraxner, M., & Rauber, A.
The rapid technological changes in today’s information landscape
have turned the preservation of digital information into a pressing challenge. A lot of different strategies, i.e. preservation actions,
have been proposed to tackle this challenge. However, which strategy to choose, and subsequently which tools to select to implement it, is a non-trivial task. Creating a concrete plan for
preserving an institution’s collection of digital objects requires the evaluation of available tools against clearly defined and measurable criteria.
Preservation planning aids in this decision making process to find the best preservation strategy considering the institution’s requirements, the
planning context and possible actions applicable to the objects contained in the repository. Performed manually, this evaluation of possible
solutions against requirements takes a good deal of time and effort. In this demonstration, we present Plato, an interactive software tool aimed at
supporting institutions in the process of creating preservation plans.
Digital Curation Tools and Demos I
Rauber, A., van der Hoeven, J., van Diessen, R.J., Pearce-Moses, R., Bowden, H., & Pomerantz, J.
This invited demonstration panel session brings together an international collection of tools for performing and facilitating digital curation in the practice setting, as well as for use in the education setting. This is the first of a two-part tools and demo session. Tools to be demonstrated in this session include: Plato,
the Planets preservation planning tool; Hoppla
(Home and Office Painless Persistent Long-Term Archiving) system; Dioscuri,
a modular emulator; the Universal Virtual Computer (UVC);
the Preservation Manager ;
PeDALS (Persistent Digital Archives and Library System); and the DCE
(Digital Curation Exchange). Additionally, two invited papers included in these Proceedings provide further
information on Plato and Hoppla.
Digital Curation Tools and Demos II
Pearson, D., Shah, C., Moore, R., Ingram, G.B., McHugh, A., & Hofman, H.
This invited demonstration panel session brings together an international collection of tools for performing and
facilitating digital curation in the practice setting, as well as for use in the education setting.
This is the second of a two-part tools and demo session. Tools to be demonstrated in this session
include: Prometheus, a digital preservation workbench;
Mediapedia, a prototype web-based resource on carriers;
ContextMiner, a framework to collect, analyze, and present contextual
information along with the data;
iRODS (Integrated Rule-Oriented Data System), a data grid software system
developed by the Data Intensive Cyber Environments (DICE) group and collaborators;
CONTENTdm, a complete software solution for the storage, management, and
delivery of multi-format digital collections to the Web; and
DRAMBORA (Digital Repository Audit Method Based on Risk Assessment),
a toolkit to facilitate internal audit by providing repository administrators with a means to assess capabilities,
weaknesses, and strengths. Additionally, four invited papers included in these Proceedings provide further
information on Prometheus, Mediapedia,
ContextMiner and CONTENTdm.
Digital Curation Exchange (DCE)
The Digital Curation Exchange (DCE) website, or digitalcurationexchange.org. is designed to serve as an international online community center for digital curation practitioners, researchers, educators, and students. The DCE will allow visitors to create login accounts and then add resources (e.g. course materials, academic program information, information about file formats and applications). The DCE is designed to allow collaboration, information sharing, and community building related to the rapidly-evolving field of digital curation. The demo will serve to walk you through all of the ways one can interact with the site, such as: how to create an account, how to view the resources housed on the site, how to contribute to the discussion of the resources via comments and discussion forums, how to create your own material such as blogs, groups, group wiki posts, and how to add material to the general collection of digital curation practice and education resources.
van der Hoeven, J.
Dioscuri is an x86 computer hardware emulator written in Java. It is designed by the digital preservation community to ensure documents and programs from the past can still be accessed in the
future. The Dioscuri emulator has two key features: it is durable and flexible. Because it is implemented in Java, it can be ported to any computer platform which supports the Java Virtual Machine (JVM), without any extra effort. This reduces the risk that emulation will fail to work on a single architecture in the future,
as it will continue to work on another architecture. Dioscuri is flexible because it is completely component-based. Each hardware component is emulated by a software surrogate called a module. Combining several modules allows the user to configure any computer system, as long as these modules are compatible. New or upgraded modules can be added to the software library, giving the emulator the capability to run these.
Hoppla - Digital Preservation Support for Small Institutions
This demo presents the Hoppla archiving system to provide digital preservation solutions specifcally for small institutions and offices.
It hides the technical complexity of digital preservation challenges by providing automated services
based on established best practice examples. Appropriate preservation strategies and required tools for the collection
are delivered via a web service, effectively outsourcing the required digital preservation expertise.
Strodl, S., Motlik, F., & Rauber, A.
Small businesses (small office/home office, SOHO) have tremendous amounts of digital information. At the same time, they
have little to no expertise on how to manage it, not to mention caring for their long-term preservation, as even simple
back-up strategies pose already drastic challenges.
Mediapedia: Managing the Identification of Media Carriers
del Pozo, N., Elford, D., & Pearson, D.
All digital information is stored on physical carriers. Given the variations in carrier types, the quantity produced and in circulation, along with the potential importance of the content being stored on them, not taking any steps to document and preserve the characteristics of different carrier types will make it much more difficult, and eventually impossible, to extract content even in the short-term.
The Mediapedia is intended to provide a sustainable way of facilitating carrier type identification as well as documenting their technical requirements and general preservation information. By enabling a community of specialist individuals and organizations to collaborate in the documentation of these carriers it will hopefully create a sustainable body of knowledge which can be centrally and persistently accessed via the web. From a preservation and risk management perspective, we can either approach this problem as a community or ignore it at our individual peril.
Policy-based Distributed Data Management
Moore, R.W. & Marciano, R.
Data management applications primarily differ in the set of management policies that are enforced. The underlying mechanisms for managing data, supporting queries, and validating assessment criteria are usually generic infrastructure that are provided by traditional data grids. The iRODS integrated Rule Oriented Data System is a data grid that explicitly implements management policies as computer actionable rules, implements management processes as computer executable procedures, and consistently updates state information that results from the application of the policies and procedures. The iRODS framework provides generic infrastructure for interacting with remote storage systems, while enforcing the management policies. Time dependent assertions are validated through the parsing of audit trails of all operations performed within the data grid.
The iRODS technology is being used as infrastructure for sharing data (data grids), infrastructure for publishing data (digital libraries), infrastructure for preserving data (persistent archives), infrastructure for analyzing data (processing pipelines), and infrastructure for managing real time data streams (federation). The types of collections range from web pages, to office products, to images, to observational data, to experimental data, to output from simulations.
Policy-based systems promise to alleviate three major challenges when managing collections that scale to hundreds of millions of files and petabytes of data:
- Enforcement of management policies. Example policies include retention, disposition, distribution, replication, integrity validation, authenticity validation, time-dependent access controls, human-subject access approval flags.
- Automation of administrative functions to minimize labor support requirements. Examples include automated extraction of metadata, creation of derived data products, recovery from corruption events.
- Validation of assessment criteria. Every collection is assembled for a purpose, typically expressed as a set of criteria that need to be periodically validated. Examples include the ISO MOIMS-rac repository assessment criteria, the DRAMBORA risk mitigation criteria.
A demonstration of the IRODS data grid will be given that emphasizes the extensibility and modularity of the system. The policies controlling the management of data will be dynamically changed, and advanced interfaces will be demonstrated.
Prometheus: Managing the Ingest of Media Carriers
del Pozo, N., Elford, D., & Pearson, D.
The National Library of Australia has a relatively small but important collection of digital material stored on common carriers such as floppy disks, CDs and DVDs. This includes both published material and unpublished manuscripts in digital form. In the past, preservation of the Library’s physical format digital collection has been taken care of manually, on a case-by-case basis, but this approach is insufficient to deal effectively with the increasing volume of material requiring preservation.
The Library has produced an application called Prometheus, which provides a semi-automated, scalable process for transferring data from carriers to preservation-managed digital storage. This is helping the Library to mitigate the major risks associated with storing the content on physical carriers: deterioration of the media and obsolescence of the hardware required to access them. Prometheus makes it easier to process the majority of carriers commonly encountered in the Library and to collect and manage metadata about their content. Although not perfect, Prometheus is helping the Library to save digital content before it is too late.
Teaching with CONTENTdm in the Digital Curation Curriculum
Back to top
One challenge for LIS programs is to develop digital curators who demonstrate a sound grasp of best practices and core principles in information management as they make use of modern digital collection management tools. Selection and description, presentation and preservation—at every stage in the lifecycle of electronic materials management, no less than with traditional librarianship, highly refined curation skills begin in the LIS program. Except for those curricular tracks that focus on tool design and development, library and information studies programs must leverage as-built curation tools as they teach core concepts—acquisition, cataloging, information retrieval, and preservation of the record of knowledge. During this demo session, we will look at how faculty are meeting their teaching and learning objectives while leveraging CONTENTdm in the LIS program.
Contextual Information from Blogs in Video Digital Curation
Capra, R.G., Clemens, R., Lee, C.A., & Sheble, L.
In this study, we examined the extent to which blog postings that provide links to digital videos also provide contextual information about those videos. Our analysis of blog entries that linked to YouTube videos about the U.S. presidential election revealed that most of the blog entries do discuss the videos to which they link but also contain content that is not directly related to the videos. Most of the blog entries provide some additional contextual information. This suggests that crawling and capturing blog pages that link to YouTube videos can serve as a means to gather contextual information about those videos. We suggest future research that will (1) focus on the types of contextual information provided by blog entries; (2) determine whether there is a small subset of blog entries that provide the majority of contextual information about videos to which they link; and (3) identify aspects of blog entries that have the potential to positively impact more efficient collection of the entries. Answers to these questions will help curators of digital collections to better identify and collect content from the blogosphere to provide contextual information that will help future users to make sense of videos in their collections, while minimizing the resources required to capture additional content from the Web.
Data Management and Curation of Research Data in Academic Scientific Research Environments
Hayes, B.E., Harroun, J.L., & Temple, B.
The Structural Bioinformatics Core Facility at the University of North Carolina at Chapel Hill (SBI Core) assists researchers university-wide in computational structural biology techniques and incorporating structural biology/bioinformatics into their grants and publications. The SBI Core works with a diverse population of researchers from numerous departments and provides support to an ever-changing body of research. The computational biology services provided by the SBI Core are data-intensive and use a diverse and distributed set of applications for processing, data storage, and data management
As the amount of data and number of projects have increased, the SBI Core requires an effective strategy for managing data and facilitating
data sharing between the SBI Core and the researchers it assists.
The UNC-CH Health Sciences Library (HSL) has begun a collaborative project
with the SBI Core to identify the crucial data management needs and to envision new
roles for the library in e-science and data management. In partnership, the SBI Core and the
HSL have identified major obstacles in data sharing, data management, and data access. Furthermore,
the SBI Core and the HSL will develop solutions in which the library facilitates collaboration among
campus resources and matches unmet needs to external resources. One of the library’s goals in this proof-of-concept project with the
SBI Core is to become a central campus resource for research support and data management.
Extending an LIS Data Curation Curriculum to Include Humanities Data
Renear, A.H., Teffeau. L.C., Hswe, P., Dolan, M., Palmer, C.L., Cragin, M.H., & Unsworth, J.M.
We describe an IMLS-funded project to extend an existing data curation curriculum to include humanities data.
Federal Libraries Digital Preservation Census
Keller, K. & Harrison, A.
In this poster session, we share the planning and early data collection stages of a multi-year effort to survey the federal library community about its digitization efforts. We will discuss the overlapping sub-communities within the federal library sector which make it challenging to break this survey project into manageable pieces. We will discuss efforts to market this survey initiative so that we can encourage participation and show benefits to libraries for assisting with this information collection. We will also describe strategies that are developed for achieving maximum response rates and avoiding duplicate or unsanctioned responses. We will share survey questions that probe not only what digital projects are being undertaken in libraries, but the methods, funding, staffing, and work practices that are involved in these initiatives. While we intend to present some of the preliminary data collected from the first round of libraries surveyed, the final goal, an online directory of digital projects taking place in federal libraries, will be premature at the time of this presentation.
Getting the Tar Off Our Heels: Moving Forward with Archiving University of North Carolina at Chapel Hill Websites
During the 2008-2009 school year, the author, a DigCCurr fellow assigned to The University of North Carolina at Chapel Hill (UNC) University Archives and Records Management Services (UARMS), is exploring the feasibility of integrating website archiving into the UARMS workflow. Because this is a relatively new area for the UARMS, it will be vital to discover what is currently being done in the field to help UARMS situate its goals and resources into workflows that already have been developed and to learn from the difficulties that others have overcome. From there, test implementation of selected tools will be carried out to determine their feasibility for long-term use at UARMS. This poster will present the results of the author’s research and test implementation of different methods for archiving websites. As such, it will be of interest to those curious about how an institution can survey its resources and the field of available tools, and begin a web archiving program tailored to its needs.
It will also showcase open source tools that the UNC UARMS explores/implements on a trial basis.
Preserving Electronic Mailing Lists: The H-Net Archive
This poster illustrates an NHPRC-funded project to evaluate and improve upon preservation practices for the H-Net academic e-mail list archive.
Residential Data Curation Internship: Opportunities and Challenges
During the summer of 2008, while a student at San Jose State University’s School of Library and Information Science, the author completed a residential data curation internship at the Cornell Biological Field Station. The internship entailed preparing and documenting a long term dataset collected by researchers at the field station. Internships such as this one present learning opportunities not only for students interested in the field of data curation, but for researchers as well. They also provide the opportunity to facilitate collaborations between library staff and researchers.
Sustaining Digital Preservation Organizations: What Discourse Analysis Can Tell Us about Market Demand and Long-Term Survival
In this poster, I show the results of a study in which the coevolution of community discourse and organizational structures in preservation and archiving have influenced the current ecology of the field. In particular, I use latent semantic analysis (LSA), a computational technique originating in the Information Retrieval field, to show how the historical trends, technological develop-ments, values and professional issues within a community lead to a demand for organizational attributes that are eventually instantiated by the emergence of new forms and sub-forms of organization. The results of the analysis can then provide the historical and evolutionary basis for explaining why particular types of digital preservation forms have emerged and what types of hypotheses about the future sustainability of these forms can be derived.
What Should We Teach about METS in a Digital Preservation Course?
Back to top
Waters, J. & Allen, R.B.
We describe a METS (Metadata Encoding and Transmission Standard) teaching assignment which forms part of a foundational Digital Preservation course for an MS(LIS) degree. The assignment requires students to both critically evaluate this important framework and apply it practically to metadata management for digital objects. The results indicate that it was a valuable assignment for LIS and IS students who could conceptually grasp METS readily but that many have trouble with integrating external metadata schemes and with XML syntax. These results are informing a redesign of the assignment.