AMeGA Quarterly Report 1
Quarterly Report 1 for Section 4.2, of the LC Bibliographic Control Plan
Submitted by Jane Greenberg, Principal Investigator (contact: janeg@ils.unc.edu)
Friday, February 20, 2004
Recommended Functionalities for
Automatic Metadata Generation Applications,
for Section 4.2, LC Bibliographic Control Action Plan:
The AMeGA (Automatic Metadata Generation Applications) Project
Automatic Metadata Generation Applications,
for Section 4.2, LC Bibliographic Control Action Plan:
The AMeGA (Automatic Metadata Generation Applications) Project
Status Report Summary
Work on the “Recommended Functionalities for Automatic Metadata Generation Applications” proposal for Section 4.2 of the LC Bibliographic Control Action Plan was launched October, 2003, following the Dublin Core conference. The goal of the project is to identify recommended functionalities for automatic metadata generation and guide vendors in the design of future applications that will take advantage of computing technology and aid in the creation of high quality metadata. This work is now being called the AMeGA (Automatic Metadata Generation Applications) project, and is an extension of the Metadata Generation Research (MGR) project (http://ils.unc.edu/~janeg/mgr), funded by Microsoft Research; OCLC, the Online Computer Library Center, Inc.; and the University of North Carolina’s University Research Council.
This document is the first quarterly report for the AMeGA project. The AMeGA project is on track with the proposed work plan submitted to LC this past summer. A summary list of accomplishments follows:
- A project Web site has been launched at: http://www.ils.unc.edu/~janeg/mgr/amega.htm.
- Project staff members have been hired.
- The Metadata Generation Task Force (MGTF) has been formed.
- A project listserve for staff and MGTF members has been set up, although it has not yet been officially launched. This will likely take place next week. The address is: Amega@listserv.unc.edu.
- Investigative tasks have been undertaken to examine and identify metadata-creation functionalities that are currently supported, or could potentially be supported, via automatic and semi-automatic means in online library catalogs, document creation software (e.g., word-processing software, Web editing software, etc.), and state-of-the-art automatic metadata generation applications.
- Background research has been conducted on survey design.
- Background research has been conducted on Request for Proposals for online catalogs, and the ALA archives have been mined, although unsuccessfully, for information on the research methodology and documentation underlying CONDOC.
- Preliminary work has been conducted on the survey that will be implemented to gather feedback from metadata creation professionals about the functionalities desired in automatic metadata generation applications.
Several of these accomplishments are elaborated on below to provide more context.1. Project Web site
The project Web site provides an overview of the project (http://ils.unc.edu/~janeg/mgr/amega), lists project staff and Metadata Generation Task Force (MGTF) members, and will serve as the host site for the survey on automatic metadata generation application functionalities.
2. Project Staff
Current project staff include the Principal Investigator (Dr. Jane Greenberg), and the following students at the School of Information and Library Science, University of North Carolina at Chapel Hill (SILS/UNC—CH): Abe Crystal (doctoral student), Michelle Cronquist, (master’s student), and Amanda Wilson (master’s student). All three have an interest in metadata. Abe Crystal has worked on the Metadata Generation Research (MGR) project for the last year and one-half. Michelle Cronquist was an intern at the Library of Congress this past summer in Music cataloging and is currently enrolled in my metadata class. Amanda is completing a metadata internship at Duke University, Special Collections, and working with the Encoded Archival Description (EAD).
3. Metadata Generation Task Force (MGTF)
The MGTF has been created to assist with project tasks and give input on project plans. The MGTF is chaired by the Principal Investigator (PI). The MGTF currently includes five professional librarians, one vendor/software developer, one researcher from OCLC, and one LC representative. Another professional librarian will likely be added from the UNC library system and we still hope to add a person from Microsoft. Current Metadata Generation Task Force members include the following people:
Judy Ahronheim
Metadata Specialist
Graduate Library
University of Michigan
e-mail: jaheim@umich.edu
Brad Allen
President
Siderean Software
e-mail: ballen@siderean.com
Priscilla Caplan
Assistant Director, Digital Library Services
Florida Center for Library Automation
e-mail: pcaplan@ufl.edu
Tim Cole
Mathematics Librarian & Professor of Library Administration
University of Illinois at Urbana-Champaign
e-mail: t-cole3@uiuc.edu
Ed O’Neill
Consulting Research Scientist
OCLC
e-mail: oneill@oclc.org
Robin Wendler
Metadata Analyst
Office for Information Systems
Harvard University Library
e-mail: r_wendler@harvard.edu
David Williamson
Cataloging Automation Specialist
Library of Congress
e-mail: dawi@loc.gov
Mary Woodley
Social Sciences Librarian
California State University, Northridge
e-mail: mary.woodley@csun.edu
To date, the MGTF has not been called upon for feedback. They will be asked to provide feedback on the survey draft some time in the next three to four weeks. MGTF members will recruit at least ten persons each from their respective institutions to complete the official survey once it is implemented.
4. Investigative Tasks
Investigative tasks have been undertaken to identify metadata-creation functionalities for the survey on automatic metadata generation applications “desired functionalities.” The three areas investigated include: online library catalogs, document creation software (e.g., word-processing software, Web editing software, etc.), and state-of-the-art automatic metadata generation applications.
4.1. Online catalogs.
The online catalog analysis identified approximately 30 functionalities that will be incorporated into the survey. The investigation involved a focus group discussion among project staff and analysis of literature, with special attention given to ENCompass Solutions and Endeavor Information systems. A site visit is being planned to North Carolina State Library to examine, and see first-hand, the functionalities identified in the ENCompass Solutions system. It is anticipated that the MGTF will be able to evaluate and add to this list of functionalities. We are awaiting a copy of the Catalogers Desktop from Kathryn Mendenhall, Acting Director, LC Cataloging Distribution Service, which may further inform research efforts in this area.
4.2 Documentation creation / presentation software.
Approximately ten different types of software have been analyzed to identify metadata elements automatically generated in this class. Examples include Adobe, MS WORD, and Blog software. We felt it was very important to include Blog software, given that these resources can now receive an official ISSNs. Approximately 10 metadata elements were identified. This work is being incorporated into the survey.
4.3 Automatic Metadata Generation Applications.
A small-scale evaluation of two state-of-the art automatic metadata generation applications has been conducted as an extension of the Metadata Generation Research (MGR) project. The generators examined were DC.dot and Klarity. The study’s findings are informing the design of the survey on desired functionalities for automatic metadata generation applications. The results are reported on formally in a forthcoming publication entitled, “Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications,” (Greenberg, 2004, Journal of Internet Cataloging, vol. 6, no. 2). A sample of ten metadata records has been collected from OCLC World Cat to assist with the potential testing of additional generators. The OCLC metadata records represent high quality records, and serve as measures if we elect to test additional applications, although project staff do not think this next step will add to the small-scale study at the moment, so we are concentrating on the initial survey desing instead, and will revisit this issue in the coming weeks.
5. Survey Design
Background research has been conducted on survey desing and work is underway on the survey’s framework and questions presentation. The School of Information and Library Science, University of North Carolina at Chapel Hill shares a building with the University’s Institute of Research in Social Science (IRSS). This institute includes research experts that advise on research design. Project staff plan to meet with an IRSS researcher to review the survey desing. Participants will be recruited from a range of libraries (e.g., academic/research, small colleges, special, and public libraries) and other disciplines where metadata creation is becoming an important task. A participant-profile questionnaire will accompany the survey, in order to gain background information on those who respond.
6. Immediate Next Steps
Immeidate next steps (within the next two to four weeks) include connecting with and updating MGTF members about project accomplishments, finalizing the MGTF members list; completing a first draft of the survey, and submitting research documetantion to the University’s Internal Review Board to verify that subjects participating in the study are not at risk, that their participation will remain confidential, and that we will uphold high-standard research practices. This work will be followed by piloting of the survey, and then implementation.
This concludes the first Quarterly Report for the AMeGA project. Please let me know if you have any questions or comments.
Respectfully Submitted by Jane Greenberg.