Metadata Marathon

Metadata Marathon Metadata Marathon Metadata Marathon
On December 7th, 2011, 12:30-3:15 PM, the SILS Metadata Research Center is celebrating its 5th year anniversary. This event also celebrates the 80th anniversary for the School of Information and Library Science at the University of North Carolina at Chapel Hill . We’ll be holding a “Metadata Marathon” and sponsoring a lunch. Metadata experts will give lightning talks highlighting research topics/areas; breakout groups will discuss current and long-term metadata-related research needs and approaches; and the final reporting will fold into a collective publication. The celebration will close with a special report from Thomas Baker, Chief Information Officer, Dublin Core Metadata Initiative (DCMI), and co-chair of the W3C Library Linked Data Incubator Group.

Event information:

Location: Toy Lounge, Dey Hall, UNC Campus
Date: Wednesday, December 07, 2011, 12:30 PM - 3:15 PM

Sign-up

Registration for the Metadata Marathon is full. You may still join the waitlist. We will let you know if there is a cancelation.

Share

Share Your Metadata-ness and read other Metadata Marathon partipants’ burning metadata questions!

Schedule

Welcome, Jane Greenberg, Director of the Metadata Research Center

12:30 - 1:00 PM: Lightning Talks
(Click on the participant’s name to view a brief bio and description of their topic).

Speaker Topic
1.) Derek Rodriguez, Program Officer, Triangle Research Libraries Network Metadata interoperability and next generation discovery
2.) Jim Balhoff, Research Programmer, National Evolutionary Synthesis Center (NESCent) Ontology
3.) Joseph Busch, Senior Principal, Information Management, Knowledge Lead for Metadata and Taxonomy Project Performance Corporation/AE Group Metadata and social media
4.) Seth Shaw, Electronic Records Archivist, University Archives, Duke University Metadata and born-digital archival records
5.) Jill Sexton, Library Systems, UNC Libraries Institutional repositories
6.) Ryan Shaw, School of Information and Library Science (SILS) UNC Digital humanities
7.) Thomas Baker, Chief Information Officer, Dublin Core Metadata Initiative Library Linked Data
8.) Jenn Riley, Head, Carolina Digital Library and Archives (CDLA) UNC Libraries Future of metadata and metadata models
1:00 – 2:00 PM: Lunch and breakout group discussions

Lightning talks will be followed by a sponsored lunch w/breakout groups for participants (SILS students, metadata experts, and attendees). Each breakout group will discuss the topic and consider research needs.

The groups will discuss their topic freely, but they will also explore a set of questions relating to their topic. The groups will discuss their topic freely, but they will also explore a set of questions relating to their topic:

Target questions:

  1. What are the pressing research needs in your group’s designated topic/area? (Identify chief research motivators.)
  2. What research addressing these needs is feasible short term and long term?
  3. Which research approaches should be explored?

Time permitting: brainstorm and identify other research topics beyond your designated area.

2:00 - 2:10: Break
2:10 - 2:40 PM: Breakout group reporting

One person from each breakout group will report highlights, noting group consensus and perhaps areas of non-consensus. Each group will be encouraged to write up a one to two page briefing report summarizing the activity/discussion; the report will be published with all participants as authors on the group piece.
2:40 - 3:15 PM: Special talk

Thomas Baker, Chief Information Officer, DCMI, and co-chairs the W3C Library Linked Data Incubator Group.

Participant Bios and Topics

Jim Balhoff

Jim Balhoff is a research programmer in the Informatics group at the National Evolutionary Synthesis Center (NESCent), where his work aims to develop applications applying semantic technology within evolutionary biology. His primary work is in the context of the Phenoscape project, where he is the lead developer of the Phenex annotation tool and the Phenoscape Knowledgebase, and is also a developer with the Hymenoptera Anatomy Ontology project, developing semantic approaches for taxonomic descriptions. He is an active contributor to various open source projects and working groups related to biological ontologies and phyloinformatics.

Taxonomic descriptions—publications that present new species—contain a wealth of qualitative descriptive information about the physical characteristics of the vast diversity of organisms on Earth. However, the content in these descriptions is composed in natural language, scattered across thousands of publications and not available for bulk query, and therefore consumed almost exclusively by other taxonomists. We would like alter the practice of taxonomy to make these original descriptions available as computable data, providing for cross-publication linkages and intelligent queries. We are applying Semantic Web technologies such as RDF and OWL ontologies to taxonomy to make this wealth of data accessible to all areas of biology. Several hurdles must be overcome, including educating taxonomists about semantic technologies, developing required resources such as anatomy ontologies, and building software tools facilitating new taxonomic workflows.

Joseph Busch

Mr. Busch helps organizations develop metadata frameworks and taxonomy strategies to ensure that content realizes its highest value through re-use and re-purposing. He has extensive knowledge and experience developing content architectures consisting of metadata frameworks, taxonomies and other information management methods to implement effective information management applications such as search engines, portals, web sites, content management systems, digital asset management systems, document management systems, knowledge management systems, e-learning, e-government and e-commerce. Mr. Busch was President of the American Society for Information Science and Technology in 2001, and was a member of the Dublin Core Metadata Initiative Board of Directors from 2002-2008.

Research goals:

In 2008 I visited UNC and gave a talk about the simple YouTube taxonomy that makes it possible to browse the huge video archive. Social networking services are based well-structured descriptions of individuals and organizations from which potential relationships can be inferred and made explicit via transactions – all based on named entities triples with simple, commonly understood and standardized relationship labels. The clever hash tag (#tag) indexing method which evolved on twitter has been adapted by social media websites and proven especially effective in activism, politics and merchandising. What patterns and practices are emerging across social metadata? How are people and organizations using social metadata to maximize their messaging? How are individuals finding content that they are interested in?

Jenn Riley

Jenn Riley is the Head of the Carolina Digital Library and Archives at the University of North Carolina at Chapel Hill. In this position she leads a department that combines digital technologies with library and archival collections to support the work of scholars, students, and librarians at UNC and beyond. In this role, Jenn also works to enhance faculty digital research and scholarship, builds partnerships to advance the state of the art in digital libraries, and develops sustainable and streamlined workflows for the publication of digital content. Prior to arriving at the University of North Carolina in 2010, Jenn was the Metadata Librarian with the Indiana University Digital Library Program.

At the Metadata Marathon, Jenn will stand up on a soapbox and alternately preach, lecture, and rant about where library metadata efforts need to go in order to provide useful services to our patrons in the 21st century.

Derek Rodriguez

Derek Rodriguez serves as a Program Officer with the Triangle Research Libraries Network. In almost 2 decades in the field of library automation he has worked on a wide variety of catalog and metadata related projects. Since 2007 he has served as the project manager for the TRLN Endeca Project. Derek is also a Ph.D. student at the School of Information and Library. His research and teaching interests include the value and impact of libraries, methods for exploring information seeking and use behaviors, academic libraries and information services for higher education, and managing information technologies in libraries.

Metadata interoperability and next generation discovery: In recent years, libraries have replaced legacy online catalogs with next generation discovery systems. ‘Next gen’ systems can harvest and index metadata in a wide variety of formats from across the network, unlike their legacy predecessors which were limited to indexing local databases of MARC records. In this lightning talk, I will present ways the Triangle Research Libraries Network has used its Next generation discovery system, Search TRLN, to share metadata across institutions within the consortium. Benefits of sharing metadata include expanded discovery options for library users and reduced metadata maintenance costs within the consortium.

Potential research questions related to metadata sharing include:
- Under what conditions can libraries share metadata?
- What lessons from this experience can inform network-level metadata sharing initiatives?
- What are the implications of sharing the responsibility for metadata quality within a consortium?
- What skill-sets are needed among metadata managers who maintain or make decisions to use shared metadata?

Jill Sexton

Jill Sexton is Information Infrastructure Architect at UNC Chapel Hill’s Davis Library. She manages the technical development of the Carolina Digital Repository and UNC’s online catalog. Prior to this, she held various positions within the UNC Libraries, including Applications Analyst, Systems Librarian, and Digitization Projects Librarian at Documenting the American South. Jill’s professional interests include interoperable systems development, digital preservation, information discovery, and IT project management.

The selection of a metadata schema for a repository system is influenced by several factors: the nature of the collections and resources to be managed; the access needs of end users; staff expertise; and system limitations. When the objects to be managed are similar in nature, or the repository software can only handle a specific metadata schema, the selection process is usually simple. When the nature of the content is diverse and the system does not limit you to a single schema, the choice becomes less clear. The Carolina Digital Repository (CDR) collects heterogeneous materials with varying levels of organization and description, divergent access requirements, and different metadata schemas. I will speak about the challenges we encounter as we work to normalize these data within our repository and ensure interoperability with other systems in the future.

Ryan Shaw

I’m an assistant professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill. My research focuses on how events and periods are used as conceptual structures for digitally organizing narrative information across different types of media, and how information systems might better support the expression of multiple perspectives on events and their relationships.

The word “metadata” implies something separate from the “data.” In the case of digitized books, “metadata” is the kind of stuff typically found in MARC records, while the “data” is the actual book text and/or page images. Yet recent trends in humanist scholarship using algorithmic text analysis suggest that this separation is artificial, and may actually create a barrier to certain kinds of study. Using these trends as a point of departure, I would like to consider how far we can expand or abuse our traditional notions of what metadata is.

Seth Shaw

Seth Shaw is the Electronic Records Archivist for Duke University Archives where he is responsible for everything born-digital in both the University Archives and Rubenstein Rare Book & Manuscript Library. He received his Bachelors of Science in Information Systems from Brigham Young University – Idaho in 2005 and his Masters of Science in Information, Archives & Records Management from the University of Michigan’s School of Information in 2007. Seth teaches the “Managing Electronic Records in Archives & Special Collections” workshop for SAA and has presented at various conferences on several topics. He is also the creator of Duke’s Data Accessioner software.

Metadata and born-digital archival records:

Despite the advances in metadata for digital materials in libraries and archives the metadata standards and practices for born-digital electronic records have not kept pace. Technical metadata is well understood for discrete file objects whereas access control, relationship/ecosystem, and aggregate content description metadata is not. Current metadata standards provide a strong foot-hold but may be incomplete, incompatible, or not scalable. We need research that addresses the issues above, provides proofs of concept within this domain, establish benchmarks, and guide us toward practical “best practice.”

Thomas Baker

Thomas Baker, a member of DCMI administrative committees since 1998, served from May 2005 to January 2009 as the DCMI Director Specifications and Documentation and currently serves as the Chief Information Officer of DCMI Ltd. He is the chair of the DCMI Usage Board, which he founded in 2001, and co-chair of the DCMI Architecture Forum. He has previously worked as a digital library researcher at the German National Research Center for Informatics, GMD (later Fraunhofer Society) in Bonn and and the State and University Library in Göttingen. He holds an M.S. in library science from Rutgers University, a Ph.D. in anthropology from Stanford University, and taught for two years at the Asian Institute of Technology in Bangkok. He has worked as an activity lead in projects funded by the EU (e.g., SCHEMAS, CORES, DELOS Network of Excellence) and the German National Science Foundation, DFG (KIM). From 2006 to 2009 he co-chaired the Semantic Web Deployment Working Group of the World-Wide Web Consortium (W3C) and currently co-chairs the W3C Library Linked Data Incubator Group.