INLS 235

Day 3

1/22/2003

 

1. One minute papers (none from last week)

Points

Sharium concept

DLs as sharing services

DLs expand/change notions of collection and reference

DLs seem to depend on R&D in other areas  (e.g., IR)

 

Questions

Augmentation? (vs amplification)

Collective intelligence?

How can a DL publish without charging for service?  Free vs fee?

Quality control? Public contributions?

Agile views examples?

 

2. Collection Management (Product Management) and Collection Development

            Basic functions: Add, Delete, Maintain/update  (DBMS 101), compare: Davis Library, BLS, Wall Mart

            Management includes: policy, selection, acquisition, inventory control, evaluation/updating)

            How do libraries decide what to acquire?

                        User recommendations (e.g., patron requests, faculty recs)

                        Bibliographer actions

                                    Ask users

                                    Do usage analysis, citation analysis

                                    Participate in communities of interest (peer recommendations)

                                    Read reviews

                                    Develop and maintain profiles

                        Jobbers:  input profile, acquire and deliver new materials (including adding value such as MARC records)

            Constraints on collection development?  (e.g., space, $, IP rights, time to process, mission, user base)

            In DLs, same basic functions, decisions, and constraints, but different parameters.  E.g., spaced < spacep  Agree?

 

 

Some collection development issues

User driven vs. collection driven  (all libraries—physical and digital—have elements of both)

            Most public libraries, academic libraries are user driven.

            Many special libraries are collection driven

            Many DL projects are collection driven  (funding drives much of this)

 

How do we decide WHAT to include?  (CD policy, the Khoo paper and review policies related to conceptions of DL)

            Content (evaluation? Popularity? Webs of trust?)

            Metadata

How do we decide HOW to provide access?  (e.g., closed/open stacks; search, format, display options)

            I suggest that in today’s state of evolution, DLs have many more decisions about the HOW than physical

 

The acquisition process.  Businesses exist for physical libraries, what about DLs?

IP issues are the strongest constraint on DL development

 

Compare docsouth, ibiblio, and Perseus DLs

 

Ibiblio  www.ibiblio.org

Documenting the American South  http://docsouth.unc.edu/index.html

Perseus www.perseus.tufts.edu

 

DS                                                 Ibiblio                                       Perseus

Embedded                           self-contained                            self-contained

Library model                      Internet model                           Hypertext model

  Ed board, strong eval           open                                         ed board+convenience

  standard bib records            minimal metadata, post hoc        custom metadata

  persistence high                   ephemeral                                 persistence promising

Added value minimal           Added value minimal                 Added value high

  Access, indexing, spell         access                                      access, translation, text/images

                                                                                                  Custom concordances, tools, maps

 

3. Discuss readings as examples of special CD issues:

Khoo:  quality and authority

review policy situated within negotiations about what is a DL’

librarian as ‘community delegated curator’ in DLISE as library

librarian as manager of automated technologies in DLISE as digital artifact

various kinds of quality (content, usability, reviewers/community)

the DLISE solution as a compromise between ideal and practical

DLs not simply converting library structures and functions; not simply building tools to do library-like functions.

 

Bergmark: automatic collection building

Find and integrate the hand crafted collections

Web crawler as CD tool (beyond its other uses)

NSDL description (compare to a book jobber)

Apply IR techniques (searching and clustering): seed URLs, analyze to build centroids, merge to create dictionary that directs the focused crawl

Note many CD policy decisions (use TF/IDF weights, ignore documents with <4 overlaps with centroids, when to stop crawl, etc.)

 

4. Acquisition and Digitization

Digitization and management processes (and associated costs) flow from the CD and Acquisition policies and procedures 

Digitize AND markup (index/tag/catalog)

 

See http://www.stoa.org/guides/ for guides to photography/images, GPS coding, QTVR etc.

See http://www.oasis-open.org/cover/sgml-xml.html for Open Standards information (SGML, XML etc.)

See http://sunsite.berkeley.edu/SGML/ for an intro/overview of SGML

See http://www.tei-c.org for Text Encoding Initiative

See http://lcweb.loc.gov/ead/ for Encoded Archival Description

 

Doc South outsources digitization: double key rather than OCR.  ~$2/page with markup (=> about $1K/book)

16 mm film to VHS about $0.2/foot

VHS to AVI or MOV, need computer and card and time

 

5. Possible projects

Doc South Projects (from Natasha Smith)

 

1. we have several oral history interviews (in North Carolina Experience

project.) Right now they are just humongous MP3 files associated with

encoded transcriptions. Would anybody be interested in experimenting with

SMIL or something better?

14 interviews are available at:

http://docsouth.unc.edu/nc/econ.html#oralhist

(http://docsouth.unc.edu/nc/aaron/menu.html as one example)

 

2. If anybody is interested in learning more about Web users, they might

think about something like "Towards Web Site User's Profile or

understanding DocSouth users: log file analysis."

 

 

UNC Digital Library Projects (from Hugh Cayless)

 

1) Usage tracking

                This would involve working with diglib staff to add logging hooks to exisitng components and to write code that would harvest logged events. The logging could be done in the database or a combination of the database + logfiles.  We need, for example, views of how many people access the site per day, in what patterns, etc.  We'd be flexible in the   format of the project.  Java is preferable, but anything that will run under Apache and that isn't proprietary would probably be acceptable.

 

2) UI building

                We will need a generic guest UI that allows searching / browsing but lacks the full functionality of the application.  This would pretty much have to be done in Java, preferably Cocoon, but I'm happy to work with students to get them started.  There's also the strong possibility of department-specific interfaces, in which a department that owns a diglib collection will be able to access their digital objects within a framework that either lives on a platform they host, or which just looks like their website.  Another aspect of this might be an interface that is accessible to the visually impaired.

 

Southern Oral History Program (from Kerry Taylor)

 

In late September I recorded fifteen interviews and some of the proceedings of the Convention of the United Electrical Radio and Machine Workers Union (UE) in Raleigh. It was the first time a national union has held its convention in the South (excluding Miami) since 1886 when the Knights of Labor met in Richmond. The SOHP thought it would be a good idea to document this historic occasion and the UE150--the North Carolina local was most supportive.

 

 

What we'd like to do (with the union's encouragement and support) is to make some of the information available via a website--that might include documents, transcripts, soundclips, photos and possibly video from the convention. Beth Millwood and I were wondering whether a SILS student might be interested in working on a project like this. We have all of the content and a few ideas, but none of the web expertise.

 

The Ferris Collection (as we discussed in class)

 

1. Use the 'Give my poor heart ease' video as the core of an 'exhibit' for a DL

 

2. Other themes (Jean Ferguson is a SILS student in Dr. Ferris' storytelling class this semester and will develop a project for the story telling materials in the collection)

 

Open Video Project

 

1. We have focused on text and video, would like to do more with audio in video.  An exploratory project would be welcome.

 

2. Meng Yang has been developing a tool to help indexers catalog videos. An evaluation of the tool would be a good project.

 

Government Statistics Project

1.             A project that defines the special requirements of statistics as digital objects in a DL could be defined

 

 

2.             Ways to develop and test dynamic help in DLs (work with the team doing this already, or adapt their techniques to another DL)

 

6. Readings for next week

 

1. Wactlar, et at., (1999). Lessons learned from building a terabyte digital video library.  IEEE Computer,  32(2), 66-73.

2. Smeaton, A., Murphy, N., O’Connor, N., Marliw, S., Lee, H., McDonald, K., Browne, P. & Ye, J. (2001). The Fischlar digital video system: A digital library of broadcast TV programmes.  Proceedings of JCDL 2002, p 312-13.  (ACM DL)

Optional: Myers, B., Casares, J., Stevens, S., Dabbish, L., Yocum, D., & Corbett,  A.  (2001). A multi-view intelligent editor for digital video libraries.  Proceedings of JCDL 2001 p. 106-115.  (ACM DL).

 

7. One-minute paper

      What was the main point you learned in class today?

What is the main, unanswered question you leave class with today?