INLS 210-89

Day 5

2/21/00

 

1.One minutes

Big Point

Media object characteristics

MM indexing requires an integrated approach

New challenges for non-linguistic indexing

Compression (need for it)

Memory and how tech advances have altered it (writing, photography, audio, etc.)

Perceptual abilities and characteristics are also important (not only semantics)

 

Questions

How much preference for text-based expression is learned? (can we shift to alternatives; what about other cultures)?

Can we develop visual languages?

Is the modeling of semantics really just complex math and lots of memory—i.e., do we behave systematically?

What is difference between electronic and digital?

What is the relationship between memory and intelligence?  Would people with good memories still excel in DL-augmented env?

How will copyright be managed in online env?

 

  1. Project updates:  I have email from most of you.

Reminder, DL reviews due march 20. Send me email about which DL you will do

 

3. Indexing, Storage, Maintenance and Access

3.1 Basics of retrieval (T set on G drive)          

3.2. Discuss Thomas paper.

                        Database

                        Indexing (note the many

                        Usage

                        Term weighting

                        Look at current version www.loc.gov

 

3.3. The Digital Librarian’s Toolkit

Michael Levi's BLS wish list for backend tools

            Guiding principles: don't release early, be correct, don't release late, release equitably

1. Hardware: automatic failure detection and switch-over (need cheap, easy to configure soln's)

2. Database: data replication across machines & backups, data loading schedules, query optimization (many concurrent users running complex queries)

3. Configuration management: testing tools; version control (system AND apps) including fixes/patches; cross-platform!; installation tools (all or nothing--finish all machines or back out), unistall tools

4. Secruity: intrusion prevention; intrusion detection 7 analysis; safer defaults (how did they get in and what did they change?  Right now, 3-7 logs must be examined manually)

5. Site analysis tools: log analysis; session tracking; site map creationl search analysis (e.g., parse queries)

 

Komlodi, Marchionini, & Plaisant wish list

1.      Objects/items

CD tools: filtering, validating accuracy, authority, authentication

Loading, exporting

Digitization: scanning, OCR, keyframe extraction, imaging

Object naming and addressing

Redundancy checking

Storage/refreshing/migrating

File format helps (e.g., Unicode)

File helps: format (e.g., gif vs tiff), version number, item format (gif can be image of text or picture), item level (bib record, note, picture, etc.)

2.      Working with objects and collections

Directory structure tools (e.g., IBM DL separates object server from metadata server); WebToc

Browsers for special types (image browser, page image browser)

Tools for special types (key frame extraction, speech to text, text to speech)

Document conversion: GIF converters, SGML to HTML, etc.

Indexing (text, multimedia)

Link metadata with primary data (multiple layer dbms)

3.      Metadata

Standards (e.g., Dublin core)

Conversion (e.g., EAD to MARC, postscript to PDF, RTF)

Self-describing objects

4.      Users

Needs assessment tools and procedures

User profile builders/manager

Logging tools, client side? Standard formats? Analysis tools

Reference services

            FAQ

            FAQ with updates

            Listserv scanners for local, community service

            Help/suggestions

            Tours/paths/guides

            Public scheduler

            Query parsers and forwarding schedules

            Referral tools

            User communication (online discussion, collab filtering, suggestions, shared ps)

5.  Management (backend)

editors (HTML, SGML, XML, etc.)

templates for style guides

style checkers

automatic platform simulators (browsers, settings, etc.)

item gathering and labeling tools

site mapping with alternative views: relationships such as function, in and out links, user behavior

version control (backups, new versions, auto what's new, old versions, archives, broken links, etc.)

link checker (broken, updated)

bug reporter (email, auto content analysis?)

move web sites across servers

log analysis (summary + sequential)

renaming pages, moving pages (auto update all related)

site reorg tools

alerts for errors

garbage collection storage routines

encryption/de

watermarking tools

authority control tools (names, dates)

 

4. One-minute paper

      What was the main point you learned in class today?

What is the main, unanswered question you leave class with today?