INLS 210-89
Day 5
2/21/00
1.One minutes
Big Point
Media object characteristics
MM indexing requires an integrated approach
New challenges for non-linguistic indexing
Compression (need for it)
Memory and how tech advances have altered it (writing, photography, audio, etc.)
Perceptual abilities and characteristics are also important (not only semantics)
Questions
How much preference for text-based expression is learned? (can we shift to alternatives; what about other cultures)?
Can we develop visual languages?
Is the modeling of semantics really just complex math and lots of memory—i.e., do we behave systematically?
What is difference between electronic and digital?
What is the relationship between memory and intelligence? Would people with good memories still excel in DL-augmented env?
How will copyright be managed in online env?
Reminder, DL reviews due march 20. Send me email about which DL you will do
3. Indexing, Storage, Maintenance and Access
3.1 Basics of retrieval (T set on G drive)
3.2. Discuss Thomas paper.
Database
Indexing (note the many
Usage
Term weighting
Look at current version www.loc.gov
3.3. The Digital Librarian’s Toolkit
Michael Levi's BLS wish list for backend tools
Guiding principles: don't release early, be correct, don't release late, release equitably
1. Hardware: automatic failure detection and switch-over (need cheap, easy to configure soln's)
2. Database: data replication across machines & backups, data loading schedules, query optimization (many concurrent users running complex queries)
3. Configuration management: testing tools; version control (system AND apps) including fixes/patches; cross-platform!; installation tools (all or nothing--finish all machines or back out), unistall tools
4. Secruity: intrusion prevention; intrusion detection 7 analysis; safer defaults (how did they get in and what did they change? Right now, 3-7 logs must be examined manually)
5. Site analysis tools: log analysis; session tracking; site map creationl search analysis (e.g., parse queries)
Komlodi, Marchionini, & Plaisant wish list
1. Objects/items
CD tools: filtering, validating accuracy, authority, authentication
Loading, exporting
Digitization: scanning, OCR, keyframe extraction, imaging
Object naming and addressing
Redundancy checking
Storage/refreshing/migrating
File format helps (e.g., Unicode)
File helps: format (e.g., gif vs tiff), version number, item format (gif can be image of text or picture), item level (bib record, note, picture, etc.)
2. Working with objects and collections
Directory structure tools (e.g., IBM DL separates object server from metadata server); WebToc
Browsers for special types (image browser, page image browser)
Tools for special types (key frame extraction, speech to text, text to speech)
Document conversion: GIF converters, SGML to HTML, etc.
Indexing (text, multimedia)
Link metadata with primary data (multiple layer dbms)
3. Metadata
Standards (e.g., Dublin core)
Conversion (e.g., EAD to MARC, postscript to PDF, RTF)
Self-describing objects
4. Users
Needs assessment tools and procedures
User profile builders/manager
Logging tools, client side? Standard formats? Analysis tools
Reference services
FAQ
FAQ with updates
Listserv scanners for local, community service
Help/suggestions
Tours/paths/guides
Public scheduler
Query parsers and forwarding schedules
Referral tools
User communication (online discussion, collab filtering, suggestions, shared ps)
5. Management (backend)
editors (HTML, SGML, XML, etc.)
templates for style guides
style checkers
automatic platform simulators (browsers, settings, etc.)
item gathering and labeling tools
site mapping with alternative views: relationships such as function, in and out links, user behavior
version control (backups, new versions, auto what's new, old versions, archives, broken links, etc.)
link checker (broken, updated)
bug reporter (email, auto content analysis?)
move web sites across servers
log analysis (summary + sequential)
renaming pages, moving pages (auto update all related)
site reorg tools
alerts for errors
garbage
collection storage routines
encryption/de
watermarking tools
authority control tools (names, dates)
4. One-minute paper
What was the main point you learned in class today?
What is the main, unanswered question you leave class with today?