INLS 210-89
Day 6
2/28/00
1.One minutes
Big Point
Auto-indexing
Auto-summarization
Questions
Ongoing questions about roles/balances of humans and machines
Given the ambiguities of language, why move away from controlled vocabs?
Do the vectors need to be updated every time you add a document? (tf/idf weights yes)
Problem with titles with “catchy” words?
What is WordNet?
What are effects of proprietary search engines for overall advance (open source?)
More about semantic indexing
Confirm that terms in doc vector are unordered (i.e., bag of words—we lose order)
2. The Digital Librarian’s Toolkit
Michael Levi's BLS wish list for backend tools
Guiding principles: don't release early, be correct, don't release late, release equitably
1. Hardware: automatic failure detection and switch-over (need cheap, easy to configure soln's)
2. Database: data replication across machines & backups, data loading schedules, query optimization (many concurrent users running complex queries)
3. Configuration management: testing tools; version control (system AND apps) including fixes/patches; cross-platform!; installation tools (all or nothing--finish all machines or back out), unistall tools
4. Secruity: intrusion prevention; intrusion detection 7 analysis; safer defaults (how did they get in and what did they change? Right now, 3-7 logs must be examined manually)
5. Site analysis tools: log analysis; session tracking; site map creationl search analysis (e.g., parse queries)
Komlodi, Marchionini, & Plaisant wish list
1. Objects/items
CD tools: filtering, validating accuracy, authority, authentication
Loading, exporting
Digitization: scanning, OCR, keyframe extraction, imaging
Object naming and addressing
Redundancy checking
Storage/refreshing/migrating
File format helps (e.g., Unicode)
File helps: format (e.g., gif vs tiff), version number, item format (gif can be image of text or picture), item level (bib record, note, picture, etc.)
2. Working with objects and collections
Directory structure tools (e.g., IBM DL separates object server from metadata server); WebToc
Browsers for special types (image browser, page image browser)
Tools for special types (key frame extraction, speech to text, text to speech)
Document conversion: GIF converters, SGML to HTML, etc.
Indexing (text, multimedia)
Link metadata with primary data (multiple layer dbms)
3. Metadata
Standards (e.g., Dublin core)
Conversion (e.g., EAD to MARC, postscript to PDF, RTF)
Self-describing objects
4. Users
Needs assessment tools and procedures
User profile builders/manager
Logging tools, client side? Standard formats? Analysis tools
Reference services
FAQ
FAQ with updates
Listserv scanners for local, community service
Help/suggestions
Tours/paths/guides
Public scheduler
Query parsers and forwarding schedules
Referral tools
User communication (online discussion, collab filtering, suggestions, shared ps)
5. Management (backend)
editors (HTML, SGML, XML, etc.)
templates for style guides
style checkers
automatic platform simulators (browsers, settings, etc.)
item gathering and labeling tools
site mapping with alternative views: relationships such as function, in and out links, user behavior
version control (backups, new versions, auto what's new, old versions, archives, broken links, etc.)
link checker (broken, updated)
bug reporter (email, auto content analysis?)
move web sites across servers
log analysis (summary + sequential)
renaming pages, moving pages (auto update all related)
site reorg tools
alerts for errors
garbage
collection storage routines
encryption/de
watermarking tools
authority control tools (names, dates)
3. Search
Lots of tools
Htdig http://www.htdig.org/
IRISWeb http://ils.unc.edu/iris/
What is the BEST search service? Why?
There is a trend toward hybrid query and selection services
Netscape provides a service plus options for Excite (how is that different than the Netscape service that uses Excite engine?), Lycos, Snap, InfoSeek, etc.
Also specialized services such as:
Search the Web
HotBot
AOL NetFind
WebCrawler
Find Web Sites
GoTo.com
Explore by Topic
100hot: Stars, Pics, Jokes...
Electric Library
FindLaw: Free Case Law
Mining Co. Expert Guides
Virtual Job Fair: 25,000 jobs
Yahoo!
Find a Product
AUTOWEB.COM Low Cost Autos
eBay - Buy, Sell, Collect.
CNET's Most Popular Products
Thomas Register American Mfrs.
Find a Service
Mortgage Quotes: Today's Rates
Work-At-Home 24,000 JOBS
HomeSmart Real Estate Center
Find a Person
Netscape People Finder
WhoWhere? People Finder
Bigfoot People Search
Find a Business
YellowPages AtHand
Netscape Yellow Pages
How is infornautics' Electric Library different from Netscape?
http://www.elibrary.com/id/250/250/
4. One-minute paper
What was the main point you learned in class today?
What is the main, unanswered question you leave class with today?