INLS 210-89

Day 6

2/28/00

 

1.One minutes

Big Point

Auto-indexing

Auto-summarization

 

Questions

Ongoing questions about roles/balances of humans and machines

Given the ambiguities of language, why move away from controlled vocabs?

Do the vectors need to be updated every time you add a document? (tf/idf weights yes)

Problem with titles with “catchy” words?

What is WordNet?

What are effects of proprietary search engines for overall advance (open source?)

More about semantic indexing

Confirm that terms in doc vector are unordered (i.e., bag of words—we lose order)

 

2. The Digital Librarian’s Toolkit

Michael Levi's BLS wish list for backend tools

            Guiding principles: don't release early, be correct, don't release late, release equitably

1. Hardware: automatic failure detection and switch-over (need cheap, easy to configure soln's)

2. Database: data replication across machines & backups, data loading schedules, query optimization (many concurrent users running complex queries)

3. Configuration management: testing tools; version control (system AND apps) including fixes/patches; cross-platform!; installation tools (all or nothing--finish all machines or back out), unistall tools

4. Secruity: intrusion prevention; intrusion detection 7 analysis; safer defaults (how did they get in and what did they change?  Right now, 3-7 logs must be examined manually)

5. Site analysis tools: log analysis; session tracking; site map creationl search analysis (e.g., parse queries)

 

Komlodi, Marchionini, & Plaisant wish list

1.      Objects/items

CD tools: filtering, validating accuracy, authority, authentication

Loading, exporting

Digitization: scanning, OCR, keyframe extraction, imaging

Object naming and addressing

Redundancy checking

Storage/refreshing/migrating

File format helps (e.g., Unicode)

File helps: format (e.g., gif vs tiff), version number, item format (gif can be image of text or picture), item level (bib record, note, picture, etc.)

2.      Working with objects and collections

Directory structure tools (e.g., IBM DL separates object server from metadata server); WebToc

Browsers for special types (image browser, page image browser)

Tools for special types (key frame extraction, speech to text, text to speech)

Document conversion: GIF converters, SGML to HTML, etc.

Indexing (text, multimedia)

Link metadata with primary data (multiple layer dbms)

3.      Metadata

Standards (e.g., Dublin core)

Conversion (e.g., EAD to MARC, postscript to PDF, RTF)

Self-describing objects

4.      Users

Needs assessment tools and procedures

User profile builders/manager

Logging tools, client side? Standard formats? Analysis tools

Reference services

            FAQ

            FAQ with updates

            Listserv scanners for local, community service

            Help/suggestions

            Tours/paths/guides

            Public scheduler

            Query parsers and forwarding schedules

            Referral tools

            User communication (online discussion, collab filtering, suggestions, shared ps)

5.  Management (backend)

editors (HTML, SGML, XML, etc.)

templates for style guides

style checkers

automatic platform simulators (browsers, settings, etc.)

item gathering and labeling tools

site mapping with alternative views: relationships such as function, in and out links, user behavior

version control (backups, new versions, auto what's new, old versions, archives, broken links, etc.)

link checker (broken, updated)

bug reporter (email, auto content analysis?)

move web sites across servers

log analysis (summary + sequential)

renaming pages, moving pages (auto update all related)

site reorg tools

alerts for errors

garbage collection storage routines

encryption/de

watermarking tools

authority control tools (names, dates)

 

3. Search

Lots of tools

Htdig http://www.htdig.org/

 IRISWeb http://ils.unc.edu/iris/

 

What is the BEST search service?  Why?

There is a trend toward hybrid query and selection services

 

Netscape provides a service plus options for Excite (how is that different than the Netscape service that uses Excite engine?), Lycos, Snap, InfoSeek, etc.

 

Also specialized services such as:

Search the Web

HotBot

AOL NetFind

WebCrawler

Find Web Sites

GoTo.com

                                                   

Explore by Topic

100hot: Stars, Pics, Jokes...

Electric Library

FindLaw: Free Case Law

Mining Co. Expert Guides

Virtual Job Fair: 25,000 jobs

Yahoo!

 

Find a Product

AUTOWEB.COM Low Cost Autos

eBay - Buy, Sell, Collect.

CNET's Most Popular Products

Thomas Register American Mfrs.

                           

Find a Service

Mortgage Quotes: Today's Rates

Work-At-Home 24,000 JOBS

HomeSmart Real Estate Center

 

Find a Person

Netscape People Finder

WhoWhere? People Finder

Bigfoot People Search

 

Find a Business

YellowPages AtHand

Netscape Yellow Pages

 

How is infornautics' Electric Library different from Netscape?

http://www.elibrary.com/id/250/250/

 

4. One-minute paper

      What was the main point you learned in class today?

What is the main, unanswered question you leave class with today?