Levels of Representation (How it works)

INLS 525: Managing Electronic Records

Week 3 (1/29)

Presenter Notes

Digital Materiality of Digital Culture

  • Mini-Assignment 2: You & Your Devices/Media

Presenter Notes

Which matters (more) to an Archivist?

Looking "...beyond the magic to the mechanism."

Looking at an "'opaque technology,' whereby people take the computer at 'interface value'"

What, after all, is being collected?

Presenter Notes

How Computers Remember

Presenter Notes

Bits will be Bits (But not for Long)

  • Physical media should be stored in appropriate environmental conditions.
  • Take care in handling of media.
  • Maintain integrity of bitstream through security, checksums, periodic audits & other validation.
  • Bit rot & advantages of newer media both call for periodic refresh & reformatting.
  • Ensuring the integrity of the bitstream in such transfers is extremely important.

Presenter Notes

"Errors typically occur at the juncture between analog and digital states, such as when a drive's magnetoresistive head assigns binary symbolic value to the voltage differentials it has registered, or when an e-mail message is reconstituted from independent data packets moving across the TCP/IP layer of the Internet, itself dependent on fiber-optic cables and other hardwired technologies. All forms of modern digital technology incorporate hyper-redundant error-checking routines that serve to sustain an illusion of immateriality by detecting error and correcting it, reviving the quality of the signal, like old-fashioned telegraph relays, such that any degradation suffered during a subsequent interval of transmission will not fall beyond whatever tolerances of symbolic integrity exist past which the original value of the signal (or identity of the symbol) cannot be reconstituted." (p.12, emphasis mine)

Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. Cambridge, MA: MIT Press, 2008.

Presenter Notes

Where and how does a computer store information?

  • Processor
  • CPU cache
  • ROM (Read-Only Memory)
  • RAM (Random-Access Memory)
  • Hard drive – includes both persistent & temporary or cached files (e.g. browser cache)
  • Elsewhere on a network
  • Near-line (local)
  • Off-line (local)
  • Near-line (remote)
  • Off-line (remote)

Presenter Notes

Motivations for Storage Hierarchy

  • Different forms of memory/storage have significantly different costs & performance
  • Store recent data close by, in fast, expensive, volatile storage
  • Store data that has not been used recently & is rarely used in slower, cheaper, less volatile storage

Presenter Notes

Tiered Storage

  • Matching data to appropriate storage medium, based on cost-benefit analysis and risk management, informed by knowledge of:
    • Reliability
    • Cost
    • Capacity
    • Availability
    • Size
    • Speed
  • Typically based on 3 tiers:
    • Online – "enterprise-class"
    • Near-line – "desktop-class"
    • Off-line – "archival"

Presenter Notes

Caching

  • Storing a copy of a subset of data from a slower data source to a faster (more readily available) data source
  • Examples:
    • CPU cache from main memory
    • Main memory cache from hard disk
    • Hard disk cache from CD-ROM
    • Proxy server cache from web sites

Presenter Notes

How Disks Work

Presenter Notes

Low-Level – Sectors and Clusters

  • You computer’s processor manipulates at and in the form of bitstreams, and data is stored on your computer’s hard drive as bitstreams
  • But moving the data from the hard drive to the processor depends on higher-level chunks: sectors and clusters
  • Think of mail sent to a member of a family who all live in the same house – the envelope will indicate the house address but won’t identify where that person’s bedroom is located within the house

Presenter Notes

Sectors

  • Smallest unit of storage that can be assigned an address (i.e. can be directly identified & found by the computer system)
  • Have specified size, depending on the type of storage (e.g. 2048 bytes on CD-ROM, often 512 bytes on floppy or hard drive)
  • Created when disk is low-level formatted (usually by manufacturer) with bad sectors identified by disk controller so data won’t be written to them

Presenter Notes

Clusters

  • Groups of sectors
  • Smallest unit of storage that can be tracked by the operating system
  • Sizes depends on operating system, type & size of storage device – often 2048 bytes (4 sectors of 512 bytes)
  • Defined during high-level formatting performed by operating system

Presenter Notes

File Slack

File slack description

Carrier, Brian. File System Forensic Analysis. Boston, MA: Addison-Wesley, 2005.

Presenter Notes

Magnetic Disk (e.g. Hard Drive or Floppy)

  • Bits stored as magnetic fields of different polarity
  • Magnetized surface of disk rotates under a read/write head Divided into tracks (like rings of a tree)
  • Tracks divided into sectors and clusters
  • Windows: File Allocation Table (FAT) or Master File Table (for NTFS) indicates, for given file, what clusters contain its content

Presenter Notes

Hard Drive structure.

Presenter Notes

Optical Media - CD-ROM

Optical media layers.

Presenter Notes

Volumes and Partitions

  • Volume = a storage area defined at the logical OS level, which has a single filesystem & usually resides on one disk partition
  • Partition = exists at lower level, used e.g. to set up multiple operating systems on same computer

Presenter Notes

File System

  • Access controls
  • File names & identifiers
  • File size (length)
  • Where to find files in storage (sectors and clusters)
  • MAC times
    • Modified – when the content was last changed
    • Accessed – time file was last accessed (by person or software)
    • Changed – last time metadata changed
    • Created – (implemented inconsistently, if at all, across different file systems)

Presenter Notes

File System Examples

  • ext, ext2, ext3 (Extended File System) - Linux
  • FAT16 - MS-DOS
  • FAT32 (VFAT) - Windows 95/98 - Most systems support this format. It is popular for flash-drives
  • HFS (Hierarchical File System) - Macintosh System 4-8
  • HFS+ - Macintosh System 8.1-X
  • HPFS (High Performance File System) - OS/2
  • ISOFS (ISO 9660) - Any OS that reads data from a CD
  • JFS1 (Journaled File System) - AIX (IBM)
  • MFS (Macintosh File System) - Macintosh System 1-3
  • NTFS - Windows NT - Windows 8
  • ReiserFS - Several Linux distributions
  • UFS (Unix File System) aka FFS (Fast File System) - Various flavors of Unix

Presenter Notes

Microsoft: FAT & NTFS

Presenter Notes

FAT16

Example of FAT16's structure

Mikhail, Ranish. “Partitioning Primer.” August 5, 1998.

Presenter Notes

What "Deletion" Does

Filename's first character is marked hxE5; FAT pointers to clusters are marked "free" in turn. The actual clusters are untouched.

Duong, Duc. "I/O devices and File systems." Vietnam OpenCourseWare. November 18, 2008.

Presenter Notes

NTFS

  • Directory and FAT functions are combined in Master File Table ($MFT)
  • Each MFT record assigned a unique number
  • Good for forensic discovery:
    • For small files (< about 600 bytes), content is stored directly in MFT itself & remains until overwritten by another MFT record
  • Not so good for forensic discovery:
    • After deletion of a file, NTFS replaces (overwrites) MFT record next time a new file is created

Presenter Notes

File Systems for Unix

Directory list -> inode metadata -> blocks

Farmer, Dan, and Wietse Venema. Forensic Discovery. Upper Saddle River, NJ: Addison-Wesley, 2005. Figure 3.2: Simplified structure of the UNIX file system

Presenter Notes

"Archive" Formats - Portable File Systems

  • Most popular: zip and tar [+gzip]
  • Retains important metadata that was in original file system, but does add a layer of representation information (compression & packaging) that software needs to understand
  • Compression also reduces robustness in the face of bit loss (any given bit flip is more likely to prevent recovery/rendering of content)

Presenter Notes

Linux filesystem hierarchy

Linuxconfig.org – Filesytem Basics.

Presenter Notes

Temporary Data Locations

  • Files on disk used for virtual memory management – e.g. "swap files" in Windows 95/95, "page files" in Windows NT/2000/XP/
  • Temp files
  • Various caches - e.g. browser cache, which includes copies of recently downloaded files
  • "Recent Documents" in Windows
  • Cookies – "expires" attribute can indicate quick deletion or long-term retention
  • History files – e.g. browsing & download history

Presenter Notes

Configuration & Log Files

  • Often contain information about where files are located, when last opened, user preferences, state of files when last used
  • In Windows, much of this happens in the Registry, e.g. Most Recently Used (MRU) lists, various other details in USER.DAT or NTUSER.DAT and SYSTEM.DAT
  • Internet Explorer example - Index.dat:
    • RSS feeds
    • URLs visited
    • search queries
    • recently opened files

Presenter Notes

Examine a Flash Drive

Presenter Notes

Right-click the drive letter and select "Properties."

Presenter Notes

Note the file system says "FAT"

Presenter Notes

Go up to "Tools" and select "Folder Options...".

Presenter Notes

Note the "Show hidden files and folders" option which is off by default.

Presenter Notes

Now you will see two new files and two new folders listed with "ghost" icons.

Presenter Notes

CMD directory listing.

Presenter Notes

CMD directory listing using "/a" flag

Presenter Notes

Spotlight directory

Presenter Notes

Files with "._" are Mac resource fork entries.

Presenter Notes

Hex view of resource fork.

Presenter Notes

Another hex view of a resource fork.

Presenter Notes

Forms of “Hidden Data”

Not just what you see when you open a file in its native application.

Listed roughly in order to difficulty of identification & retrieval.

Presenter Notes

Sanitization Taxonomy

Sanitization Taxonomy levels 0-5

Garfinkel, Simson L., and Abhi Shelat. "Remembrance of Data Passed: A Study of Disk Sanitization Practices." IEEE Security and Privacy 1 (2003): 17-27.

Presenter Notes

How Computers Communicate

Layers of Protocols...

A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications.

Presenter Notes

Request a Webpage

  • Let me 'splain...
  • No, there is too much. Let me sum up.

Presenter Notes

Mini-Assignment 3

Examine the files on your own computer.

Use one of the following TreeMap applications:

  • WinDirStat (Windows) - preferred for Windows but requires install privileges.
  • SpaceSniffer (Windows) - secondary but does not need to be installed.
  • Disk Inventory X (Mac)
  • KDirStat (Linux) - preferred for Linux; requires KDE.
  • GD Map (Linux) - secondary if you don't have KDE.

We will have small-group discussions about your results.

Presenter Notes