Levels of Representation (How it works)
INLS 525: Managing Electronic Records
Week 3 (1/29)
Digital Materiality of Digital Culture
- Mini-Assignment 2: You & Your Devices/Media
Which matters (more) to an Archivist?
Looking "...beyond the magic to the mechanism."
Looking at an "'opaque technology,' whereby people take the computer at 'interface value'"
What, after all, is being collected?
How Computers Remember
Bits will be Bits (But not for Long)
- Physical media should be stored in appropriate environmental conditions.
- Take care in handling of media.
- Maintain integrity of bitstream through security, checksums, periodic audits & other validation.
- Bit rot & advantages of newer media both call for periodic refresh & reformatting.
- Ensuring the integrity of the bitstream in such transfers is extremely important.
"Errors typically occur at the juncture between analog and digital states, such as when a drive's magnetoresistive head assigns binary symbolic value to the voltage differentials it has registered, or when an e-mail message is reconstituted from independent data packets moving across the TCP/IP layer of the Internet, itself dependent on fiber-optic cables and other hardwired technologies. All forms of modern digital technology incorporate hyper-redundant error-checking routines that serve to sustain an illusion of immateriality by detecting error and correcting it, reviving the quality of the signal, like old-fashioned telegraph relays, such that any degradation suffered during a subsequent interval of transmission will not fall beyond whatever tolerances of symbolic integrity exist past which the original value of the signal (or identity of the symbol) cannot be reconstituted." (p.12, emphasis mine)
Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. Cambridge, MA: MIT Press, 2008.
Where and how does a computer store information?
- Processor
- CPU cache
- ROM (Read-Only Memory)
- RAM (Random-Access Memory)
- Hard drive – includes both persistent & temporary or cached files (e.g. browser cache)
- Elsewhere on a network
- Near-line (local)
- Off-line (local)
- Near-line (remote)
- Off-line (remote)
Motivations for Storage Hierarchy
- Different forms of memory/storage have significantly different costs & performance
- Store recent data close by, in fast, expensive, volatile storage
- Store data that has not been used recently & is rarely used in slower, cheaper, less volatile storage
Tiered Storage
- Matching data to appropriate storage medium, based on cost-benefit analysis and risk management, informed by knowledge of:
- Reliability
- Cost
- Capacity
- Availability
- Size
- Speed
- Typically based on 3 tiers:
- Online – "enterprise-class"
- Near-line – "desktop-class"
- Off-line – "archival"
Caching
- Storing a copy of a subset of data from a slower data source to a faster (more readily available) data source
- Examples:
- CPU cache from main memory
- Main memory cache from hard disk
- Hard disk cache from CD-ROM
- Proxy server cache from web sites
How Disks Work
Low-Level – Sectors and Clusters
- You computer’s processor manipulates at and in the form of bitstreams, and data is stored on your computer’s hard drive as bitstreams
- But moving the data from the hard drive to the processor depends on higher-level chunks: sectors and clusters
- Think of mail sent to a member of a family who all live in the same house – the envelope will indicate the house address but won’t identify where that person’s bedroom is located within the house
Sectors
- Smallest unit of storage that can be assigned an address (i.e. can be directly identified & found by the computer system)
- Have specified size, depending on the type of storage (e.g. 2048 bytes on CD-ROM, often 512 bytes on floppy or hard drive)
- Created when disk is low-level formatted (usually by manufacturer) with bad sectors identified by disk controller so data won’t be written to them
Clusters
- Groups of sectors
- Smallest unit of storage that can be tracked by the operating system
- Sizes depends on operating system, type & size of storage device – often 2048 bytes (4 sectors of 512 bytes)
- Defined during high-level formatting performed by operating system
File Slack

Carrier, Brian. File System Forensic Analysis. Boston, MA: Addison-Wesley, 2005.
Magnetic Disk (e.g. Hard Drive or Floppy)
- Bits stored as magnetic fields of different polarity
- Magnetized surface of disk rotates under a read/write head
Divided into tracks (like rings of a tree)
- Tracks divided into sectors and clusters
- Windows: File Allocation Table (FAT) or Master File Table (for NTFS) indicates, for given file, what clusters contain its content

Optical Media - CD-ROM

Volumes and Partitions
- Volume = a storage area defined at the logical OS level, which has a single filesystem & usually resides on one disk partition
- Partition = exists at lower level, used e.g. to set up multiple operating systems on same computer
File System
- Access controls
- File names & identifiers
- File size (length)
- Where to find files in storage (sectors and clusters)
- MAC times
- Modified – when the content was last changed
- Accessed – time file was last accessed (by person or software)
- Changed – last time metadata changed
- Created – (implemented inconsistently, if at all, across different file systems)
File System Examples
- ext, ext2, ext3 (Extended File System) - Linux
- FAT16 - MS-DOS
- FAT32 (VFAT) - Windows 95/98 - Most systems support this format. It is popular for flash-drives
- HFS (Hierarchical File System) - Macintosh System 4-8
- HFS+ - Macintosh System 8.1-X
- HPFS (High Performance File System) - OS/2
- ISOFS (ISO 9660) - Any OS that reads data from a CD
- JFS1 (Journaled File System) - AIX (IBM)
- MFS (Macintosh File System) - Macintosh System 1-3
- NTFS - Windows NT - Windows 8
- ReiserFS - Several Linux distributions
- UFS (Unix File System) aka FFS (Fast File System) - Various flavors of Unix
Microsoft: FAT & NTFS
FAT16

Mikhail, Ranish. “Partitioning Primer.” August 5, 1998.
What "Deletion" Does

Duong, Duc. "I/O devices and File systems." Vietnam OpenCourseWare. November 18, 2008.
NTFS
- Directory and FAT functions are combined in Master File Table ($MFT)
- Each MFT record assigned a unique number
- Good for forensic discovery:
- For small files (< about 600 bytes), content is stored directly in MFT itself & remains until overwritten by another MFT record
- Not so good for forensic discovery:
- After deletion of a file, NTFS replaces (overwrites) MFT record next time a new file is created
File Systems for Unix

Farmer, Dan, and Wietse Venema. Forensic Discovery. Upper Saddle River, NJ: Addison-Wesley, 2005. Figure 3.2: Simplified structure of the UNIX file system
"Archive" Formats - Portable File Systems
- Most popular: zip and tar [+gzip]
- Retains important metadata that was in original file system, but does add a layer of representation information (compression & packaging) that software needs to understand
- Compression also reduces robustness in the face of bit loss (any given bit flip is more likely to prevent recovery/rendering of content)

Linuxconfig.org – Filesytem Basics.
Temporary Data Locations
- Files on disk used for virtual memory management – e.g. "swap files" in Windows 95/95, "page files" in Windows NT/2000/XP/
- Temp files
- Various caches - e.g. browser cache, which includes copies of recently downloaded files
- "Recent Documents" in Windows
- Cookies – "expires" attribute can indicate quick deletion or long-term retention
- History files – e.g. browsing & download history
Configuration & Log Files
- Often contain information about where files are located, when last opened, user preferences, state of files when last used
- In Windows, much of this happens in the Registry, e.g. Most Recently Used (MRU) lists, various other details in USER.DAT or NTUSER.DAT and SYSTEM.DAT
- Internet Explorer example - Index.dat:
- RSS feeds
- URLs visited
- search queries
- recently opened files
Examine a Flash Drive











Forms of “Hidden Data”
Not just what you see when you open a file in its native application.
Listed roughly in order to difficulty of identification & retrieval.
Sanitization Taxonomy

Garfinkel, Simson L., and Abhi Shelat. "Remembrance of Data Passed: A Study of Disk Sanitization Practices." IEEE Security and Privacy 1 (2003): 17-27.
How Computers Communicate
Layers of Protocols...
A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications.
Request a Webpage
- Let me 'splain...
- No, there is too much. Let me sum up.
Mini-Assignment 3
Examine the files on your own computer.
Use one of the following TreeMap applications:
- WinDirStat (Windows) - preferred for Windows but requires install privileges.
- SpaceSniffer (Windows) - secondary but does not need to be installed.
- Disk Inventory X (Mac)
- KDirStat (Linux) - preferred for Linux; requires KDE.
- GD Map (Linux) - secondary if you don't have KDE.
We will have small-group discussions about your results.