Research interests

I’m a classificationist: I study the selection, description, and arrangement of collections.The term collections often brings to mind physical things, like collections of stamps or paperweights, and some collections do bring together physical things. But other collections bring together digital things: Netflix is a collection of time-based visual media; the World Wide Web is a collection of documents encoded in HTML. And sometimes, the “things” that are being collected are data, in physical or digital form. A library is a collection of books, and a library catalog is a collection of data about books. A library catalog is a dataset. In primary school, when I first started using the library, the catalog of book data was instantiated on paper cards, placed inside a special card cabinet. Today, the catalog of book data is digital. But the card catalog and the online catalog are both datasets.

Another way to describe myself as a scholar, then, is: I study the design and implementation of datasets.

Over the past ten years, the language of data has become ubiquitous and urgent. The algorithms that mediate online interactions rely on data. Universities have been rushing to create data science programs, training students in statistical techniques to wrest economically productive patterns from the data that seems to have sprung up everywhere, arising from our digital activities like shoots from magic beans.

The link between classification and data design is vital. It’s my expertise as a classificationist that informs my distinction as a scholar. The conceptual foundation that underpins my research—that there is a fundamental similarity between collections of physical things, collections of digital things, and collections of data about things—enables me to understand data in the same way that I understand cheese in the supermarket: through selection, description, and arrangement.

In American supermarkets, for instance, some of the cheese is often placed in the dairy case near the milk. The salient characteristics for that cheese are brand, shape, and size, like many other packaged commodities. But there may be, in addition, a totally separate cheese section, perhaps near the bakery or deli, where the cheese is arranged by location of origin, regional appellation, and the animal that produced the milk—characteristics that take specialized expertise to understand, similar to wine. The classificatory protocols implemented by the supermarket instantiate, in effect, two distinct entities: “fancy cheese” and “normal cheese.” Each entity has different characteristics—different kinds of data—associated with it.

To see how supermarket cheese sections provide a critical lens onto other data projects, consider two datasets that track episodes of organized violence across the world. One dataset, the Armed Conflict Location and Event Database (ACLED), includes “armed conflicts” that involve two or more “actors.” Another dataset, the Global Terrorism Database (GTD) identifies “terrorist events” that involve “perpetrators” and “victims” (also called “targets”). These two data projects include many of the same events, but an armed conflict between two actors is, I would argue, a different kind of entity than a terrorist event inflicted upon a victim by a perpetrator, just as Tillamook slices in the dairy case are a different kind of entity from freshly cut slabs of Swiss Gruyère in the “fancy” section.

My expertise as a classificationist enables me to identify the structural similarities between data enacted in supermarkets and data enacted in event-tracking datasets. As a classificationist, moreover, I know that aggregating data from ACLED and the GTD can never be a purely technical matter, because the underlying concepts are not equivalent.

As a classificationist, in short, I am a data maven of rare power and precision.