ISA Literature Screening Dataset


Every five to ten years, the United States Environmental Protection Agency (EPA) synthesizes the most recent scientific research for six air pollutants (nitrogen oxides, sulfur oxides, particulate matter, carbon monoxide, ozone, and lead). Officially called the Integrated Science Assessments (ISAs), these documents aim to provide an updated, comprehensive understanding of the health and welfare effects of these air pollutants, which lays the scientific foundation for related environmental policies.

This page provides the literature screening datasets for 2013 and 2020 ISAs for Ozone and Related Photochemical Oxidants. This goal is to promote research and development of computational techniques to help EPA scientists more efficiently screen scientific publications to find policy-relevant ones to be referenced in future ISAs. This resource is useful for researchers in applied machine learning, natural language processing, information retrieval, and digital libraries communities.

Data Download

07/01/2022: The dataset is now published on!

2013 Ozone ISA Data

2020 Ozone ISA Data

Data Dictionary

Contact Us

This work is a collaboration between the University of North Carolina at Chapel Hill and the United States Environmental Protection Agency. If you would like to share feedback or report issues, please send your e-mail to wangyue [AT] unc [DOT] edu.