Data Rescue Efforts

Astronomical Plate Collection and Preservation in China
Botanic Garden and Botanical Museum Berlin-Dahlem
Dominion Astrophysical Observatory
International Environmental Data Rescue Organization
Royal Observatory of Belgium
U.S. Geological Survey
U.S. National Archives and Records Administration

Astronomical Plate Collection and Preservation in China

The Astronomical plate collection and preservation project started in July 2008. During the first three years of its currency some important milestones have been reached:

  1. A warehouse specially constructed for storing and preserving astronomical plates was first refurbished so as to maintain an environment that was nearly constant temperature and humidity, and free from dust and moths.
  2. 28994 astronomical plates were then moved into the warehouse. They include 957 plates from Tsingtao Astronomical Observatory, 975 plates from Yunnan Astronomical Observatory, 10624 plates from the National Astronomical Observatories of China (previously called Beijing Astronomical Observatory), 6338 plates from Shanghai Astronomical Observatory, and 10100 plates from Purple Mountain Astronomical Observatory (Nanjing).
  3. An online metadata catalogue for those plates is nearly completed. We have adopted the metadata format of the Wide-Field Plate Database, which was developed by a group at the Bulgarian National Academy of Sciences in Sofia (http://www.skyarchive.org/). When this aspect of the project is completed, the metadata database will be released to the public via the Chinese Virtual Observatory.
  4. Schemes for digitizing the plates are still under discussion. Suggestions are comments are very welcome.

Contributors of the project include: Chen Li, Cui Chenzhou, Fu Guohong, Gao Shuling, Hao Jinxin, Hou Jinliang, Jiang Shiyang, Jin Wenjing, Lan Songzhu, Li Jing, Li Jingqian, Li Yan, Mao Yaqing, Wang Qi, Wang Shuhe, Wang Yi, Su Hongjun, Sun Linan, Tang Zhenghong, Yao Baoan, Yin Jisheng, Zhang Jianwei, Zhang Chunsheng, Zhao Jianhai and Zhao Yongheng


Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)

The Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM, http://www.bgbm.org) based at Freie Universität Berlin (FUB) provides a combination of international collaborative efforts and scientific production in both systematic research and biodiversity informatics. The BGBM is one of the leading natural history research institutions in Europe and maintains extensive scientific collections of herbarium specimens (about 3.5 million), one of the world's largest living plants collections, as well as the most complete botanical library in Germany. It early specialised in the development and implementation of protocols, data standards, and software for international networking of distributed and heterogeneous biodiversity information. The main focus in biodiversity informatics are on digitization, networking, and interoperability of collection data as well as modelling taxonomic workflows and taxonomic data processing (http://www.bgbm.org/biodivinf/).

The BGBM hosts numerous databases and information systems and is connected with a Gigabit backbone and Gigabit connection to the GÉANT network via GWIN. In July 2011, the BGBM started a 3 year project (reBiND, http://reBiND.bgbm.org) funded by the Deutsche Forschungsgemeinschaft (German Research Foundation) with the aim to develop cost-efficient workflows for rescuing legacy biodiversity databases. reBiND workflows will combine software tools for transforming data stored in outdated database systems into well-documented, standardized, and commonly understood XML-formats with a system for storing, documenting, and publishing the information as a web service. reBiND closely cooperates with the CODATA Data at Risk Task Group and will help to rescue threatened biodiversity databases urgently needed to address pressing scientific questions in a rapidly changing environment.


Dominion Astrophysical Observatory

The Astrophysical Observatory (DAO), now also known through the title Herzberg Institute of Astrophysics (HIA), of which it is part, is Canada's largest (though not oldest) optical astronomical observatory. Dating from 1918, it grew from Ottawa astronomer J.S. Plaskett's desire for a site which experienced better observing conditions than were routine at the Dominion Observatory in Ottawa, and which would be equipped with the largest telescope then known - a reflector of diameter 72 inches, or 1.8 metres - where its users could better carry out forefront research. While the environs of Victoria in BC were not of the quality of those of the coastal mountains of California, Plaskett's compromise of an adequately good site, reasonable proximity to a city and University, plus the fact that a railway ran past the foot of the hill that he had earmarked, brought about the founding of a research institution that was, and always has been, heavily focussed on observational astronomy.

Today the DAO pursues a range of research projects that literally span the universe, from solar-system objects via stars to external galaxies and cosmology, and most researchers base their work on data acquired by major telescopes (necessarily overseas) or from space missions. A significant portion of the staff effort is also nowadays directed towards the development and construction of instrumentation for Canada's telescopes in Hawaii and Chile.

From the inauguration of the big telescope in 1918 until the 1980s, all observations made with it were on photographic plates. Observational astronomers never discard data, with good reason - celestial objects change in a multitude of ways, and the ability to return to some former observations can be crucial in clinching the period of a binary system or the magnitude (as well as the period) of brightness changes. A second telescope, smaller in aperture (48 inches, or 1.2 metres) but capable of much higher spectral resolution, was added in 1961. Even though observing had become almost completely digital by the 1990s, the plates remain, catalogued and stored in fair condition in the DAO basement. The photographic archive of the 72-inch telescope totals over 93,000 spectra; that of the 48-inch adds another 16,500.

Discussions on how to preserve a heritage of data of that magnitude, richness and value have rumbled on for years. The DAO was one of the very few observatories in the world to maintain its PDS microdensitometer (PDS being the manufacturer's initials) in some semblance of working order, so that scanning of archival material could be carried out if needed - as was indeed done during the 1990s. Although it has since been a struggle to ensure that the observatory's heritage has the attention and care that it merits, we are at last making progress; an upgrade of the PDS data acquisition hardware and software became necessary and has now been completed, a part-time technician has been trained to operate the instrument, and processes have been developed which partially automate the procedures. A prime concern now is to anchor the project with enough resources to ensure that a major part of the digitising work can be completed; demands by external users will be one proof of the value of recovering this historic scientific information.

The value of historic stellar spectra for modern research was amply demonstrated through the Ozone Project*, in which concentrations of the Earth's ozone were measured from photographic stellar spectra that had been observed in the 1920s and 1930s (almost before ozone was heard of!), thus offering a completely novel source of such information. It illustrated well the true transdisciplinary nature of so many data in science.

*see, e.g. The Physics Teacher, 47, p. 22, 2009

International Environmental Data Rescue Organization (IEDRO)

The data rescued by the International Environmental Data Rescue Organization (IEDRO) enables the meteorological and scientific communities to provide more accurate severe weather forecasting and to understand climate change. This knowledge offers the world community a greater ability to more accurately predict long-range weather patterns.

In 2000, Dr. Sharon Nicholson representing Florida State University approached the National Oceanic and Atmospheric Administration (NOAA)'s National Weather Service's International Activities Office (IAO) to set up a project for locating and rescuing historic weather observations in Africa. IAO Chief Dr. Martin Yerg provided the initial funding for the effort through U.S. donations to the World Meteorological Organization's Voluntary Cooperation Program (VCP). Shortly thereafter, Richard Crouthamel, D.Sc., assumed the responsibility for managing the project. He focused on six African countries: Kenya, Malawi, Mozambique, Niger, Senegal, and Zambia. Comuters, digital cameras, copy stands, and software were purchased and shipped to each of the six countries. NOAA employees Larry Nicodemus and Mark Seiderman from NCDC, Wasilla Thiaw, PH.D. from NCEP, and Ken Clark from IAO traveled to each of the six countries. The group installed computers and instructed the staff from each National Meteorological Service on data rescue and imaging. Since that time these six countries have been imaging and sending thier historic weather observations to NOAA for digitization.

In 2004, NOAA reduced the funds available to data rescue and digitization activities. In light of this, Dr. Richard Crouthamel retired from the IAO and formed IEDRO to carry on this important work using private donations as well as some federal funds.

Since that time, IEDRO gained U.S. Tax Exempt 501(c)(3) status; and became a major player in the international field of environmental data rescue and digitization, working closely with the World Meteorological Organization, NOAA, and the weather services of many other countries.


Preserving Astronomical Images at the Royal Observatory Belgium

At the Royal Observatory of Belgium we have a collection of ~25,000 wide-field photographic plates (images), taken in the course of the 20th century. They constitute a unique record of the past appearances of the sky, and enable the study of transient phenomena as well as the study of phenomena for which a long time-base is needed.

These photographic plates are in danger because of the ageing of the emulsions. Most are stored in places that are difficult to access, and are thus not easily exploitable scientifically. Some have been stored under incorrect conditions and have lost their emulsions, and therewith their scientific content, through attacks by fungi.

The way to safeguard the scientific information on these plates and to make it widely accessible is to digitise them. At the Royal Observatory we therefore designed and built a high-precision scanner intended to digitise photographic plates with the best possible astrometric and photometric accuracy. The scanner was built as a pilot project initiated by the Belgian Federal Space Policy and was a collaboration between several Belgian Federal Scientific Institutes, each of them having collections to be digitised - photographs of artistic objects and aerial photographs as well as astronomical images. The scanner is now operational (though not yet completely optimal), and there is good hope to get funding to digitise our own collection. A picture of the scanner, DAMIAN, is shown in Fig. 1.

Other astronomical observatories in Europe also acquired important collections of wide-field photographic images in the course of the 20th century. Some of those collections are in good hands, but others are in danger, for different reasons. Some institutes stopped using photographic techniques several decades ago, and have since lost the knowledge and skills for handling such plates. Others have set up other priorities that favour different observing technologies, and in so doing they have lost interest in their older collections. In some cases the funds needed to preserve and maintain the collections in acceptable condition are just not there.

10 or more years ago the Royal Observatory of Belgium initiated a project, called UDAPAC, in which the Royal Observatory would act as a host for these collections of endangered photographic plates. An air-conditioned archive room was set up adjacent to DAMIAN (see Fig. 2) so that plates stored at the Royal Observatory could easily be digitised.

UDAPAC is progressing, but slowly. There is no funding yet for it, and only one institute has so far started the negotiations to transfer its plate collection. At present it is not possible to do mass digitisation of plates other than the ROB's own collections.


Data and Data Rescue at the U.S. Geological Survey

The U.S. Geological Survey (USGS) is a science organization that provides impartial information on the health of our ecosystems and environment, the natural hazards that threaten us, the natural resources we rely on, the impacts of climate and land-use change, and the core science systems that help us provide timely, relevant, and useable information. As the United States' largest water, earth, and biological science and civilian mapping agency, the USGS collects, monitors, analyzes, and provides scientific understanding about natural resource conditions, issues, and problems. The diversity of our scientific expertise enables us to carry out large-scale, multi-disciplinary investigations and provide impartial scientific information to resource managers, planners, and other customers.

Since 2006 the USGS has sponsored a Data Rescue Project to preserve and make accessible legacy USGS science records. The USGS Data Rescue Project has continued to expand and seek USGS science data sets at risk of loss due to obsolescence of media or format, extreme issues with the storage of the records or data, or to make data better accessible to our customers or in the furtherance of science. Losing scientific data, e.g., data on tapes that can no longer be read, data stored in a basement that becomes flooded, or data on paper that is not accessible to those that could use it are problems which USGS must continue to address. Be it digital preservation, inventorying, boxing and sending records to proper Federal records storage at a National Archives and Records Administration facility, purchasing hardware or software to move data from outdated media, or adding metadata to records, all are eligible for help through the Data Rescue Program.

USGS Data Rescue Example: Historical Files from Federal Government Mineral Exploration-Assistance Programs, 1950-1974.

More than 5000 original historical minerals exploration-assistance dockets (paper, carbon paper copies, copies of maps, graph copies, blueline copies of maps, linen or mylar maps, and other fragile and sometimes poor quality media) of vintage 1950-70 era were electronically scanned and are now available to the public. These dockets provide access to data that are unique or difficult to recreate but which are invaluable to land and resource management organizations and the minerals industry.

More information can be found at https://erosvpn.cr.usgs.gov/dockets/,DanaInfo=minerals.usgs.gov+


Preserving United States Government Records of the Past, Present and Future

The U.S. National Archives and Records Administration (NARA) is the Federal government agency responsible for preserving and making available to the public the records created in the course of business conducted by the United States Federal government. The National Archives was established in 1934, but its major holdings date back to 1775. They capture the sweep of the past: slave ship manifests and the Emancipation Proclamation; captured German records and the Japanese surrender documents from World War II; journals of polar expeditions and photographs of Dust Bowl farmers; Indian treaties making transitory promises; and a richly bound document bearing the bold signature "Bonaparte" the Louisiana Purchase Treaty that doubled the territory of the young republic.

NARA keeps only those Federal records that are judged to have continuing value about 2 to 5 percent of those generated in any given year. They now add up to a formidable number, diverse in form as well as in content. There are approximately 9 billion pages of textual records; 7.2 million maps, charts, and architectural drawings; more than 20 million still photographs; billions of electronic files; and more than 365,000 reels of film and 110,000 videotapes. All of these materials are preserved because they are important to the workings of Government, have long-term research worth, or provide information of value to citizens. NARA not only preserves these records, but also makes them available to anyone who is conducting research or who is simply interested in viewing these records. Records help us claim our rights and entitlements, hold our elected officials accountable for their actions, and document our history as a nation. In short, NARA ensures continuing access to the essential documentation of the rights of American citizens and the actions of their Government.

In addition, NARA must also manage the rapidly growing number of electronic Government records. Now being developed, the Electronic Records Archives (ERA) is the strategic response to the challenge of preserving, managing, and providing access to electronic records. ERA will keep essential electronic Federal records retrievable, readable, and authentic for as long as they remain valuable, whether that is a few years or a few hundred years. NARA currently has 97.4 Terabytes of data in billions of files from over 200 departments and agencies containing data from 1819-2010. There are a variety of record formats, including: data files and databases, electronic documents, e-mail messages with attachments, scanned images of textual records, Portable Document Format (PDF) records, digital photographic records, geospatial data records and web content records. The records are received from agencies on a variety of media types including CDs, DVDs, DLTs, 3480-class cartridges, open-reel tapes, external and internal hard drives, diskettes, mini-cartridges and even keypunch cards as well as by electronic transfer and web downloads.

NARA actively collaborates with several federal agency partners, including the United States Geological Survey (USGS), National Oceanic and Atmospheric Administration (NOAA) and National Aeronautics and Space Administration (NASA) for development and implementation of long-term preservation and access strategies of scientific electronic records. Recently, in collaboration with various federal partners, NARA developed a discussion guide based on the ISO Open Archival Information System (OAIS) Standard Reference Model, that can help Federal agencies identify and determine high-level data management policies, procedures, and processes needed to ensure long term preservation and access of digital assets for all stakeholders. A complete discussion is available in NARA's Toolkit for Managing Electronic Records at http://www.codata.org/taskgroups/TGdataatrisk/index.html