Previous Page Table of Contents Next Page

Where Do We Put our Files?

Harrison Eiteljorg II

Archaeologists generate enormous quantities of machine-readable data today. Whether we are doing fieldwork, laboratory research projects, or desk-bound research, we have larger and larger quantities of data and more and more disparate forms for those data. Many times, the data files are not only large but impossible to reduce to paper, as is certainly the case with CAD, GIS, or large database files.

The technological revolution that has made it possible for us to store so much information in such complex forms has not, however, made it possible for us to preserve that information very well. Computer files can now be kept for a reasonable length of time without fear of decay, but that is not particularly helpful if the files become obsolete, as they surely will. Hardware, operating system software, and application software will continue to evolve at a blistering pace, and, as a result, data files will become inaccessible as the physical forms and computer formats of those files become obsolete. Even if the files are in pristine condition, they will be useless. If new operating systems do not render them useless, new applications will. If neither of those accomplishes the result, hardware changes will.

That assessment may seem harsh, but in the relatively short number of years since personal computers appeared on the scene, we have seen various operating systems: CP/M, the original Apple OS, the MAC OS, DOS, OS/2, Windows, Windows 95, Windows NT. We know that more are in the works. During that time we have also seen a variety of programs come and go, with file formats becoming common and going out of use along with the programs. Of the widely used database systems, for example, only dBase used a file format that became a kind of standard, and, ironically, dBase is no longer widely used itself. Microsoft changed the file formats for Excel, Word, and Access in the last release of Office97. These kinds of revisions of file formats seem both common and destined to continue. Hardware has also changed and will surely continue to evolve. The original standard disc, the 5 1/4-inch floppy is now rare, its earlier 8-inch ancestor virtually unknown. The CD, only a few years old, is already being superseded.

Thus, while computers have brought us more powerful ways to deal with increasingly voluminous and complex data, they have failed to provide secure archival systems for the data we gather. Yet our ability to analyze the data depends on our ability to store them safely. Fortunately, the process of keeping data in useful forms is not terribly difficult--that is, not technically demanding. Files can be translated from one file format to another as necessary to keep them current (called data migration) with relatively little difficulty. Unfortunately, however, there are times when the difficulties are more significant, especially if the files are to be useful to people with different software. Generally, the difficulties arise as much from the complexity of the data as the changes in technology, and archaeological expertise is as important to the process of migrating such files as computer knowledge. Rarely is the process so automatic that one may simply push a button to have a file migrated correctly. Because computer files must be migrated, and because the migration requires both archaeological skills and computer skills, an archive specifically for archaeological data files is required. Such an archive exists--the Archaeological Data Archive--operated by the Center for the Study of Architecture at Bryn Mawr College. A similar archive exists in Britain, the Archaeology Data Service at York University, a consortium of British institutions. Both archives are prepared to handle the problems of data migration. In addition, both are prepared to assist scholars in organizing their data, preparing the data for archival storage, and documenting the materials being archived.

The two archives are cooperating on multiple levels. We are working to avoid duplication of archival holdings and to share expertise in data migration. More important to users, we will provide indexes that will enable users to find files in any cooperating archive. The physical location of any specific file should be completely irrelevant. Speaking for the Archaeological Data Archive, I can say that we have hardly been inundated with data. Scholars have not realized the transience of data in digital form; have not understood the danger of leaving important files to languish on floppy disks, local hard drives, or university mainframes; and have had little incentive to archive their files. Until rather recently, of course, a scholar who saw the problems would have found no archival center prepared to deal specifically with digital archaeological data.

The understanding of the importance of archival storage is growing. The time has arrived for scholars to deposit their digital data. Personnel from the Archaeological Data Archive Project (check will assist with all aspects of data deposition, but a few aspects of the process should be clear to all who are concerned.

(1) Virtually all valuable files should be archived--text, images, CAD models, GIS files, database files, etc.

(2) Files must have appropriate documentation--sufficient to help a user understand the files and their contents and sufficient for archival personnel to migrate the data years from now (vocabulary used, categorizing schemes applied, software used, and so on). The documentation must also provide data that will make good indexing possible. Without good indexes, people will not be able to find the data.

(3) Like professional publications, files should be peer reviewed before being placed in an archive.

(4) Some files may require data migration at the time they are deposited in the archive. Files in nonstandard or difficult-to-use formats will need to be migrated so that they can be useful--and to assure a future migration path. (The original files will be archived as well.)

(5) Files from ongoing projects may be archived and kept private, available only to project personnel; however, no files will be accepted for the archive if they are to remain inaccessible for a lengthy period.

(6) Files may be removed from the archive by the person(s) who deposited them.

(7) Access to the files will be, at the least, by FTP downloading. Some information may be put into form for web access to the actual data, but most data in the Archaeological Data Archive will be files in standard formats that are available for downloading from the ADAP web site. Users will be expected to have the software and computers necessary to deal with the files. There are no plans to develop archivewide search tools for data, although search tools to locate files containing data of interest are under development.

(8) ADAP personnel will assist with data migration, preparation of documentation, and planning for data collection. We will not, however, provide generalized database systems for excavations, since we believe that to be an unwise approach to data recording.

(9) All data will be stored in duplicate on the longest-lasting digital media available at the time (CDs currently), with one copy placed in a bank vault and the other kept at the ADAP/CSA offices.

(10) All files in the archive will be migrated according to the best available procedures and timetables. In most cases, we expect multiple versions of the files will be available at any given moment, either because more than one version of required software is in use or because there are multiple standards. For instance, we have CAD files in AutoCAD R12 and R13 formats, and we keep database files in delimited ASCII and DBF form.

Finally, the aim of the Archaeological Data Archive Project is to archive digital resources. We will not attempt to transfer nondigital materials into digital form; however, we will provide help and guidance to scholars who may wish to do that in order to preserve information and add it to the archive. The processes and procedures will surely change as the archives grow and mature. The necessary procedures, though, are already in place. It is now up to the archaeological community to acknowledge the importance of caring properly for archaeological data and to deposit the data files that will ultimately make digital archives not simply useful, but invaluable.

Harrison Eiteljor, II is director of the Center for Study of Architecture on the Archaeological Data Archive Project.

Return to top of page

Previous Page Table of Contents Next Page