Scholarship, librarianship and the digital future

A humanities scholar’s archive is a significant personal possession and thread of continuity in a lifelong pursuit of their academic vocation. It is also, when donated to a research archive, a significant source of further research on the scholar’s area of inquiry, and an important element in intellectual history. Yet there is a remarkable disconnect between these two uses of the same archive. Creators tend to organize papers in idiosyncratic ways because these are, overall, more efficient (Whittaker and Hirschberg 2001). In preparing an archive for storage and use, archivists and curators must also convert papers and files into an arrangement that adheres to standards and practices of information management. Archivists and curators are often never intended as recipients, let alone audiences, and so must reconstruct archives from papers in file cabinets or worse. The conversion from living record to archive is thus lossy; the processes of handing over a life’s work can potentially destroy valuable context. It is not merely poor planning, but a failure of two cultures to effectively communicate.

In this essay, I sketch a proposal to create a system whereby researchers and librarians find a meeting ground, learn from each other on an ongoing basis, share the use of digital tools to make use of research materials and data, and prepare for it future use. In essence, it involves training scholars to think of themselves as co-curators of their own work product, and thus I use the term prearchiving.

An avalanche of data, an infinitely intricate context

Prearchiving not only provides a way to facilitate the proper and effective storage and use of scholars’ materials, but also protects against threats to the archive that are inherent in an increasingly digital ecology for academic work. Increasingly and with great speed, the main elements of a scholarly archive – one’s typescript draft writings, research notes, annotated writings of others, and correspondence – are first created in a digital medium. Indeed, in the near future, there will be working scholars whose whole career may be paperless. Not only is their work product first created in digital forms, but these materials exist only or primarily as digital objects, including books and journal articles. As technology changes rapidly and academia becomes more dependent on digital communication, scholars increasingly find it difficult to maintain purely digital archives. For instance, consider professional correspondence, a rich source of information about intellectual history. A typical scholar will, over the course of a 40-year career as student and professor, will receive on the order 1000 emails per year, resulting in an archive of 40000 emails. While most of these emails are memos from administrators or scheduling emails with students, this massive archive, including metadata, all attachments, images and duplicate copies, is also likely only to be about 100 gigabytes, which is very easy to store. Hence the problem with using an email archive as the correspondence record of a scholar is not its digital form – most email is stored in one of two well-documented and easily convertible standards – or its size. Rather it is the difficulty of searching and indexing this mail as individual units of information, and also performing statistical analysis on email metadata, which itself may reveal interesting details about a scholar’s work habits and biography. However, librarians routinely make use of freely available software for curating digital objects, including email. Under the right conditions, each scholar could use these tools to create a personal correspondence file, in essence prearchiving their own email, while also capitalizing on this software’s capacity for search and analysis. The same could be done for many elements of the scholar’s digital workbench: bibliographic databases and PDF archives, research files, typescript files, scanned copies of paper notes, scanned copies of books and ebooks.

The components of a prearchiving system

A prearchiving strategy would involve several components:


