Prearchiving: Helping Librarians and Researchers Collaborate on Archiving and Public Dissemination of Research Materials and Correspondence
Scholarship, librarianship and the digital future
A humanities scholar’s archive is a significant personal possession and thread of continuity in a lifelong pursuit of their academic vocation. It is also, when donated to a research archive, a significant source of further research on the scholar’s area of inquiry, and an important element in intellectual history. Yet there is a remarkable disconnect between these two uses of the same archive. Creators tend to organize papers in idiosyncratic ways because these are, overall, more efficient (Whittaker and Hirschberg 2001). In preparing an archive for storage and use, archivists and curators must also convert papers and files into an arrangement that adheres to standards and practices of information management. Archivists and curators are often never intended as recipients, let alone audiences, and so must reconstruct archives from papers in file cabinets or worse. The conversion from living record to archive is thus lossy; the processes of handing over a life’s work can potentially destroy valuable context. It is not merely poor planning, but a failure of two cultures to effectively communicate.
In this essay, I sketch a proposal to create a system whereby researchers and librarians find a meeting ground, learn from each other on an ongoing basis, share the use of digital tools to make use of research materials and data, and prepare for it future use. In essence, it involves training scholars to think of themselves as co-curators of their own work product, and thus I use the term prearchiving.
An avalanche of data, an infinitely intricate context
Prearchiving not only provides a way to facilitate the proper and effective storage and use of scholars’ materials, but also protects against threats to the archive that are inherent in an increasingly digital ecology for academic work. Increasingly and with great speed, the main elements of a scholarly archive – one’s typescript draft writings, research notes, annotated writings of others, and correspondence – are first created in a digital medium. Indeed, in the near future, there will be working scholars whose whole career may be paperless. Not only is their work product first created in digital forms, but these materials exist only or primarily as digital objects, including books and journal articles. As technology changes rapidly and academia becomes more dependent on digital communication, scholars increasingly find it difficult to maintain purely digital archives. For instance, consider professional correspondence, a rich source of information about intellectual history. A typical scholar will, over the course of a 40-year career as student and professor, will receive on the order 1000 emails per year, resulting in an archive of 40000 emails. While most of these emails are memos from administrators or scheduling emails with students, this massive archive, including metadata, all attachments, images and duplicate copies, is also likely only to be about 100 gigabytes, which is very easy to store. Hence the problem with using an email archive as the correspondence record of a scholar is not its digital form – most email is stored in one of two well-documented and easily convertible standards – or its size. Rather it is the difficulty of searching and indexing this mail as individual units of information, and also performing statistical analysis on email metadata, which itself may reveal interesting details about a scholar’s work habits and biography. However, librarians routinely make use of freely available software for curating digital objects, including email. Under the right conditions, each scholar could use these tools to create a personal correspondence file, in essence prearchiving their own email, while also capitalizing on this software’s capacity for search and analysis. The same could be done for many elements of the scholar’s digital workbench: bibliographic databases and PDF archives, research files, typescript files, scanned copies of paper notes, scanned copies of books and ebooks.
The components of a prearchiving system
A prearchiving strategy would involve several components:
- Scholars and librarians would come together to learn about the management of digital objects, specifically the licensed, noncommercial software tools such as ContentDM (OCLC 2012) which librarians use for managing online, digital collections, or free, open-source, widely-adopted alternatives, such as Drupal (Ruest 2009).
- Universities would provide prearchiving servers, with proper backup and recovery built in. Scholars would then be given personal curating privileges for their own material on the server. Cloud prearchives could be backups of a scholar’s existing material, or the main site where they browse, search, annotate and add to this material. As universities move from cloud as storage to cloud as computing resource, I foresee an increasingly reliance on prearchiving computers over a network as the main site for research work.
- The scholar’s personal archive would be restricted during the prearchiving phase, but need not be limited to one person. Using access-control list rules, a scholar could create circles of access to different groups, including the public, much as social media allows for degrees of sharing at the level of a single post. Each item could be included in a specific circle: private until donation, a group of users, a specific user, non-logged-in users (the public). Using OpenID, universities can share authentication methods and provide logins to each others’ users.
- When the scholar chooses, the prearchive can be handed over to any university as a permanent home. Since the prearchive is managed using existing and widely-adopted database tools, most universities with digital collections will also be able to take a prearchive and convert it effortlessly into an archive. The existing or new, more liberal ACL rules can also be the framework for negotiating access restrictions when the prearchive is donated.
Ruest, Nick. 2009. “OMG! You Don t Need CONTENTdm!!!” Nick Ruest’s Blog. June 5. http://ruebot.net/content/omg-you-dont-need-contentdm.
Online Computer Library Center, Inc. 2012. “CONTENTdm: Help Searchers Discover the World s Greatest Digital Collections Yours.” http://www.oclc.org/content/dam/oclc/services/brochures/211472usb_contentdm.pdf.
Whittaker, Steve, and Julia Hirschberg. 2001. “The Character, Value, and Management of Personal Paper Archives.” ACM Transactions of Computer-Human Interaction 8 (2): 150 70. doi:10.1145/376929.376932.