Proposal Title: iKive—A Trusted Personal Archives Service
Principal Investigator: Christopher J. Prom, Ph.D., Assistant University Archivist and Associate
Professor of Library Administration, University of Illinois at Urbana-Champaign
Abstract: The iKive project will develop open source software that makes it easy for people to save their digital lives in a trusted location, then donate their iKive to a public archives or manuscript repository later in life. The software will be made openly available to the library, archives, and computer science community and may serve as the basis for a production service.
Goals/Problem Statement: Personal papers, which have traditionally been preserved (in analog
form) by academic archives and manuscript libraries, provide uniquely valuable historical resources. These one-of-a-kind materials, including private correspondence, diaries, unpublished manuscripts, photographs, and many other records, constitute the organic output of an individual’s daily actions. They are evidence: an intertwining and ever-growing stream of communicated information that documents activities, interests, and ideas.1
Most of the personal papers that are found in archives and libraries were not publicly available at the time they were created, but once they have been donated to an archival repository, the previously hidden information that they contain becomes accessible to scholars, students, and members of the public, facilitating the generation of new knowledge. Historians, scholars, and students construct better histories when they have access to diaries, correspondence, and other private sources. A scientist’s correspondence complements her published works because it includes evidence documenting the process of innovation. A writer’s drafts and sketches illuminate the evolution of an artistic idea over months or years. A diplomat’s letters to his wife reveal contextual detail absent from news reports or diplomatic cables.
Today, people do not generate the letters, diaries, and paper reports that comprise the previous generation’s archives: they generate email, working files, twitter feeds, and blog posts. Unsurprisingly, the private messages of prominent figures are the object of intense contemporary and long-term historical
interest (Crook 2010; Leigh 2011; The Sunlight Foundation n.d.; Schroeder 2010). Unfortunately, current approaches to saving dispersed electronic records will not result in the preservation of a record that isvery useful for future research.2 Unless archivists, information technologists, and digital curators help people keep a useful and trusted record of materials that are important to them, it is likely many historically-valuable materials will be lost before they have even had a chance to be saved (Bearman 1994; Cox 2008).
Project Description, Outcomes, and Results: The iKive project seeks to provide a proof-of-concept for open source software and services people can use to collect email messages, social media, blog postings, reports, desktop files, and other fugitive materials. These records will be saved to a redundant server in an encrypted, standardized, and preservation-ready format. Regular integrity checks will be run to ensure materials are maintained in a trustworthy manner. Content will be stored with sufficient technical and structural metadata to permit its long-term preservation. Specifically, the project will develop three pieces of software:
A social media archiving tool customized for use by libraries and archives to allow the preservation of account data for multiple users. I propose that this work be undertaken as an extension to the ThinkUp project (Expert Labs 2011).
A plug-in for Microsoft Outlook/Exchange and possibly for other client/server environments. 3 The
tools will use SMTP to transfer all sent/received mails (or, optionally, a filtered set) to a designated archival store. We will select a suitable mail transfer agent and target storage format in conjunction with advice from Google Research and others, using an open storage format that facilitates data reuse and transformation.4 The project Muse software is suggested as a candidate technology.
A desktop imaging tool, which will mirror files from a local computer, using standard encryption and
Each of these tools will use, borrow from, or extend existing open-source projects and tools and will themselves be made freely available via open source code repositories, with appropriate tools to encourage collaborative development and contributions from the community.
Other project deliverables include a project website, documentation, blog and forum; a licensing issues report; a peer reviewed article (targeted to JASIST or D-Lib); a Google Research project briefing; and a final report to Google. Any associated travel will be self-funded by Illinois.
Staff will include: 1) Christopher J. Prom, Ph.D, Project Director, who will be responsible for overall project design and management, consulting with advisory committee, and researching legal issues that may be pertinent to future plans (see below); 2) One Full time Research Programmer, working under the direction of the Project Director for two years to program/implement the open-source tools described above; and 3) An advisory committee. The committee will meet one at the beginning of the project to scope out issues, make software recommendations, and advise on legal matters. Confirmed members include Michelle Kimpton, CEO of DuraSpace; Peter Hirtle, Senior Policy Advisory, Cornell University Library; Cathy Marshall, Principal Researcher, Microsoft Research; and Sarah Shreeves, Coordinator for IDEALS and Scholarly Commons, University of Illinois. Prospective members include Gina Trapani, Expert Labs; Jessica Litman, Professor of Law, University of Michigan; Roy Campbell, Professor of Computer Science, University of Illinois; Jeff Ubois, Independent Consultant; and Mark Matienzo,
Digital Archivist, Yale University.5
Future Plans: Based on the results from this project, we will consider launching a not-for profit pilot personal digital archiving service via the www.iKive.com website, in conjunction with Duracloud/Duraspace and/or the Internet Archive. Provisionally, the service would include the following elements:
• competitive pricing model vis-à-vis commercial backup services,
• encrypted data transfer and storage, and
• cloud storage and preservation management under a partnership with DuraSpace/DuraCloud. Initially, subscribers would self-register and pay a monthly fee, or their parent institution would pay on their behalf. The service aims to be self-funding once initial research and development have been completed.
In addition, all software developed during the project will be released under an open source, non-copy left license.
Relation to Prior Work: The project will complement three strands of active research, development, and service provision:
Archival arrangement, description, storage, and access tools: Over the past 15 years, the archival community has developed appropriate content standards, XML data models, and open source tools that facilitate the arrangement and description of paper-based and born-digital personal papers and organizational records.6 I have personally participated in and led much of this work.7 While this work provides an essential foundation for making information about digital materials accessible once they have been received by an archives or manuscript library, it does nothing to ensure that an adequate record survives until that moment in time.
Digital Preservation and Data Curation Initiatives: Concurrently, national and international projects, supported with funding from United States, Canadian, European, and Australasian governments, have developed a collaborative infrastructure dedicated to storing and preserving digital materials in a trustworthy fashion.8 These initiatives provide a reliable method for storing materials once they have been received, arranged, and described by an archives or manuscript library, but they also do nothing to ensure that an adequate record survives until that moment in time.
Low–cost backup services: Commercial vendors such as Carbonite, Mozy, and Crashplan incrementally
mirror defined files to an external disk or off-site server. While superficially attractive, these services leave the user extremely vulnerable to data loss or corruption. The terms of service provide no protection to the user,9 and the services do not provide the types of digital preservation or migration services that the academic community is ideally placed to provide. Similar problems afflict emergent services that individuals can use to backup data in web applications or social media, such Backupify and Nuffly.
1 In a different context, this has been termed a lifestream (Freeman & Gelernter 1996), and some researchers have attempted to develop applications to store and manage all information related to a person’s daily experiences (Gemmell et al. 2006). Such projects can be seen as experimental implementations of Vannevar Bush’s Memex (Bush 1945).
2 Five factors make the survival of a useful set of ‘personal archives’ unlikely for all but a handful of people: 1) The digital materials that people produce are spread among multiple locations, including desktop computers, email servers, and Internet utilities; 2) The ‘backup’ tactics that people use for their content are risky and show little understanding of principles necessary to ensure long-term digital preservation (Jones 2007; Marshall 2007; Whittaker et al. 2006; Marshall 2008); 3) Few individuals have the wherewithal to manage records for an entire lifetime, before donating them to an archives; 4<!Most institutional repository efforts focus on gathering a tightly constrained list of formats, typically publications (Markey et al. 2007); and 5) Systemic archiving efforts, such the Internet Archive’s Wayback Machine and Library of Congress’s Twitter archive, focus only on ‘public’ records such as publications, tweets, and blogs, not on the full range of an individual’s output. !
3 Outlook/Exchange were chosen for the proof of concept stage since it is a widely adopted mail transfer/user agent environment and because Microsoft provides no end user tools to allow for one-step export. Depending on time constraints, the project team will also explore how messages could be self archived from Gmail or from generic IMAP servers. In subsequent phases of the project, other tools will be developed for different environments.
4 In the long term (e.g. once an account is inactive), it may be desirable to save email to a self-describing XML format, such as the Email Account Schema developed by the North Carolina State Archives (Minor 2008).
6 These standards and tools are described by the (International Council on Archives 2011; Society of American Archivists, EAD Roundtable 2011; Statasbibliothek zu Berline 2011; University of California at San Diego 2011; University of Illinois at Urbana-Champaign 2011; New York University 2011).
7 As noted in my CV, I participated in several of the groups that developed these standards and served as co-director of the Archon Project (), which developed an open source tool that received a Mellon Award for Technology Collaboration. As Archon co-director, I developed the product profile, designed the data model and interface for the tool in conjunction with developers, and assisted with programming.
8 Several projects (Library of Congress n.d.; Planets Project 2010; InterPARES Project n.d.; Australasian Digital Recordkeeping Initiative n.d.; Digital Curation Centre n.d.; International Internet Preservation Coalition n.d.) have been particularly influential in facilitating the development of standards, procedures, and tools that support the long-term preservation of digital materials having permanent cultural value. Relevant standards are discussed in (Consultative Committee for Space Data Systems 2002; PREMIS Editorial Committee 2008; and RLG/OCLC Working Group on Digital Archive Attributes 2002.). (Dale & Ambacher 2007; DRAMBORA Project n.d.) provide audit requirements to ensure trustworthiness. (Cornell University n.d.; Moore 2006; Bradley et al. 2007; DuraSpace.org 2011a) describe four of the many projects implementing this work. Two innovative projects that can serve as trustworthy storage environments for personal digital archives are described by (Artefactual Systems, Inc. n.d.; DuraSpace.org 2011b).
9 For example, under the Crashplan and the user waives the right to anything but minor remedies and the agreement may be terminated at will by the service provider. Furthermore, the license mandates no positive obligation to provide users access to their own data—not only in the case of business failure, but even during the course of daily business (Code 42 Software 2011).
Artefactual Systems, Inc., Archivematica Project. Available at: http://archivematica.org/wiki/index.php?title=Main_Page [Accessed July 30, 2011].
Australasian Digital Recordkeeping Initiative, Project Website. Available at: http://www.adri.gov.au/products.aspx# [Accessed July 30, 2011].
Bearman, D., 1994. Managing Electronic Mail. Archives and Manuscripts, 22(1), pp.28-50.
Bradley, K., Lei, J. & Blackall, C., 2007. Towards and Open Source Archival Repository and Preservation System: Recommendations of the Implementaiton of an Open Source Digital Archival and Preservation System and on Related Software Development, Paris: UNESCO. Available at: http://portal.unesco.org/ci/en/ev.php-URL_ID=24700.
Bush, V., 1945. As We May Think – Magazine – The Atlantic. The Atlantic Monthly. Available at: http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/3881/ [Accessed July 28, 2011].
Code 42 Software, 2011. End User License for Crashplan. Available at: http://support.crashplan.com/doku.php/eula [AccessedJuly 30, 2011].
Consultative Committee for Space Data Systems, 2002. Reference Model for an Open Archival Information System (OAIS), Available at: http://public.ccsds.org/publications/archive/650x0b1.pdf.
Cornell University, Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems. Available at: http://www.icpsr.umich.edu/dpm/dpm-eng/eng_index.html [Accessed July 22, 2010].
Cox, R.J., 2008. Chapter Seven: Electronic Mail and Personal Recordkeeping. In Personal archives and a new archival calling!: readings, reflections and ruminations. Duluth, Minnesota: Litwin Books, pp. 201-42. Crook, C., 2010. Climategate and the Big Green Lie – Politics – The Atlantic. Available at: http://www.theatlantic.com/politics/archive/2010/07/climategate-and-the-big-green-lie/59709/ [Accessed July 22, 2010].
Dale, R. & Ambacher, B. eds., 2007. Trustworthy Repositories Audit & Certification (TRAC)!: Criteria and Checklist, Chicago, Illinois: CRL.
Digital Curation Centre, Project Website. Available at: http://www.dcc.ac.uk/ [Accessed July 30, 2011].
DRAMBORA Project, Digital Repository Audit Method and Risk Assessment. Available at: http://www.repositoryaudit.eu/ [Accessed July 22, 2010].
DuraSpace.org, 2011a. DuraSpace to Bring Cloud-Based Platform “Direct-to-Researchers” (News Release). Available at: http://duracloud.org/ [Accessed July 28, 2011].
DuraSpace.org, 2011b. The Hydra Project – Hydra – DuraSpace Wiki. Available at: [Accessed July 30, 2011].
Expert Labs, 2011. ThinkUp: Social Media Insights Platform. Available at: http://thinkupapp.com/ [Accessed July 30, 2011]. Freeman, E. & Gelernter, D., 1996. Lifestreams: a Storage Model For Pesonal Data. ACM SIGMOD Record, 25(1), pp.80-86. Gemmell, J., Bell, G. & Lueder, R., 2006. MyLifeBits. Communications of the ACM, 49(1), pp.88-95.
International Council on Archives, 2011. Website of the Committee on Descriptive Standards. Available at: http://www.icacds.org.uk/eng/standards.htm [Accessed July 30, 2011].
International Internet Preservation Coalition, netpreserve.org website. Available at: http://netpreserve.org/about/index.php [Accessed July 30, 2011].
InterPARES Project, Project Website. Available at: http://www.interpares.org/ [Accessed July 30, 2011].
Jones, W., 2007. How People Keep and Organize Personal Information. In Personal Informaiton Management. Seattle, WA: University of Washington Press, pp. 35-56.
Leigh, D., 2011. Wikileaks!: inside Julian Assange’s war on secrecy 1st ed., New York: Public Affairs.
Library of Congress, National Digital Information Infrastructure and Preservation Program Website. Available at: http://www.digitalpreservation.gov/ [Accessed July 30, 2011].
Markey, K. et al., 2007. Census of institutional repositories in the United States!: MIRACLE Project research findings, Washington D.C.: Council on Library and Information Resources.
Marshall, C.C., 2007. How People Manage Personal Information Over a Lifetime. In Personal Information Management. Seattle, WA: University of Washington Press, pp. 57-75.
Marshall, C.C., 2008. Rethinking Personal Digital Archiving Part 1: Four Challenges from the Field. D-Lib Magazine, 14(4).
Available at: http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html [Accessed July 17, 2011]. Minor, D., 2008. Mail Account XML Schema: How Internet Messages Can Be Stored as XML. Available at:http://siarchives.si.edu/cerp/David_Minor_CERP_symp.pdf [Accessed May 25, 2011].
Moore, R., 2006. Building Preservation Environments with Data Grid Technology. American Archivist, 69(1), pp.139-158.
New York University, 2011. ArchivesSpace Project Website. Available at: http://archivesspace.org/[Accessed July 30, 2011].
Planets Project, 2010. Planets Project Website. Available at: http://www.planets-project.eu/ [Accessed July 30, 2011]. PREMIS Editorial Committee, 2008. PREMIS Data Dictionary for Preservation Metadata version 2.0, Library of Congress. RLG/OCLC Working Group on Digital Archive Attributes, 2002. Trusted Digital Repositories: Attributes and Responsibilities (pdf), Mountain View, CA: Research Libraries Group. Available at:http://www.oclc.org/research/activities/past/rlg/trustedrep/default.htm.
Schroeder, P.W., 2010. Why Wiki-Diplomacy Fails – NYTimes.com. New York Times. Available at: http://www.nytimes.com/2010/12/03/opinion/03Schroeder.html [Accessed July 18, 2011].
Society of American Archivists, EAD Roundtable, 2011. Encoded Archival Description Help Pages. Available at: http://www.archivists.org/saagroups/ead/ [Accessed July 30, 2011].
Statasbibliothek zu Berline, 2011. Encoded Archival Context Home Page. Available at: http://eac.staatsbibliothek-berlin.de/ [Accessed July 30, 2011].
The Sunlight Foundation, Sarah’s Inbox | A Project of the Sunlight Foundation. Available at: http://sarahsinbox.com/ [Accessed July 11, 2011].
University of California at San Diego, 2011. Archivists’ Toolkit Available at: http://www.archiviststoolkit.org/ [Accessed July 30, 2011].
University of Illinois at Urbana-Champaign, 2011. Archon Project Website. Available at: http://www.archon.org/ [Accessed July 30, 2011].
Whittaker, S., Bellotti, V. & Gwizdka, J., 2006. Email in personal information management. Communications of the ACM, 49(1), p.68.