n. (also web archive, Web archive, and Web archives)preserved copies of live web content collected for permanent retention and accessAntracoli et al. 2014, 157Far more than a simple collection of static snapshots, then, proper Web archives capture not only content but also aspects of the user environment. As such, practices for capturing and displaying archived websites must continually evolve, placing archivists in a perpetual struggle to keep pace with changing technology and practices on the live Web.Arnold and Sampson 2014, 520Broadly, we find it accurate to characterize a tweet collection as a Web archives as described in the “Web-Archiving” report to the Digital Preservation Coalition: content captured from the live Web for future use or retention in an archival setting.Kelly 2017, 10Web scraping and web archiving tools that were invented for creating social media and web archives may be an option for archives attempting to harvest and preserve their own institution’s social media and web content (as is the case of ArchiveSocial and Archive-It), or for collecting social media content not necessarily generated by their institution using hashtags, usernames, or other search entries (example services include Lentil, ScraperWiki, Social Feed Manager, Twarc, and TAGS).FDLP 2018The Federal Depository Library Program (FDLP) Web Archive is comprised of selected U.S. Government Web sites, harvested and archived in their entirety by the U.S. Government Publishing Office (GPO) in order to create working “snapshots” of the Web sites at various points in time. The aim is to provide permanent public access to Federal Agency Web content.Fernando, Marenzi, and Nejdl 2018, 39Web archives collect, preserve, and provide ongoing access to ephemeral web pages and hence encode important traces of human thought, activity, and history. Curated web archive collections contain focused digital content from specific organizations, related to specific topics or covering specific events, which are collected to provide representative samples and preserve them for future exploration and analysis.an organization devoted principally to the collection and preservation of web contentBrett 2002, 108So, in the end, after multiple meetings of the working group, which debated technical implications of archiving Web sites, brainstormed about the development and future of Web archives, and studied a number of articles and other reference sources concerning Web sites and electronic records, what was the result?Galloway 2011, 173, fn. 7The September 11 Web Archive has been harvested by the Internet Archive as commissioned by the Library of Congress; it currently consists of more than five terabytes, delivered to the viewer through the Wayback Machine using automated Javascript recoding; see http://september11.archive.org/welcome.html (accessed on 7 July 2011). Currently the International Internet Preservation Consortium is supporting the development of the Web Archive (WARC) format, based on the Internet Archive’s crawler output format, for conversion of websites. Two current projects, the Living Web Archives and the World Wide Web of Humanities, are being carried out in Europe to develop a capture format that retains the visual and interactive features of websites for digital discovery and archiving.Upward, McKemmish, and Reed 2011, 216, fn. 45In a parting interview, the retiring Governor-General of Victoria lamented the lack of a usable web archive on climate change that all citizens could consult (see The Melbourne Age [4 April 2011], p. 1); the creation of such an archive, however, is not likely to be formed by traditional creator-centric methods.Pennock 2013, 9Legality is often the biggest non-technical issue faced by web archives. Do they have the legal right to take copies of content and provide access independently of the original site and without explicit permission of the owner, or is that a breach of the owner’s copyright?Lepore 2015The Wayback Machine is a Web archive, a collection of old Web pages; it is, in fact, the Web archive.Huurdeman et al. 2015, 247Web archives attempt to preserve the fast changing web, yet they will always be incomplete.Belovari 2017, 60Web archives attempting to preserve (portions of) the public Web have existed since 1991.Belovari 2017, 66As explained, we start our thought experiment at the point where each historian begins: with a research interest and a search for source materials. Working in 2050, the historian will look for a global web repository that preserved the former global Web. It is no trivial matter to ask how future historians would locate such institutions, especially if you consider endemic link rot, reference and content rot, and the online-only access points to most web archives. Furthermore, obstacles encountered in locating global web archives mirror complications encountered in locating and accessing archived sites within these archives.Dooley et al. 2017, 5The formatting and organization of data within individual Web archives is an issue for academics in the humanities and social sciences who are beginning to make forays into the world of Web archives; they may need specialized training to access Web archives that do not have a user-friendly access layer.Hight et al. 2017, 2Although Web archives follow the same general lifecycle as traditional archives, which includes selection, acquisition, arrangement, description, preservation, and access, Web archives are unique enough to require their own standards, best practices, and graduate courses.Taylor 2017, 5Also vital is their observation of the inadequacy of strictly browse-based access to the Library of Congress Web archives—an archived RSS feed is a sub-optimal entry point for this kind of exploration, and users of Web archives would in any case benefit from more sophisticated exploration tools such as full-text search and ngram visualizations.IIPC 2020bThe intent of web archiving is to preserve the original form of the harvested content without modification. To achieve this goal the tools, standards, policies and best practices need to be in place that will ensure the management of web archives over time.
Notes
Web archives frequently consist of multiple, time-stamped, collated copies of the same web page or pages taken at different times. Ideally, web archives capture and preserve not only the text, images, and informational content, but also the functionality, look, and feel of the web.