n.the process of identifying and removing redundant electronic filesToney 1992, 24The database was divided into twenty-three pools of about five thousand records each. Records were assigned to pools based on the date of publication, since that field was found to be the most accurate and uniform among those useful for deduplication. However, choosing that pooling method meant that if a record had a missing or incorrect date field, the record would not be included in the comparisons with other records for that bibliographic item. To compensate for this, after the entire database is deduplicated, new pools will be based on the author fields, and the database will be deduplicated again.Wong 2006, 60Deduplication is the process of identifying two files as being the same file. For example, an e-mail with an attachment might have been sent twice, or sent to different people. Selecting the deduping option prevents multiple copies of the same attachment from being converted to images.Richardson 2012, 268Deduplication software tools can help organizations identify and delete duplicate and potentially duplicate content. Deduplication works by comparing chunks of data and looking for duplication. Deduplication uses two primary approaches: ¶ Postprocess: Postprocess deduplication allows new data to be entered in the application or repository throughout the workday without interruption. Then at a later time, the deduplication process will run to determine whether any duplicate information exists. ¶ In-line: The in-line deduplication process is working at all times in an effort to detect duplication information. When duplicate or potentially duplicate information is entered in the application, the employee is notified of the issue.Huth 2016a, 66–67Deduplication tools enable a user to find and remove duplicate files and temporary files and evaluate files with different content but identical names. Some of these tools focus exclusively on deduplication of email or Web crawls. The appraisal archivist does not need any of these tools to write an appraisal report, but may need to recommend the use of one of these tools during the processing stage.Pendergrass et al. 2019, 183The ease of retaining all copies should not deter the practice of selective appraisal: just because a document exists in multiple locations within the collection—or across collections—does not mean that every copy is necessary. As has long been the case with analog materials, cultural heritage professionals should be comfortable determining which digital content should be retained and which should be destroyed; metadata or descriptive pointers may be sufficient placeholders for duplicate files. Deduplication of those files deemed to be unnecessary should be part of the appraisal process, further reducing the storage footprint and environmental costs of the acquisition.Thompson 2022, 103, fn. 13Museums with photography departments that distribute digital images may find that multiple staff members have saved the same images from the photography department. If the museum archives is collecting photographs directly from the photography department, it may decide not to maintain all the duplicates that have been distributed to other staff and saved among their files, and some method of deduplication will be necessary.Decker et al. 2022, 863Second, by treating each email message as a single artifact, existing approaches to email archiving, such as ePADD and DarcMail, tend to decontextualise the email corpus. Even where the “thread” structure of messages is acknowledged and navigation within the thread is possible, further technical issues such as the need for deduplication or named entity resolution may compound the challenge facing the human user seeking to “read” an email exchange in the same way it would have been read by the original participants. For example, who are the email correspondents and the individuals’ names in an email or thread?Kopin, Hutchinson, and Truitt 2025, 44Many of the donors and depositors to FHL (Friends Historical Library) are not fluent in digital technologies that lie outside of very specific use cases. Although the digital archivist had been trained in previous professional archival contexts to request precustodial interventions from the donor such as deduplication, file renaming, and file format migration/normalization, it was not practical to request this from donors struggling with simply transferring files from one computer to another. Similarly, transfer methods developed to ensure fixity (a best practice for digital preservation) that required donor-side use of specialized software were not feasible.v. deduplicate or dedupeto identify and remove redundant electronic filesSkinner and Hulbert 2022, 6First, the Society of American Archivists collaboratively generated an invitation list through the help of a number of partner organizations. This list was cleaned and deduped and ultimately resulted in 44,884 distinct emails, which received the initial invitation.Sweetser et al. 2023, 145We asked that colleagues coordinate with one another within an institution so that only one response was captured per institution, but we captured no identifying data to allow us to verify or dedupe.Zamon 2024, 93Once you have set up the machine, you will need to install disk imaging software, such as FTK® Imager, IsoBuster®, or Guymager. Other tools that are helpful include the following: ¶ Exiftool (technical metadata extractor) ¶ BitCurator suite (on an Oracle VM Virtual Box installation or installed as the primary operating system on your dedicated computer) ¶ DROID (file format identification and checksum creation) ¶ Teracopy (file copying and moving while maintaining metadata) ¶ Quick View Plus (to open files in many obsolete formats) ¶ VLC Media Player ¶ Bulk Rename Utility ¶ FastStone Image Viewer (deduping image files).