Learning object summary
Students will learn why and how to identify personally identifiable information (PII) and other sensitive information within digital collections, as well as methods for redacting the information. The unit includes two hands-on exercises: identifying potentially sensitive information within a disk image and redacting information that meets specified criteria within a PDF file.
Christopher (Cal) Lee, Kam Woods, Simson Garfinkel
- Rights to Control Information
- Examples of Potentially Private and Sensitive Information
- Identifying Information in Born-Digital Materials, Complicating factors
- Digital forensics tools can help
- Bulk Extractor Exercise
- Redaction Options – two ways to redact data
- BitCurator Access Disk Image Redaction Tools
- File-Level Redaction – PDF
- Exercise – BitCurator PDF Redaction Tool
Learning object type
This learning object might be used in a lesson to satisfy the following learning objectives:
- Convey several different rights to control information
- Identify various types of personally identifying information and other potentially sensitive information
- Express major challenges and strategies for locating PII and other potentially sensitive information
- Generate reports of potentially sensitive information using Bulk Extractor
- Redact designated patterns in a PDF file using the BitCurator PDF Redaction Tool
Prerequisite knowledge required: No specific prerequisite knowledge is assumed.
Suggested duration: 60 minutes (variable, depending on format and amount of discussion)
- Virtual, synchronous
- Virtual, asynchronous
Accessibility information: Students must be able to interact with both the host environment (OS running on their designated computers) as well as the BitCurator (Ubuntu Linux 18.04LTS) within VirtualBox.
Estimated set-up time: 0-2 hours (depending on whether instructor will be setting up virtual machines for the students in advance and instructor’s level of familiarity with the tools)
Hardware requirements: See “Getting Started with the Virtual Machine” in the BitCurator QuickStart Guide
Software installations: The Bulk Extractor exercise requires VirtualBox and the BitCurator virtual machine (VM) (see p.5-16 in the QuickStart Guide – http://distro.ibiblio.org/bitcurator/docs/BitCurator-Quickstart-v2.pdf). The PDF redaction exercise requires that further software be installed in the BitCurator VM (see “Building the Software in BitCurator” at https://github.com/BitCurator/bitcurator-redact-pdf).
Sample data: The Bulk Extractor exercise uses a USB flash drive image from the M57 Patents Scenario at digitalcorpora.org. The PDF redaction exercise can use the PDF files that are included automatically with the installation of the redaction tool, or instructors could elect to use other PDF files.
Exercises can be assigned in advanced of a class session, carried out during the class session, or assigned to complete after the class session.
Potential readings to assign:
Garfinkel, Simson, and Abhi Shelat. “Remembrance of Data Passed: A Study of Disk Sanitation Practices.” IEEE Security and Privacy (January/February 2003): 17-27, http://cdn.computerscience1.net/2005/fall/lectures/8/articles8.pdf.
Lee, Christopher A., and Kam Woods. “Automated Redaction of Private and Personal Data in Collections: Toward Responsible Stewardship of Digital Heritage.” In Proceedings of Memory of the World in the Digital Age: Digitization and Preservation: An International Conference on Permanent Access to Digital Documentary Heritage, 26-28 September 2012, Vancouver, British Columbia, Canada, edited by Luciana Duranti and Elizabeth Shaffer, 298-313: United Nations Educational, Scientific and Cultural Organization, 2013. https://ils.unc.edu/callee/p298-lee.pdf
Students can be tasked with applying one or more of the following to other test data supplied by the instructor (either as discrete tasks or part of a larger multi-week assignment, possibly in groups): Bulk Extractor, BitCurator Disk Image Redaction Tool, BitCurator PDF Redaction Tool.
Student understanding of course concepts can be evaluated within the context of class discussions and exercises. Students can also be asked to submit the products of the exercises (along with their own explanations and reflections on the tasks performed) for instructor evaluation.
In order to avoid updating and versioning issues, the exercises in this module reference instructions from individual tool pages in GitHub and the BitCurator QuickStart Guide.
This work is licensed under a Creative Commons Attribution 4.0 International License.