Mass redact a wide range of documents
Mass redactions let you efficiently redact search sets in Ediscovery. They support multiple terms and common RegEx patterns, using semicolons for batch processing. Redaction reasons and colors are customizable, enabling multiple operations at once. The user experience aligns with existing mass action workflows, including user-level permissions and real-time status updates. Notifications inform users of completion or issues, and redactions are reversible via the mass action log.
As with all Ediscovery automation, exercise due diligence with mass redactions. Perform spot QC checks and run test redactions on limited document sets to ensure correct application. You can undo test redactions as described later in this article.
This feature is in Walk. Databases created on or after Aug 19, 2024, will have mass redactions enabled automatically. For older databases, please contact DISCO Desk for options to enable mass redactions.
How to mass redact documents: terms and phrases
-
In Search & Review, search for the documents you want to redact in bulk:
-
Above the document list, click the 'redact' button in the mass actions menu:
-
The mass redactions menu slides out from right to left:
-
Enter your redaction terms in the Terms/Patterns section below:
-
Configure your redact with optional values like case sensitivity, redaction reason, and color:
-
Click 'Apply redactions' to start the mass redactions request; the mass actions modal will close. Notifications will confirm that your request has started:
-
Track it in the mass action log:
-
After completion, you will receive another notification:
-
Your mass action log will show this completed state:
The mass actions log shows the total number of redactions and the number of documents redacted at least once.
-
Click the blue, bolded '8 documents' above to access the documents redacted during your mass redaction job:
-
Open the first document to inspect its mass redactions:
The term 'bankruptcy' is redacted in white, with the reason displayed in the top left corner.
-
Return to the mass actions log and click 'rollback' to undo the mass redactions:
-
Confirm the rollback:
-
The rollback operation will begin, and its status will change from red to blue upon completion:
-
-
Clicking the '8 documents' link again reveals that the redactions are removed:
How to mass redact documents with multiple terms for one redaction reason
Now we will explore more complex use cases for mass redactions.
-
In Search & Review, select all documents in your review database and open the mass redactions modal:
-
Open up mass redactions:
To redact multiple terms for the same reason, separate them with semicolons. This commonly captures multiple names or variations.
-
Click Apply redactions to start your job:
-
Track the mass redactions operation progress in the mass actions log:
-
After completion, inspect your documents to verify the redactions.
How to mass redact documents with patterns instead of terms and phrases
-
DISCO provides built-in Regular Expression pattern matches, so you don't need to write complex RegEx yourself. For example, you can match all email addresses without creating a custom regular expression:
-
Open mass redactions and click 'Insert patterns':
-
The terms and phrases box switches to a patterns view:
-
Select 'Email addresses' as your pattern, then apply:
-
You can view your pattern mass redaction start in the mass action log:
-
Note that all email addresses are redacted:
What if your custom regular expression is unsupported by our pre-built patterns?
DISCO professional services perform custom mass redactions using nearly any regular expression for our customers.
- To get started with a custom mass redactions, contact us @ discodesk@csdisco.com
A member of our data operations team will contact you to assist.
How to add multiple mass redactions in one mass redactions request:
- You can add multiple redactions (up to 10 rows) in one mass redactions operation:
- Select '+ Add another set of terms or patterns to redact' to add another row item.
- You can multiple mass redactions requests with different reasons, colors and case sensitivities.
-
Before you submit your mass redactions request you can also broaden your search set:
- +Similar documents will extend an operation to any documents determined to be similar based off the similarity threshold you select.
- +Attachments will extend an will extend an operation to any attachments below a document in its family (e.g., children, grandchildren).
- +Family members will extend an operation to any documents in its family (e.g., parents, children, grandchildren).
- +Conversations will extend an operation to any documents in its conversation.
- Note: These operations stack with each other; take note of how the text next to checkboxes updates as you check certain combinations of checkboxes.
- After applying your redactions go to the mass action log area to observe you have created three separate and distinct mass redactions requests:
Mass Redactions FAQ
Q: Can mass redactions redact on the native spreadsheet view of Excel files?
A: Yes.
Q: If my matter is older and does not have mass redactions can I have it added?
A: We can add mass redactions to databases that first added documents on or after August 19, 2024
Q: What if my review database is older than that?
A: Please contact discodesk@csdisco.com for options here.
Q: Do mass redactions involve extra costs?
A: DISCO Ediscovery customers receive mass redactions at no additional cost.
Q: What data spaces support mass redactions?
A: Mass redactions are available in active review.
Q: Are mass redactions permissible?
A: Review managers can change permissions for custom roles.
Q: Can I initiate multiple mass redactions or must I wait for each to finish?
A: You can initiate multiple mass redactions.
Q: What documents support mass redactions?
A: All text-based documents support mass redactions.
Q: Can I mass redact non-English languages? Any restrictions?
A: Any character set is redactable, including multiple languages.
Q: What is the minimum size for a mass redaction?
A: Three characters, excluding spaces or whitespace.
Q: How long can redaction terms be?
A: Each redaction phrase can be up to 512 characters, about 85 words.
Q: What if a spreadsheet has a redaction in spreadsheet view?
A: Mass redactions apply only to PDF views of spreadsheets. We prompt users to skip or overwrite spreadsheet views.
Q: What if a spreadsheet has an annotation on its spreadsheet view?
A: We prompt users to skip or overwrite these documents, as annotations will be removed.
Q: How do redaction overlaps work? If a user applies a pattern for all email addresses but also a custom term “freedman@csdisco.com”?
A: Words and phrases take precedence over general RegEx patterns. If the same box is drawn for different reasons, we prioritize words and phrases.
Q: How long can redaction reasons be?
A: Redaction reasons have a 260-character limit, matching regular redactions.
Q: Can a user cancel a mass redactions job in progress?
A: No.
Q: What happens when we rollback a completed mass redactions job?
A: We target the original redacted documents and remove redactions applied during that job. We do not affect manual redactions. Deleted documents from the original set are skipped. New documents similar to the original set are also skipped, as they lack the job ID.
Q: Can I distinguish between hand-drawn and system redactions?
A: Yes. Search for hasRedaction(mass) for mass redactions and hasRedaction(manual) for hand-drawn redactions via the redactions tool.
Q: What happens if a user edits a system-drawn mass redaction?
A: Any edit converts it to a manual redaction. We never modify or delete manual redactions, as we consider them important.
Q: Will the system mass redact documents the user cannot view?
A: No. We skip documents without user edit access via doc set permissions, consistent with other mass actions.
Q: What review databases support mass redactions?
A: All review databases created on or after August 19, 2024.
Q: How many rows can a user add in a single redaction request?
A: Twenty (20).
Q: What is the maximum number of redactions that can be added per document?
A: 500
Q: Does this limit include manual and mass redactions?
A: Yes both count cumulatively towards the 500 redaction limit
Q: Will I be warned if a mass redactions job exceeds this limit?
A: Yes we will create a downloadable exceptions report that will link you to every impacted document.
Mass redactions pattern regular expression FAQ
We explain how our regular expression patterns function to clarify their role in document searches during mass redactions. Our pattern-matching process sequentially scans text from a starting point, advancing to find matches. If a pattern fails to match, the engine backtracks to previous positions and tries different combinations. A match is valid only if the entire pattern exactly corresponds to a text segment.
Credit card numbers:
Identifies credit card numbers, typically 16-digit sequences appearing in various formats:
Separated by spaces, dashes, or dots (e.g., 1234 5678 9012 3456 or 1234-5678-9012-3456).
How the credit card pattern works:
Start of the Number: The pattern begins at a line start or after characters indicating a new segment, such as new lines, tabs, or punctuation.
-
Number Format: It checks for numbers arranged:
In sets of four, possibly separated by space, dash, or dot, handling line breaks like new paragraphs.
In other standard credit card configurations with more digits.
End of the Number: The pattern ensures the sequence ends before characters that typically mark a text segment's end or at line end.
Social Security Numbers (SSN):
U.S. Social Security numbers usually follow three digits, a separator (hyphen, space, or dot), two digits, another separator, and four digits (e.g., 123-45-6789).
How the SSN pattern works:
Scans documents for number sequences matching the standard SSN format.
Start of the SSN: Identifies numbers starting at a new line or after punctuation or spaces, signaling new information.
SSN Format: Checks for three digits, a separator, two digits, the same separator, and four digits.
End of the SSN: Ensures the sequence ends before punctuation or line end, marking the data's likely end.
IP (Internet Protocol) Addresses:
An IP address consists of four number groups, each from 0 to 255, separated by dots (e.g., 192.168.1.1). These identify computers on a network.
How the IP address pattern works:
This pattern scans documents for numerical patterns matching IP address structure:
Start of the IP Address: Identifies numbers starting at a new line or after spaces, tabs, or punctuation indicating a data break.
-
Number Groups: Verifies four groups:
Each group ranges from 0 to 255, covering all valid IP numbers.
Groups are separated by dots. The regex handles dots followed by line breaks, recognizing IPs split across lines.
End of the IP Address: The pattern ensures the sequence ends before characters indicating data end or at line end or before space or tab.
Phone Numbers:
This pattern identifies typical phone number formats, supporting formats from various international regions.
How the phone number pattern works:
Starting Point: Identifies a clear start before a phone number, such as line start or after punctuation or spaces, ensuring detection of phone numbers, not other text.
Country Code (Optional): Checks for an optional '+' and up to three-digit country code, supporting international formats.
-
Area Code and Main Number: Searches for:
Area code, possibly enclosed in parentheses (e.g., (123)) or without (e.g., 123).
Main number body, possibly divided by spaces, dashes, or dots, typically with 2 to 8 digits per block.
Formatting Flexibility: Accommodates line breaks or spaces splitting phone numbers across lines.
Ending Point: Ensures the phone number ends before punctuation, line break, or space, distinguishing it clearly.
Email Addresses:
This pattern detects email addresses by recognizing their structure.
How the email address pattern works:
The pattern scans documents for strings matching email format, identifying each part:
Start of Email: Ensures the email does not start unexpectedly within unrelated text by checking for spaces, punctuation, or new lines before it.
Username: Examines the section before "@" for letters, numbers, and optional symbols like dots (.), underscores (_), percent signs (%), plus signs (+), and hyphens (-).
Domain: After "@", searches for domain names with letters, numbers, dots (.), or hyphens (-), matching typical patterns like "example" in "http://example.com ".
Top-Level Domain (TLD): Looks for a TLD (e.g., ".com", ".org", ".net") with at least two letters to validate common endings.
End of Email: Ensures the email ends properly, followed by spaces, punctuation, or line end to separate it from adjacent text.