Mass redact a wide range of documents
Mass redactions enables you to efficiently redact search sets inside Ediscovery. It supports redacting multiple terms and common RegEx patterns, with semicolon delimitation for batch processing. Redaction reasons and colors are customizable, with the ability to handle multiple operations in one go. The user experience is consistent with existing mass action workflows, complete with user-level permissions and real-time status updates. Notifications alert users to completion or issues, and redactions are reversible through our mass action log.
As with all technologies that automate Ediscovery, we recommend exercising due diligence with mass redactions. It is generally advisable to perform spot QC checks of your mass redactions results and run test mass redactions in limited sets of documents to ensure they are being applied as expected. You can always undo your test redactions as noted further down in this article.
This feature is in Walk. Please note --> Review databases that were created on or after Aug 19, 2024 will automatically have mass redactions enabled. For older databases, please contact DISCO Desk for options to enable mass redactions.
How to mass redact documents: terms and phrases
- In Search & Review perform a search to locate a set of documents that you want to be mass redacted:
- Above the document list, click the 'redact' button from the mass actions options menu:
- The mass redactions menu will slide out from right to left:
- Enter your redaction terms in the Terms/Patterns area below:
- You can configure your redact with optional values such as case sensitivity, redaction reason and color:
- Click 'Apply redactions' to initiate the mass redactions request and the mass actions modal will close. You'll see in notifications that your mass redactions request has started:
- You can keep track of it in the mass action log:
- Once complete you'll get a notification again:
- Your job in the mass action log will reflect this completed state:
- You'll see the total number of redactions as well as the total number of documents that were redacted at least once in the mass actions log
- Click on the '8 documents' bolded in blue above to be taken to the set of documents redacted during your mass redactions job:
- Open up the first document to inspect the mass redactions for this document:
- You can see that 'bankruptcy' is redacted in white with the 'bankruptcy' reason in the top left hand corner.
- Go back to the mass actions log and click 'rollback' to undo the mass redactions:
- Confirm the rollback:
- The rollback operation will now be underway and it's status will change from red to blue once completed:
- When you click the '8 documents' link above again you'll note that the redactions have been removed:
How to mass redact documents with multiple terms for one redaction reason
Now we'll get into more involved use cases for mass redactions.
- In Search & Review select all documents in you review database and open up the mass redactions modal:
- Open up mass redactions:
- To redact multiple terms with the same reason you can add semi-colon delimiters between terms and phrases. A common use case would be to capture multiple names or variations.
- Click Apply redactions and observe that you job has kicked off:
- View progress of the mass redactions operation in the mass actions log:
- Once complete inspect your documents and verify that the documents were redacted as expected:
How to mass redact documents with patterns instead of terms and phrases
- You can use DISCO's built in Regular Expression pattern matches without having to write or specify the complex RegEx yourself. For example you can match all email addresses without having to write a custom regular expression to do so:
- Open up mass redactions and click 'Insert patterns':
- The terms and phrases box switches to a patterns view:
- Select 'Email addresses' as your pattern and then apply:
- You can see your pattern mass redaction commence in the mass action log:
- Observe that all email addresses are redacted:
What if you have a custom regular expression not supported by our pre-built patterns?
- DISCO professional services can perform custom mass redactions with almost any regular expression on behalf of our customers.
- To get started with a custom mass redactions, contact us @ discodesk@csdisco.com
- From there a member of our data operations team will get in touch to assist you.
How to add multiple mass redactions in one mass redactions request:
- You can add multiple redactions (up to 20) in one mass redactions operation:
- Select '+ Add another set of terms or patterns to redact' to add another row item.
- You can multiple mass redactions requests with different reasons, colors and case sensitivities.
-
Before you submit your mass redactions request you can also broaden your search set:
- +Similar documents will extend an operation to any documents determined to be similar based off the similarity threshold you select.
- +Attachments will extend an will extend an operation to any attachments below a document in its family (e.g., children, grandchildren).
- +Family members will extend an operation to any documents in its family (e.g., parents, children, grandchildren).
- +Conversations will extend an operation to any documents in its conversation.
- Note: These operations stack with each other; take note of how the text next to checkboxes updates as you check certain combinations of checkboxes.
- After applying your redactions go to the mass action log area to observe you have created three separate and distinct mass redactions requests:
Mass Redactions FAQ
Q: Do mass redactions involve extra costs?
A: Mass redactions are offered to DISCO Ediscovery customers without additional cost.
Q: What data spaces are mass redactions available in?
A: Mass redactions are available in active review.
Q: Are mass redactions permissible?
A: Yes, review managers can change permissions for custom roles.
Q: Can I initiate multiple mass redactions or do I need to wait for each one to finish?
A: Yes you can initiate multiple mass redactions.
Q: What documents are mass redactable?
A: All text based documents can have mass redactions applied to them.
Q: Can I mass redact non English languages? Any restrictions to be aware of?
A: Any set of characters is redactable and you can redact multiple languages.
Q: What is the minimum size of a mass redaction?
A: A minimum of three (3) characters excluding spaces or whitespace.
Q: How long can redactions terms be?
A: There is a limit of 512 characters for each redaction phrase or roughly 85 words.
Q: What happens if a spreadsheet has a redaction on it in the spreadsheet view?
A: Mass redactions can be applied to PDF view of spreadsheets only and we prompt the
user to either skip or over-write the spreadsheet views.
Q: What happens if a spreadsheet has an annotation on it on its spreadsheet view?
A: We prompt the user to either skip or over-write theses docs as the annotations will
be removed.
Q: How do redaction overlaps work? If a user does a pattern for all email addresses but also does a custom term “freedman@csdisco.com”...what happens?
A: Words and phrases take primacy over general RegEx patterns. That is if we draw the
same box for different reasons we use the words and phrases over the pattern.
Q: How long can redactions reasons be?
A: They are the same as what's used for regular redactions or a 260 character limit.
Q: Can a user cancel a mass redactions job in flight?
A: No.
Q: What happens when we rollback a completed mass actions job?
A: We target the original document set that was redacted and remove the redactions
we applied to those documents during that job. We do not touch manual redactions. If
a subset of the original set of documents redacted has been deleted, they will be
skipped. If a new but similar subset of documents to the original set has been added
and would’ve been picked up with the original search scope → they will be skipped as
they were not included as part of the original mass redaction and will not have a job
ID associated with them.
Q: Can I distinguish between a hand drawn and system redaction?
A: Yes. You can search hasRedaction(mass) for mass redactions vs. hasRedaction(manual)
for hand drawn redactions via our redactions tool.
Q: What happens if a user edits a mass redaction drawn by our system?
A: It is considered to be a manual redaction, even the tiniest save and edit will
do this. We never ever touch manual redactions as we consider them to be important.
We will never edit, modify or delete manual redactions with mass redactions.
Q: Will the system mass redact docs that the user does not have view access to?
A: No. We will skip any docs that the user doesn’t have edit access to via doc set
permissions like we do with other mass actions.
Q: What review databases can have mass redactions added to them?
A: All review databases created on or after February 26, 2024 can have mass redactions.
Q: How many rows can a user add in a single redactions request?
A: Twenty (20).
Mass redactions pattern regular expression FAQ
We'll explain how our regular expressions 'patterns' function to provide you with a better understanding of their operations during document searches during mass redactions. Our pattern-matching process works by sequentially identifying patterns in text, beginning at a specific starting point and moving forward again and again to find a pattern match. If a pattern does not match at any stage, our engine will backtrack to previous positions and try various combinations to detect a match. A match is only considered valid if the entire pattern corresponds accurately with a segment of the text.
Credit card numbers:
Identifies credit card numbers in documents. Typically, credit card numbers are sequences of digits, usually 16 digits long and can appear in various formats:
- Separated by spaces, dashes, or dots (e.g., 1234 5678 9012 3456 or 1234-5678-9012-3456).
How the credit card pattern works:
- Start of the Number:The pattern begins its search at the start of a line or follows characters that indicate a new text segment, such as new lines, tabs, or punctuation.
-
Number Format: It checks for numbers arranged:
- In sets of four, possibly separated by a space, dash, or dot, and can handle line breaks like new paragraphs.
- In other standard credit card configurations with more digits.
- End of the Number: The pattern ensures the sequence concludes before characters that typically signify the end of a text segment or at the end of a line.
Social Security Numbers (SSN):
Social Security numbers (SSNs) in the United States of America are usually formatted as three digits, followed by a separator (hyphen, space, or dot), then two digits, another separator, and four final digits (e.g., 123-45-6789).
How the SSN pattern works:
Acts as a filter that scans documents for number sequences that match the standard SSN format.
- Start of the SSN: Identifies numbers that start at a new line or follow characters such as punctuation or spaces, signalling the beginning of new information.
- SSN Format: Checks the sequence to ensure it consists of three digits, a separator, two digits, the same separator and four final digits.
- End of the SSN: Ensures that the sequence terminates before certain punctuation marks or at the end of a line, indicating a likely end to that piece of data.
IP (Internet Protocol) Addresses:
An IP address is made up of four groups of numbers, each between 0 and 255, separated by dots (e.g., 192.168.1.1). These numbers identify computers on a network.
How the IP address pattern works:
This pattern serves as a scanner that searches documents for numerical patterns matching the structure of an IP address. Here’s how it operates:
- Start of the IP Address: It identifies numbers that start at a new line or follow characters indicating a break or the start of new data, such as spaces, tabs, or punctuation.
-
Number Groups: It verifies four groups of numbers:
- Each group must range from 0 to 255, encompassing all possible valid IP address numbers.
- These groups are typically separated by dots. The regex can also manage instances where a dot is followed by a line break, helping recognize IP addresses that may be incorrectly split across multiple lines.
- End of the IP Address: The pattern ensures the sequence terminates before characters that likely indicate the end of the data, or it ends at the line's end or before a space or tab.
Phone Numbers:
This pattern identifies text that resembles typical phone number formats and we support phone number formats from various international regions.
How the phone number pattern works:
- Starting Point: It identifies a clear start before a phone number, such as the beginning of a line or following certain punctuation or spaces. This ensures the pattern detected is likely a phone number and not part of other text.
- Country Code (Optional): It checks for an optional country code at the beginning of the phone number, which may start with a '+' and can be up to three digits long. This supports variations in international phone number formats.
-
Area Code and Main Number: The pattern searches for:
- Area code, which may be enclosed in parentheses (e.g., (123)) or presented without them (e.g., 123).
- The main body of the phone number, possibly divided into blocks by spaces, dashes, or dots. These blocks typically contain 2 to 8 digits.
- Flexibility with Formatting: The pattern accommodates formatting inconsistencies, such as interruptions by line breaks or spaces, which might occur if a phone number is split across lines in a document.
- Ending Point: It ensures the phone number sequence ends before a punctuation mark, line break, or space, clearly distinguishing it as a phone number.
Email Addresses:
This pattern is designed to find email addresses. It knows exactly what an email address looks like and searches the text for any matches.
How the email address pattern works:
The pattern combs through a document to locate text strings that match the format of an email address. Here’s how it identifies each part:
- Start of Email: The pattern ensures that an email address does not begin unexpectedly within unrelated text. It searches for spaces, punctuation, or new lines right before the email starts.
- Username: This section, located before the "@" symbol, is examined for a combination of letters and numbers. It may also include symbols such as dots (.), underscores (_), percent signs (%), plus signs (+), and hyphens (-), although these are not mandatory.
- Domain: Following the "@", the regex searches for the domain name, which includes letters, numbers, and sometimes dots (.) or hyphens (-). This ensures the domain follows typical patterns, such as "example" in "example.com".
- Top-Level Domain (TLD): The pattern then looks for a top-level domain (e.g., ".com", ".org", ".net"), which must be at least two letters long to validate common domain endings.
- End of Email: The pattern concludes its checks by ensuring the email address properly terminates. It looks for spaces, punctuation, or the end of a line to distinctly separate the email from adjacent text.