In order to understand DISCO’s production deduplication options, you must first understand how DISCO deduplicates documents upon ingest. As part of our standard processing, DISCO will deduplicate data across all custodians and data sources while maintaining complete families. This process results in a single record for each member of a family or stand-alone document. All Instances of duplicate data are maintained in DISCO’s data storage but only one record of each document will be displayed for review. Furthermore, upon production you can choose to produce either one instance of each duplicate record, one instance of each duplicate per custodian, or all instances of each duplicate.
Once your documents have been ingested into DISCO, you will be able to review each document (or record) and categorize each by applying tags or placing them into folders. Once you have completed the review and categorization of your documents, you will be ready to create a production. DISCO’s Production feature can be accessed by navigating to the MENU > PRODUCE > Productions.
When you open the Production feature, you will be presented with various options such as naming your production, selecting your criteria (such as all documents tagged responsive but not Attorney-Client or Work-Product), choosing your Bates prefix along with starting Bates number, and adding a Confidentiality stamp. Once these selections have been made, you can either choose to run the production using DISCO defaults or open the Advanced Options section to reveal more choices.
Click on the Show advanced options button to see your additional production options.
Under Advanced Options, you will be able to choose document and load file formats, volume labels and breaks, and how your documents will be sorted within your production. Furthermore, you can choose to produce as native by file type or tag, create custom slipsheets along with slipsheet rules, and include a native file for each document (unless redacted). While some options, such as producing natively with slipsheets, will impact the overall page count of your production, it is the production Deduplication Level that will impact the number of documents that are produced.
DISCO offers three (3) levels of deduplication, as follows:
- Global deduplication by family (Default): Produces each duplicated family in the production one time.
- Custodian-level deduplication by family: Produces a separate copy for each custodian associated with a duplicate family.
- Full reduplication: Produces documents as they were ingested into DISCO, prior to deduplication.
To further understand how Production Deduplication Levels work, we will walk through an example scenario. You are working on a case in which you collect the mailboxes of (4) custodians (Ann, Bob, Charles and Danielle) from your client’s email server. In addition to the emails found on the server, Charles has a local copy of his email which is also collected. All five (5) mailboxes are ingested into DISCO and globally deduplicated so that your team only reviews one record of each document.
It is important to note, that when families are processed by DISCO, each family member (or document) becomes its own individual record in DISCO. In this example, the email and each attachment will result in a total of three (3) unique records. This allows you the flexibility to produce, slipsheet, or withhold any family individual family member as needed.
Now, during your review you find that each of the four (4) custodians is a recipient to an email that has two (2) attachments. Again, while this email was contained in each of the mailboxes collected and therefore ingested five (5) times, your team will only review it once. Let’s say you have decided to produce the email and both attachments. Here are the production results based on Deduplication level:
- Global deduplication by family: DISCO will produce the email and two (2) attachments, for a total of three (3) documents. (exactly how it is displayed for review)
- Custodian-level deduplication by family: DISCO will produce each family four (4) times, once for each custodian, for a total of twelve (12) documents.
- Full reduplication: Will produce each family five (5) times, replicating what was ingested, for a total of fifteen (15) documents.
Finally, it is important to note that regardless of how many instances of a document you decide to produce, you can choose to include all the metadata associated with the various instances of that document. When creating a custom load file, you can choose to include the following fields of instance metadata:
- Custodian - Name of the custodian of the first instance of a document or selected custodian
- DupCustodian - Name(s) of the custodians all of the duplicate instances
- AllCustodians - Names of all the custodians for all the instances
- FileName - File name of the the first instance of a document or selected custodian
- DupFileName - File name(s) for all of the duplicate instances
- AllFileNames - File names for all instances
- Path - File path for the first instance of a document or selected custodian
- DupPath - File path(s) for all of the duplicate instances
- AllFilePaths - File paths for all instances
To create a custom load file, select Custom DAT… from the Load file format drop down menu. Here you will be able to view all of the fields that are included in DISCO’s standard load file. By default, the follow instance level fields will be included: Custodian, DupCustodian, FileName, DupFileName, Path, and DupPath
To modify the selects or to add AllCustodians, AllFileNames, and AllFilePaths click on the Select metadata to include button. Use the checkboxes to include or exclude fields of information from your production load file.