During the ingest process, DISCO deduplicates documents strategically based on the document’s family structure. Duplicates are handled as follows:
- Emails with an attachment: if there are identical child documents that have two or more unique parents, DISCO ingests each attachment as a unique document. Previously, email attachments that had different parents were deduplicated. Now you will see the duplicate email attachments in DISCO as separate documents. For example, an email with the same attachment (a contract) was sent to one's lawyer, then sent to a vendor to sign the contract. In DISCO, you will see contract A sent to the lawyer and may tag it as “Attorney-client”. Separately you will see contract B as sent to the vendor and may tag it as “responsive”.
Note: this feature is only available for databases created on or after August 8th, 2015. If the database was created prior to this date, deduplication will follow #2 and all identical documents are deduplicated.
- Duplicate instances of all other documents: if there are identical documents in the database that do not have unique parents, these documents are deduplicated, per historical DISCO processing procedures. For example, two different custodians have the same Word file on their computer. During collections, both identical “.doc” files ingest into DISCO and are deduplicated.
- Limiting documents with 200+ instances: during ingest if there are 200+ identical documents, DISCO will - at the very end of ingest - cleanup the duplicates that litter up a database and leave only one copy of the document in the database. The copy left will be the first instance ingested. The ingest report will have a message when the Doc Count Limit script is utilized.
In DISCO, one document would exist, showing it has two or more unique instances by reviewing the metadata. For example, here we see this document has several custodians. Filepaths will also show deduplicated documents.
Figure 1: document metadata (2015)
Navigate to the metadata pane within the lower left corner of the document viewer to view custodians (Figure 1; DISCO 2015).
Upon production, you have the option to produce (or not produce) duplicates in the following ways:
- Global deduplication by family (Default): Produces each duplicated family in the production one time.
- Custodian-level deduplication by family: Produces a separate copy for each custodian associated with a duplicate family.
- Full reduplication: Produces documents as they were ingested into DISCO, prior to deduplication.
If you have questions, DISCO is here to support you. Email us at firstname.lastname@example.org or call 877-941-0583.