Preferred composition of digital data for DISCO
This document is designed to provide guidelines and best practices for sending digital data produced by other parties for ingestion into DISCO.
Documents produced as images require a separate, image-only load file, in addition to the comprehensive load file described below. DISCO prefers an Opticon (.opt or .log) or other similar file format that shows the document boundaries and the file path to the location of the image on the delivery media.1
Bates ranges, metadata, and file paths to natives and text should be included in a comprehensive load file. For this, DISCO prefers a DAT file with standard Concordance delimiters. Opticon or similar cross reference files are still required for images. It is highly recommended that the load file contain a field for the Bates number ranges in order to populate both the BeginBates and EndBates fields in DISCO. To ensure the most efficient ingest, we strongly encourage adherence to these guidelines.
Image file production specifications2
Requirement | Description |
---|---|
File Image Format / TIFF Production |
Document images will be provided as whole-document PDF or single-page TIFF format, using Group 4 compression with at least 300 dots per inch (dpi) resolution. Images may be reduced by up to 10% to allow for a dedicated space for page numbering and other endorsements of documents. Images will be in black and white, unless color is necessary to understand the meaning of the document. |
Load File |
A cross-reference load file in an Opticon (.opt or .log) or other similar file format will accompany the images, showing the document boundaries and the correlation between the unique page identifier of the document (such as Bates number) and the location of the file on the delivery media. |
Unitization |
Each page of a document will be electronically saved into an image file. If a document is more than one page, the unitization of the document and any attachments will be maintained as it existed in the original form and reflected in the load file. The parties will make their best efforts to unitize documents correctly. |
1,2 “Model Stipulated Production Specifications”. (2016) Legal Technology Professionals Institute. https://themastersconference.com/sponsors/legal-technology-professionals-institute.
Comprehensive load file column specifications
The metadata of electronic document collections should be extracted and provided in a DAT file using the field name and formatting described below. Other fields not listed here may be mapped as custom fields into the database, per consultation with DISCO technical services.
Field name | Content specifications |
Author |
Author field extracted from the metadata of a non-email document. |
BCC |
BCC or blind carbon copy field extracted from an email message. |
BeginAttachmentBates |
Unique number identifying the first page or first document of a document attachment. |
BeginBates |
Beginning Bates number of the document. |
CC |
CC or carbon copy field extracted from an email message. |
CreateDate |
Date the file was created (mm/dd/yyyy format). |
CreateTime |
Time the file was created. |
Custodian |
Name of the custodian of the files produced (last name, first name). |
DuplicateCustodians |
Identifies duplicate custodian sources for files excluded from production based on MD5 or SHA-1 hash deduplication. |
DuplicateFilenames |
If collected from multiple sources, the name of each additional file. |
DuplicateOriginalFilepath |
If collected from multiple sources, the filepath of each additional file. |
EndAttachmentBates |
Unique number identifying the last page or last document of a document attachment. |
EndBates |
Ending Bates number of the document. |
Filename |
Filename of the original digital file name. |
From |
From field extracted from an email message. |
Hash |
MD5 or SHA-1 unique 32 or 40 character hexadecimal value, respectively. |
ImageFilename |
Filename to a produced PDF image. |
ImagePath |
Path to the produced PDF image. |
LastModifiedDate |
Modification date of a non-email document. |
LastModifiedTime |
Modification time of a non-email document. |
NativeFilename |
Filename to a produced native file. |
NativePath |
Path to the produced native file. |
OCRPath |
Path to the OCR text file. |
OCRTextFilename |
Filename to the OCR text file. |
OriginalFilepath |
Original filepath of the document. |
PageCount |
Number of pages in the document. |
ParentID |
ID of the parent of the document. |
ReceivedDate |
Received date of an email message (mm/dd/yyyy format). |
ReceivedTime |
Received time of an email message. |
ReferenceID |
Cross-reference identifier (if needed). |
ReviewID |
Another identifier (if needed). |
SendDate |
Sent date of an email message (mm/dd/yyyy format). |
SendTime |
Sent time of an email message. |
Subject |
Subject (or "re" line) of an email. |
Tags |
Tags or codes added by users. |
To |
To or Recipient field extracted from an email message. |