Preferred composition of digital data for DISCO
This document is designed to provide guidelines and best practices for sending digital data produced by other parties for ingestion into DISCO.
Documents produced as images require a separate, image-only load file, in addition to the comprehensive load file described below. DISCO prefers an Opticon (.opt or .log) or other similar file format that shows the document boundaries and the file path to the location of the image on the delivery media.1
Bates ranges, metadata, and file paths to natives and text should be included in a comprehensive load file. For this, DISCO prefers a DAT file with standard Concordance delimiters. Opticon or similar cross reference files are still required for images. It is highly recommended that the load file contain a field for the Bates number ranges in order to populate both the BeginBates and EndBates fields in DISCO. To ensure the most efficient ingest, we strongly encourage adherence to these guidelines.
Image file production specifications2
Requirement | Description |
---|---|
File Image Format / TIFF Production |
Document images will be provided as whole-document PDF, single-page TIFF, or JPG format, using Group 4 compression with at least 300 dots per inch (dpi) resolution. Images may be reduced by up to 10% to allow for a dedicated space for page numbering and other endorsements of documents. Images will be in black and white, unless color is necessary to understand the meaning of the document. |
Load File |
A cross-reference load file in an Opticon (.opt or .log) or other similar file format will accompany the images, showing the document boundaries and the correlation between the unique page identifier of the document (such as Bates number) and the location of the file on the delivery media. |
Unitization |
Each page of a document will be electronically saved into an image file. If a document is more than one page, the unitization of the document and any attachments will be maintained as it existed in the original form and reflected in the load file. The parties will make their best efforts to unitize documents correctly. |
1,2 “Model Stipulated Production Specifications”. (2016) Legal Technology Professionals Institute. https://themastersconference.com/sponsors/legal-technology-professionals-institute.
Comprehensive load file column specifications
The metadata of electronic document collections should be extracted and provided in a DAT file using the field name and formatting described below. Other fields not listed here may be mapped as custom fields into the database, per consultation with DISCO technical services.
Field name | Content specifications |
BeginBates | Beginning Bates number of the document. |
EndBates | Ending Bates number of the document. |
BeginAttachmentBates | Unique number identifying the first page or first document of a document attachment. |
EndAttachmentBates | Unique number identifying the last page or last document of a document attachment. |
ReviewID | Unique Document ID (if needed). |
ReferenceID | Additional Document Identifier (if needed). |
ParentID | Document ID of the parent of the document (used to create family relationships). |
Author |
Author field extracted from the metadata of a non-email document. Note: This does not include the sender of an email. See From field. |
Custodian | Name of the custodian of the files produced (last name, first name). |
DuplicateCustodians | Identifies duplicate custodian sources for files excluded from production based on MD5 or SHA-1 hash deduplication. |
Filename | Filename of the original digital file name. |
DuplicateFilenames | If collected from multiple sources, the name of each additional file. |
DuplicateOriginalFilepath | If collected from multiple sources, the filepath of each additional file. |
SendDate | Sent date of an email message (mm/dd/yyyy format). |
SendTime | Sent time of an email message. |
LastModifiedDate | Modification date of a non-email document. |
LastModifiedTime | Modification time of a non-email document. |
CreateDate | Date the file was created (mm/dd/yyyy format). |
CreateTime | Time the file was created. |
ReceivedDate | Received date of an email message (mm/dd/yyyy format). |
ReceivedTime | Received time of an email message. |
Subject | Subject (or "re" line) of an email. |
To | To or Recipient field extracted from an email message. |
From | From field extracted from an email message. |
CC | CC or carbon copy field extracted from an email message. |
BCC | BCC or blind carbon copy field extracted from an email message. |
Hash |
MD5 or SHA-1 unique 32 or 40 character hexadecimal value, respectively. A digital file fingerprint. |
PageCount | Number of pages in the document. |
ImageFilename | Filename to a produced PDF image. (Optional if Opticon file is not provided). |
ImagePath | Path to the produced PDF image. (Optional if Opticon file is not provided). |
NativeFilename | Filename to a produced native file. |
NativePath | Path to the produced native file. |
OCRPath | Path to the OCR text file. |
OCRTextFilename | Filename to the OCR text file. |
OriginalFilepath | Original filepath of the document. |
Tags | Work product fields. (Optional). |