Ingest - How to prepare your documents for ingest

You’ve been handed an external hard drive from a lead attorney and instructed to get the data loaded as soon as possible, preferably yesterday. Stressed out with a hundred to-do items on your plate, you’ve also been told the data is being loaded into some strange new system called DISCO. What?!

Breathe. This document is going to help you get the job done as efficiently as possible, hopefully with time to spare to save you stress and one less angry attorney.

Read on for instructions on how to effectively prepare your data for DISCO:

Prepare Your Documents For DISCO

You were given…

Native Files: We can handle most common file types. Read More for a breakdown of DISCO supported file types HERE. In general, email containers (PSTs) process the quickest. Compressed files (ZIP, RAR) tend to take the longest. For the fastest ingest, we strongly suggest you ask:

    1. IMPORTANT: Who are the custodians?
      Organizing data by one folder per custodian is vital for the correct loading of data in DISCO. Please specify the custodian information when you submit data to DISCO.
    2. What are the (majority) file types?
      Giving DISCO a heads up concerning the majority file type(s) within the data will help us gauge the ingest ETA with the best accuracy.
    3. What is the overall size?
      As with file types, giving DISCO an accurate read on the overall size of the data will help us prioritize and load the documents in the most effect manner.
    4. Any container files (ZIPS, RAR, ARCHIVES)?Compressed files take longer for us to process and ingest. That said, if we have an accurate count of container files, we can more accurately quote an ETA.
    5. Any encrypted files?
      If your data has any encrypted files, you will need to un-encrypt the data before sending it to us for the best review experience.  
    6. File path maximum
      There is a 260 character limit for a document's file path. Characters include letters, spacing, slashes, etc. While a rare occurrence, if we detect your file path maximum is reached, we'll send the data back to you to shorten the file paths as we are unable to access the documents if the file path is 260 characters or more. 
    7. What is the data review timeline?
      Let us know the general data review timeline so we can help you strategize concerning resource-heavy activities like mass coding/tagging, culling, or producing documents.


Data + Load File: if given a load file from the opposing counsel, DISCO prefers a concordance DAT load file, an OPT or similar file containing the PDF (or TIFF) images, and text files. These text files are either extracted or OCR depending on the document. It is also HIGHLY RECOMMENDED that the load file contain a field for the batesnumber ranges so we can populate both the "BeginBates" and "EndBates" fields in Disco. This allows our customers to search and sort the batesnumber field. To assure the most efficient ingest, we strongly encourage for these guidelines to be followed. If you are exporting data from another platform in Disco, we can accept the following load file fields:    

    • Doc ID
    • Group ID
    • ParentBates
    • Page Count
    • Attachment Count
    • Has Attachments
    • Custodian
    • Duplicate Custodians
    • Date Created
    • Date Modified
    • Tags
  • Author
  • File Name
  • Original Path
  • Duplicate Paths
  • File Extension
  • File Length
  • Email "From"
  • Email Received Date
  • Email Subject
  • Extracted OCR Text

For comprehensive load file column specifications, please read 'Incoming Production Standard Formats'. 

Hard Copies/Paper: Any paper documents must be scanned before loading into DISCO. Here are ways to make scanned documents ingest most efficiently

    1. Documents should be scanned one doc. per PDF
    2. Pagination is of utmost importance to preserve unitization and document breaks
    3. Pages should be straight and not skewed from mis-feeding into a scanner
    4. Preferred scan resolution is 300 dpi or higher, but no lower than 150 dpi.The higher the dpi (600+), the better the OCR fidelity
    5. (Optional) Include field “box number” or “folder” as metadata for reference in DISCO


Send Data to DISCO

Once you are ready to send your data to DISCO for processing and ingestion, you may send us your data in one of two ways:

    1. Less than 100GB: Secure FTP (contact support@csdisco for credentials and instructions). Read instructions on how to use a FTP Client here.
    2. Greater than 100GB: Ship your drive(s) to:

      CS Disco, Inc.
      c/o New Data
      4400 Post Oak Parkway
      Ste. 2700
      Houston, TX 77027

In the meantime, your Project Manager will be working side by side with our Data Services team to create your database. Once we have the data in our possession, we’ll give you an estimated ETA based on the queue, data integrity, and data size. When the ingest is complete, you’ll receive up to two emails:

  1. A confirmation email that the data is ready will be sent. Any issues will be sent in an attached Exceptions Report to the notification email.
  2. For new databases, you will also be sent an email with login credentials and instructions on how to begin document review in DISCO.

That’s it! 
Always feel free to visit our Help Center at http://support or email us with questions at Or contact your assigned Project Manager!


Keywords: ingest file format, collections, supported file types

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request