Follow

How to ingest your data

You may need to ingest different kinds of files, which may require different methods to ingest them into your review database. Choose what kinds of files you need to ingest:

  • Native files: A native file is the original version of a document in the original format, usually collected directly from the custodian.
  • Load files: A load file is used to import data into a database. It can carry extracted, searchable text, metadata about the documents, and information about the relationships between documents.
  • Hard copies: A hard copy is a paper document that must be scanned before it can be ingested.

If you want to know what kinds of files DISCO supports, see our list here.

No matter what kinds of files you ingest, we will extract all compressed, container, and embedded files. There will be separate, related records for both original and embedded file(s). For example, if you have an Excel document embedded in a PowerPoint, DISCO will create one document/record for both the PowerPoint and the Excel. In addition, the Excel file will be recorded as an attachment to the PowerPoint.

If you want to know what metadata is extracted by DISCO, see our list here.

How to ingest native files

First, prepare your files:

  1. Organize the files with one folder per custodian.
  2. Unencrypt any encrypted files.
  3. If any documents' file paths are over 260 characters, shorten those file paths.

Then, determine if you want to ingest your files yourself OR if you want to send your data to DISCO so we can ingest them for you.

If your database has the high-speed uploader in crawl or walk, but you are unsure which method is best for your data, run this speed test and click "Show more info." Find your network’s upload speed. Then, use the table below to see how quickly you can ingest your own documents using our high-speed uploader.

Your network upload speed in Mbps

How much data you can upload in 24 hours

in GB

in TB

25

251

0.25

50

503

0.49

100

1004

0.98

250

2519

2.46

500

5028

4.91

1000

10056

9.82

If you need your data transferred faster than your upload speed allows, we recommend overnighting your hard drive(s) to us.

I want to ingest my files myself

You can do this if your database has the high-speed uploader in crawl or walk!

Read instructions on how to use the high-speed uploader here.

Unfortunately, if your database does not have this feature yet, you will need to send your files to DISCO for ingestion.

I want DISCO to ingest my files for me

If you choose this option, you can send your files to us one of two ways:

Attn: Media Management

CS DISCO

3700 N. Capital of Texas Highway

Suite 150

Austin, TX 78746

When you send us your data, also provide the following information:

  • What file type(s) the majority of the files are in. This will help us provide an accurate ETA.
  • What the overall size of your ingest is.
  • How many container files are included in your ingest session.
  • What your review timeline is, so we can help you strategize with document tagging, culling, and production.

After you have sent your datacomplete a new data form. If you mailed your files, make sure to include a description of the hard drive(s) you sent, including the color and brand.

 

In the meantime, your project manager will be working side by side with our Data Services team to create your database, if it has not already been created. Once we have the data in our possession, we’ll give you an estimated ETA based on the queue, data integrity, and data size. When the ingest is complete, you’ll receive up to two emails:

  1. A confirmation email that the data is ready will be sent. Any issues will be sent in an attached exceptions report to the notification email.
  2. For new databases, you will also be sent an email with login credentials and instructions on how to begin document review in DISCO.

Please note that when you send files to DISCO, we sometimes cannot ingest all of them. Click here to see more information.

How to ingest load files

First, prepare your files:

  1. See our guidelines for load files produced by other parties.
  2. We prefer a concordance DAT load file or an OPT load file.
  3. Make sure the file contains fields for the Bates number range: BeginBates and EndBates. This way documents can be searched and sorted by Bates numbers.

Then, determine if you want to physically mail your data to us or upload it in-app. With both options, our Data Ops team will ingest the data for you.

I want to upload my load files in-app

  1. Navigate to "Ingest" from the main menu in DISCO Review.
  2. Click the blue "New ingest" button in the top right corner.
  3. Select "Load file" in the dropdown menu.
  4. Complete the wizard. Upload_load_files.gif

I want to mail DISCO my load files

Send your hard drive(s) and a return shipping label to:

Attn: Media Management

CS DISCO

3700 N. Capital of Texas Highway

Suite 150

Austin, TX 78746

When you send us your data, also provide the following information:

  • What file type(s) the majority of the files are in. This will help us provide an accurate ETA.
  • What the overall size of your ingest is.
  • How many container files are included in your ingest session.
  • What your review timeline is, so we can help you strategize with document tagging, culling, and production.

After you have sent your datacomplete a new data form. If you mailed your files, make sure to include a description of the hard drive(s) you sent, including the color and brand.  

In the meantime, your project manager will be working side by side with our Data Services team to create your database, if it has not already been created. Once we have the data in our possession, we’ll give you an estimated ETA based on the queue, data integrity, and data size. When the ingest is complete, you’ll receive up to two emails:

  1. A confirmation email that the data is ready will be sent. Any issues will be sent in an attached exceptions report to the notification email.
  2. For new databases, you will also be sent an email with login credentials and instructions on how to begin document review in DISCO.

Please note that when you send files to DISCO, we sometimes cannot ingest all of them. Click here to see more information.

How to ingest hard copies

You must scan paper copies of your documents before ingesting them into DISCO. When scanning...

  1. Only scan one document per PDF file.
  2. Make sure pagination is included and clear.
  3. Make sure pages are not skewed, which can happen when a page is misfed into a scanner.
  4. Do not use a scanning resolution lower than 150 dpi, but we recommend 300 dpi or higher. The higher the dpi, the better the OCR (optical character recognition) fidelity.
  5. We recommend including a reference field titled "box number" or "folder" in the metadata to help you search for and sort documents in DISCO. 

After you have scanned all your documents, you will have a bunch of PDF files. You can then put those files onto a hard drive or series of hard drives. Then, follow the instructions for ingesting native files above.

When will your files be excluded from an ingest?

When you send your documents to DISCO to be ingested, there are rare instances when a document is not actually ingested into your database. These are the scenarios in which a document will be excluded from the ingest:

There is a high volume of duplicate content

We try to reduce the number of duplicative review documents that are ingested into your review database. Specifically, we try to exclude multiple copies of data that is most likely to be immaterial, such as attachments or embedded objects in emails that are likely to be company logos or signature blocks. If a company logo was ingested hundreds of times into your database as separate documents, then your reviewers would have to review each of these documents separately. So, we identify the likely immaterial duplicate content using different hashes, numbers that uniquely identify different pieces of data.

First, there are documents which are identical to each other, even across different ingests and even if they belong to different families. For example, we may find 250 versions of a blank page with only a company logo on it in your documents. This kind of duplicate is found using the object hash. Our limit for ingesting documents with the same object hash is 200. After 200 documents with the same object hash have been ingested, we will ignore any additional documents with the same hash. At the end of the ingest, we will eliminate all but one copy of the duplicate documents, so that you do not need to review more than one.

The second kind of duplicate is documents that are exactly the same and in the same ingest. In this case, exactly the same means that the documents belong to the same custodian and have the same file path. These documents are found using the instance hash. Once a document has been ingested, no additional documents with the same instance hash will be ingested.

Finally, there are documents that have the same content and family memberships, but have different custodians. We call these deduplicated documents and they are found using the dedupe hash. We will ingest and retain up to 200 instances of documents with the same dedupe hash, and any additional instances will not be ingested.

There are too many child documents in a family

We can only ingest up to 2,000 documents in a single family. When a family exceeds this amount, the members (children) are not ingested and the parent is processed as a native, which means that limited functionality is available during review.

Exclusions are logged in the ingest report

If any of these kinds of documents are excluded from your ingest session, they will appear in your ingest report. You can download the ingest report for each ingest session in your database’s Ingest page, then the Reports page. See how to download and understand your ingest report here.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Chat is online
Chat is woffline