An ingest session card is automatically generated for every ingest or HSU collection session. To find a list of all ingests and their cards, click Ingest from the DISCO menu.
Ingest Card Information
Each ingest session card includes a snapshot of the following information about your session, taken at time of ingest.
-
Ingested by – The name of the person who performed the ingest
-
Type – Whether the files ingested were native or processed (load file) data
-
Billing size – The billing size of the data
-
Data Space – The target data space for the ingest. Only shows if the review database is enabled for ECA.
-
Documents processed – The number of documents added or updated with new instances in the review database as part of this ingest session.
- The hoverable tooltip over Documents Processed will show how many documents are net new vs. duplicative of the existing documents in the database.
-
Documents processed will always be the sum of new documents and duplicates. The duplicates number will be the sum of the duplicates within the ingest session and across the database at time of ingest.
- For example, a new ingest with 20 new documents may have 15 new documents and 5 duplicates. But of those 5 duplicates, 3 were duplicates within the ingest session and 2 were already in the database. In this scenario, clicking Show documents will show 17 documents even though there were 20 new documents added to the system, as the 3 duplicate instances within the ingest session are included under a single document in the search results.
- The ingest details page for any ingest session also includes these numbers and a link to identify the documents from that ingest session which have other duplicates in the database.
-
Exceptions – The number of processing exceptions from this ingest session. The number itself will link to the search view and search results to identify the specific documents that were exceptions.
The Show documents button will take users to the Search view with a list of documents that are either new or had an additional deduplicated instance added. Because the system will deduplicate both within the ingest session and against anything already in the database, the number of documents processed may not be the same as the number of documents shown in the search results.
Note: the numbers shown on the card are not dynamic and will not update based on further ingests or culls.
Ingest Card Statuses
Ingest cards also provide status and progress metrics on the tasks that DISCO performs when data is processed. These tasks are broken into 4 major stages:
Uploading → Processing → Enriching → Data Ops QC [Optional]
Select the dropdown caret on each ingest card to show all ingest stages.
Uploading/Uploaded
This stage handles the inventorying and transfer of data into the DISCO environment.
-
Locating files
-
Scanning and collecting the files selected during the ingest wizard to prepare them for transfer to DISCO’s cloud platform.
-
-
Transferring files
-
Uploading selected files to DISCO over your internet connection. The performance during transfer will depend on your available bandwidth, network configuration, and internet service provider. The system will attempt to warn you if it appears to be slowed or impacted by some external factor.
-
-
Preparing files
-
Verifying that all files selected to ingest transferred correctly, then preparing them for upcoming processing steps.
-
Processing/Documents ingested
This stage handles the extraction and processing of files to generate documents for review. DISCO processing runs many of these steps in parallel, so their progress will be identical.
-
Expanding containers
-
Expanding compressed content from zip files, pst files, and other containers, as well as extracting embedded files or attachments from emails or other files.
-
-
Filtering files
-
Identifying and excluding NIST and system files from further processing, limiting the junk and noise in your database.
-
-
Extracting metadata & text
-
Processing files to identify type, extract metadata and text, generate near natives, run OCR, and aggregate all information from a file.
-
-
Deduplicating files
-
Calculating hash values and identifying duplicate documents, duplicative either to documents within this ingest session or to documents already existing within the review database.
-
-
Adding new documents
-
Saving and indexing documents, text, and metadata for newly-updated documents so they are available for search and review in the review database.
-
-
Assigning folders & tags
-
Adding the tags and assigning the folders to new documents as configured in the ingest wizard. Note that this step only occurs if tags or folders are assigned as part of the ingest settings.
-
Enriching/Documents enriched
This stage consists of enhancing documents with additional information based on their content and the content already in the review database. These processes are not necessary to search and view documents, but they do enrich the new and existing documents in the review database with more information.
-
Finding similar documents
-
Comparing new documents from this ingest session to existing documents in the review database and assigning a similarity score to any documents, both new and old, that are identified as over 80% similar in content.
-
-
Building conversations
-
Analyzing new emails and building them into new or existing conversation threads in the review database.
-
DISCO QC
This stage is only created for ingests with exceptions that also have the exception handling setting of "Send exceptions to DISCO Professional Services for analysis and remediation" selected.
The stage will contain the DISCO ticket number for this task, as well as the contact information for support for your region.