The hash value of a document is a numeric value of a fixed length that uniquely identifies data. During ingestion, DISCO uses SHA1 hash to identical documents at the byte level. For a document to be deduplicated, all metadata has to be the same. If two or more instances have an identical hash value, they are deduplicated.
Have more questions? Submit a request