Follow

What are the settings for deduplicating an email? May I customize them?

The hash value of an instance is a numeric value of a fixed length that uniquely identifies data. During ingestion, we look for identical instances at the byte level. For an instance to be deduplicated, all metadata has to be the same. If two or more instances have an identical hash value, they are deduplicated and one document is loaded into DISCO. Keep in mind the metadata about each instance is still searchable and stored on the document within DISCO.

You cannot control the settings of email deduplication unless you give us a load file,  which contains all of the metadata for each instance. If no load file exists, we compare the available data, but likely there will be missing pieces that prevent DISCO from deduplicating some instances.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments