Question: I have noticed that there are a number of duplicate documents in my database. Why didn't my documents deduplicate as expected?
Answer: Upon review of these items, it appears that while they look the same, the documents are in fact quite different.
They have different file paths, different modified dates and different sizes. That is enough for the system to trigger a non-deduplicate designation. The size difference (242.83 KB vs 242.39 KB) is a direct indicator that something else was added/removed from one of the files.
We can run the Near Duplicate (Similar Document) job for this and it would most definitely show these two as 'Similar' documents, but the system doesn't consider them duplicates because of the file discrepancies it found between the two items.
This is working as intended and isn't something the system didn't do or that we would have been able to prevent.