Similar documents (also known as near duplicates) are documents that share nearly all of the same text. By default, they are documents that have 95% the same content, but are not exactly the same. The similar documents feature in DISCO allows you to adjust this similarity percentage to give you greater control over the documents you are working with.
Searching similar documents
To find similar documents in your database, add the search syntax similarcount(X to XXX) into your search string, replacing the Xs with a number or numbers. For example, you could search similarcount(2) to find all documents with one other similar document (pairs of similar documents).
Viewing similar documents
There are two ways to view similar documents within DISCO. First, from your search results screen you will see an icon on the left side of the result row that indicates there are one or more similar documents associated with this document.
The second way is in the document viewer. Click a document in your search results that has similar documents. In the viewer, you can navigate between the similar documents and view their details in the RELATED DOCUMENTS section.
Adjusting similarity
You can also adjust the percentage similarity of the documents you are viewing.
The similarity percentage dropdown shows the number of documents that are included within the specified percentage of similarity. In addition, the Exact hash match number displays the number of documents that are an exact match (that is, they have the same hash). By default, these exact duplicate documents and/or overflow documents are included in the similar documents listed. If you do not want to include these documents, clear the Include exact duplicates checkbox.
Tagging similar documents
To apply a tag to similar documents:
- On the DISCO home page, search for and click a document that has similar documents associated with it.
- In the document viewer, in the right panel, click the dropdown arrow to open the RELATED DOCUMENTS section.
- Click the dropdown arrow to expand the Similar section.
- Click the tag icon to open the Select tags to apply or remove panel.
- In the Apply changes to dropdown, select the scope of documents to apply the tag to.
- Click in the text box and either type the name of a tag or select an existing tag from the dropdown.
- Select the check boxes for the similar documents to which to apply the tag.
Mass-tagging similar documents
To mass tag similar documents:
- On the DISCO home page, select the documents to which you want to apply a mass tag.
- Click the mass tag icon.
- Select the Documents similar to selected documents checkbox.
- From the similarity dropdown, select the percentage similarity for the documents you want to tag. By default, exact duplicate documents are included in the similar documents tagging. If you do not want to include these documents, deselect the Include exact duplicates checkbox.
When you select a percentage from the dropdown, the tag change options for attachments, family members, and email conversations will change accordingly. - Click Update to apply the tag.
Mass-foldering similar documents
To mass folder similar documents:
- On the DISCO home page, select the documents you want to mass folder.
- Click the mass folder button.
- Select the Documents similar to selected documents checkbox.
- From the similarity dropdown, select the percentage similarity for the documents you want to folder. By default, exact duplicate documents are included in the similar documents foldering. If you do not want to include these documents, deselect the Include exact duplicates checkbox.
When you select a percentage from the dropdown, the foldering options for attachments, family members, and email conversations will change accordingly.
- Click Update to folder the documents.
For a training video about DISCO's similar documents feature, see Similar documents demo.