Remove duplicate documents
You can improve coding consistency and reduce the volume of documents to code by removing duplicate documents from the Documents page.
A document that is an exact copy of another document is called a duplicate document. The master document among a set of duplicate documents is generally the copy that is loaded into the application first.
If enabled by your administrator, the coding that you apply to a master document automatically applies to any duplicate documents. Duplicate documents inherit coding when you code fields or issues that have the suffix [AC], such as Confidentiality [AC]. The suffix [AC] stands for autocoding.
The master or duplicate documents of a specific document appear in the Master or Duplicates fields in the Code pane.
Documents can be individual duplicates or family duplicates, as follows:
Individual duplicates: If two documents are identical, the application identifies these documents as individual duplicates. Identical documents have the same value in the [RT] MD5 Hash field. The application designates the copy that was loaded into the application first as the master document.
Family duplicates: A document family is a group of related source and attachment documents, such as a group of email messages. For two documents to be family duplicates, the documents must be individual duplicates of each other, and the documents must have identical document families. In identical document families, each family must have identical source documents, and every attachment in one family must have an individual duplicate in the other family. The position of the attachments within a document family does not affect whether documents are considered family duplicates.
Family comparison is more rigorous than individual comparison and typically identifies fewer duplicates.
To identify the master and duplicate documents in a case, your administrator runs the populate hashes job. Hashes are numerical values that identify unique documents and document families. For more information about the populate hashes job, see Populate hashes.
Find master documents
If enabled by your administrator, duplicate documents inherit coding from their master document. You can identify the master documents to review and code.
To find master documents:
Note: You can search for master documents only at the family level.
Select the check box next to the documents whose master documents you want to find.
On the Tools menu, select Find masters family.
All duplicate documents are removed from the document set, and substituted with their master documents. Only master documents and unique documents appear on the Documents page.
Find individual duplicates of documents
To find individual duplicates of documents:
Select the check box next to the documents whose duplicates you want to find.
On the Tools menu, select Find individual duplicates.
The selected documents and any individual duplicates of these documents, including master documents, appear on the Documents page. The document families are ignored.
Find family duplicates of documents
To find family duplicates of documents:
Select the check box next to the documents whose duplicates you want to find.
On the Tools menu, select Find family duplicates.
The selected documents and all duplicates that also belong to duplicate document families, including master documents, appear on the Documents page. This is a more rigorous search for duplicates, and typically returns fewer results than a search for duplicates at the individual level.
Remove individual duplicates of documents
To remove individual duplicates of documents:
Select the check box next to the documents whose individual duplicates you want to remove.
On the Tools menu, select Remove individual duplicates.
When the application removes individual duplicates, it retains one unique copy of each of the selected documents. If a master document is in the original selection, the master document remains. Otherwise, the document that remains is the document that was loaded into the application first from among the selected documents, considered a representative copy of the master document.
Remove family duplicates of documents
To remove family duplicates of documents:
Select the check box next to the documents whose family duplicates you want to remove.
On the Tools menu, select Remove family duplicates.
The application removes duplicates only if the duplicates belong to duplicate document families. If documents are individual duplicates but are not part of duplicate families, the application retains the documents. This option is more rigorous and typically discards fewer duplicates than removing duplicates at the individual level.