Extract text

You can extract text from native files. Text extraction is available only for search results of base documents.

Note: As of Nuix Discover version 10.6.005, for the Predictive Coding and Production features, Nuix Discover is configured (by default) to skip email headers for .msg files during text extraction.

To extract text:

To enable the Extract option, in the List pane, select at least one document in the search results.

On the Tools menu, select Extract text.

The Extract text dialog box indicates the number of documents selected for text extraction and the number of documents that are not eligible for text extraction.

To view a list of the ineligible documents, click the blue link. Documents are not eligible for text extraction if they meet the following criteria:

They have no content files.

They have a status of Submitted or In progress.

Optionally, to specify a field for the application to use to identify which files to extract text from, select the field in the Field to identify file for text extension list.

Note: The default field is the value that exists in the Production default native field case option. If no field is selected, or a field is selected but does not contain a value, the application extracts text from the highest ranked content file. For more information about case options, see Case options.

To avoid extracting text from documents that already have content files, select either or both of the options under Skip text extraction for documents if:

Text has already been extracted from the document successfully: Select this check box to avoid extracting text from documents that have already had text extracted successfully, regardless of which file extension was used to extract text from it previously. Click the blue number to view a list of affected documents. To re-run text extraction, clear the check box.

A separate .txt file already exists for the document: Select this check box to avoid submitting the documents for text extraction if a .txt file is already associated with the documents. Click the blue number to view a list of affected documents. To run text extraction even if a .txt file exists, clear the check box.

To schedule the text extraction job, select a date and time under Start time. To submit the job for processing immediately, leave the default setting, Now.

Click OK.

If text extraction is successful, extracted text is stored in the database. You cannot view extracted text in the application interface.