View and troubleshoot ingestions jobs

You can view ingestions job details as well as perform troubleshooting for jobs that encounter errors. 

View ingestions jobs

You can view the following information about ingestions jobs:

View_properties_of_an_ingestions_job

View_ingestions_job_progress

View ingestions reports

View properties of an ingestions job

You can view detailed properties of an ingestions job after it has processed. The properties appear for all job statuses, regardless of whether a job finished successfully.

To access information about an ingestions job, including file counts, filters, and levels:

On the Case Home page, under Manage Documents, click Ingestions.

Click the name of an ingestions job.

The following table describes the information that appears on an ingestions job's Properties page.

Section

Description

Description

Information entered by the creator of the job to describe or identify it.

Job ID

Unique identification number for each job. You can use this number to track and report issues with a job in a case.

Started

Date and time that the ingestions job began.

Duration

The amount of time it took to process the ingestions job.

Status

The status of the ingestions job. Status messages include the following: In progress, Completed, Error, Completed with Warnings, and Completed with Exceptions.

Processed Files

Total expanded documents: Total number of files after expanding compressed archive files, such as .zip files.

Suppressed documents: Number and percentage of total documents that the application excluded after processing.

Duplicates: Number and percentage of total documents that were excluded through deduplication. Click the Master Document link to run a search for the master documents of all documents that were suppressed by deduplication. The search results appear on the Documents page.

Outside date range: Number and percentage of total documents with a Document Date that is outside of the specified range.

Excluded NIST files: Number and percentage of total files that the application excluded because the files appear on the NIST list.

Note: For information about hash codes, see the National Software Reference Library (NSRL), maintained by the National Institute of Standards and Technology (NIST).

Excluded by category or extension: Number and percentage of total files that the application excluded because the file types were selected for exclusion in Default Settings.

Does not match search term family: Number and percentage of total files that the application excluded because the files do not match any terms in the selected search term family.

Unsuppressed documents

Number and percentage of total documents available in the application after processing. Click the number to view the search results for all documents in the job. The search results are equivalent to what you would expect to see if you searched using the Evidence Job ID field.

Suppressed Document Ingestion Exceptions

and

Unsuppressed Document Ingestion Exceptions

Breakout of exceptions to the suppressed document settings that were encountered in the [Meta] Processing Exceptions field for suppressed and unsuppressed documents.

Click the number under the Unsuppressed Document Ingestion Exceptions heading to view the search results for exceptions in the job. The search results are equivalent to what you would expect to see if you searched using the Evidence Job ID field AND [Meta] Processing Exceptions / has a value. Separate links to search results for the Evidence Job ID and the specific [Meta] Processing Exceptions list items appear underneath this heading.

Folders

Folder structure: The Folder depth settings selected in the Default Settings.

Source files: Indicates the folder structure of the source data that was ingested.

Filters

Duplication: Indicates whether the documents are deduplicated at the case level, the custodian level, or not at all.

Deduplication by top parent: If this option is selected, the application compares only the hash values of top parent documents. Attachments are ignored. This means families with different numbers of attachments may be identified as duplicates.

Date range: The earliest start date and latest end date in the Document Date fields of the documents.

Exclude NIST files: Indicates whether NIST files are excluded after processing.

Exclude by category or extension: Indicates whether files with selected file types are excluded after processing.

Keywords

The search term family the application uses to filter data.

Levels

User-defined levels in the application under which processed data is organized.

Document ID

Document prefix: The user-defined prefix for each document ID.

Ingestion Details

For more information about these settings, see Configure default ingestion settings.

Time zone: The time zone setting for the job.

Suppressed documents: Indicates whether native files of suppressed documents are retained.

Indexing: Status messages include Indexing only or Indexing and Enrichment.

Personal information: Indicates whether credit card or personal identification numbers were identified and tagged as part of the ingestions job.

Language: Indicates whether the primary language of the documents was identified as part of the ingestions job.

Duplicate coding: If this option is selected, documents in the ingestion take the All Custodian field values from other duplicates in the case and pass those All Custodians values to duplicates. If this option is not selected, documents in the ingestion retain their own Custodian value in the All Custodians field.

Source encoding The source encoding value selected in the Advanced settings window.
Password bank If the administrator did not select a password bank for ingestions in the Advanced settings window, the value displayed in this row is No. If the administrator selected a password bank, the value is Yes.
Chat Settings

The following information is available in this section:

Idle time: Threads are broken into separate documents if the difference in sent times between two messages is equal to or greater than this number.

Minimum messages: Threads containing fewer messages than this number are not broken out into separate documents.

Maximum messages: Threads containing more messages than this number are broken out into separate documents.

EmailFiles The type of file the user selected to be available in the viewer for imported email files.

To download a report of the Properties page, click Download report.

View ingestions job progress

To view the details of the ingestions job's progress:

On the Case Home page, under Manage Documents, click Ingestions.

Click the name of a job.

In the navigation pane, click Progress.

The Progress Summary page appears. A progress bar and details for each submission of an ingestions job are included on this page, with the most recent job at the top of the page.

The following table describes the information that appears on an ingestions job's Progress Summary page.

Column

Description

RPF Job ID

Unique identification number of each job in the Processing Framework (RPF).

Start

Date and time that the ingestions job began.

Progress bar

If the progress bar is green, the steps performed so far have completed successfully. Red indicates that a step failed. The entire bar is yellow when a job completes with warnings.

Percentages appear underneath the progress bar to indicate the progress of the current job.

Status

The status of the ingestions job. Status messages include the following: In progress, Completed, Error, and Completed with Exceptions, and Completed with Warnings.

For more information about the job, click View Progress Details.

The following table describes the information that appears on an ingestions job's Progress Details page.

Column

Description

Name

This column contains the following task status names.

Processing: Creates tables, begins processing, standardizes data.

File Inventory: Identifies and catalogs files.

Pre Processing: Verifies Nuix engine.

Creating RDX Guid Tables: Prepares data tables.

Batching: Breaks data into batches for processing.

Processing: Expands files, gathers metadata, exports data and files.

Standardizing Data: Assigns document IDs, levels, and export_extra types.

Import Files and Metadata: Creates case database entries and copies files to the agent server.

Load Data: Loads the data into the case database.

Cleanup: Deletes the data staging tables.

Hashes: Runs a Hashes job.

Update Field Counts: Updates document counts for fields populated during the job.

Group Coding: Runs the All Custodians stage.

All Custodians: Populates All Custodians values.

Data Filtering: Filters data by date range, removes files on the NIST list, and deduplicates files.

Filter by Date Range: Filters data by the user-defined date range.

Filter by excluded Files: Suppresses files of types selecting in the settings.

Filter by NIST: Removes files on the NIST list, if indicated by the user.

Filter by De-duplication: Removes duplicate files, if indicated by the user.

Transfer Unsuppressed Files: Copies the files that were not filtered out of processing.

File Copy Batching: Divides the files into batches for transfer.

File Transfer: Copies the files from the ingest folder to the images folder.

File Transfer Confirmation: Verifies that files have been successfully copied.

File Copy Rebatching: If any files failed to copy, divides remaining files into batches for transfer.

Retry File Transfer: Copies the remaining files from the ingest folder to the images folder

Finalize File Copy: Verifies that files have been successfully copied.

Indexing: Creates and updates indexes.

Search Term Family: If a search term family was selected by the user, this column includes files that correspond to the selected search term family.

Transfer Suppressed Files: Copies the files that were filtered out of processing.

File Copy Batching: Divides the files into batches for transfer.

File Transfer: Copies the files from the ingest folder to the suppressed folder.

File Transfer Confirmation: Verifies that files have been successfully copied.

File Copy Rebatching: If any files failed to copy, divides remaining files into batches for transfer.

Retry File Transfer: Copies the remaining files from the ingest folder to the suppressed folder

Finalize File Copy: Verifies that files have been successfully copied.

Gathering Report Data: Generates data for job specific reporting.

Finalize Job: Deletes temporary files and tables.

Cleanup: Removes temporary tables and sets final job status.

Tasks Completed

Number of subtasks completed out of the total number of subtasks.

Duration

Amount of time taken to complete each task.

Start

Date and time that the task began.

Progress

Task's percentage of completion.

To view the Tasks page for each status, in the Name column, click the stage.

The following information appears on the Tasks page.

Column

Description

(Task status icon)

Hover over the icon to view information about the status.

Task ID

The unique number for the task.

Start

The date and time that the task started processing.

Duration

How long it took for the task to complete.

Supervisor

The name of the supervisor executing the task.

Status

The status of the task.

Progress

The task's percentage of completion.

To view a task's input, output, and error detail, click a Task ID.

The XML page appears, displaying the input, output, and error detail.

Note: A task's XML input is the XML set that provides instructions for a task to do its work. The task output is the XML output that the application creates if the task succeeds with warnings. You can view this information and error data if the task encounters an error during processing.

View ingestions reports

The following reports are available for ingestions jobs: File type by custodian and Files processed. The reports list details of each processed file in an ingestion, including a link to the documents in the case that were generated from each file.

To download a report about each ingestion:

On the Case Home page, under Manage Documents, click Ingestions.

Click a job name.

In the navigation pane, click Report.

The following basic job information appears at the top of the page.

Job ID: Unique identification number of each job.

Total exceptions: The total number documents with exceptions.

Date range of documents: The minimum to maximum Document Date.

Total files: The number and size of files included in the ingestion.

Document ID range: The minimum to maximum Document ID.

Click the File type by custodian or Files processed tabs. To save the report as a spreadsheet (.csv file), click Download report.

The following table describes the information that appears in the File type by custodian report.

Column

Description

File type by custodians

Lists each custodian and the types and number of files belonging to each custodian.

Expanded

Total number of files extracted from compressed archive files, such as .zip files.

Duplicates

Number of duplicate files.

Suppressed

Number of files that were excluded after processing.

Unsuppressed

Number of files available in the application after processing.

The following table describes the information that appears in the Files processed report.

Column

Description

File ID

Number assigned to each file that is processed. This value is stored in a field called [RT] DPM File Id.

Path

Path to the processed file.

Name

Name of the processed file.

Extension

File extension of the processed file.

Related Files

Subsequent parts of a multi-part file, such as a .rar file.

Size (bytes)

Size of the processed file.

Suppressed

Number of files that were excluded during processing.

Unsuppressed

Number of files available in the application after processing.

View link

Opens the document on the Documents page.

Troubleshoot ingestions jobs

You can perform the following troubleshooting tasks for ingestions jobs:

Unsuppress documents in an ingestions job

Troubleshoot unprocessed files

Resolve file copy errors

View a table with common [Meta] processing exceptions and possible resolutions

Unsuppress documents in an ingestions job

You can unsuppress all suppressed documents from a completed ingestions job.

Note: If no documents are suppressed in an ingestions job, the button is unavailable. If the Retain suppressed files option is not selected for the ingestions job, the files for these documents are not available in the application.

To unsuppress documents:

On the Case Home page, under Manage Documents, click Ingestions.

Click the name of a completed job.

Click the Unsuppress documents button.

A message appears with the number of documents that the application will unsuppress in the job.

Click OK.

Troubleshoot unprocessed files

You can resubmit unprocessed files for processing or export the entire list as a .csv file for further troubleshooting.

To view unprocessed files:

On the Case Home page, under Manage Documents, click Ingestions.

Click the name of a job with a status of Completed with Warnings.

In the navigation pane, click Unprocessed files.

The following information appears on the Unprocessed files page.

Column

Description

Batch ID

Identification number of the batch containing the unprocessed file.

File ID

Identification number of the unprocessed file.

File name

File name of the unprocessed file.

File size

Size of the unprocessed file.

File path

File location of the unprocessed file.

To view the XML output, click the number in the Batch ID column.

Resubmit unprocessed files

To resubmit unprocessed files:

To access the Unprocessed files page, do the following:

On the Case Home page, under Manage Documents, click Ingestions.

Click a job name.

In the navigation pane, click Unprocessed files.

On the Unprocessed files page, select the check box next to the Batch ID for the file or files to resubmit. To resubmit all files, skip this step.

Click Resubmit.

Select All files to resubmit all unprocessed files, or Selected files to resubmit the files that you selected.

Change the value in the Max files per batch box, if needed.

Click Save.

Resubmitted files appear on the Progress Details page under the original batch ID.

Download the list of unprocessed files

To download the list of unprocessed files as a comma-separated values (.csv) file:

To access the Unprocessed files page, do the following:

On the Case Home page, under Manage Documents, click Ingestions.

Click a job name.

In the navigation pane, click Unprocessed files.

On the Unprocessed files page, select the check box next to the Batch ID for the file or files.

Click Download report, and then click OK.

Open or save the report.

Resolve file copy errors

Ingestions attempts to copy files into the application that failed to copy during the initial ingestions job. If files fail to copy after multiple attempts, the job's status will be Completed with warnings.

You can attempt to resolve any remaining copy errors by manually rerunning the file copy steps.

To manually rerun the file copy steps:

On the Case Home page, under Manage Documents, click Ingestions.

Click the name of a job with a status of Completed with Warnings. A message indicating that some files failed to copy also appears for the job.

Click Retry job.

Common [Meta] processing exceptions and possible resolutions

The following table provides common [Meta] processing exceptions, descriptions of the exceptions, and possible resolutions.

List item

Description

Possible resolution

Corrupted

The application is unable to open the file during ingestion. When opening the file, there is some type of failure or the application is otherwise unable to process the file.

Obtain a new copy of the file, if possible, and reprocess the file.

Data Type Conversion Failed

This indicates an invalid date that was extracted for a date value in processing. If this is coded, additional detail is in the [RT] Ingestion Exception Detail field.

No resolution. Refer to the [RT] Ingestion Exception Detail field.

Databases

Items where the Kind = "Databases" from the supported file types list.

No further action. (This is just an informational flag.)

Deleted Item

Permanently deleted items that were recovered from slack space.

No further action. (This is just an informational flag.)

Empty File

Items that are 0 KB in size.

No further action. (This is just an informational flag.)

Encrypted

Items that the application has determined to contain encrypted content.

Obtain the password for the file, apply the password to the file, and reprocess or replace the file in Nuix Discover.

Or use a password-cracking software or consulting solution to obtain a decrypted copy of the file, and then reprocess or replace the file in Nuix Discover.

Note: Reprocessing applies only to container documents.

Export Failed

Items flagged by the application as "poison" files,

OR items of MIME types application/vnd.ms-outlook-activity, application/vnd.ms-outlook-journal or application/vnd.ms-outlook-task that the application was unable to write to a native file,

OR items where no binary data is available to create a native file.

No further action.

Note: This is just an informational flag.

Export Slipsheet

No longer used.

No further action.

Extracted Text Only

If Ingestions is unable to obtain a native file for a document, and extracted text is available, the text is used in lieu of the native file. This exception is coded if the text is used.

No further action.

Field Data Extraction Error

This indicates an issue with extracting data from the application for a specific value. If this is coded, additional detail is in the [RT] Ingestion Exception Detail field.

No resolution. Refer to the [RT] Ingestion Exception Detail field.

Field Data Truncated

This indicates data that was too long for the target field. The field and the entire value are found in the [RT] Ingestion Exception Detail field.

No resolution. Refer to the [RT] Ingestion Exception Detail field.

File copy failed

The file for this document could not be copied to Nuix Discover.

Note: Nuix Discover no longer codes “File Copy Failed”. If the job fails, that is reported in the job itself. The user can click the Retry Job button on the Properties page for the ingestion and try to copy the file again.

Inaccessible Content

Items with the common name of "Inaccessible Content" from the supported file types list.

No further action.

Note: This is just an informational flag.

License Restricted

Nuix license restricted flag. This applies if the Nuix license does not cover processing of a certain file type.

Contact Nuix for licensing options.

Missing Hash Value

Items that do not have an MD5 Hash value.

No further action.

Note: This indicates that the application was unable to obtain a hash value for a file, for example, for a file that was corrupt or inaccessible, or for a file with a 0 KB file size.

Multimedia

Items where the Kind = "Multimedia" from the supported file types list.

No further action.

Note: This is just an informational flag.

NIST Item

Items with an MD5 Hash value that matches an MD5 Hash value of a known file from the NSRL Reference Data Set, also referred to as the NIST list.

No further action.

Note: This is just an informational flag.

Non-Business Document

No longer used

No further action.

Non-Searchable PDF

Items determined to be a PDF but that do not contain any indexable text.

N/A - If the case option OCR documents without content files is selected, Optical Character Recognition (OCR) will be run on non-searchable PDF files.

Note: If you do not have this case option selected, you may want to OCR these documents.

Renamed Extension

Items where the true file extension determined by header analysis is different from the original file extension.

No further action.

Note: This is just an informational flag.

System File

Items where the Kind = "System File" from the supported file types list.

No further action.

Note: This is just an informational flag.

Text Stripped

Items where the application recognized the file type, but the text and metadata cannot be cleanly extracted. The result is an item that is searchable, but the text may be distorted or not properly formatted.

No further action.

Note: This is just an informational flag.

Unknown Binary

Items where Document Kind = "Unrecognized" or items where the MIME type is application/octet-stream.

No further action.

Note: This is just an informational flag.