Configure ingestion settings

More commonly used settings appear in the Default settings window. Advanced settings are in the Advanced settings window.

Configure default ingestion settings

You can define the default folder structure, filters, and other settings for imported data. You can also create levels and other settings for ingested data that is loaded into the application.

To configure default ingestion settings:

On the Case Home page, under Manage Documents, click Ingestions.

Click Default settings.

On the Folder structure page, define the folder structure of the data that you are loading. If possible, organize your data to mirror the default selections on this page. When you load data into the application after processing, the application imports these parameters as fields.

Important: You should use a consistent folder structure for the life of a case.

For each Folder depth, select one of the following options. The value in the corresponding folder will be captured in the selected field.

Custodian

Note: You must select Custodian for one folder level. Folder depths 1 and 2 are required.

[Meta] Evidence ID

Collection ID

Media ID

Undefined Folder: Select this option if you do not want the application to capture name of the corresponding folder in a field.

Click Next.

On the Filters page, do the following:

In the Family deduplication list, select how you want to deduplicate your data:

Case: Data is deduplicated at the case level.

Custodian: Data is deduplicated at the custodian level.

None: Data is not deduplicated.

Note: If previous jobs in the case ran using only the Case or Custodian option, the Family deduplication setting defaults to either Case or Custodian. If previous jobs ran using the None deduplication option or a mix of Case and Custodian, the Family deduplication setting does not default to any option.

If you want the application to evaluate only the top parent document in a document family when identifying duplicates for suppression, select the Only use the top parent documents to identify duplicates check box.

Note: If you select the option to use only the top parent documents to identify duplicates, this means that families with different numbers of attachments, and families with attachments that have different hash values, could still be identified as duplicates.

If you want to ingest documents only for a specific range of dates, under Date range, clear the None check box. Then, select a start and end date for the data that you want to process. The date of a document is determined by its Document Date field, which is determined in the following ways, in order:

For Emails: 1. Mapi-Client-Submit-Time 2. Sent Date 3. Mapi-Message-Delivery-Time 4. For items that do not have a date, the items receive the date of extraction.

For Attachments: 1. File modified date (file system) 2. File created date 3. For items that do not have a date, the items receive the date of the parent.

For Efiles: 1. File modified date 2. File created date 3. For items that do not have these dates, the items receive the date of extraction.

Parent document dates are not inherited by family members unless a date is not available. For example, an attachment does not typically have the parent email’s Document Date as its own Document Date.

By default, standalone NIST files are excluded. To include standalone NIST files, clear the Exclude NIST files check box.

Note: Only standalone NIST files are suppressed if you select the option to exclude NIST files. The Properties page of a completed ingestions job displays the number of suppressed NIST files.

If the Exclude by category or extension check box is selected, files with the selected file types are suppressed. To change the setting for specific file types, select the applicable check boxes in the list.

Note: Files of the selected file types that are part of a document family are not suppressed unless all documents in the family meet the criteria for suppression. File types that are considered system files are selected in the dialog box by default.

Click Next.

On the Search page, do the following:

If you want to ingest only the files with content that matches a search term family, and exclude all other files, select a search term family in the Search term family list. For information about search term families, see Work with search term families.

Click Next.

On the Levels page, specify how to organize the data in the application.

For Levels 1 through 10, select one of the following options:

Note: Only Level 1 is required. The options that are available on the Levels page depend on your selections on the Folder structure page.

Collection ID: A unique ID assigned to each media collection that consists of the date the media is received and the order in which it is received. Think of the collection ID as an electronic folder or container into which the electronic media is placed.

Custodian: A document’s originating person or organization, such as a department or company.

Constant: A user-defined field that is used in the file name of processed documents. To define the constant., type text in the Value box.

Custodian ID: A numerical value assigned to each custodian that corresponds to the order in which the custodian's data is ingested.

Media ID: A unique ID assigned to a set of processed documents that has been loaded or sent for staging. The media ID, which is created during processing, identifies the processed data and correlates it from the application hosting environment to the tracking database.

[Meta] Evidence ID: A unique number assigned to data when media is staged.

To organize your documents according to batch count, select the Include batch count as last level check box.

Note: If you select this option, the last level is an incremental number that changes for every 1000 documents. This option prevents a single folder from containing a large number of files. We recommend that you do not clear this check box, as doing so can result in performance degradation.

Click Next.

On the Document ID page, do the following:

In the Document ID prefix box, type the text that will appear at the beginning of the processed document IDs. An underscore and an incremental number will appear after this text in the file names of processed documents. Nine digits of padding appear after the underscore character. To view the next available document ID, click Preview.

Click Next.

On the Ingestion Details page, do the following:

In the Time zone list, select the time zone for the processed documents.

The Retain suppressed files check box is cleared by default to save storage costs. To store suppressed files in a separate folder on the search service server, select the check box.

Note: Suppressed files include duplicates, NIST files, and files that are filtered based on a date range of keywords. Suppressed files are tracked in the database, but cannot be loaded into the application without reprocessing the data. Selecting the Retain suppressed files option will increase the processing time of the ingestions job and consume more space on the agent server.

The Run indexing during ingestions jobs check box is selected by default. By default, new documents are indexed during an ingestions job, but the documents are not enriched. If this option is not selected, documents are indexed only if you select a search term family on the Search page in the Defaults settings window.

The Run indexing and enrichment check box is cleared by default. If you select this check box, a separate indexing and enrichment job runs immediately after the ingestions job is complete. This job performs document enrichment in addition to indexing.

Under Language and Personally Identifiable Information, select either or both of the following check boxes.

Identify personal information: Locates credit card or personal identification numbers in documents and tags the documents.

Identify language: Identifies the primary language of the documents and tags documents.

To update the All Custodians or related fields for new documents in the ingestions job, select Update group coding fields. Selecting this option also updates the fields for existing and future family duplicate documents in the application.

Click Finish.

Configure advanced ingestion settings

You can customize fields, upload a password bank, and define settings for chat data and encoding for imported data.

To configure advanced ingestion settings:

On the Case Home page, under Manage Documents, click Ingestions.

Click Advanced settings.

On the Customize fields page, the selected fields are included in ingestions jobs. By default, most fields are selected. When cloning a case, these selections are also cloned. Select or clear the check boxes as needed to add or remove fields. You can also hover over the field name to see a description of the purpose of the field.

The following fields cannot be unselected:

Custodian

Document Date

Document Type

Evidence Job ID

[Meta] File Extension - Loaded

[Meta] Processing Exceptions

[RT] DPM File ID

[RT] Ingestion Exception Detail

[RT] MD5 Hash

On the Chat data page, do the following:

In the Minimum messages per thread box, enter the minimum number of messages that should be used to break a thread into documents. Threads containing fewer messages than the number specified in the Minimum messages per thread box are ingested as a single document.

In the Maximum messages per thread box, enter the maximum number of messages that should be used to break a thread into documents. Threads containing more messages than the number specified in the Minimum messages per thread box are broken into separate documents.

Note: The application breaks a thread into separate documents at the largest idle period between messages that is closest to the maximum message limit. The application breaks the thread at exactly the maximum message limit if the thread does not contain a 15-minute idle period.

In the Idle time (hours) box, enter the time difference (in hours) between when messages in the thread were sent that should be used to break a thread into separate documents.

Click Next.

On the Encoding page, select the correct encoding type in the Source encoding list. The default type for new and existing cases is windows-1252. Cloned cases will retain the setting from the clone source.

Note: The application attempts to detect the encoding for all processed files. If the application cannot determine the correct encoding, the encoding selected in this setting is used.

On the Password Bank page, to submit a list of known passwords for a case, select the Use the password bank to decrypt the files check box. In the File source box, click Browse and navigate to the plain text file (in .txt format) that contains your passwords. Then, click Upload files.

Note: The .txt file must contain one password per line. By default, if you upload a .txt file when existing passwords are already present, the application adds new, unique passwords to the bank. If you select the Overwrite all previous passwords option, the application overwrites all existing passwords.

To download a .txt file of the existing password bank, click Download password bank file. The file name for the .txt file is in the following format: "PasswordBank_{date/time}.txt."

On the Email files page, you can select the type of file that will be available in the viewer for imported email files. Select the correct file type in the Files to include for email data list.

The default type for new and existing cases is MHT. Select MSG/EML with attachments to include embedded attachments as part of the email document.

Select MSG/EML without attachments to remove attachments from the email file. The attachments are available as separate documents and are imported as attachments to the email.

Note: Processes such as indexing, imaging, and export include any embedded attachments when acting on the email document.

Click Finish.