Preliminary steps for predictive coding

Before using predictive coding with either the standard workflow or Continuous Active Learning (CAL), you must perform the initial steps described in this topic.

Create a binder

For information about how to create a binder, see Work with binders.

Create a population

A target population is a static set of documents that you want to code with a predictive coding model.

To create a population:

On the Case Home page, under Analysis, click Populations.

Click Add population.

In the Name box, type a name for the population.

In the Source list, select a source for the population, either Binder or Saved Search.

In the list that appears, select the binder or saved search name.

Click Save.

The new population appears at the top of the list on the Populations and Samples page. To view the documents in the population, click the number in the Documents column.

Create a sample

A sample is a representative subset of documents from a population. You create a sample from a population to validate the population's predicted codes against human reviewers' actual marks.

To create a sample:

On the Case Home page, under Analysis, click Samples.

Click the Add sample button in the row for a population. Or, click Add sample at the top of the page.

In the Name box, type a name for the sample.

In the Population list, select the population from which you want to create the sample. This box is unavailable if you are creating the sample from the button in a population's row.

In the Size box, type the number of documents to include in the sample. The number must be equal to or less than the number of documents in the population, minus the number of documents in all other samples for this population.

Click Save.

The new sample appears in the list under the selected population. To view the documents in the sample, click the number in the Documents column.

Create a custom predictive coding template (optional)

A predictive coding template contains a predefined set of parameters that govern the behavior of a predictive model. You use a predictive coding template when adding or configuring training for a model. 

Note: When a template is in use by a CAL or predictive coding model, it cannot be edited. To view the names of the models that are using a template, click on the template name on the Case Home > Analysis > Predictive Coding Templates page. The template's Properties page contains the names of the models that are using the template.

To create a predictive coding template:

Go to the Case Home > Analysis > Predictive Coding Templates page and click Add.  

The Properties page appears.

Add a name and description for the template, and then click Save.

The Fields page appears.

To add fields to the template, select a field in the Add field list and click the + (plus sign) button. 

Note: The values of date fields included in a template appear as text strings.  

The field appears on the Fields page.

A value appears in the Weight column for the field. The weight for each field is 1 by default, but you can change the value to anything between 1 and 10.  Weight reflects the amount of influence a field has on the model in relation to other fields in the template.  For example, if you want the People - Between field to be more heavily considered in the model than the Date Added to Case field, adjust the value in the Weight column for the People - Between field to be 2 and the value for Date Added to Case to 1.

Click Finish.

To select how date type field data should be treated in the model, go to the Case Home > Analysis > Predictive Coding Templates page and select a template. On the Fields page, click in the Date Value column, and then select a value for how date information should be treated in the template. The options are as follows:

Text: Date information is treated as a text string.

Day, month, and year: Date information is modeled without time. This option is the default.

Month: Date information is treated as a number with January = 1, February = 2, and so on.

Day: Date information is treated as a number.

Day of the week: Date information is treated as a number with Sunday = 1, Monday = 2, and so on.

Year: Date information is treated as a number.