Glossary

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

accuracy: In predictive coding, the proportion of all scored documents for which the predicted code and the human reviewer’s mark agree (including true negatives and true positives). Accuracy is different from recall, in that a high level of accuracy may still be achieved without finding a high percentage of the truly positive documents.

active learning: In predictive coding, a method to more rapidly refine and improve a predictive model by having the software actively select which training documents to add to a model’s training set. Thus, human reviewers review and mark only those training documents that are likely to significantly improve the predictive model’s training.

advanced search: The search functionality in the application that allows you to search all aspects of a database, including fields, redactions, highlights, notes, and productions. You can save the search parameters to use as validation criteria; see also validation criteria.

analysis: An application process that identifies the concepts in a document.

annotation: A note, redaction, or highlight made to a document.

annotation label: Text placed over an annotation.

applied code: In predictive coding, a Yes/No code applied to a document. A user can bulk apply a positive or negative code to all predicted documents in a population, based on the document’s score and the user-defined threshold. Any document with a score greater than or equal to the threshold is coded as positive.

ASCII: American Standard Code for Information Interchange. ASCII is a code that assigns a number to each key on the keyboard.

assignment: A logically related set of documents designed to be reviewed together. An assignment is a subset of a phase.

assignment ID: A unique number used to identify an assignment.

assignment name: The title given to an assignment. An assignment name consists of the assignment prefix plus a unique number; this unique number is distinct from the assignment ID number.

assignment status: The state of an assignment within a phase.

attachment: A document that is appended to another document, typically an email message. See also source document.

autocoding: In the hashes feature, autocoding applies the master document codes for designated fields and issues to all of the duplicate documents.

B

base document: In productions, the original document from which a rendition is made. See also rendition.

batch count: An incremental number that can be included with a document to indicate the batch in which the document was processed.

binder: A group of documents created by a user as a way to organize documents.

Boolean operators: Words such as AND, NOT, and OR that define the logical relationship among words in a search term.

branding: Making a redaction to a document permanent so that it cannot be removed.

C

case database: The database that contains the files and information related to the case.

clear: Documents that meet the validation criteria and that can advance to the next phase in a workflow.

cluster: In the Map pane, a grouping of documents that appear within a circle based on each document’s concepts.

coding: Examining and evaluating documents to determine relevance and identify important terms or phrases, and applying a tag or otherwise marking or flagging the document.

coding template: A saved list of the relevant items that you want reviewers to code for each document or entity item.

column template: A saved list of fields, binders, and other items that reviewers can display as columns in the List pane.

comparison sample: A human-reviewed sample that is created at the beginning of the predictive coding process and that is used to evaluate the model as the model is iteratively improved. The comparison sample is human reviewed and then used to estimate performance on the population from which it was drawn.

completed: A status that indicates that all documents in the assignment meet the validation criteria.

concept: A noun or noun phrase that describes a document and which is identified during analysis. A concept is based on the contents of a document, but it may not be identical to actual words in the document. See also keyword.

concept compass: In the Map pane, the main area of the map where document dots and clusters appear.

confidence level: The probability that the actual value of a parameter falls within a desired range of values. In other words, the chance that a value that you predict to happen, actually happens. In predictive coding, confidence levels are used to measure recall. For example, a 95% confidence level means that if you drew 100 independent, random samples from a population, and then calculated the expected range of recall for each sample, the expected range of recall would contain the true value of recall in about 95 of 100 times.

conflict: In predictive coding, a document for which the human reviewer’s mark disagrees with the applied code. For example, a document with a negative human reviewer mark that received a highly positive model score, or a document with a positive human reviewer mark that received a highly negative model score.

connection: In the data models feature, a link between two related entities in a data model.

container: A file that contains other files. Examples include .zip files, Microsoft Outlook personal folder (.pst) files, and Microsoft Office documents, which can contain embedded objects.

coordinator: A service that provides, creates, and monitors jobs and supervises work assignments.

cube: A set of documents created for early case assessment using a multi-dimensional search technique, instead of just two dimensions like a spreadsheet. Cubes allow you to select and explore the relationships between multiple data types, for example, dates, people, search terms, and pick lists.

custodian: A document’s originating person or organization, such as a department or company.

custodian ID: A unique numerical value assigned to each custodian that corresponds to the order in which the custodian's data was loaded into the application.

D

data model: A representational model of interconnected information that you want to track or analyze. See also entity.

deduplication: A process that suppresses files with content identical to another file (even if the files have different file names) or content wholly contained in another file.

delimiter: A special character that is used to separate data values, such as a comma or semicolon. See also load file.

document: An individual file or mail item (email message, appointment, note, or journal entry) that is processed or loaded into the application. A document can be a native file, a collection of single page image files, or a multi-page image file such as a PDF. A document may also possess a .txt content file.

document date: Core date field. Usually contains the last modified date for files such as Microsoft Word and Microsoft Excel, and the Sent date for email messages.

document family: A document that contains multiple components, such as an email message that includes attachments. A document family can also be a group of documents linked by source and attachment relationships.

document ID: A unique number that is associated with each document in the database.

duplicate (threading): In email threading, any document in a thread whose content, including attachments, is contained in other documents in the thread.

duplicate previous pivot (threading): In email threading, a document that used to be a pivot, but is now a duplicate. This can happen when a new document is submitted for analysis after the thread that it belongs to has been built. If the new document is identified as a pivot, any documents that were previously identified as pivots but that are now wholly contained within the new pivot are now considered duplicate previous pivots.

E

email threading: A feature that analyzes the documents in email conversations, including messages, replies, forwards, and attachments, and then organizes related documents together into threads.

entity: In the data models feature, a category of items that is tracked or analyzed using a data model.

entity item: In the data models feature, a record of data for an entity, similar to a new row in a spreadsheet or a database.

evidence ID: A unique number assigned to data when media is staged. Data can be provided for staging in a number of media formats, including hard drives, DVDs, and CDs.

export: A process that downloads documents from the application to a file repository.

extract: A load process operation that removes files from their containers.

F

false negative: In predictive coding, a document that the model predicted with a negative code, but that the reviewer marked with a positive code. See also mark and score.

false positive: In predictive coding, a document that the model predicted with a positive code, but that the reviewer marked with a negative code. See also mark and score.

family code: The highest-ranking quick code that is applied to any document in a document family. See also quick code.

family ranking: The way that documents are ordered when coding. The highest code (according to the defined ranking order) of any document in a document family is the code for the entire family. See also document family and family code.

field: A document property that is used to associate metadata with a document or an entity item. Collectively, fields define a document or an entity item's attributes.

file repository: A file storage system for storing documents and images. An application file repository contains the path and permissions for connecting to the stored files and folders.

footer: In productions, the text that appears at the bottom of an imaged page in a production. The page can include left, middle, and right footers.

fuzzy search: A type of search query that allows you to search for terms that closely match, even if a word is misspelled. For example, a fuzzy search for "apple" finds "appple."

G

group leader: A review lead who can view cases they have access to, manage document reviews, create and distribute assignments, produce review guidelines, train reviewers, and perform quality control and other functions as delegated.

group member: A reviewer who can view cases they have access to in order to review, categorize, and redact documents.

H

hash: In the hashes feature, a hexadecimal value that uniquely identifies a document.

highlight: A way to annotate a document by applying different colors to the content.

I

image: An image file of a document, such as a .tiff, PDF, or .jpeg.

import: A way to add documents and metadata to the application.

indexing: A process that generates a database of the locations of all of the words in an assignment or file set, except for noise words. Documents must be indexed before the text can be searchable. See also noise word.

indexing files: Files used for content searching.

ingestion: A way to process and add unstructured data sets to the application, such as native files and email messages.

issue: A way to organize documents by associating them to facts, events, matters, topics, or subjects relevant to a case, as defined by the review lead or the case administrator. Issues can have one or more subissues, viewed in a tree structure.

J

job: The highest level of work that can be submitted to the Processing Framework (RPF).

judgmental sample: In predictive coding, a manually selected sample of documents. A judgmental sample can be used as a training set of documents to train a predictive model.

K

key document: In the Map pane, the document at the center of a cluster that has the highest incidence of the associated concept.

keyword: Significant words or phrases in a document. See also concept.

L

level: A way of grouping or organizing documents in folders in the application.

load: An operation that brings files into a case, and catalogs, extracts, and suppresses them.

load file: A file associated with a set of scanned images or electronic files. A load file is used to transfer data from one database to another database. A load file indicates where individual pages or files belong together as documents, any attachments, and where each document begins and ends. It may also contain data relevant to the individual documents, such as metadata, coded data, and text. Load files must be in specific formats to ensure that accurate images of data transfer correctly.

locked production: In productions, the final production that contains all production rendition records and settings, and that has been locked. A locked production cannot be changed. See also unlocked production.

lot: A group of documents that is created when you add documents to a review workflow.

M

mark: A code applied by a human reviewer.

master document: The main document used to autocode duplicate documents. When a user codes a master document's autocoding fields, the coding values are also applied to its duplicate documents.

master/duplicates group: In the hashes feature, documents with the same hash values and source/attachment relationships.

MD5 hash: Originally used in cryptography, an MD5 hash is a 128-bit value created from binary input data. It is now more often used in file identification and validation where a large message has to be compressed in a secure manner before being signed with a private key.

media ID: A unique ID assigned to a set of processed documents that have been loaded or sent for staging.

memo: An alphanumeric field type in the application.

metadata: Information about a file, such as its name, size, type, creation date, or last modified date.

mine: A visual representation of important concepts in documents. Similar documents are clustered together based on the similarity of the concepts in the documents.

N

native file: A file generated in the format of the original application that it was created in.

negative document: In predictive coding, a document that has been marked by a human reviewer with a defined negative code. Also refers to a predicted document that a model has scored below the user-defined threshold for an applied code.

noise word: An insignificant word that occurs with such frequency that it is not useful for searching. For example, but or if.

non-native data: An image, document, or other data that does not need to go through native file processing.

note: Information a reviewer can associate with a document, transcript, person, organization, issue, level, list, or chronology, either for their own reference or to share with another reviewer.

O

OCR: Optical character recognition. A method for converting text contained in image files into a searchable format.

OCR text: The text file created after running OCR software on an image.

one-to-many field: A field that has more than one value.

one-to-one field: A field that has only one value.

organization (case): In a case, the company or institution that is associated with a person. A person's organization is typically extracted from the domain in a person's email address.

organization (portal): In a portal, a set of administrators, users, and cases that share a portal but remain separate. Multiple organizations can securely use the same portal while keeping all of their data and resources private.

output path: In productions, the location in the repository where the produced files are created.

P

page: In productions, a single image equivalent to one sheet of paper. A document can have one or more pages.

page annotations: In productions, the changes, additions, or editorial comments made or applied to a document (usually an electronic image file) using redactions and highlights.

phase: A sublevel of a workflow with a specific purpose, and which includes specific documents and can be associated with validation criteria. Phases can have multiple levels to facilitate multilevel reviews by multiple review teams. Phases are assigned to teams and include assignments that are intended for individual reviewers.

pivot (threading): In email threading, a document that contains any unique content not contained in any other document in the thread. Examples of unique content include the body text, attachments, and recipients. Documents that cannot be thread analyzed are also marked as pivots.

populate hashes: In the hashes feature, the process that writes each document's hash value to a field in the database and applies the master document's coding to all the duplicate documents.

population: A static set of documents from which a representative sample is taken.

portal: The administrative interface in the application that is used to manage resources like users, cases, organizations, physical servers, and processing jobs.

positive document: In predictive coding, a document that has been marked by a human reviewer with a defined positive code. Also refers to a predicted document that a model has scored at or above the user-defined threshold for an applied code.

precision: In predictive coding, the percentage of documents with positive predicted scores that received positive marks from the reviewers. The higher the precision, the fewer documents are incorrectly identified as positive. For example, if the model’s prediction identifies 800 true positive documents, but also identifies an additional 800 false positive documents, the prediction's precision is 800 out of 1,600, or 50%.

predictive model: In predictive coding, a model that is trained using the human reviewers’ marks on a set of training documents, and that maps the marks to the weighted characteristics of those documents. You can then use this model to predict codes for unmarked documents in a target population.

privilege: A legal principle that protects certain types of communications.

Processing Framework (RPF): The application system used for processing large volumes of data.

produce: The process of delivering to another party the documents that are deemed responsive to a discovery request, or making them available for that party’s review.

produced document label: In productions, information that appears on each page of a produced document. This information is required and is unique for each document in a production. This label is used as the document ID in the produced document load file and can be used to name a folder for each document in the output structure. See also produced page label.

produced page label: In productions, the label that appears on each page in a production. This label increments by page and must be unique for each page contained in a document. The produced page label is used as the image file name for imaged documents and native files. See also produced document label.

production: An operation that creates PDF or TIFF files from reviewed documents in response to a request for production. Production numbers and branding text are applied during a production.

production name: In productions, the name used to identify a production within the application. The name is mandatory and must be unique within a case.

production number: Historically called a Bates number, the production number is unique for each page that is produced. All parties in a matter can reference the production number to identify a page.

Q

quick code: A color-coded value that is associated with a coding field. Quick codes are a type of pick list. See also family code.

R

recall: In predictive coding, the percentage of documents that reviewers marked as positive that also received predicted positive scores from the model. The higher the recall, the lower the proportion of positive documents that the model’s prediction missed. For example, if 1,000 documents out of a population of 5,000 are positive, and predictive coding identifies as positive 850 of those 1,000 documents, the model’s prediction has a recall of 85%.

redaction: An opaque mask that conceals a portion of an image or document to prevent disclosure of information. Redactions are often applied to protect privileged content or to avoid the production of irrelevant content that may contain confidential, sensitive, or proprietary information.

redaction label: Text placed over a redaction.

rendition: Copies, or alternate versions, of a document. In the context of productions, a rendition is a produced document (also referred to as a production rendition) with all of the associated metadata and annotations. Users can perform standard application functions with production renditions, such as searching, viewing (in read-only mode), printing, and exporting. See also base document.

repository: See file repository.

review lead: A user who manages the review process and who works with litigation support to create and allocate assignments.

reviewer: A user who analyzes documents for facts relating to a case, applies highlights and redactions, and codes a document, such as marking a document as privileged.

RPF: See Processing Framework.

S

sample: A representative subset of documents from a population. You can use known information about a sample to infer information about the population that the sample was drawn from.

saved search: Frequently used search criteria that have been saved for reuse. Saved searches can be used as validation criteria.

score: In predictive coding, a number between -1 and +1 that the model gives each document during a coding prediction. Scores near -1 or +1 are stronger predictions. Scores near 0 are weaker, less certain predictions.

search file: A text file that can be loaded into the application to enable large-scale document searches.

search term family: A group of search terms that are related within a query.

security override binder: A binder with security settings that override the security that is set on the documents elsewhere.

set-aside clusters: In the Map pane, containers for storing coded documents. The color of the set-aside cluster indicates the coding value.

slip sheet: A blank sheet that is generated when all or a portion of a document is not produced. See also placeholder.

sort order settings: In productions, the specified sort order of a locked production that is necessary to provide before locking a production.

source document: A document, such as an email message, to which other documents are attached. See also attachment.

spine: In the Map pane, a line that connects clusters that have one or more significant concepts in common.

spine label: In the Map pane, a word or phrase that appears around the concept compass. It indicates the name of the main concept that ties clusters together along a spine.

stage: In the Processing Framework (RPF), a unit of work composed of zero or more tasks. The same type of worker performs all the tasks in a stage. If a stage has multiple tasks, you can distribute those tasks to multiple supervisors at the same time. See also worker.

starting number: In productions, the first in a series of production numbers assigned to a production set. The starting number is used in conjunction with the produced document label and the produced page label. See also produced document label and produced page label.

stem words: Words that share the same stem, or root, as a search term. For example, if "apply" is a search term, stemmed words include "applied" and "applies."

stopped concept: A concept that is prevented from appearing in mines and in the Map pane.

supervisor: A service that produces workers to perform tasks. The supervisor communicates with the coordinator to provide status updates and to retrieve new tasks. See also worker.

suppress: Withdrawing a file from further processing or review. A file may be suppressed because it is a known file, a container, or an exact-duplicate or near-duplicate. Suppressed files are not physically removed or deleted.

sweeping: In the Map pane, moving documents from the concept clusters to the set-aside clusters. See also set-aside cluster.

system field: A default field that is included in the application installation. The application uses system fields for content extraction, case processing, and other processing functions.

T

target population: In predictive coding, a static set of documents for which you want to generate predicted codes.

task: The smallest unit of work that the Processing Framework (RPF) can process. Each task belongs to a single stage and is processed by a single worker. See also worker and stage.

team: A group of users who can be assigned to phases in a workflow.

thread: A group of documents with the same normalized title and contextually similar body. A thread includes the original document and all subsequent replies pertaining to the original.

thread analysis: The process of comparing and classifying documents into threads.

threshold: In predictive coding, the user-defined dividing line that separates positive documents from negative documents. Positive documents have a score greater than or equal to the threshold score. Negative documents have a score less than the threshold score.

training set: In predictive coding, a set of documents that is used to train a predictive model. The set can be created from a random sample of a population, from source documents selected by active learning, or from a manually assembled judgmental sample of documents. The training set must be reviewed by a human before you train or re-train the model.

transcript: A written record of testimony in a court, hearing, deposition, or other legal proceeding.

transcript annotation: A highlight or note made to a transcript. A transcript highlight is also called a transcript issue.

true negatives: In predictive coding, documents that the model predicted with a negative code and that the reviewer marked negative. See also mark and score.

true positives: In predictive coding, documents that the model predicted with a positive code and that the reviewer marked positive. See also mark and score.

U

unlocked production: A production that has not been finalized (locked) or that was locked and then unlocked. See also locked production.

V

validation criteria: Coding rules that a document in an assignment must meet before it can clear. Validation criteria are created from saved searches.

validation report: In predictive coding, a report that records the final results of a prediction and its applied code. A validation report can be used to document and defend the predictive coding process.

validation sample: A sample that is created and reviewed at the end of the predictive coding process to make a defensible evaluation of the performance of the model against the population.

Variable Builder: A tool in the application that creates labels for a production or load file template.

W

worker: A component that performs work on a single task. The coordinator assigns each task to a single supervisor. See also supervisor.

workflow: A collection of phases that facilitates the review by routing assignments to reviewers on teams.

workspaces: The arrangement of the panes and features on the Documents page. You can customize the application by using the default workspaces or by creating new workspaces.

Y

yield: In predictive coding, the ratio of eliminated false positive documents to additional training documents that were added in different versions of a predictive model.