Search technology and query syntax

Nuix Discover employs the following two search technologies:

Content searches: Nuix Discover uses dtSearch to index and search the content of files loaded to Nuix Discover (that is, document content).

SQL searches: Nuix Discover uses SQL to search data loaded to fields (hereinafter referred to as fielded data).

Advanced searches constructed on the Search page can simultaneously search both document content and fielded data by using multiple search expressions connected by Boolean operators (that is, AND, OR, NOT).

For clarity, additional explanation is helpful.

Content searches

When you load documents to Nuix Discover either through the Ingestions or Imports features, or through a single file upload, the application stores files associated with those document records on file systems that the application manages, scans, and monitors.

For example, a document record with Document ID ABC-000001 can point to a Microsoft Word file named ABC-000001.docx. When an indexing and enrichment job is executed, Nuix Discover scans the file system and records any file whose name matches the pattern [Document ID].[extension] in the database. These files are referred to as content files. When Nuix Discover creates the document content index, dtSearch extracts text from the content files and indexes that content for searching.

Note: dtSearch does not save extracted text as a separate file. In some instances, users load documents to Nuix Discover and include both a native file and an extracted text file. For instance, if both ABC-000001.DOCX and ABC-000001.TXT are present, Nuix Discover indexes both files for content searching.

SQL searches

Nuix Discover enables searching of fielded data (for example, email subject, file path) with SQL. Nuix Discover field types include yes/no (that is, Boolean), dates, numbers, pick lists, text (up to 256 characters) and memos. Of these field types, pick list, text, and memo fields contain words that users search.

Behind the scenes, Nuix Discover leverages SQL LIKE and SQL CONTAINS methods to search these field types. SQL LIKE is a search method in which a string of characters can be searched within a field. When a SQL LIKE search is executed, Nuix Discover does a real time scan of field values to find matches. Because this search uses a real time scan, SQL LIKE searches can take longer.

In contrast, SQL CONTAINS searches leverage a SQL full text index to find matches. Because SQL CONTAINS searches leverage an index, SQL CONTAINS searches are often faster and permit the use of limited wild cards as described below.

Advanced searches constructed on the Search page are comprised of search expressions.

For example, Document Title is My Document Title is an advanced search expression.

Document Title is the field (data).

is is the search expression operator.

My Document Title is the search term.

A picture containing chart
Description automatically generated

You can use multiple expressions along with Boolean operators such as AND, OR, and NOT. You can also group or nest search expressions to create more sophisticated filters.

A picture containing chart
Description automatically generated

For information about how to construct an advanced search, see Perform an advanced search.

The following sections provide more detail regarding searching in Nuix Discover.

Search operators on the advanced Search page

Search query syntax for Document Content

Regular expressions

Search query error messages and warnings

Search operators on the advanced Search page

The following table describes the search operators used to construct search expressions in an advanced search on the Search page.

Note: The information in the following table does not apply to Document Content contains and Document Content does not contain, which use dtSearch syntax detailed in later sections. The People fields designation applies to People, Identities, and Organizations. Most search expressions apply to a single field (for example, Document Title), although some expressions apply to specific features in Nuix Discover (for example, History or Levels).

Operator

Applies To

Results

has a value

All Fields

Returns documents that are coded.

does not have a value

All Fields

Returns documents that are not coded.

is

Pick Lists

Text

Number

Yes/No

People

Returns documents coded with the search term. One-to-many fields might be coded with additional values.

is not

Pick Lists

Text

Number

Yes/No

People

Returns documents not coded with the search term. The result set includes documents that are not coded to any value in addition to documents coded with other values.

is only

Pick Lists

People

Returns documents that are only coded with the search term.

is any of

Pick Lists

People

Returns documents that are coded with the search term. This operator is used to create multiple search expressions that are grouped together separated by the OR operator.

greater than

Pick Lists

Text

Number

People

Returns documents coded with a value that is alphabetically or numerically greater than the search term.

greater than or equal to

Pick Lists

Text

Number

People

Returns documents coded with a value that is alphabetically or numerically greater than or equal to the search term.

less than

Pick Lists

Text

Number

People

Returns documents coded with a value that is alphabetically or numerically less than the search term.

less than or equal to

Pick Lists

Text

Number

People

Returns documents coded with a value that is alphabetically or numerically less than or equal to the search term.

is like

Pick Lists

Text

Memo

People

Returns documents where the search term is found anywhere in the field value.

A search for Document Title is like pl returns documents with apple in the title.

is not like

Pick Lists

Text

Memo

People

Returns documents where the search term is not found anywhere in the field values. Also returns documents not coded with any value and documents coded with other values.

contains

Pick Lists

Text

Memo

People

Returns documents that satisfy the search expression.

Permissible syntax includes:

Single terms

Single terms with *. This will find values with the word stem.

Phrases (use double quotes around phrases)

Phrases with *. Use double quotes around phrases. Each term in the phrase is considered a stem. For example, “local wine*” will find “locally wined and dined”.

AND, AND NOT, and OR are supported. The AND operator is evaluated before the OR operator. For example, Term A AND NOT Term B OR Term C is logically evaluated as (Term A AND NOT Term B) OR Term C.

AND NOT is not supported as the first part of a CONTAINS search.

does not contain

Pick Lists

Text

Memo

People

Returns documents that do not satisfy the search expression including documents that are not coded with any value. See permissible syntax in the previous row.

begins with

Pick Lists

Text

People

Returns documents where the search term is found at the beginning of the field value.

A search for Document Title begins with apple returns documents with Apple Pie in the title but does not return documents with Red Delicious Apple in the title.

does not begin with

Pick Lists

Text

People

Returns documents where the search term is not found at the beginning of the field value, including documents that are not coded with any value and documents coded with other values.

is between

Pick Lists

Text

Number

Date

Returns documents coded with a value that is between the search terms inclusive.

For text fields and pick list fields, returns documents with a value alphabetically between the two search terms, inclusive of the search terms.

For number fields, returns documents with a value numerically between the two search numbers inclusive.

For date fields, returns documents with a value chronologically between the two search dates inclusive.

not between

Pick Lists

Text

Number

Date

Returns documents coded with a value that is not between the search terms inclusive.

This includes documents that are coded with values outside the range, and documents that are not coded with any value.

on

Date

Returns documents coded with the exact search date. One-to-many fields might be coded with other values.

not on

Date

Returns documents not coded with the exact search date. This includes documents that are coded with no value and documents that are coded with other values.

after

Date

Returns documents coded with a date that is later than the search date.

on or after

Date

Returns documents coded with a date that is later than or equal to the search date.

before

Date

Returns documents coded with a date that is earlier than the search date.

on or before

Date

Returns documents coded with a date that is earlier than or equal to the search date.

find similar to

Document ID

Returns documents that are conceptually similar.

This type of search returns the same results that you receive if you select a document in the List pane, and then select Find similar on the Options menu.

Find similar option in Document ID column

is correspondence between

People

Requires the selection of two people values. Returns documents where one of the people values is an author (FROM) and one of the people is a recipient (TO, CC, BCC). The order of selected names does not impact the result.

is like correspondence between

People

Requires the input of two people search terms. Returns documents where one of the search terms matches an author (FROM) and the other search term matches a recipient (TO, CC, BCC). The order of search terms does not impact the result.

The LIKE search syntax applies such that the search term can appear anywhere in the people value.

contains correspondence between

People

Requires the input of two people search terms. Returns documents where one of the search terms matches an author (FROM) and the other search term matches a recipient (TO, CC, BCC). The order of search terms does not impact the result.

 

The CONTAINS search syntax applies. See the contains entry earlier in this table.

Previous Value Ever Was

Pick Lists

Text

Number

Yes/No

Date Fields

Returns documents where the search term matches how a document was previously coded in the searched field.

 

Search query syntax for Document Content

Document Content is text that is extracted and indexed by dtSearch.

One method to search Document Content is to build an advanced search on the Search page. The available expression operators for Document Content are contains, does not contain, and has a value. This search method uses dtSearch syntax, which provides significant additional options, as described in the following table.

Graphical user interface, text, application
Description automatically generated

A second method to search document content is to use the quick search box on the navigation bar. The quick search box includes the option to search Document Content.

Search for Document Content in the quick search box.

The syntax in the following table applies to Document Content searches. For a full explanation of the functionality in the quick search box, see Perform a quick search.

Document Content search operators

The following table describes the syntax and operators that you can use to query Document Content. Nuix Discover supports Unicode characters.

Note: Because dtSearch does not index punctuation, punctuation is not searchable unless your administrator changes the alphabet file. The same applies to the following characters because they are reserved as operators: %, #, &, ?, =, and ~

Operator

Description

Example query and results

AND

Requires all terms connected by AND to exist.

A search for apple pie AND poached pear finds documents that contain both apple pie and poached pear.

OR

Requires any term connected by OR to exist.

A search for apple OR pear finds documents that contain the word apple, documents that contain the word pear, and documents that contain both words.

NOT

Requires certain terms to not exist.

A search for NOT banana finds documents that do not contain the word banana.

A search for apple AND NOT pear finds documents that contain the word apple and do not contain the word pear.

not w/0

Searches for a word or phrase not in association with another word or phrase. 

A search for Word?? not w/0 Word04 finds documents that include Word01 or Word02 or Word03 but exclude documents that include Word04.

Nuix Discover finds all words in the index that meet the criteria for Word?? and excludes words after the not w/0 proximity operator (in this case, Word04).

NEAR

The word near is treated as a search term, not an operator. Use a proximity search to locate terms that are near each other.

Not applicable.

Words and phrases

Quotation marks are not required when searching a phrase.

Noise words, such as if and the, are treated as any word.

To search for an exact phrase that includes the words and, or, or not, enclose the phrase in quotation marks.

Punctuation inside a search word is treated as a space.

A search for tart apple pie finds tart apple pie but not apple pie.

A search for bill of sale finds documents containing bill, any intervening word, and sale.

A search for "apple and pie" in quotation marks finds documents containing apple and pie but not apple pie.

( )

Use parentheses with searches that have two or more connectors. If you do not use parentheses, dtSearch evaluates OR operators before AND operators.

A search for apple AND (pear OR orange) finds the word apple with either pear or orange. If you do not use parentheses, this search will return the same result.

?

Wildcard that matches any single character.

A search for appl? finds apple and apply.

*

Wildcard that matches any number of characters. Use at the beginning or end of a search term.

A search for *ppl* finds application and supply.

=

Wildcard that matches a single digit.

Use multiple equals signs (=) to find multiple digits.

A search for ==== finds 1234.

~

Stemming search: Finds grammatical variations of a word.

A search for click~ finds clicked and clicking.

%

Fuzzy search is useful to find misspelled words or to search faulty text generated by optical character recognition (OCR).

Each percent sign (%) in a search term represents one incorrect character.

Characters prior to percent signs must match exactly.

A search for capit%al finds capital, capitol, and capita.

A search for int%%ernet finds internet and intranet.

 

x w/n y

Proximity search:

In a content search, x w/n y finds the term (x) within a specified number of words (n) of another term (y).

In a coding search (database search), w/n is treated as a proximity search when the value for n is 50 words or fewer. When the value for n is greater than 50 words, the proximity search is treated as an AND operator (both words exist in the text).

At least one of the two expressions connected by w/n must be a word, a phrase, or a group of words and phrases connected by OR.

The x NOT w/n y operator allows you to search for a term that is not associated with another term.

A search for apple w/5 pear finds apple and pear where apple appears within five words of pear.

A search for (apple and banana) w/5 (pear or cherry) finds documents that contain both apple and banana within five words of either pear or cherry.

A search for apple NOT w/5 pear finds apple, except where apple is within five words of pear.

x pre/n y

Proximity search:

Finds the term (x) within a specified number of words (n) of another term (y). The first term must occur before the second term.

At least one of the two expressions connected by pre/n must be a word, a phrase, or a group of words and phrases connected by OR.

A search for apple pre/5 pear finds documents that contain apple within five words before pear.

A search for (apple and banana) pre/5 (pear or cherry) finds documents that contain both apple and banana within five words before either pear or cherry.

xfirstword

xlastword

Built-in search words that indicate the beginning or end of a document, as follows:

xfirstword: Marks the beginning of a document.

xlastword: Marks the end of a document.

Combine xfirstword and xlastword with proximity operators to limit a search to the beginning or end of a document.

A search for apple w/5 xfirstword finds apple when it appears within five words of the beginning of a document.

 

Regular expressions

Regular expressions are an advanced syntax that allow you to search for a pattern of characters. For example, you can use regular expressions to locate Social Security numbers, credit card numbers, or other text that has a consistent pattern of characters.

To use a regular expression, you must enclose the regular expression in quotation marks and start the expression with ##. For example: "##[abc]"

The following table describes the regular expressions that you can use in search boxes.

Regular expression

Description

Example query and results

\d

Represents one number in the pattern.

A search for "##\d" finds 1 and 2.

\w

Represents one alphanumeric character or underscore in the pattern.

A search for "##\w" finds a and 1.

*

Indicates zero or more numbers, alphanumeric characters, or underscores in the pattern.

A search for "##click\w*" finds click, clicked, and clicking.

+

Indicates one or more numbers, alphanumeric characters, or underscores in the pattern.

A search for "##click\w+" finds clicked and clicking, but not click.

. (period)

Represents any alphanumeric character or symbol.

A search for "##appl." finds apple and appl2.

[abc]

[123]

Indicates a set of characters, one of which must be present. The order of the characters does not matter.

A search for "##d[uo]g" finds dug and dog, but not dig.

[a-z]

[0-9]

Indicates a range of characters.

A search for "##[a-z]" matches any lowercase character.

A search for "##ma[a-e]e" finds made, but not make.

[^abc]

[^123]

[^a-z]

[^0-9]

Indicates any character except the characters in the set or range.

A search for "##c[^u]t" finds cat, but not cut.

(abc|xyz)

Indicates that either group of characters appears.

A search for "##require(d|ment)" finds required and requirement.

?

Indicates that the preceding character appears zero or one times. Enclose the preceding character in parentheses.

A search for "##colo(u)?r" finds color and colour.

{n}

Indicates that the preceding expression occurs exactly n times.

To search for a pattern such as ABC1234, which consists of three letters followed by four digits, search for "##[a-z]{3}[0-9]{4}".

To search for a pattern such as 123-45-6789, which consists of three digits, a hyphen, two digits, a hyphen, and four digits, search for "##\d{3}-\d(2)-\d{4}".

To search for multiple characters in a character set, use the [abc] expression followed by the {n} expression. For example, a search for "##the[resm]{2}" finds words that start with the and contain two of the characters in the set [resm], such as there and these, but not words with one character in the set, such as them, or words with three characters in the set, such as themes.

{n,}

Indicates that the preceding expression occurs a minimum of n times.

To search for a pattern that consists of the letters ABC followed by at least three digits, search for "##abc[0-9]{3,}". This search finds ABC123, ABC1234, ABC12345, and so on.

{n,m}

Indicates that the preceding expression occurs a minimum of n times, and a maximum of m times.

To search for multiple characters in a character set, use the [abc] expression followed by the {n,m} expression.

To search for numbers that are at least one digit and at most three digits, search for "##\d{1,3}". This search finds 8, 12, and 374, but not 2001.

A search for "##the[mres]{1,2"} finds words that start with the and contain one or two characters in the set [mres], such as them and there, but not words with three characters in the set, such as themes.

Regular expression examples

When creating the syntax for regular expressions, keep the following in mind:

To use a regular expression, you must enclose the regular expression in straight double quotation marks and start the expression with ##. For example: "##[abc]".

Do not use curly quotation marks.

To ensure that the syntax works as intended, your administrator must configure and index all characters and symbols in the syntax as letters, including the symbols in the Index as letter column in the following table.

If you are an administrator, see the following section after the table in this topic: "For administrators: Configure characters and symbols in the index as letters."

Use all lowercase letters in your syntax. By default, the application configures the index to be case insensitive unless an administrator changes the indexing options to case sensitive.

Spaces are not searchable characters.

The following table provides examples of search types, the syntax to use, and sample results.

Search type

Syntax

Index as letter (administrator)

Sample results

Social Security Number (SSN) with hyphens

"##((?!000)(?!666)([0-8]\d{2}))\-((?!00)\d{2})\-((?!0000)\d{4})"

- (Hyphen; ASCII Code 45)

777-77-7777

SSN without spaces

"##\d{9}"   777777777

SSN with spaces

"##\d{3}" "##\d{2}" "##\d{4}"  

777 77 7777

Document ID

"##doc\d{8}"   Doc00000001 or DOC00000001
Email "##([\w_\.]+)@([\w_\.]+)\.([\w_\.]{2,6})"

@ (At symbol; ASCII Code 64)

. (Period symbol; ASCII Code 46)

jdoe@domain.com

jane.doe@domain.com

johnsmith@document.gov.edu

Date "##[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}"

/ (Forward slash; ASCII Code 47)

 

08/28/1963

8/28/1963

08/28/63

28/08/1963

Phone number

(US format)

"##([0-9]{3}-)?[0-9]{3}-[0-9]{4}"

- (Hyphen; ASCII Code 45)

867-5309

800-649-2568

Visa

"##4\d{12,15}"  

4321069823745

4321069823745123

Mastercard

"##5[1-5]\d{14}"

 

5490876543456782

5290876543456781

Discover

"##6011\d{12}" OR "##65\d{14}"

 

6011987623543109

6589043928435612

American Express

"##3[4,7]\d{13}"

 

340928374625172

371928172040238

For administrators: Configure characters and symbols in the index as letters

Only system administrators can configure the indexing options for a portal on the Portal Management > Settings > Indexing pages (Indexing: Options, Indexing: Alpha Standard, Indexing: Alpha Extended).

Portal administrators can edit the indexing options for a specific case on the Portal Management > Cases and Servers > Cases > [Case name] > Indexing pages (Indexing: Options, Indexing: Alpha Standard, Indexing: Alpha Extended).

Note: Case indexing options override portal indexing options.

To allow users to use the syntax described in the previous table, administrators must index all characters and symbols in the syntax as letters, including those described in the following table.

Symbol

Description

ASCII Code
- Hyphen 45
@ At symbol 64
. Period 46
/ Forward slash 47

Important: Any changes to the indexing options require you to delete and rebuild the index. After that, you must submit a new Indexing and enrichment job on the Portal Management > Processing > Jobs page. Depending on the size of the index, this can take time.

For more information about portal indexing options, see Work with portal indexing options.

For more information about case indexing options, see Working with case-level indexing options.

Search query error messages and warnings

If a search query contains errors, the application displays a warning message. The following table describes search query error and warning messages, and provides examples of search queries that trigger the messages.

Message text

Example search query that triggers this message

Warning: term contains a leading wildcard. This term may be too broad and may not run successfully.

Note: It is best to not start a search with a wildcard.

*apple

Error: one side of the proximity operator must be a single word, phrase, or series of words and/or phrases separated by or.

(apple AND banana) w/5 (pear AND apricot)

Error: parentheses are not properly balanced.

(apple AND (banana OR pear)

Error: phonic operator # cannot be mixed with a stemming or synonym operator.

#apple~

Error: phonic operator # must be the first character of the word.

apple#

Error: quotes are not properly balanced.

"banana AND "apple pear"

Error: stemming operator ~ can only be placed at the end of the word.

abc~d

Error: synonym operator & must be placed at the end of the word.

abc&d

Error: term contains two or more consecutive boolean operators.

apple AND OR banana

Error: term or clause begins with an operator.

AND apple

Error: term or clause ends with an operator.

apple OR banana OR

Error: the negation operator must be in either the form and not or not w/#.

apple NOT banana

Error: the proximity operator cannot be preceded by and or or.

apple AND w/2 banana

Error: wildcard operators cannot be used with the # (phonic), ~ (stemming), or & (synonym) operators.

#apple*

Warning: the # character is the phonic search operator. # is not searchable as a character.

apple#

Warning: the % character is the fuzzy search operator. % is not searchable as a character.

100%

Warning: the & character is the synonym search operator. & is not searchable as a character.

abc&d

Warning: term contains !. The wildcard operator character is *.

apple!

Warning: term contains ambiguous boolean operators.

apple AND banana OR pear

Warning: term contains ambiguous proximity operators.

apple w/2 banana w/5 pear

Warning: term contains noise word(s) [comma delimited list of noise words]. Search results will include documents that contain any word in this position.

apple on the banana

Warning: the proximity operator must be in the form w/# or pre/#.

apple /5 banana

Warning: the term is missing an operator.

(apple) (banana)

Warning: term contains a regular expression. This term may be too broad and may not run successfully.

"##[a-z][a-z][a-z]"