RediSearch

DEMYSTIFIED

RediSearch Demystified

Search Engines

Y U NO

SEARCH ENGINE

Search Engine

software that builds indexes on documents

and answers queries using those indexes

Indexing

Documents

A document is a collection of fields

full text
tag
numeric
geo

Inverted Index

maps keywords to docs

Tokenization

Whitespace
list of words → list | of | words
Punctuation
foo-bar.baz…bag → [foo, bar, baz, bag]

Stop words

Extremely common words

the

and

are

but

for

into

not

such

that

their

then

there

these

they

this

was

will

with

Stop Words

This is a list of words

↓

list words

Stemming

Reduce a word to its simplest form

running

↓

run

English Stemmer

am, are, is → be
abode, abided, abidden → abide
cat, cats, cat’s, cats' → cat

Romance Stemmer

		Fr	Spa	Por	Ita
noun	ANCE	ance	anza	eza	anza
adjective	IC	ique	ico	ico	ico
noun	ATION	ation	ación	ação	azione
adjective	ABLE	able	able	ável	abile

Spa

Por

Ita

noun

ANCE

ance

anza

eza

anza

adjective

ique

ico

noun

ATION

ation

ación

ação

azione

adjective

ABLE

able

ável

abile

Synonyms

{boy, child, baby}

{girl, child, baby}

{man, person, adult}

Tag Fields

Similar to full-text fields but more compact

Query Language

Multi-word phrases: foo bar baz
Exact phrases: "hello world"
Prefix: hel*
Or (union): hello|hallo|shalom|hola
Negation: hello -world

Query Language

Specific fields: @field:hello world
Numeric range: @field:[1 10]
Geo-radius: @field:[-77 39 5 km]
Tags: @field:{tag1 | tag2}
Optional: ~bar

Query Execution

based on chained iterators

hello

↓

read("hello")

hello world

↓

intersect(
	read("hello"),
	read("world")
)

"hello world"

↓

exact_intersect(
	read("hello"),
	read("world")
)

"hello word" foo

↓

intersect(
	exact_intersect(
		read("hello"),
		read("world")
	),
	read("foo")
)

Fuzzy Matching

%%Hamberders%%

↓

Hamburgers

Covfefe?

Phonetic Matching

AIHEOPDERF

AI → I

HEOP → help

D → the

ERF → earth

Double Metaphone

primarily designed for American English names
also encodes most English words well
double encoding for a given word
- likely pronunciation
- optional alternative pronunciation

Double Metaphone

John → JN
Jon → JN
Jawn → JN

Index Partitioning

index split across many partitions by document ID
a partition has complete index of all its documents
query partitions concurrently and merge results
… need search coordinator

Concurrency

OS scheduler ensures all queries get CPU time
While a query is running the rest wait idly
Execution is yielded 5,000 times/sec
Fast queries finish in one go
Slow ones will take many iterations
Allows queries to run concurrently
Same approach for indexing big documents
RediSearch has a thread pool for running concurrent search queries.
When a search request arrives, it gets to the handler, gets parsed on the main thread, and a request object is passed to the thread pool via a queue.
The thread pool runs a query processing function in its own thread.
The function locks the Redis Global lock, and starts executing the query.
Since the search execution is basically an iterator running in a cycle, we simply sample the elapsed time every several iterations (sampling on each iteration would slow things down as it has a cost of its own).
If enough time has elapsed, the query processor releases the Global Lock, and immediately tries to acquire it again. When the lock is released, the kernel will schedule another thread to run - be it Redis' main thread, or another query thread.
When the lock is acquired again - we reopen all Redis resources we were holding before releasing the lock (keys might have been deleted while the thread has been "sleeping"), and continue work from the previous state.

May the search be with you

…ALWAYS