RediSearch

DEMYSTIFIED

RediSearch Demystified

redisearch logo

Search Engines

search engines

Y U NO

SEARCH ENGINE

Search Engine

software that builds indexes on documents

and answers queries using those indexes

Indexing

Documents

A document is a collection of fields

  • full text

  • tag

  • numeric

  • geo

Inverted Index

maps keywords to docs

Tokenization

  • Whitespace
    list of wordslist | of | words

  • Punctuation
    foo-bar.baz…​bag[foo, bar, baz, bag]

Stop words

Extremely common words

a

is

the

an

and

are

as

at

be

but

by

for

if

in

into

it

no

not

of

on

or

such

that

their

then

there

these

they

this

to

was

will

with

Stop Words

This is a list of words

list words

Stemming

Reduce a word to its simplest form

running

run

English Stemmer

  • am, are, isbe

  • abode, abided, abiddenabide

  • cat, cats, cat’s, cats'cat

Romance Stemmer

FrSpaPorIta

noun

ANCE

ance

anza

eza

anza

adjective

IC

ique

ico

ico

ico

noun

ATION

ation

ación

ação

azione

adjective

ABLE

able

able

ável

abile

Synonyms

{boy, child, baby}

{girl, child, baby}

{man, person, adult}

Tag Fields

Similar to full-text fields but more compact

Query Language

  • Multi-word phrases: foo bar baz

  • Exact phrases: "hello world"

  • Prefix: hel*

  • Or (union): hello|hallo|shalom|hola

  • Negation: hello -world

Query Language

  • Specific fields: @field:hello world

  • Numeric range: @field:[1 10]

  • Geo-radius: @field:[-77 39 5 km]

  • Tags: @field:{tag1 | tag2}

  • Optional: ~bar

Query Execution

based on chained iterators

hello

read("hello")
hello world

intersect(
	read("hello"),
	read("world")
)
"hello world"

exact_intersect(
	read("hello"),
	read("world")
)
"hello word" foo

intersect(
	exact_intersect(
		read("hello"),
		read("world")
	),
	read("foo")
)

Fuzzy Matching

%%Hamberders%%

Hamburgers

Covfefe?

Phonetic Matching

IHEOPDERF

AIHEOPDERF

AII

HEOPhelp

Dthe

ERFearth

Double Metaphone

  • primarily designed for American English names

  • also encodes most English words well

  • double encoding for a given word

    • likely pronunciation

    • optional alternative pronunciation

Double Metaphone

  • John → JN

  • Jon → JN

  • Jawn → JN

Index Partitioning

  • index split across many partitions by document ID

  • a partition has complete index of all its documents

  • query partitions concurrently and merge results

  • …​ need search coordinator

redisearch coordinator

Concurrency

redisearch concurrency

May the search be with you

…ALWAYS