KatPadi's Point

Autocomplete Using Redis

Usually, when we implement suggestions in autocomplete input fields, we query directly to our database. Relatively, LIKE queries in MySQL are slower. So why not autocomplete using Redis?

Disclaimer: This is a basic Redis solution for “just-trying-it-out” purposes only.

Motivation

As I’ve mentioned, I got annoyed by the slow autocomplete results of an app that I have been playing with so I browsed the internet for some NoSQL solutions, specifically Redis. Usually, apps have Redis set up already so it would be nice to just leverage it for a basic autocomplete solution.

 

Implementation

One of the strategies that I tried to implement is based on the idea that I need to store each partially typed character as a sorted set in Redis.

    def populate(query_scope)
      query_scope.find_each do |item|
        name = item.label
        json_data = format_data(item)
        name.length.times do |n|
          key = name[0, n+1]
          $redis.zadd "kupsearch:#{key.downcase}", 1, json_data
        end
      end
    end

Assuming that I have DB data, the populate method above will insert Redis keys that would help implement autocomplete.

For example I have the following data in my database:

  • tomato
  • toaster
  • toothpick

Generally, the number of Redis keys that will be created will be the number of characters each string has. So in tomato’s case, it’s 6.
The method populate will insert tomato’s 🍅 keys as:

t

to

tom

toma

tomat

tomato

Uhm, get it?

On tomato’s initial iteration,  the key would be set to the string “t” since we’re extracting the characters from 0 to n+1 (n will start from 0 so, just +1). Then, the data to be set will be formatted to JSON so we can store not just one value. This is necessary if you need to access more than one attribute from your ActiveRecord model when in using the JS for autocomplete.

Then, the Redis ZADD command is used to create a sorted set called kupsearch:t since the variable key is set to “t”.  It will be inserted into the sorted set kupsearch:t having the formatted to JSON data.

 

SUBJECT: Tomato
FORMATTED DATA: {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 1st ITERATION ====
Key per character iteration: t
The actual ZADD: zadd kupsearch:t 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 2nd ITERATION ====
Key per character iteration: to
The actual ZADD: zadd kupsearch:to 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 3rd ITERATION ====
Key per character iteration: tom
The actual ZADD: zadd kupsearch:tom 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 4th ITERATION ====
Key per character iteration: toma
The actual ZADD: zadd kupsearch:toma 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 5th ITERATION ====
Key per character iteration: tomat
The actual ZADD: zadd kupsearch:tomat 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

======= 6th ITERATION ====
Key per character iteration: tomato
The actual ZADD: zadd kupsearch:tomato 1 {\"id\":3,\"label\":\"Tomato\",\"type\":\"Goods\"}

This process will continue until the whole word is inserted as a key- kupsearch:tomato. And then, it will move on to the next data which is “toaster”.

Note: Ideally, namespaces must be meaningful so don’t use kupsearch in real life.

Retrieving

Data retrieval is easy. The sample method that I created below shows a rough implementation of it. Note that I parsed the results first because I saved the data in JSON format.

    def results_for(term, max = 5)
      return [] if term.blank?
      results = $redis.zrevrange "kupsearch:#{term.downcase}", 0, max-1
      results.map { |result| JSON.parse(result, symbolize_names: true) }
    end
results_for

So, the basic idea is to use Redis’ ZREVRANGE command. This will return the result set in reversed order which is descending– values with higher scores will be returned first.

As you may have noticed in the populate method, I was just inserting 1 as the score. Ideally, if you want to keep track of users’ hits as they enter the search key, you would increment the score of that key. When you do this, you will be able to return the results that are usually searched first before the unpopular ones. To increment, you can use the ZINCRBY command.

Improvement

~99%

I benchmarked the LIKE query method against the Redis solution and for obvious reasons, Redis won. It’s an almost 100% improvement at least on 1000 rows of data.

#<Benchmark::Tms:0x007fc385731ce8 @label="", @real=14.029786, @cstime=0.0, @cutime=0.0, @stime=0.3000000000000007, @utime=12.299999999999983, @total=12.599999999999984>
#<Benchmark::Tms:0x007fc383e04158 @label="", @real=0.000284, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>

Percentage Increase: 99.99797573533908

Code I used is something like:

sql_time = Benchmark.measure do
  Product.where('name LIKE ?', 'to%').to_json
end

redis_time = Benchmark.measure do
  AutocompleteSearch.results_for('to')
end

puts "Percentage Increase: #{((sql_time.real - redis_time.real) / sql_time.real ) * 100}"

It’s probably a small sample so don’t count on my performance testing ability. 😂

Caveats & Limitations

  • Searching by ActiveRecord scopes is not handled. This means that if you only want to search and return products starts with “t” and that are fresh,   Product.fresh, Redis does not know that so it will still return all that starts with “t”. There is probably a decent prefixing strategy for that but I haven’t explored it yet.
  • Redis insertions must be maintained when inserting, updating and deleting AR objects.

 

P.S. There are gems that do this already such as soulheart and soulmate.

5 comments for “Autocomplete Using Redis

  1. February 13, 2017 at 2:13 pm

    Good overview of the technical details. Would be nice if you listed the gems that already do this, as you probably found them.

    • February 13, 2017 at 4:11 pm

      Hi Ilmari,

      Thanks for dropping by. I’ve updated the post to link to some gems that do this. 🙂

  2. Oleg
    February 13, 2017 at 10:46 pm

    Or just use postgres full text search.

  3. February 14, 2017 at 7:49 am

    You would be better off using lexicographical indexes as outlined here: https://redis.io/topics/indexes

    This will net fewer keys in the global namespace, better performance, and less memory usage. However, your solution is indeed useful!

Leave a Reply

Your email address will not be published. Required fields are marked *