Meilisearch v1.6 - Meilisearch Cloud

We’re announcing the release of Meilisearch 1.6. Let’s dive into some of the most important changes. You can also view the full changelog on GitHub.

Experimental feature: hybrid search

Meilisearch introduces hybrid search. It combines full-text and semantic search to enhance the accuracy and comprehensiveness of search results. Picture a movie app like where2watch. Now, your users will be able to find those movies they can't quite name but remember the story.

Furthermore, Meilisearch now streamlines the creation of vector embeddings. Choose your preferred embedders and Meilisearch will handle all interactions with external tools for you.

Configuring embedders

You can configure the embedders in your index settings. Select from three types of embedders for your needs:

openAI:

Uses the OpenAI API for computing embeddings
Requires an OpenAI API key for operation

huggingFace:

Enables local computation of embeddings by downloading models from the HuggingFace Hub
Operates on your CPU–not your GPU–which may impact indexing performance

userProvided:

Functions similarly to Meilisearch v1.3, with a key difference: you must define a specific embedder
Allows you to add pre-computed embeddings into your documents. You perform searches using vectors instead of text.

To use hybrid search, define at least one embedder in the index settings:

{
  "embedders": {
    "default": {
      "source":  "openAi",
      "apiKey": "<your-OpenAI-API-key>",
      "model": "text-embedding-ada-002",
      "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
    },
    "image": {
      "source": "userProvided",
      "dimensions": 512
    },
    "translation": {
      "source": "huggingFace",
      "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
      "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
    }
  }
}

The documentTemplate field acts as a blueprint for creating your document's embedding. It uses the Liquid template language. While its inclusion is optional, it is highly recommended, especially since embedding models are optimized for concise texts. It keeps only the necessary content, excluding non-essential data like id, and helps in adding context to increase relevancy.

Hybrid search

To perform a hybrid search, use the hybrid field within the POST /index/:index_uid/search route.

{
    "q": "Plumbers and dinosaurs",
    "hybrid": {
        "semanticRatio": 0.9,
        "embedder": "default"
    }
}

embedder: an embedder from the options configured in your index settings.

semanticRatio: a floating value ranging from 0 to 1; 1 is a fully semantic search; 0 is an exact, match-focused full-text search; the default is 0.5 which mixes both methods.

Your control over the semantic ratio directly influences how search results are ranked. A higher semantic ratio shifts the focus towards the context and meaning behind your query, ranking results that are more semantically relevant higher.

On the other hand, a lower semantic ratio increases the weight given to keyword accuracy in the ranking process, bringing results that closely match your specific search terms to the forefront.

Breaking changes in the experimental vector search API

Meilisearch v1.6 introduces some breaking changes in the vector search API.

Previously, you could send vectors without specifying a model. Now, you must define a model in the settings:

"embedders": {
    "default": {
      "source": "userProvided",
      "dimensions": 512
    }
}

Because Meilisearch now supports multiple embedders, it has updated the vector submission format from arrays to JSON objects.

Previews format: “_vectors”: [[0.0, 0.1]]
New format: “_vectors”: {“image2text”: [0.0, 0.1, …]}

For detailed information on these updates, refer to the documentation.

For in-depth technical information, explore the series of articles on Arroy, an open-source repository based on Spotify’s Annoy and developed in Rust. This library, created and maintained by the Meilisearch engine team, specializes in searching for vectors in a space that are near a specified query vector.

Performance optimization

Improved indexing speed

We're thrilled to share a major enhancement in Meilisearch's indexing performance. Our recent tests, including scenarios with frequent and partial document updates, have shown impressive results: a reduction in indexing time by up to 50%, and in some cases, even as much as 75%.

Thanks to our latest optimizations, Meilisearch now stores and pre-computes less data. Additionally, during document updates, it re-indexes or deletes only the necessary data. For instance, in an e-commerce dataset, updating the stock level of a product results in re-indexing just the 'stock' field, rather than the entire product document.

Disk space usage reduction

Meilisearch reduces internal data storage, leading to a more compact database size on your disk. With a dataset of approximately 15Mb, we observed a 40% to 50% reduction in database size.

This enhancement not only reduces the database size, but also improves its stability, making the space savings more evident as the number of documents increases.

New feature: customize proximity precision

To further reduce indexing speed, Meilisearch now allows you to tailor the accuracy of the proximity ranking rule to your specific needs.

The proximity ranking rule is computationally demanding and may lead to longer indexing times. Reducing its accuracy can greatly enhance performance, and in most scenarios, it will not substantially affect the relevancy of the results.

To adjust its impact configure the proximityPrecision setting:

curl \
  -X PATCH 'http://localhost:7700/indexes/books/settings/proximity-precision' \
  -H 'Content-Type: application/json'  \
  --data-binary '{
    "proximityPrecision": "byAttribute"
  }'

The default proximityPrecision setting is byWord, which calculates proximity based on exact word distances.

The byAttribute setting considers words in the same attribute as proximate, regardless of their exact distance.

Using byAttribute can boost the indexing speed, but it might slightly change how relevant the results are. This becomes more noticeable in searches where it's important for words to be close to each other.

For example, when you're looking through song lyrics or long articles, like trying to find 'world war' in a bunch of Wikipedia pages, you might end up with results that contain these words but not necessarily close together or in the desired order. This is also true for phrase searches and for searches involving multi-word synonyms, where the specific combination of words is crucial.

New feature: task queue webhook

Meilisearch now offers a webhook feature to notify a custom URL when an asynchronous task finishes (either succeeds, fails, or gets canceled).

This feature is particularly useful for streamlining workflows, saving you from polling the tasks route.

Set up your webhook at launch using these environment variables:

MEILI_TASK_WEBHOOK_URL=https://mywebsite.com/my-super-webhook?user=1234&number=8

MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER='Bearer 12340987546wowowlolol'

You can also use the respective command-line options.

Once set up, the webhook sends out a payload in JSON Lines (ndjson) format to your specified URL, containing the list of finished tasks:

//POST HTTP request to https://myproject.com/mywebhook?common=people

{"uid":4,"indexUid":"movie","status":"failed","type":"indexDeletion","canceledBy":null,"details.deletedDocuments":0,"error.message":"Index `movie` not found.","error.code":"index_not_found","error.type":"invalid_request","error.link":"https://docs.meilisearch.com/errors#index_not_found","duration":"PT0.001192S","enqueuedAt":"2022-08-04T12:28:15.159167Z","startedAt":"2022-08-04T12:28:15.161996Z","finishedAt":"2022-08-04T12:28:15.163188Z"}
{"uid":5,"indexUid":"movie","status":"failed","type":"indexDeletion","canceledBy":null,"details.deletedDocuments":0,"error.message":"Index `movie` not found.","error.code":"index_not_found","error.type":"invalid_request","error.link":"https://docs.meilisearch.com/errors#index_not_found","duration":"PT0.001192S","enqueuedAt":"2022-08-04T12:28:15.159167Z","startedAt":"2022-08-04T12:28:15.161996Z","finishedAt":"2022-08-04T12:28:15.163188Z"}

Experimental feature: limit the number of batched tasks

To speed up the indexing process, Meilisearch processes similar tasks in large batches. However, excessive queued tasks can occasionally cause crashes or stalls.

To control the number of batched tasks, set the limit at launch using either the command-line argument --experimental-max-number-of-batched-tasks, the MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS environment variable, or the configuration file.

Contributors shout-out

We are really grateful for all the community members who participated in this release.We would like to thank @Karribalu and @vivek-26 for their help with Meilisearch. We also want to send a special shout-out to our SDKs maintainers 🦸

And that’s it for v1.6! This release post highlights the most significant updates. For an exhaustive listing, read the changelog on Github.

Stay in the loop of everything Meilisearch by subscribing to the newsletter. To learn more about Meilisearch's future and help shape it, take a look at our roadmap and come participate in our Product Discussions.

For anything else, join our developers community on Discord.