Meilisearch Cloud Updates

Meilisearch 1.10.1

Tue, 03 Sep 2024 16:47:51 GMT

Meilisearch 1.10.1 improves search under heavy loads and speeds up documents deletion.

Meilisearch 1.10

Wed, 28 Aug 2024 09:03:52 GMT

Meilisearch 1.10 introduces federated search and locale settings, and paves the way for AI-powered search stabilization.

Meilisearch v1.8.3

Thu, 20 Jun 2024 08:54:41 GMT

Meilisearch 1.8.3 fixes a bug that can lead to memory leaks.

Meilisearch v1.8.2

Tue, 11 Jun 2024 10:38:58 GMT

Meilisearch 1.8.2 fixes a bug causing freezes when received many concurrent search requests.

Meilisearch 1.8.1

Wed, 22 May 2024 15:22:31 GMT

This patch release includes a fix for geo search users.

Upgrade your Meilisearch version from your project’s settings page.

Meilisearch 1.8

Mon, 13 May 2024 13:53:09 GMT

Meilisearch 1.8 brings negative keyword search, improvements in search robustness and AI search, including new embedders.

Meilisearch 1.7

Mon, 18 Mar 2024 12:12:14 GMT

Meilisearch 1.7 stabilizes ranking score details, adds GPU support for Hugging Face embeddings, and integrates the latest OpenAI embedding models.

Meilisearch v1.6

Tue, 16 Jan 2024 09:00:00 GMT

We’re announcing the release of Meilisearch 1.6. Let’s dive into some of the most important changes. You can also view the full changelog on GitHub.

Experimental feature: hybrid search

Meilisearch introduces hybrid search. It combines full-text and semantic search to enhance the accuracy and comprehensiveness of search results. Picture a movie app like where2watch. Now, your users will be able to find those movies they can't quite name but remember the story.

Furthermore, Meilisearch now streamlines the creation of vector embeddings. Choose your preferred embedders and Meilisearch will handle all interactions with external tools for you.

Configuring embedders

You can configure the embedders in your index settings. Select from three types of embedders for your needs:

openAI:

Uses the OpenAI API for computing embeddings
Requires an OpenAI API key for operation

huggingFace:

Enables local computation of embeddings by downloading models from the HuggingFace Hub
Operates on your CPU–not your GPU–which may impact indexing performance

userProvided:

Functions similarly to Meilisearch v1.3, with a key difference: you must define a specific embedder
Allows you to add pre-computed embeddings into your documents. You perform searches using vectors instead of text.

To use hybrid search, define at least one embedder in the index settings:

{
  "embedders": {
    "default": {
      "source":  "openAi",
      "apiKey": "",
      "model": "text-embedding-ada-002",
      "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
    },
    "image": {
      "source": "userProvided",
      "dimensions": 512
    },
    "translation": {
      "source": "huggingFace",
      "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
      "documentTemplate": "A movie titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
    }
  }
}

The documentTemplate field acts as a blueprint for creating your document's embedding. It uses the Liquid template language. While its inclusion is optional, it is highly recommended, especially since embedding models are optimized for concise texts. It keeps only the necessary content, excluding non-essential data like id, and helps in adding context to increase relevancy.

Hybrid search

To perform a hybrid search, use the hybrid field within the POST /index/:index_uid/search route.

{
    "q": "Plumbers and dinosaurs",
    "hybrid": {
        "semanticRatio": 0.9,
        "embedder": "default"
    }
}

embedder: an embedder from the options configured in your index settings.

semanticRatio: a floating value ranging from 0 to 1; 1 is a fully semantic search; 0 is an exact, match-focused full-text search; the default is 0.5 which mixes both methods.

Your control over the semantic ratio directly influences how search results are ranked. A higher semantic ratio shifts the focus towards the context and meaning behind your query, ranking results that are more semantically relevant higher.

On the other hand, a lower semantic ratio increases the weight given to keyword accuracy in the ranking process, bringing results that closely match your specific search terms to the forefront.

Breaking changes in the experimental vector search API

Meilisearch v1.6 introduces some breaking changes in the vector search API.

Previously, you could send vectors without specifying a model. Now, you must define a model in the settings:

"embedders": {
    "default": {
      "source": "userProvided",
      "dimensions": 512
    }
}

Because Meilisearch now supports multiple embedders, it has updated the vector submission format from arrays to JSON objects.

Previews format: “_vectors”: [[0.0, 0.1]]
New format: “_vectors”: {“image2text”: [0.0, 0.1, …]}

For detailed information on these updates, refer to the documentation.

For in-depth technical information, explore the series of articles on Arroy, an open-source repository based on Spotify’s Annoy and developed in Rust. This library, created and maintained by the Meilisearch engine team, specializes in searching for vectors in a space that are near a specified query vector.

Performance optimization

Improved indexing speed

We're thrilled to share a major enhancement in Meilisearch's indexing performance. Our recent tests, including scenarios with frequent and partial document updates, have shown impressive results: a reduction in indexing time by up to 50%, and in some cases, even as much as 75%.

Thanks to our latest optimizations, Meilisearch now stores and pre-computes less data. Additionally, during document updates, it re-indexes or deletes only the necessary data. For instance, in an e-commerce dataset, updating the stock level of a product results in re-indexing just the 'stock' field, rather than the entire product document.

Disk space usage reduction

Meilisearch reduces internal data storage, leading to a more compact database size on your disk. With a dataset of approximately 15Mb, we observed a 40% to 50% reduction in database size.

This enhancement not only reduces the database size, but also improves its stability, making the space savings more evident as the number of documents increases.

New feature: customize proximity precision

To further reduce indexing speed, Meilisearch now allows you to tailor the accuracy of the proximity ranking rule to your specific needs.

The proximity ranking rule is computationally demanding and may lead to longer indexing times. Reducing its accuracy can greatly enhance performance, and in most scenarios, it will not substantially affect the relevancy of the results.

To adjust its impact configure the proximityPrecision setting:

curl \
  -X PATCH 'http://localhost:7700/indexes/books/settings/proximity-precision' \
  -H 'Content-Type: application/json'  \
  --data-binary '{
    "proximityPrecision": "byAttribute"
  }'

The default proximityPrecision setting is byWord, which calculates proximity based on exact word distances.

The byAttribute setting considers words in the same attribute as proximate, regardless of their exact distance.

Using byAttribute can boost the indexing speed, but it might slightly change how relevant the results are. This becomes more noticeable in searches where it's important for words to be close to each other.

For example, when you're looking through song lyrics or long articles, like trying to find 'world war' in a bunch of Wikipedia pages, you might end up with results that contain these words but not necessarily close together or in the desired order. This is also true for phrase searches and for searches involving multi-word synonyms, where the specific combination of words is crucial.

New feature: task queue webhook

Meilisearch now offers a webhook feature to notify a custom URL when an asynchronous task finishes (either succeeds, fails, or gets canceled).

This feature is particularly useful for streamlining workflows, saving you from polling the tasks route.

Set up your webhook at launch using these environment variables:

MEILI_TASK_WEBHOOK_URL=https://mywebsite.com/my-super-webhook?user=1234&number=8

MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER='Bearer 12340987546wowowlolol'

You can also use the respective command-line options.

Once set up, the webhook sends out a payload in JSON Lines (ndjson) format to your specified URL, containing the list of finished tasks:

//POST HTTP request to https://myproject.com/mywebhook?common=people

{"uid":4,"indexUid":"movie","status":"failed","type":"indexDeletion","canceledBy":null,"details.deletedDocuments":0,"error.message":"Index `movie` not found.","error.code":"index_not_found","error.type":"invalid_request","error.link":"https://docs.meilisearch.com/errors#index_not_found","duration":"PT0.001192S","enqueuedAt":"2022-08-04T12:28:15.159167Z","startedAt":"2022-08-04T12:28:15.161996Z","finishedAt":"2022-08-04T12:28:15.163188Z"}
{"uid":5,"indexUid":"movie","status":"failed","type":"indexDeletion","canceledBy":null,"details.deletedDocuments":0,"error.message":"Index `movie` not found.","error.code":"index_not_found","error.type":"invalid_request","error.link":"https://docs.meilisearch.com/errors#index_not_found","duration":"PT0.001192S","enqueuedAt":"2022-08-04T12:28:15.159167Z","startedAt":"2022-08-04T12:28:15.161996Z","finishedAt":"2022-08-04T12:28:15.163188Z"}

Experimental feature: limit the number of batched tasks

To speed up the indexing process, Meilisearch processes similar tasks in large batches. However, excessive queued tasks can occasionally cause crashes or stalls.

To control the number of batched tasks, set the limit at launch using either the command-line argument --experimental-max-number-of-batched-tasks, the MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS environment variable, or the configuration file.

Contributors shout-out

We are really grateful for all the community members who participated in this release.We would like to thank @Karribalu and @vivek-26 for their help with Meilisearch. We also want to send a special shout-out to our SDKs maintainers 🦸

And that’s it for v1.6! This release post highlights the most significant updates. For an exhaustive listing, read the changelog on Github.

Stay in the loop of everything Meilisearch by subscribing to the newsletter. To learn more about Meilisearch's future and help shape it, take a look at our roadmap and come participate in our Product Discussions.

For anything else, join our developers community on Discord.

What's new in v1.5?

Tue, 21 Nov 2023 15:31:18 GMT

We’re excited to announce the release of Meilisearch 1.5. 🚀

This release brings:

Up to 25% faster indexing
On-demand snapshots
Puffin report exports

Learn more in the release notes. 👇

What's new in v1.4?

Tue, 26 Sep 2023 08:34:05 GMT

Let's take a look at some of the most significant changes in Meilisearch's latest update. We’ll go over the main changes in this article, but you can also view the full changelog on GitHub.

New feature: custom text separators

To make string data searchable, Meilisearch relies on separators, as they serve to divide a string into tokens or words. Examples of separators include whitespaces, full-stops, or number signs (#). They play a crucial role in helping Meilisearch segment text effectively and enhance search relevancy.

Meilisearch comes with a predefined list of separators. However, these separators do not suit all use cases. For example, in a hashtag search, the number sign should not be considered a separator, but part of the word.

Starting with v1.4, Meilisearch allows you to customize the separators list to suit your specific needs.

You can configure Separator Tokens from your index’s settings. Both fields accept JSON format. To include separators, add them to the 'separators' field. To exclude certain characters as separators, list them under 'non-separators'.

New feature: custom dictionary

Now, you can add a custom word dictionary to improve Meilisearch's segmentation of specific words. This is particularly useful when working with domain-specific terminology like “Node.js” or proper nouns like “E. E. Cummings”.

You can configure a Custom Dictionary by providing a JSON array in your index’s settings. You can also upload a JSON file as dictionary.

Using custom dictionary with stop words and synonyms

The introduction of a custom dictionary acts as a powerful adjunct to existing features like stopWords and synonyms. Together, they synergize to improve the relevancy of the search results.

Let’s consider a literary database where an author's name may appear in various forms or abbreviations. This leads to fragmented search results, making it challenging for users to find works by that specific author. For example, take the different ways one might search for works by E. E. Cummings. Using the custom dictionary feature along synonyms can standardize these name variations, thereby enhancing the relevancy of search results.

Using synonyms and custom dictionary together, here’s an example of index settings that address this scenario:

{
"dictionary": ["E. E.", "E.E.", "E E"],
"synonyms": {
    "E. E.": [ "E.E.", "E E", "Edward Estlin"],
    "E.E.": ["E. E.", "E E", "Edward Estlin"],
    "E E": ["E. E.", "E.E.", "Edward Estlin"],
    "Edward Estlin": ["E. E.", "E.E.", "E E"]
}

Breaking bug fix: improved filtering with backslashes

In v1.4, we've addressed a longstanding bug that users faced when using backslashes (\) at the end of the filter search parameter expression.

Let’s consider the following documents:

[
  {
    "id": 1,
    "path": "my\\test\\path"
  },
  {
    "id": 2,
    "path": "my\\test\\path\\"
  }
]

Note: The double backslashes in the example are for JSON escaping.

Before v1.4.0, trying to filter on the second document using either of the filters path = "my\\test\\path\\" or path = "my\\test\\path\\\" would lead to errors.

Now, you can use any filter expression with backslashes. Just make sure to escape each \ character in your filter.

Using our example, to filter on the second document successfully, the filter should be written as: path = "my\\\\test\\\\path\\\\".

⚠️ Warning: If you're upgrading from v1.3.X or earlier and have previously used backslashes in filters, be aware that in v1.4.0, the correct filter for the first document should be path = "my\\\\test\\\\path".

Two layers of escaping are applied: : first, escape for JSON, then for the Meilisearch filter. Meilisearch turns \\\\ back to \\, and JSON parsing will result in a single \.

💡 Consider using built-in methods of your programming language to handle backslashes:
- PHP: addslashes() function
- JavaScript: While JS doesn't have a specific method for adding slashes. You can use the replace method, as suggested on StackOverflow

Contributors

We are really grateful for all the community members who participated in this release. We would like to thank: @dogukanakkaya, @JannisK89, and @vivek-26 for their help with Meilisearch. We want to send a special shout out to mmachatschek for his help and involvement with the backlash bug.

Conclusion

And that’s it for v1.4! Remember to check the changelog for the full release notes, and see you next time!

You can stay in the loop by subscribing to our newsletter. To learn more about Meilisearch's future and help shape it, take a look at our roadmap and come participate in our Product Discussions.

For anything else, join our developers community on Discord.