Semantic search with the inference API

Semantic search helps you find data based on the intent and contextual meaning of a search query, instead of a match on query terms (lexical search).

In this tutorial, learn how to use the inference API workflow with various services to perform semantic search on your data.

Amazon Bedrock <amazon-bedrock.html>
Azure AI Studio <azure-ai-studio.html>
Azure OpenAI <azure-openai.html>
Cohere <cohere.html>
ELSER <elser.html>
HuggingFace <#>
Mistral <#>
OpenAI <#>
Service Alpha <#>
Service Bravo <#>
Service Charlie <#>
Service Delta <#>
Service Echo <#>
Service Foxtrot <#>

ELSER¶

Requirements¶

ELSER is a model trained by Elastic. If you have an Elasticsearch deployment, there is no further requirement for using the inference API with the elser service.

Create an inference endpoint¶

Create an inference endpoint by using the Create inference API:

Create inference example for `ELSER` ¶

PUT _inference/sparse_embedding/elser_embeddings
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  }
}

The task type is sparse_embedding in the path and the inference_id which is the unique identifier of the inference endpoint is elser_embeddings.

You don’t need to download and deploy the ELSER model upfront, the API request above will download the model if it’s not downloaded yet and then deploy it.

Create the index mapping¶

The mapping of the destination index—the index that contains the embeddings that the model will create based on your input text—must be created. The destination index must have a field with the dense_vector field type for most models and the sparse_vector field type for the sparse vector models like in the case of the elser service to index the output of the used model.

Create index mapping for `ELSER` ¶

PUT elser-embeddings
{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "sparse_vector"
      },
      "content": {
        "type": "text"
      }
    }
  }
}

The name of the field to contain the generated tokens. It must be referenced in the inference pipeline configuration in the next step.
The field to contain the tokens is a sparse_vector field for ELSER.
The name of the field from which to create the dense vector representation. In this example, the name of the field is content. It must be referenced in the inference pipeline configuration in the next step.
The field type which is text in this example.