Semantic search with the inference API
Semantic search helps you find data based on the intent and contextual meaning of a search query, instead of a match on query terms (lexical search).
In this tutorial, learn how to use the inference API workflow with various services to perform semantic search on your data.
Amazon Bedrock <amazon-bedrock.html>
Azure AI Studio <azure-ai-studio.html>
Azure OpenAI <azure-openai.html>
Cohere <cohere.html>
ELSER <elser.html>
HuggingFace <#>
Mistral <#>
OpenAI <#>
Service Alpha <#>
Service Bravo <#>
Service Charlie <#>
Service Delta <#>
Service Echo <#>
Service Foxtrot <#>
Cohere¶
The examples in this tutorial use Cohere's embed-english-v3.0
model.
You can use any Cohere model as they are all supported by the inference API.
Requirements¶
A Cohere account is required to use the inference API with the Cohere service.
Create an inference endpoint¶
Create an inference endpoint by using the Create inference API:
PUT _inference/text_embedding/cohere_embeddings
{
"service": "cohere",
"service_settings": {
"api_key": "<api_key>",
"model_id": "embed-english-v3.0",
"embedding_type": "byte"
}
}
- The task type is
text_embedding
in the path and theinference_id
which is the unique identifier of the inference endpoint iscohere_embeddings
. - The API key of your Cohere account. You can find your API keys in your Cohere dashboard under the API keys section. You need to provide your API key only once. The Get inference API does not return your API key.
- The name of the embedding model to use. You can find the list of Cohere embedding models here.
Note
When using this model the recommended similarity measure to use in the dense_vector field mapping is dot_product
. In the case of Cohere models, the embeddings are normalized to unit length in which case the dot_product
and the cosine
measures are equivalent.
Create the index mapping¶
The mapping of the destination index—the index that contains the embeddings that the model will create based on your input text—must be created. The destination index must have a field with the dense_vector
field type for most models and the sparse_vector
field type for the sparse vector models like in the case of the elser service to index the output of the used model.
PUT cohere-embeddings
{
"mappings": {
"properties": {
"content_embedding": {
"type": "dense_vector",
"dims": 1024,
"element_type": "byte"
},
"content": {
"type": "text"
}
}
}
}
- The name of the field to contain the generated tokens. It must be refrenced in the inference pipeline configuration in the next step.
- The field to contain the tokens is a
dense_vector
field. - The output dimensions of the model. Find this value in the Cohere documentation of the model you use.
- The name of the field from which to create the dense vector representation. In this example, the name of the field is
content
. It must be referenced in the inference pipeline configuration in the next step. - The field type which is text in this example.