Cohere
API KEYSโ
import os
os.environ["COHERE_API_KEY"] = ""
Usageโ
LiteLLM Python SDKโ
Cohere v1 API (Default)โ
from litellm import completion
## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere v1 call
response = completion(
model="command-r",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
Cohere v2 APIโ
To use the Cohere v2/chat API, prefix your model name with cohere_chat/v2/:
from litellm import completion
## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere v2 call
response = completion(
model="cohere_chat/v2/command-r-plus",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
Streamingโ
Cohere v1 Streaming:
from litellm import completion
## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere v1 streaming
response = completion(
model="command-r",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True
)
for chunk in response:
print(chunk)
Cohere v2 Streaming:
from litellm import completion
## set ENV variables
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere v2 streaming
response = completion(
model="cohere_chat/v2/command-r-plus",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True
)
for chunk in response:
print(chunk)
Usage with LiteLLM Proxyโ
Here's how to call Cohere with the LiteLLM Proxy Server
1. Save key in your environmentโ
export COHERE_API_KEY="your-api-key"
2. Start the proxyโ
Define the cohere models you want to use in the config.yaml
For Cohere v1 models:
model_list:
- model_name: command-a-03-2025
litellm_params:
model: command-a-03-2025
api_key: "os.environ/COHERE_API_KEY"
For Cohere v2 models:
model_list:
- model_name: command-r-plus-v2
litellm_params:
model: cohere_chat/v2/command-r-plus
api_key: "os.environ/COHERE_API_KEY"
litellm --config /path/to/config.yaml
3. Test itโ
- Cohere v1 - Curl Request
- Cohere v2 - Curl Request
- Cohere v1 - OpenAI SDK
- Cohere v2 - OpenAI SDK
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <your-litellm-api-key>' \
--data ' {
"model": "command-a-03-2025",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <your-litellm-api-key>' \
--data ' {
"model": "command-r-plus-v2",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to cohere v1 model
response = client.chat.completions.create(model="command-a-03-2025", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to cohere v2 model
response = client.chat.completions.create(model="command-r-plus-v2", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
Supported Modelsโ
| Model Name | Function Call |
|---|---|
| command-a-03-2025 | litellm.completion('command-a-03-2025', messages) |
| command-r-plus-08-2024 | litellm.completion('command-r-plus-08-2024', messages) |
| command-r-08-2024 | litellm.completion('command-r-08-2024', messages) |
| command-r-plus | litellm.completion('command-r-plus', messages) |
| command-r | litellm.completion('command-r', messages) |
| command-light | litellm.completion('command-light', messages) |
| command-nightly | litellm.completion('command-nightly', messages) |
Embeddingโ
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
)
Setting - Input Type for v3 modelsโ
v3 Models have a required parameter: input_type. LiteLLM defaults to search_document. It can be one of the following four values:
input_type="search_document": (default) Use this for texts (documents) you want to store in your vector databaseinput_type="search_query": Use this for search queries to find the most relevant documents in your vector databaseinput_type="classification": Use this if you use the embeddings as an input for a classification systeminput_type="clustering": Use this if you use the embeddings for text clustering
https://txt.cohere.com/introducing-embed-v3/
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document"
)
Supported Embedding Modelsโ
| Model Name | Function Call |
|---|---|
| embed-english-v3.0 | embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-light-v3.0 | embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-v3.0 | embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-light-v3.0 | embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-v2.0 | embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"]) |
| embed-english-light-v2.0 | embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"]) |
| embed-multilingual-v2.0 | embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"]) |
Rerankโ
Usageโ
LiteLLM supports the v1 and v2 clients for Cohere rerank. By default, the rerank endpoint uses the v2 client, but you can specify the v1 client by explicitly calling v1/rerank
- LiteLLM SDK Usage
- LiteLLM Proxy Usage
from litellm import rerank
import os
os.environ["COHERE_API_KEY"] = "sk-.."
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]
response = rerank(
model="cohere/rerank-english-v3.0",
query=query,
documents=documents,
top_n=3,
)
print(response)
LiteLLM provides an cohere api compatible /rerank endpoint for Rerank calls.
Setup
Add this to your litellm proxy config.yaml
model_list:
- model_name: Salesforce/Llama-Rank-V1
litellm_params:
model: together_ai/Salesforce/Llama-Rank-V1
api_key: os.environ/TOGETHERAI_API_KEY
- model_name: rerank-english-v3.0
litellm_params:
model: cohere/rerank-english-v3.0
api_key: os.environ/COHERE_API_KEY
Start litellm
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Test request
curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "rerank-english-v3.0",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'