Documentation Index
Fetch the complete documentation index at: https://docs.pezzo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Pezzo provides you with out-of-the-box request caching capabilities. Caching is useful in several scenarios:
- Your LLM requests are relatively static
- Your LLM requests take a long time to execute
- Your LLM requests are expensive
Utilizing caching can sometimes reduce your development costs and execution time by over 90%!
To enable caching, simply set the X-Pezzo-Cache-Enabled: true header. Here is an example:
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{
role: "user",
message: "Hello, how are you?"
}
]
}, {
headers: {
"X-Pezzo-Cache-Enabled": true,
}
});
chat_completion = openai.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Tell me 5 fun facts about yourself",
}
],
headers={
"X-Pezzo-Cache-Enabled": "true"
}
)
Cached Requests in the Console
Cached requests will will be marked in the Requests tab in the Pezzo Console:
When inspecting requests, you will see whether cache was enabled, and whether there was a cache hit or miss:
Limitations
Requests will be cached for 3 days by default. This is currently not configurable.