Pezzo provides you with out-of-the-box request caching capabilities. Caching is useful in several scenarios:

  • Your LLM requests are relatively static
  • Your LLM requests take a long time to execute
  • Your LLM requests are expensive

Utilizing caching can sometimes reduce your development costs and execution time by over 90%!


To enable caching, simply set the X-Pezzo-Cache-Enabled: true header. Here is an example:

const response = await{
  model: "gpt-3.5-turbo",
  messages: [
      role: "user",
      message: "Hello, how are you?"
}, {
  headers: {
    "X-Pezzo-Cache-Enabled": true,

Cached Requests in the Console

Cached requests will will be marked in the Requests tab in the Pezzo Console:

When inspecting requests, you will see whether cache was enabled, and whether there was a cache hit or miss:


Requests will be cached for 3 days by default. This is currently not configurable.