Pezzo provides you with out-of-the-box request caching capabilities. Caching is useful in several scenarios:

  • Your LLM requests are relatively static
  • Your LLM requests take a long time to execute
  • Your LLM requests are expensive

Utilizing caching can sometimes reduce your development costs and execution time by over 90%!

Usage

To enable caching, simply set the X-Pezzo-Cache-Enabled: true header. Here is an example:

const response = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [
    {
      role: "user",
      message: "Hello, how are you?"
    }
  ]
}, {
  headers: {
    "X-Pezzo-Cache-Enabled": true,
  }
});

Cached Requests in the Console

Cached requests will will be marked in the Requests tab in the Pezzo Console:

When inspecting requests, you will see whether cache was enabled, and whether there was a cache hit or miss:

Limitations

Requests will be cached for 3 days by default. This is currently not configurable.