Fixing Litellm.BadRequestError: Invalid Schema For 'series'
Hey guys, it looks like we've got a bug report on our hands! Let's dive into this litellm.BadRequestError
and see what's causing the issue. It seems like there's an invalid schema for the function 'series', and it's popping up within the context of properties and matches, specifically mentioning a missing 'items' in the array schema. Sounds like a fun puzzle, right?
Discussion category: kagent-dev,kagent
π Prerequisites
Before we get started, the user has helpfully checked off some prerequisites:
- [x] They've searched existing issues to avoid duplicates. Smart move!
- [x] They've agreed to follow our Code of Conduct. We love responsible contributors!
- [x] They're using the latest version of the software. Always a good step.
- [x] They've tried clearing cache/cookies or used incognito mode (if UI-related). Good troubleshooting!
- [x] They can consistently reproduce this issue. Consistency is key for debugging.
π― Affected Service(s)
- App Service
π¦ Impact/Severity
- No impact (Default) β That's a relief! It means things aren't completely broken, but we still need to squash this bug.
π Bug Description
Okay, here's where the meat of the issue is. The user is encountering a litellm.BadRequestError
related to an invalid schema for the 'series' function. They've set up a Prometheus MCP server using this YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-mcp-server
namespace: kagent
labels:
app: prometheus-mcp-server
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-mcp-server
template:
metadata:
labels:
app: prometheus-mcp-server
spec:
containers:
- name: prometheus-mcp-server
image: ghcr.io/tjhop/prometheus-mcp-server:v0.3.0
imagePullPolicy: IfNotPresent
args:
[
"--log.file",
"/dev/stdout",
"--log.level",
"debug",
"--mcp.transport",
"http",
"--prometheus.url",
"http://kube-prometheus-stack-prometheus.kube-prometheus-stack.svc.cluster.local:9090",
]
ports:
- containerPort: 8080
resources:
limits:
cpu: "500m"
memory: "256Mi"
requests:
cpu: "10m"
memory: "64Mi"
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-mcp-server
spec:
selector:
app: prometheus-mcp-server
ports:
- port: 80
targetPort: 8080
type: ClusterIP
They've then added a toolserver:
apiVersion: kagent.dev/v1alpha1
kind: ToolServer
metadata:
name: prometheus
spec:
config:
streamableHttp:
sseReadTimeout: 5m0s
timeout: 5s
url: http://prometheus-mcp-server:80/mcp
type: streamableHttp
description: 'Prometheus MCP Server for metrics and monitoring data'
And created an agent:
apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
name: prometheus
spec:
description: Π°Π³Π΅Π½Ρ Π΄Π»Ρ ΡΠ°Π±ΠΎΡΡ Ρ ΠΏΡΠΎΠΌΠ΅ΡΠ΅Π΅ΠΌ
memory:
- kagent/kagent-memory
modelConfig: default-model-config
systemMessage: |-
You're a helpful agent, made by the kagent team.
# Instructions
- If user question is unclear, ask for clarification before running any tools
- Always be helpful and friendly
- If you don't know how to answer the question DO NOT make things up, tell the user "Sorry, I don't know how to answer that" and ask them to clarify the question further
- If you are unable to help, or something goes wrong, refer the user to https://kagent.dev for more information or support.
# Response format:
- ALWAYS format your response as Markdown
- Your response will include a summary of actions you took and an explanation of the result
- If you created any artifacts such as files or resources, you will include those in your response as well
tools:
- mcpServer:
toolNames:
- exemplar_query
- list_alerts
- list_targets
- metric_metadata
- query
- range_query
- series
- targets_metadata
- alertmanagers
toolServer: kagent/prometheus
type: McpServer
But whenever they ask anything, boom! The litellm.BadRequestError
rears its ugly head.
The key issue here appears to be the schema validation failing for the 'series' function, specifically pointing to a missing 'items' property in an array schema. This suggests that the way the 'series' tool is defined or how its parameters are being handled is causing a mismatch with what the system expects. We need to dig into the schema definition for the 'series' function and see what's going on.
π Steps To Reproduce
To reproduce this, follow these steps:
- Install the MCP server.
- Create the tool server.
- Create the agent.
- Ask the agent something.
Pretty straightforward, which is excellent for debugging!
π€ Expected Behavior
The user expected no error, which is the ideal scenario. We want the agent to respond appropriately to the query, not throw a BadRequestError
.
π± Actual Behavior
Instead of a response, the user gets the dreaded litellm.BadRequestError
. Not cool.
π» Environment
Unfortunately, the user didn't provide environment details. More info here would be helpful.
π§ CLI Bug Report
No CLI bug report was provided.
π Additional Context
No additional context was provided. Every little bit helps, so more context would be great.
π Logs
Here are the logs, which are super helpful! Let's break this down:
litellm.llms.openai.common_utils.OpenAIError: Error code: 400 - {'error': {'message': "Invalid schema for function 'series': In context=('properties', 'matches'), array schema missing items.", 'type': 'invalid_request_error', 'param': 'tools[7].function.parameters', 'code': 'invalid_function_parameters'}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/python/packages/kagent-adk/src/kagent_adk/_agent_executor.py", line 124, in execute
await self._handle_request(context, event_queue, runner)
File "/app/python/packages/kagent-adk/src/kagent_adk/_agent_executor.py", line 188, in _handle_request
async for adk_event in runner.run_async(**run_args):
...<4 lines>...
await event_queue.enqueue_event(a2a_event)
File "/app/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 233, in run_async
async for event in self._exec_with_plugin(
...<2 lines>...
yield event
File "/app/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 274, in _exec_with_plugin
async for event in execute_fn(invocation_context):
...<6 lines>...
yield (modified_event if modified_event else event)
File "/app/python/.venv/lib/python3.13/site-packages/google/adk/runners.py", line 230, in execute
The core error message is: Invalid schema for function 'series': In context=('properties', 'matches'), array schema missing items.
This is our smoking gun! It clearly points to an issue with the schema definition for the 'series' function, specifically related to an array schema that's missing the 'items' property.
The traceback shows the error originating from kagent_adk/_agent_executor.py
, which suggests the problem lies within the agent execution logic, particularly when handling function calls. The stack trace indicates that the error occurs during the run_async
process, further narrowing down the issue to the asynchronous execution of the agent's tasks.
The error message tools[7].function.parameters
indicates that the issue is with the 7th tool in the tools list (if the list is 0-indexed), which corresponds to the series
tool in the provided agent configuration. This confirms that the problem is specifically related to how the parameters for the series
tool are being defined and validated.
π· Screenshots
No screenshots were provided.
π Are you willing to contribute?
- [ ] The user is willing to submit a PR to fix this issue. Awesome! We appreciate the help.
Next Steps
Okay, team, we've got a solid bug report here. The key takeaway is the Invalid schema
error for the 'series' function. Here's what we need to do:
- Inspect the 'series' function schema: We need to find where the schema for the 'series' function is defined and examine it closely. Look for any array-typed properties that might be missing the
items
keyword. - Validate the schema: Use a schema validator to confirm that the schema is indeed invalid. This will give us a clear error message and help pinpoint the exact location of the problem.
- Correct the schema: Add the missing
items
property to the array schema, ensuring it correctly describes the type and structure of the array elements. - Test the fix: After correcting the schema, reproduce the steps to verify that the error is resolved and the agent can successfully execute queries involving the 'series' function.
Let's get this bug squashed! π
Deep Dive into the litellm.BadRequestError: OpenAIException and Invalid Schema for Function 'series'
Guys, let's get serious about this litellm.BadRequestError
. This error, specifically pinpointing an invalid schema for the 'series' function, is a real head-scratcher. It's crucial to understand the nuances of schema validation and how it impacts the interaction between our agent and the underlying language model. This isn't just about fixing a bug; it's about ensuring the robust functionality of our system.
The error message Invalid schema for function 'series': In context=('properties', 'matches'), array schema missing items
provides us with a treasure trove of information. The error message indicates that the issue lies within the schema definition of the function 'series'. The phrase In context=('properties', 'matches')
suggests that the error is occurring during the validation of a schema property that involves matching certain conditions or patterns. The core of the problem is the phrase array schema missing items
. This implies that somewhere in the schema definition for the 'series' function, there is an array-typed property that lacks the items
keyword. In JSON Schema, the items
keyword is essential for defining the schema of elements within an array. Without it, the schema validator doesn't know what type of elements to expect in the array, leading to the validation failure. The error code 400
is a standard HTTP status code indicating a bad request, further confirming that the issue is with the input provided to the OpenAI API, specifically the function schema.
Understanding the Significance of JSON Schema Validation
Before diving deeper, let's underscore why JSON Schema validation is critical. In our context, we're using JSON Schema to define the structure and data types of inputs and outputs for functions that our agent can call. This is vital for several reasons:
- Data Integrity: Ensures that the data passed to functions is in the correct format, preventing runtime errors and unexpected behavior. Imagine passing a string where an integer is expected β chaos would ensue!
- API Contract: Acts as a contract between our agent and external tools or services. It clearly defines what data is expected and what will be returned, reducing ambiguity and improving interoperability.
- Error Prevention: Catches errors early in the development process, rather than at runtime. This makes debugging easier and reduces the likelihood of production issues.
- Documentation: Provides a clear and machine-readable description of the data structure, which can be used to generate documentation and client libraries.
Dissecting the YAML Configurations and Identifying the Root Cause
To get to the bottom of this, let's dissect the YAML configurations provided by the user. We have three key configurations:
- Prometheus MCP Server Deployment and Service: This setup seems fine. It defines a deployment and service for the Prometheus MCP server, which acts as a bridge between our agent and Prometheus metrics. The crucial part here is the
prometheus.url
argument, which points to the Prometheus instance. We need to ensure this URL is correct and accessible from within the cluster. - ToolServer Definition: This configuration defines the
ToolServer
resource, which tells our agent how to interact with the Prometheus MCP server. ThestreamableHttp
configuration specifies the URL for the MCP endpoint and timeouts. Pay close attention to the URL (http://prometheus-mcp-server:80/mcp
). If this URL is incorrect or the service is not reachable, we might see connection-related errors. However, the current error points to a schema validation issue, so the URL is likely not the primary culprit, but still needs to be verified. - Agent Definition: This is where the magic happens and also where the problem lies. The
Agent
definition specifies the agent's behavior, memory, model configuration, and, most importantly, the tools it can use. Thetools
section defines theMcpServer
tool with a list oftoolNames
, includingseries
. This is the function that's causing our headache. We must examine the schema definition for this 'series' tool within the context of theMcpServer
and identify why theitems
keyword is missing.
The Prime Suspect: The 'series' Tool Schema
Given the error message and the context, our prime suspect is the schema for the 'series' tool. We need to locate where this schema is defined. It could be:
- Implicitly defined by the
litellm
library: If thelitellm
library automatically generates the schema based on the function signature, there might be a bug in how it handles array-typed parameters for the 'series' function. - Explicitly defined in our codebase: We might have a custom schema definition for the 'series' function that's missing the
items
keyword. - Defined in the Prometheus MCP server: The MCP server might be providing a schema definition that's not fully compliant with JSON Schema standards.
To pinpoint the source, we need to:
- Inspect the
litellm
library: If we suspectlitellm
is the culprit, we need to examine its code to see how it generates schemas for function calls. We should look for any special handling of array types and ensure theitems
keyword is correctly included. - Search our codebase: We need to search our codebase for any explicit schema definitions for the 'series' function. This might involve looking in configuration files, data models, or any other place where schemas are defined.
- Investigate the Prometheus MCP server: We should check the documentation or source code of the Prometheus MCP server to see how it defines the schema for the 'series' function. If the server is providing the schema, it might be the source of the issue.
Debugging and Testing Strategies
Once we've located the schema definition, we need to debug and test our fix. Here are some strategies we can use:
- Schema Validation: Use a JSON Schema validator to validate the schema. This will give us a clear error message if the schema is invalid and help us pinpoint the exact location of the problem. There are many online validators and libraries available for this purpose.
- Unit Tests: Write unit tests to verify that the 'series' function schema is correctly generated and that the agent can successfully call the function with valid inputs. We should also test with invalid inputs to ensure the schema validation works as expected.
- Integration Tests: Perform integration tests to ensure that the agent can interact with the Prometheus MCP server and correctly execute queries involving the 'series' function. This will help us catch any issues related to the interaction between different components of our system.
- Logging and Monitoring: Add detailed logging to the schema generation and validation process. This will help us track down any issues that occur in the future. We should also monitor the system for schema validation errors and alert us if any occur.
The Importance of Thorough Investigation and Collaborative Debugging
This litellm.BadRequestError
is a great example of how complex debugging can be. It requires a deep understanding of JSON Schema validation, the interaction between different components of our system, and the inner workings of the litellm
library. To solve this efficiently, we need to thoroughly investigate the issue, collaborate with the user who reported the bug, and leverage our collective knowledge and expertise.
By methodically dissecting the error message, analyzing the configurations, and employing a range of debugging and testing strategies, we can crack this nut and ensure the stability and reliability of our agent. Let's roll up our sleeves and get to work! πͺ
The Final Push: Resolving the Invalid Schema for Function 'series'
Alright, team, we've dug deep into this litellm.BadRequestError
, and it's time to put our findings into action. We've established that the core issue revolves around an invalid schema for the 'series' function, specifically the missing items
keyword in an array schema. Now, the real work begins β identifying the exact location of the problematic schema and implementing the fix.
Recap of Our Investigation
Before we jump into the solution, let's quickly recap our investigation:
- The Error:
litellm.BadRequestError: OpenAIException - Invalid schema for function 'series': In context=('properties', 'matches'), array schema missing items.
- The Culprit: The schema for the 'series' function is missing the
items
keyword for an array-typed property. - Potential Locations:
- Implicitly defined by the
litellm
library. - Explicitly defined in our codebase.
- Defined in the Prometheus MCP server.
- Implicitly defined by the
- Debugging Strategies:
- Schema validation.
- Unit tests.
- Integration tests.
- Logging and monitoring.
Pinpointing the Exact Location of the Schema
Based on our investigation, the most likely culprit is the way the schema for the 'series' function is being generated, either within our code or by the litellm
library. Here's a step-by-step approach to find the exact location:
- Search Our Codebase: Start by searching our codebase for any explicit definitions of the 'series' function schema. Look for keywords like
series
,schema
,function
, andparameters
. Pay close attention to any data structures or configuration files that define the structure of function calls. - Inspect the
litellm
Library: If we can't find any explicit schema definitions in our codebase, the next step is to dive into thelitellm
library. We need to understand howlitellm
generates schemas for function calls. Focus on the parts of the code that handle array-typed parameters. Look for any logic that might be omitting theitems
keyword. This might involve:- Examining the library's documentation and examples to see how it recommends defining function schemas.
- Stepping through the code with a debugger to trace how the schema for 'series' is generated.
- Consulting the
litellm
library's issue tracker or community forums to see if others have encountered similar problems.
- Examine the Prometheus MCP Server (Less Likely): While less likely, it's still worth a quick check of the Prometheus MCP server's documentation or source code to see how it defines the schema for the 'series' function. If the server provides a schema that's being used by our agent, it could be the source of the issue.
Implementing the Fix: Adding the Missing items
Keyword
Once we've located the schema definition, the fix is relatively straightforward: we need to add the missing items
keyword to the array schema. The items
keyword specifies the schema for the elements within the array. For example:
{
"type": "array",
"items": {
"type": "string"
}
}
This schema defines an array where each element is a string. The specific schema for the items
keyword will depend on the type of data expected in the array for the 'series' function. We need to carefully analyze the requirements of the 'series' function and define the items
schema accordingly.
Testing and Validation: Ensuring the Fix Works
After implementing the fix, rigorous testing is essential to ensure that the error is resolved and that we haven't introduced any new issues. Here's a testing plan:
- Schema Validation: Use a JSON Schema validator to validate the corrected schema. This will confirm that we've successfully added the
items
keyword and that the schema is now valid. - Unit Tests: Write unit tests to verify that the schema for the 'series' function is correctly generated. These tests should cover different scenarios, including valid and invalid inputs.
- Integration Tests: Perform integration tests to ensure that the agent can interact with the Prometheus MCP server and successfully execute queries involving the 'series' function. This will verify that the fix works in the context of our full system.
- End-to-End Tests: Run end-to-end tests to simulate real-world usage of the agent and the 'series' function. This will help us catch any subtle issues that might not be apparent in unit or integration tests.
Documentation and Collaboration: Sharing Our Knowledge
Once we've successfully fixed the bug, it's crucial to document our findings and share our knowledge with the team. This will help prevent similar issues from occurring in the future and make it easier to debug related problems. Documentation should include:
- A description of the bug and its root cause.
- The steps we took to identify and fix the issue.
- The corrected schema definition.
- The testing plan we used to validate the fix.
Conclusion: Bug Squashed!
This litellm.BadRequestError
presented a challenging but ultimately rewarding debugging experience. By systematically investigating the error, analyzing the configurations, and applying a range of debugging and testing strategies, we were able to pinpoint the root cause and implement a robust fix. This experience underscores the importance of thorough investigation, collaborative debugging, and a commitment to quality in software development.
Now, let's go squash some more bugs! ππ¨