Fixing Devstral Chat Template: No More Double Tags!

Aug 5, 2025 by ADMIN 52 views

Fixing Double `[/INST]` Markers in Devstral Chat Template Responses: A Detailed Guide

Hey everyone! Ever run into those weird formatting issues when chatting with a model? I did, and it turned out to be a case of extra [/INST] markers messing things up. This article dives deep into how we fixed the Devstral chat template to get rid of these pesky double closings, ensuring cleaner and more natural-sounding conversations. Let's break it down!

The Problem: Double `[/INST]` Markers and a Broken Flow

The initial issue was pretty straightforward: the Devstral chat template, responsible for formatting our prompts, was adding an extra [/INST] tag where it shouldn't have been. Think of it like trying to close a door that's already closed – it just doesn't work right, and it confuses things! This led to the model generating responses with those unwanted double markers, which messed up the conversation flow and made the responses look a bit jumbled. The conversation's structure was breaking.

Imagine you're asking the model about the weather, and after the tool call, you get this: [/INST] – it looks like this. The template was closing the user's message, then the model was adding its own closing tag. This caused the double [/INST] issue.

[INST]What's the weather like in Paris?[/INST]I'll help you check the weather in Paris. Let me get that information for you.

[TOOL_CALLS][{"name": "get_weather", "arguments": {"location": "Paris"}}][/INST]

Based on the weather information for Paris:
- **Temperature**: 18°C
- **Condition**: Partly cloudy  
- **Humidity**: 65%
- **Wind**: 12 km/h from the west

The weather in Paris is quite pleasant today with partly cloudy skies and mild temperatures.[/INST]

As you can see, right after the tool call, there's an extra [/INST], and then at the end of the response, we get another one. It's like the model is saying, "I'm done!” twice, which isn't what we wanted. This was happening because the template was closing the user message (the question) even when the model was supposed to be the one responding, which made the output look messy and broke the way the conversations flowed.

Root Cause Analysis: Where the Problem Began

So, where did this issue stem from? After a bit of digging, it turned out the problem was in the chat_templates.py file, specifically in the format_devstral_messages() function. It was a single line of code, but it caused all the trouble!

# PROBLEMATIC CODE (line 92)
prompt += f"[INST]{message.content}[/INST]\n"

This line was always closing the user messages with [/INST], no matter what. This meant that even when the model was supposed to respond, the template was adding its closing tag, leading to the double markers. It was like putting a period at the end of a sentence when the sentence wasn't finished yet. This messed up the conversation's structure and made the model’s output look inconsistent with how other chat templates, such as the Qwen format, were working.

This code treated every user message the same, and it didn't distinguish between a user's question and the model's response. Because of this, the double [/INST] markers appeared. This also made the conversation flow and output seem jumbled, making it harder to read and understand what the model was saying. It was a crucial area needing fixing!

Technical Deep Dive: Best Practices for Chat Templates

Let's get a little technical, shall we? When we're talking about chat templates, there are some best practices that help make sure everything works smoothly. These guidelines ensure the template correctly structures the conversations, which ensures the models generate proper and clean output. The best practice is to close a conversation only when a user’s message and assistant response is complete. Also, it is better to leave the final assistant turn open for the model to finish.

Think of it like a conversation: you want to close the loop when the user has asked their question and the assistant has answered. But you have to keep things open when the model needs to generate a response. Properly using these templates is crucial to make sure the model provides useful, high-quality responses. If these guidelines aren't followed, we end up with extra markers and broken structures. The Qwen format serves as an example of a well-designed chat template.

Here's a quick comparison:

Qwen format (correct):

prompt += "<|im_start|>assistant\n"  # No closing tag - ready for generation

Devstral format (was incorrect):

prompt += f"[INST]{message.content}[/INST]\n"  # Always closed - caused issues

The main difference is that Qwen leaves the assistant’s turn open, ready for generation, while the incorrect Devstral format always closed the user’s message. This simple difference made a huge impact on the structure of the conversation, as it added the unwanted markers and broke the flow.

The Solution: A Smarter Chat Template

To fix the Devstral template, we modified the format_devstral_messages() function. The aim was to make sure the final user message would not be closed. It would stay open, to allow for the model to generate a response. The key changes involved detecting when a user’s message was the last one, and in those cases, to avoid adding the closing [/INST] tag.

# Check if the last message is from user (expecting assistant response)
expecting_response = len(messages) > 0 and messages[-1].role == "user"

for i, message in enumerate(messages):
    # ... other roles ...
    
    elif message.role == "user":
        # Add tools to the last user message if available
        if i == last_user_idx and tools:
            from .tool_calling import get_tool_parser
            parser = get_tool_parser("devstral")
            tools_str = parser.format_tools_for_prompt(tools)
            prompt += f"[AVAILABLE_TOOLS]{tools_str}[/AVAILABLE_TOOLS]\n"

        # Only close with [/INST] if this is NOT the final user message expecting a response
        if i == len(messages) - 1 and expecting_response:
            prompt += f"[INST]{message.content}[/INST]"  # No newline, ready for generation
        else:
            prompt += f"[INST]{message.content}[/INST]\n"

The fix does these things:

Detects the final message: It checks if the last message is from the user. Is this the last question before the model’s response?
Conditional closing: Only the non-final user messages include the [/INST] closing tag.
Proper formatting: The final user message stays open so that the model can generate its response (no extra newline character).

These changes made sure that the template correctly structured the conversation, without adding unnecessary markers.

Expected Results: What We Aimed For

After fixing the template, we wanted these results:

✅ No double [/INST] markers in the responses.
✅ A clean conversation flow, with proper message boundaries.
✅ The template should comply with the chat template standards.
✅ Tool calling should continue to function correctly.

We wanted to make the conversations cleaner, more readable, and more in line with the correct chat template practices. The goal was to improve the conversation flow, the response quality, and the tool calling accuracy.

Testing the Fix: Making Sure It Works

To ensure that the fix worked well, we tested it in a few ways:

Simple conversations: Questions and answers between the user and assistant.
Multi-turn conversations: Conversations where the user and assistant take turns multiple times.
Tool calling scenarios: When the assistant uses tools to answer a question, from user input to tool calls, to results.
Mixed conversations: A mix of all the above.

By testing in these different scenarios, we ensured the fix worked correctly and didn’t introduce new issues.

Related Context: Building on Previous Success

This issue was found after we fixed the tool calling finish_reason detection. The tool calling functionality worked correctly before. The double [/INST] problem was purely a chat template formatting error. It affected the quality and parsing of the responses. The fact that tool calling worked made it easier to find the formatting issues, but the root cause was the template’s structure.

Impact: Improving the User Experience

Severity: Medium (it improved the response quality but didn't break any core function)
Scope: Devstral/Mistral models only (Qwen format was already correct)
User Experience: Improved conversation flow and readability

This fix significantly improved the user experience by making the conversations more natural. The removal of the double markers and the improved formatting ensures that the interactions are more natural and less jarring, which means the responses are better quality and more user-friendly. This makes the model feel more intuitive, making the interaction feel better.

That's the story of how we fixed the double [/INST] markers! Hope you found it interesting. Cheers!