Skip to main content
Streaming renders tokens as the model produces them, reducing time-to-first-token. Set stream: true on any chat completion request. The response is a sequence of server-sent events.
from openai import OpenAI

client = OpenAI(base_url="https://api.abliteration.ai/v1", api_key=os.environ["ABLIT_KEY"])

stream = client.chat.completions.create(
    model="abliterated-model",
    messages=[{"role": "user", "content": "Write a haiku about streaming"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
Streamed chunks arrive as data: {...}\n\n SSE frames terminated by data: [DONE]. Most SDKs parse this for you.

Tool calls in streams

When the model calls a tool, tool_calls arrives across multiple chunks. Accumulate function.arguments string fragments until the chunk with finish_reason: "tool_calls". See tool calling for a complete example.