Streaming renders tokens as the model produces them, reducing time-to-first-token.
Set stream: true on any chat completion request. The response is a sequence of server-sent events.
from openai import OpenAI
client = OpenAI(base_url="https://api.abliteration.ai/v1", api_key=os.environ["ABLIT_KEY"])
stream = client.chat.completions.create(
model="abliterated-model",
messages=[{"role": "user", "content": "Write a haiku about streaming"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.abliteration.ai/v1",
apiKey: process.env.ABLIT_KEY,
});
const stream = await client.chat.completions.create({
model: "abliterated-model",
messages: [{ role: "user", content: "Write a haiku about streaming" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}
curl https://api.abliteration.ai/v1/chat/completions \
-H "Authorization: Bearer $ABLIT_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "abliterated-model",
"messages": [{"role": "user", "content": "Write a haiku about streaming"}],
"stream": true
}'
Streamed chunks arrive as data: {...}\n\n SSE frames terminated by data: [DONE]. Most SDKs parse this for you.
When the model calls a tool, tool_calls arrives across multiple chunks. Accumulate function.arguments string fragments until the chunk with finish_reason: "tool_calls".
See tool calling for a complete example.