Chat Completions
POST /v1/chat/completions
與 OpenAI Chat Completions API 100% 相容,並透過 vecstruct 擴充欄位支援 RAG、Memory 等功能。
請求 Headers
POST /v1/chat/completions
Authorization: Bearer sk-your-api-key
Content-Type: application/json
請求 Body
{
"model": "openai/gpt-4o",
"messages": [
{ "role": "system", "content": "你是一個客服助手" },
{ "role": "user", "content": "退款需要幾天?" }
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false,
"vecstruct": {
"project_id": "proj-uuid",
"rag": true,
"rag_top_k": 5,
"use_memory": true,
"metadata": {
"user_id": "user-123"
}
}
}
請求欄位
標準 OpenAI 欄位:
| 欄位 | 類型 | 必填 | 說明 |
|---|---|---|---|
model | string | ✓ | 模型 ID,格式為 provider/model-name |
messages | array | ✓ | 對話訊息列表 |
temperature | number | 0.0 – 2.0,預設 1.0 | |
max_tokens | number | 最大輸出 Token 數 | |
stream | boolean | 是否串流回應,預設 false | |
top_p | number | Nucleus sampling | |
stop | string / string[] | 停止序列 |
vecstruct 擴充欄位:
| 欄位 | 類型 | 說明 |
|---|---|---|
project_id | string | 指定專案(不填則使用 API Key 預設專案) |
rag | boolean | 啟用 RAG 知識庫注入 |
rag_top_k | number | RAG 擷取的段落數,預設 5 |
rag_source_ids | string[] | 限定只搜尋特定文件 |
rag_min_similarity | number | RAG 最低相似度門檻,0.0 – 1.0 |
use_memory | boolean | 啟用 Agent Memory 注入 |
metadata | object | 自訂標記,會寫入 Audit Log |
回應範例(非串流)
{
"id": "chatcmpl-uuid",
"object": "chat.completion",
"created": 1746700000,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "退款通常需要 3–5 個工作天..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 120,
"completion_tokens": 80,
"total_tokens": 200
},
"vecstruct": {
"audit_id": "audit-uuid",
"rag_sources": [
{
"document_id": "doc-uuid",
"title": "退款政策.pdf",
"content": "退款申請受理後...",
"similarity": 0.92
}
],
"memory_used": true,
"credits_consumed": 0.05,
"balance_consumed_usd": 0.000240
}
}
vecstruct 回應欄位:
| 欄位 | 類型 | 說明 |
|---|---|---|
audit_id | string | 此次請求的 Audit Log ID |
rag_sources | array | RAG 引用的段落列表 |
memory_used | boolean | 是否有注入 Memory |
credits_consumed | number | 消耗的 Credits(RAG/Memory 功能) |
balance_consumed_usd | number | 消耗的 USD 餘額(LLM Token) |
串流回應
當 stream: true 時,使用 Server-Sent Events(SSE)回傳:
data: {"id":"chatcmpl-uuid","object":"chat.completion.chunk","choices":[{"delta":{"content":"退款"},"index":0}]}
data: {"id":"chatcmpl-uuid","object":"chat.completion.chunk","choices":[{"delta":{"content":"通常"},"index":0}]}
event: vecstruct
data: {"audit_id":"audit-uuid","rag_sources":[...],"credits_consumed":0.05}
data: [DONE]
串流結束前會有一個 event: vecstruct 的特殊事件,包含 RAG 來源、Credits 用量等 metadata。
模型格式
模型 ID 的格式為 provider/model-name,例如:
| Provider | 範例 |
|---|---|
openai | openai/gpt-4o, openai/gpt-4o-mini |
anthropic | anthropic/claude-3-5-sonnet |
google | google/gemini-2.0-flash |
baai | baai/bge-m3(Embedding) |
cohere | cohere/rerank-v3.5(Rerank) |
完整的可用模型列表請參考 GET /v1/models。