Completions generic api integration
This integration was tested with https://github.com/ggml-org/llama.cpp
I was running during test P4 Tesla Nvidia card with following
./llama.cpp/build/bin/llama-server \
-m Qwen3.5-4B-Q4_K_M.gguf \
--mmproj mmproj-F16.gguf \
--host 0.0.0.0 \
--port 8080 \
--api-key "your-api-key" \
-ngl 99 \
-fa on \
-t 8 \
-c 16384 \
-b 512 \
-ub 512 \
--mlock \
--reasoning-budget 512 \
--jinja \
--cache-type-k q8_0 \
--cache-type-v q8_0
This integration uses the Chat Completions API.
Rest API
Bot
Flow with Tool Call Support
The main difference from the legacy flow is the support for tool calls.
REST API
- Set a
Bearertoken. - Modify the
systemprompt.
Bot
- Import a bot and configure the correct triggers and API calls as shown in the video.
Calling a Trigger Based on a Defined Function in ChatGPT
- Note the defined function in Gemini,
transfer_operator. - Add an event to your trigger with the
Typeset toCustom text matching. TheShould include any of these wordsvalue should betransfer_operator.
For example:

Limiting the Knowledge Base to Uploaded Documents
Here are my System instructions for the bot used on the documentation page:
You are a helpful Live Helper Chat Bot. You answer questions based on file search. If you don't know the answer, respond with "I can only help with Live Helper Chat related questions." Provide the most relevant answer to the visitor's question, not exceeding 100 words. Include a link for more information about your answer.