Skip to main content

Completions generic api integration

This integration was tested with https://github.com/ggml-org/llama.cpp.

During testing I ran on an NVIDIA Tesla P4 using the following command:

./llama.cpp/build/bin/llama-server \
-m Qwen3.5-4B-Q4_K_M.gguf \
--mmproj mmproj-F16.gguf \
--host 0.0.0.0 \
--port 8080 \
--api-key "your-api-key" \
-ngl 99 \
-fa on \
-t 8 \
-c 16384 \
-b 512 \
-ub 512 \
--mlock \
--reasoning-budget 512 \
--jinja \
--cache-type-k q8_0 \
--cache-type-v q8_0

This integration uses the Chat Completions API.

REST API

Bot

Flow with Tool Call Support

The main difference from the legacy flow is support for tool calls (functions).

REST API

  • Set a Bearer token.
  • Modify the system prompt to suit your knowledge-limiting needs.

Bot

  • Import a bot and configure triggers and API calls as shown in the video.

Calling a Trigger Based on a Defined Function

  1. Identify the function name defined in the model (example: transfer_operator).
  2. Add an event to your trigger with Type = Custom text matching and Should include any of these words = transfer_operator.

For example: transfer_operator

Limiting the Knowledge Base to Uploaded Documents

Example System instructions used for the documentation bot:

You are a helpful Live Helper Chat Bot. Answer questions using file search only. If you don't know the answer, respond with "I can only help with Live Helper Chat related questions." Keep answers under 100 words and include a link for more information.