Skip to main content

Basic Flow of AI Integrations and Possible Scenarios

These observations reflect my personal opinions, so please consider them accordingly.

This article describes potential AI integration flows and their purposes.

  • Remote AI servers
  • Locally hosted AI servers

Types of API:

Remote AI Services

These solutions include:

OpenAI (Creator of ChatGPT)

Platform: https://platform.openai.com

Advantages:

  • No need to rent, maintain, or train a server.
  • Vector storage (RAG) solution available out of the box.
  • Function calling is available and relatively cheap using the gpt-o4-mini model.
  • Easy integration with Live Helper Chat.
  • Complex scenarios can be integrated with function calling.
  • OpenAI does not make internal API calls to the internal LHC structure hosted on your server. OpenAI acts as a proxy between your LHC installation and internal API servers.

Integration Samples:

Disadvantages:

  • Data is hosted remotely.

Gemini Integration

More information can be found here: https://doc.livehelperchat.com/docs/bot/gemini-integration

I have not personally used it for RAG or function calling, so it's difficult to assess how easy it is to integrate in real-world scenarios beyond the few samples I've created.

Advantages:

  • No need to rent, maintain, or train a server.
  • Reliable provider.

Disadvantages:

Open Question:

What is the best way to implement RAG and function calling simultaneously?

For testing purposes, we tried using a system prompt where we asked the AI to perform a specific function if the question was unrelated to any defined functions. However, I suspect this isn't the optimal approach.

https://ai.google.dev/gemini-api/docs/models/gemini

Helpful YouTube Videos:

DeepSeek

More information can be found here: https://doc.livehelperchat.com/docs/bot/deepseek-integration

groq.com

This service can run open-source LLM models, ensuring you still have a model even if a service goes offline. https://groq.com

From a first look, it seems very easy to set up and could be a replacement for OpenAI. You would need to implement the RAG aspect yourself.

Services That Can Run Your Fine-Tuned Models

These are more complex setups that can run your own fine-tuned models. Models can be obtained from https://huggingface.co/, which serves as a hub for AI models.

Completely Self-Hosted Solutions

RAG Flow

RAG involves vector storage and text chunking. I won't go into much detail but want to include it as a reference.

Preparation of Content for RAG Flow:

  • Prepare content as a text file.
  • Chunk content into smaller parts.
  • Upload content to vector storage.

Search Flow:

  • A user asks a question.
  • The question is chunked.
  • Each chunk is sent to vector storage.
  • Vector storage returns possible answers.
  • Answers are combined and sent to the LLM server.
  • The LLM server returns a generated response to the user.

Text Preparation Tools

Vector Storage Engines