Local RAG Knowledge Base with Llama 3.2, Chroma, and Ruby
Project Overview
This mini project demonstrates building a local Retrieval-Augmented Generation (RAG) system using the Llama 3.2 language model (via Ollama locally), Chroma vector database, and Ruby. The project includes:
- Reading and processing a knowledge base consisting of markdown files organized in folders.
- Splitting documents into context-preserving chunks using the
pragmatic_segmentergem for sentence-aware chunking. - Generating semantic embeddings of chunks with Llama 3.2 local model via an OpenAI-compatible API (Ollama).
- Storing embeddings along with metadata and documents into a Chroma vector database collection.
- Performing similarity search in Chroma on user queries to retrieve relevant context.
- Constructing a conversational prompt by combining chat history, retrieved context, and user input.
- Streaming responses from the Llama 3.2 model back to the console for real-time interaction.
What Has Been Done
-
Folder crawler and document loader:
Recursively read all markdown files in theknowledge_basedirectory, assigning document type metadata from folder names. -
Smart text chunking:
Integrated thepragmatic_segmentergem to split texts into sentence-safe chunks (~1000 characters) with overlaps to preserve context and avoid cutting sentences unnaturally. -
Embeddings generation:
Leveraged the local Llama 3.2 model via Ollama's OpenAI-compatible streaming API to generate embeddings of all text chunks, enabling efficient semantic search. -
Chroma vector store integration:
Used thechroma-dbRuby gem with a locally hosted Chroma server (via Docker Compose) to store embeddings and metadata, and to perform similarity search queries. -
Interactive conversational loop:
Maintained chat history manually as an array of message hashes, combined with relevant retrieved chunks to form prompts fed into the Llama 3.2 model. -
Streaming chat responses:
Implemented real-time streaming of LLM output to the console leveraging the Ruby OpenAI gem streaming feature and Ollama's compatible API.
Tricky Findings & Gotchas
-
Ruby Next transpilation required for
chroma-dbgem:
The gem uses modern Ruby features and requiresgem 'ruby-next'withrequire "ruby-next/language/runtime"loaded early to avoid LoadErrors. -
Chroma API version compatibility:
Different Chroma server versions expose different API versions (v1vsv2). Thechroma-dbRuby gem expected v2 endpoints. Using matched versions of Chroma server and the gem, or a forked gem branch with v2 support, was crucial. -
Bundler context for scripts:
Running scripts must be done withbundle execor withrequire 'bundler/setup'to load local gem dependencies correctly (especially forked gems). -
Manual management of conversational memory:
Unlike Python LangChain, no high-level Ruby library exists for conversation memory or RAG chains, so that had to be implemented as arrays of messages, and prompt assembly was manual. -
Text chunking with
pragmatic_segmenter:
Using sentence segmentation improved context retention significantly over naïve character splitting, but required careful assembly of chunks and overlaps. -
Streaming outputs handled via custom block in Ruby OpenAI gem:
Streaming integration required capturing delta chunks from the streaming API and printing them in realtime, instead of waiting for full response.
Setup Instructions
Requirements
- Ruby 3.2.x
- Bundler
- Docker & Docker Compose
- Ollama installed and running locally with the
llama3.2model pulled - Basic terminal shell (macOS, Linux recommended)
Steps
- Clone/Fork the repository:
- Run
bundle installto install Ruby dependencies - Run
docker compose up -dto boot up Chroma DB - Run
ollama run llama3.2to boot up Open Source LLM - Run
bundle exec ruby seed.rbto seed Chroma DB with chunks of data fromknowledge_basefolder - Run
bundle exec ruby main.rbto start actual conversation
Questions to try on
- What is the company name?
- When the company was establised?
- Which techologies does the company use?
- Tell me the emplooees` names and their titles?
- Who knows how to work with IOS?
- Tell me who was the client for web project?
- Is the company looking for IOS developer?
Then you can compare it with the actual knowledge base.
