# Local-RAG-Chatbot-Rerank

**Repository Path**: lzdn/Local-RAG-Chatbot-Rerank

## Basic Information

- **Project Name**: Local-RAG-Chatbot-Rerank
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-07
- **Last Updated**: 2025-07-07

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


# Local RAG Chatbot with Rerank

A internal Retrieval-Augmented Generation (RAG) assistant that answers your questions using a local Embedding model and LLM model pulled via [Ollama](https://ollama.com), Rerank model downloaded from [Hugging Face](https://huggingface.co/) and run on [vLLM](https://docs.vllm.ai/en/latest/). 

This version is trained on community-sourced cooking tips, but you can customize it to your own content easily.

## Features

- Fully local RAG setup (no need for cloud api)
- Uses Ollama-compatible models for both embedding and generation
- Incorporate huggingface.co rerank model, run on vLLM
- Built with Python, using LangChain and FAISS library
- Utilized the power of embedding models and rerank models to enhance knowledge retrieval accuracy
- Interactive terminal interface
- Fallback to general knowledge if answer isn't found in local data

## Requirements

- Python 3.11+
- [Ollama](https://ollama.com) installed and running
- pull LLM model and Embedding model from Ollama to local (after pulling, update line 21 and 25 of the code accordingly)
- [vLLM](https://docs.vllm.ai/en/latest/) installed and running (update line 31 after vLLM is setup)
- install huggingface_hub
- download the Rerank model 'bge-reranker-v2-m3' from huggingface.co with huggingface-cli command

Install Python library dependencies:
```bash
pip install langchain langchain-community langchain-ollama faiss-cpu requests json
```

Check if the knowledge base file is in the same directory as "internal-rag-cookbot.py". I have named mine "cooking-tips-comments.txt", the name and contents of the file can be changed. 

## How It Works

This project uses Retrieval-Augmented Generation (RAG), which combines a embedding vector database created from your content, a reranker to rank relevance of content, along with a language model to provide more accurate and contextual answers.

### Embedding the Content

- The text file (cooking-tips-comments.txt) is our knowledge base file.

- The contents of the file is converted by projecting the **high-dimensional space of initial data vectors** into a **lower-dimensional space** using a local embedding model (in the code provided, we used 'snowflake-arctic-embed:335m' from Ollama).

- These vectors capture the semantic meaning of the text — two similar tips will have similar embeddings. 

### Storing in a Vector Database

- The vectors are stored in FAISS, a fast, in-memory vector store that supports efficient similarity search.

- This lets the system quickly find the most relevant chunks of text when a new question is asked.

### Retrieving Context from Embedding

- When you ask a question, it is also embedded into a vector on the spot for the LLM model to actully understand your question.

- FAISS compares this vector to the ones in the vector database and returns the top matching text chunks. The specific phrase we use here is **"top-k"**.

- **top-k** is the number of top matching entries we retrieve from the embedding using a fast but rough similarity search. If the number k is too small, we might miss out on some relevant information; if it's too large, we're likely getting too much unrelated content. In the code provided we are using **"top-k 10"**, which is a good number compared to our data size. In the case for using a much larger knowledge base, a bigger k is prefered (some knowledge bases are so big that they use k = 100). 

### Rerank Top-K Content

- These chunks are sent to a rerank API using 'bge-reranker-v2-m3' to reorder them by relevance.

- Only the top result is used as context for generation.

### Generate The Answer

- The contxt is passed to the LLM (in the provided code we used 'gemma3:12b') along with our question.

- The LLM uses this context to generate a more accurate, grounded, and helpful response.

## Usage

Run the following command in the directory of the python file. 
```bash
python internal-rag-cookbot.py
```

Type in what you want to ask when the prompt "Your question: " shows up. 

Example use of the code:
```bash
<some-user-directory>:~$ python chatbot-rerank.py
    Internal RAG Q&A Bot    
Ask questions about cooking hacks and kitchen tips.
This assistant is powered by a local language model and a custom knowledge base built from community-sourced cooking advice.
Type 'exit' to quit.

Your question:
```

Here we enter our question:
```bash
Your question: i am trying to make chocolate chip cookies
```

The answer responded is:
```bash
Top relevant chunk:
 Not mine, but my wife browns the butter before she adds it to chocolate chip cookie dough and they're the best freakin' cookies I've ever eaten! ...


Answer: That's great! My wife browns the butter before adding it to her chocolate chip cookie dough, and it makes a huge difference – they're amazing! You should try it! 

Your question: 
```

Now we can type "exit" to close this chatbot: 

```bash
Your question: exit
Goodbye!
```