Retrieval Augmented Generation

I implemented a retrieval augmented generation (RAG) program for fun with the goal of being able to search my personal library. My focus was to make this run locally with only open source models. This was achieved with ollama and sentence-transformers for downloading and running these models locally.

However, the project was expanded to integrate with cohere and their rerank and command-r+ models, since I was especially curious about the command-r+’s performance. These models can be downloaded and run locally, but it took ages for my computer to generate any output, since the command-r+ model is 104B parameters. The obvious and impressive benefit of the command-r+ is that it generates citations from the context in its answer.

Here is a presentation that gives a brief overview of what a RAG system is, and how it can be improved with reranking.