I implemented a retrieval augmented generation (RAG)
program for fun with the goal of being able to
search my personal library. My focus was to make this run locally with only open
source models. This was achieved with ollama
and
sentence-transformers
for
downloading and running these models locally.
However, the project was expanded to integrate with cohere and their rerank and command-r+ models, since I was especially curious about the command-r+’s performance. These models can be downloaded and run locally, but it took ages for my computer to generate any output, since the command-r+ model is 104B parameters. The obvious and impressive benefit of the command-r+ is that it generates citations from the context in its answer.
Here is a presentation that gives a brief overview of what a RAG system is, and how it can be improved with reranking.