Over the last few years, we have all seen the rise of Large Language Models, commonly called LLMs. Every other tech organization is training big neural networks of tons of data. These LLMs are super good at taking tons of information during training and compressing it as its parameters.
Once trained, these LLMs know a lot of information about training data, though they are just trained to predict the next word in the sentence. After training LLMs, they are further fine-tuned for many downstream tasks like translation, conversation, text generation, etc. These fine-tuned models are quite better at these tasks as they have a lot of information compressed in them during the pre-training stage.
Now, the pre-training stage of these LLMs is quite lengthy (A bunch of costly GPUs running in parallel) and costly. Hence, if you notice the trend, only big corporates have trained such models. Individuals can not invest in such resources. Even big corporates are releasing around one LLM per year if you look at OpenAI GPT releases.
Even after performing such a lengthy and costly process of training LLMs which has an environmental impact, there is one big challenge faced by developers of these LLMs. The biggest challenge is that LLMs can answer queries on data it is trained on only. It can answer questions on data that is available after its training. To explain with a simple example, GPT-4 is trained on data scraped from the internet till 2021. Hence, it knows about events that happened till then only. If you ask ChatGPT questions about events that happened in 2023 then it won’t be able to answer accurately as it does not know about it. Even if it tries to answer, it’ll hallucinate but it won’t be an accurate answer.
To solve this big limitation to some extent concept called RAG (Retrieval Augment Generation) was coined by a scientist named Patrick Lewis (Meta AI) in his 2020 research paper (Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks).
The concept behind RAG is simple, just give LLM context information about the latest events to generate proper response. It’ll use whatever information it has available along with this latest context information and try to generate correct responses to queries. It is easier said than implemented. A lot of research is currently going on in this field as it is costly to train LLMs and maintain them. Training a new one every year can be costly as well hence organizations are looking more towards reusability and RAG can help with that.
Now, the important question that will come to your mind is how to maintain and give this context information to LLMs. Organizations can have information getting maintained in various forms like databases, PDFs, blogs, forums, etc. How to combine these different forms of the latest information and save it in a format that can be fed to LLMs to get accurate answers?
Well, one commonly used way is to create embeddings of these documents and store them in vector database. When a user queries LLMs, the information related to a query can be retrieved from this database using some kind of search (maybe semantic search), and then give this information to LLM as context to generate a proper response. We can also include sources of information in embeddings which LLM will point users to verify information. Users can then check that source to verify information. This way information about the latest events can be accurately given by LLM even though it did not have this information available at the time of training.
RAG is a way through which LLMs can connect with external resources and retrieve information that is not available to them. It can then combine this information with whatever it has available to generate the best answers to user queries. Using the RAG concept, LLMs can be connected to documents (blogs, tutorials, etc) on the internet to direct users to places that better answer queries.
All right, so that was a small introduction to the concept of RAG. Feel free to visit other blogs and tutorials on our website on a variety of topics like AI, LLMs, Data Science, Visualization, Deep Learning, etc.
If you want to