Updated On : Jan-15,2024 Time Investment : ~10 mins

Introduction to LLMs (Large Language Models)¶

Large Language Models commonly referred to as LLMs are a type of neural networks that are trained to better understand and then generate natural languages. There are single or multiple huge datasets used in training these LLMs hence they are called “Large”. The majority of LLMs trained lately have billions of parameters (network parameters).

All LLMs are based on transformer neural network architecture. The architecture is based on the concept of encoder and decoder. For large models, multiple encoders and decoders are lined up to properly catch the context of data and long-term dependencies. These encoders and decoders are based on the concept of self-attention which tries to find an important token of a sentence at a given time. To properly generate sentences, it assigns importance to particular words based on context. It learns how to give this importance during the training process. I haven’t discussed transformer architecture in detail here as it is a complicated topic and deserves an article of its own.

Transformers architecture was published by Google researchers in 2017 and then from 2018 onwards, various forms of this architecture are used to create large language models. It started with Google releasing BERT in 2018 and OpenAI releasing GPT-1 that same year. Then, competition to create large language models heated up with everyone releasing models trained on billions of parameters.

Below, I have included some famous LLMs.

GPT-1 (2018) - 117 Mn Parameters (OpenAI)
BERT (2018) - 340 Mn Parameters (Google)
GPT-2 (2019) - 1.5 Bn Parameters (OpenAI)
GPT-3 (2020) - 175 Bn Parameters (OpenAI)
Claude (2021) - 52 Bn Parameters (Anthropic)
PALM (2022) - 540 Bn Parameters (Google)
BLOOM (2022) - 175 Bn Parameters (HuggingFace)
LLaMA (2023) - 65 Bn Parameters (MetaAI)
GPT-4 (2023) - 1.76 Tn Parameters (OpenAI)
And many more.

From a technical perspective, LLMs are simply two files.

Architecture File (Written in Python, C, C++, etc) - Particular transformer architecture coded in a specific programming language.
Parameters File (Trained Network Parameters) - Float arrays stored in a file.

We just need to load the architecture in memory with parameters (from the params file) and can use it for inference. Of both files, the parameters file is more important as it is said to have much of the information from the training corpus compressed in it. Most companies release architecture information but keep parameters file secret to keep a competitive advantage (OpenAI being one). The final parameters file is available after training.

All LLMs are by default simply trained to predict the next word (token) that it feels right. They are then fine-tuned for various tasks like text classification, sentiment classification, question answering, conversation, translation, etc. ChatGPT is a fine-tuned conversational bot based on GPT LLMs (GPT-1,2,3, etc).

Training Process of LLMs¶

The training process of LLMs is generally 3 stage process.

Pre-Training - In this stage, a neural network based on transformer architecture is trained on a large dataset with just one simple task which is to predict the next word in a sentence. For this large datasets from sources like Wikipedia, Github, etc are used. During this stage, the model learns the meaning of words, the relationship between words, the context, and so on.
Fine-Tuning - During this stage, the model is fine-tuned for specific downstream tasks like classification, conversation, translation, etc.
Prompt-Tuning - This is one extra stage that is generally not performed by everyone but is made famous by OpenAI (They do it for ChatGPT). During this stage, the model is given a prompt and asked to generate multiple predictions as output. These predictions are then rated by human readers for quality. It's a kind of quality improvement stage where the model learns to properly predict the output of the prompt. OpenAI researchers generally refer to this process as RLHF (Reinforcement Learning from Human Feedback) as the model is improved based on ratings on predictions from humans.

Nowadays, many companies are performing the Pre-training stage on large corpus and releasing models for others to further fine-tune. Meta AI models like LLaMA and LLaMA-2 are such examples.

Limitations of LLMs¶

Though LLMs are widely accepted as successful AIs there are a few challenges as well as limitations.

Hallucination - Hallucination happens when LLM outputs a wrong prediction or not what the user expects based on the input prompt, but it does so with confidence. It behaves as if it is right even though the generated answer to a given prompt can be wrong.
Bias - The information generated by LLMs can be biased for a particular prompt. This is generally introduced due to bias in training data. For example, if training data has information with high numbers of crimes committed by black people then it’ll likely be biased towards predicting black guys as criminals which might not be the case in reality all the time.
Security - LLMs are trained on large datasets which are generally scraped from the internet. This can result in sometimes leaking someone’s private information which is used during training network. And, this can happen without the consent of the individual. There can be many different forms of security breaches that can happen based on training data. Adversarial attacks on LLMs in one fascinating field concentrating on this area of research.
Deployment & Scaling - The scaling of LLMs is generally hard. The process of pre-training LLMs is costly and generally takes many days. Big companies like OpenAI, Google, Meta, etc are releasing one LLM per year due to this. It is hard for an individual to perform pre-training on their own due to the very high cost of training (It involves clusters of GPUs). Deployment of LLMs requires costly hardware to make inference faster in production to avoid delaying response to users.

All right so that was a small introduction to LLMs. Feel free to explore other articles on our website to learn more about topics like LLMs, Deep Learning, Machine Learning, Data Viz, Data Science, Python, etc.