Have you heard of ChatGPT? If you have not, it's an artificial intelligence (AI) chatbot that has taken the world by storm. It can write poems, code, and even novels. But what's really powering generative AI models like OpenAI’s ChatGPT or Google’s Bard? It's all thanks to large language models (LLMs).
LLMs are a type of AI model specifically designed to understand natural language. It can process and generate text, allowing it to be used for a variety of tasks such as language translation, summarisation, and question-answering.
These models are trained on massive datasets that vary depending on the specific purpose of the LLM. Smaller datasets typically have tens of millions of parameters, whereas larger ones have hundreds of billions of data points.
In LLMs, parameters represent the ‘knowledge’ gained by the model during its training phase. More parameters generally result in more accurate predictions because the model has access to more contextual information. Some of the most prominent LLMs have hundreds of billions of parameters.
In simple terms, an LLM can be thought of as an enormous database of text data that can be referenced to generate human-like responses to prompts.
Also read: OpenAI to launch ChatGPT app for iPhone users
Let's do a little exercise to understand this better. Here's a sentence, and all you have to do is fill in the blank:
"I am going to the ____ to buy milk."
If your answer was "I am going to the store to buy milk", you are right. However, you could have also said "market" or "shop". There are quite a few possible answers for the sentence. If you were able to fill in the blank with something sensible, you just demonstrated how an LLM works.
How does it work?
Language models rely on neural networks, which are machine-learning algorithms inspired by the human brain. These networks comprise interconnected nodes organised in layers, with each layer processing information and transmitting it to the next layer.
For language models specifically, the neural network takes a word sequence as input and produces an output by predicting the most probable sequence of words.
How are LLMs trained?
The process of training an LLM can be divided into three stages: pre-training, fine-tuning, and inference.
During the pre-training stage, the model learns from vast amounts of text data including books, articles, and websites to understand how words work together and how sentences are built. It also figures out the meaning of words and learns the rules of grammar and how to put words in the right order. It does all of this by looking at patterns in the text, like how certain words often appear together.
After pre-training, the model can be trained for specific tasks like translating languages or answering questions. This is called fine-tuning, where the model learns more about the specific task it needs to do. It's like practising a particular skill to get better at it.
Once the model is trained, the next stage is inference. The trained model can now be asked questions or given prompts, and it will generate answers or responses based on what it has learned during the pre-training and fine-tuning stages.
Which are the popular LLMs?
GPT-3.5: Generative Pre-trained Transformer-3.5 or GPT-3.5 is one of the largest LLMs developed by OpenAI and serves as the backbone for AI chatbot ChatGPT. Boasting an impressive 175 billion parameters, the model can carry out many tasks, such as text generation, translation, and summarisation.
LaMDA: Google's Language Model for Dialogue Applications (LaMDA) is the underlying technology behind the newly introduced Bard AI. This language model has undergone training using extensive conversational dialogue data, enabling it to grasp subtle linguistic nuances and engage in open-ended conversations. Google has also developed an advanced iteration called LaMDA 2, which is further refined and equipped to offer recommendations for user queries. LaMDA 2 incorporates Google's Pathways Language Model (PaLM), featuring an impressive parameter count of 540 billion.
LLaMA: Developed by Meta AI, the LLaMA model comes in various parameter sizes, ranging from 7 billion to 65 billion. Meta aims to democratise access to the field by introducing LLaMA, as the training of large models has traditionally been limited by the computational power required.
WuDao 2.0: Developed by the Beijing Academy of Artificial Intelligence, WuDao 2.0 is the largest model in existence, trained on 1.75 trillion parameters. WuDao 2.0 can simulate human speech and generate content.
MT-NLG: Megatron-Turing Natural Language Generation (MT-NLG), jointly developed by Nvidia and Microsoft, serves as the successor to Microsoft's Megatron-LM and Nvidia's Turing NLG 17B. These models underwent training on Nvidia's Selene ML supercomputer, utilising a dataset consisting of 530 billion parameters. With its 105-layer deep neural network, MT-NLG can perform a wide range of natural language tasks, including completion prediction, reading comprehension, intelligent reasoning, language inferences, and more.
Bloom: BigScience Large Open-science Open-access Multilingual Language Model is an open-source LLM built by a consortium of over 1,000 AI researchers and trained on 176 billion parameters. The model is capable of generating text in 46 languages and code in 13 programming languages.