Photo of space ship in a theme park
Introduction Large Language Models (LLMs)

How LLMs work

Large Language Models are a type of AI (artificial intelligence) system that works with language, specifically designed for handling language-related tasks. LLMs are a subset of generative AI, focusing on generating and processing text-based content. 

They are statistical applications, that predict the output text, based on input text.

Graphic showing the input and output of a large language model

 

These applications have become so powerful that the responses to text input feel like a real conversation.

When you cannot tell any more, if you are talking to a human being or a software, then the application is considered to be intelligent (Turing test).

How you create LLMs

AI applications like ChatGPT are called large language models, because they are trained on a massive amount of data and contain millions, or even billions of parameters.

Think of a LLM as a software that can “talk” (create grammatically correct sentences, that have a high probability of making sense).

Of course it can only talk about what it “knows” (was trained with).

Graphic that shows how you train Large Language Models

 

LLMs are restricted by training data

LLMs can only talk about data, that it was trained on.

This can be overcome, by passing the data you want to talk about to the LLM before the conversation (in-context learning).

This is calle pre-promting. Designing and optimizing text prompts given to the LLM before the conversation esnures that the responses meets the desired criteria for the conversation. This process, known as prompt engineering, is the most common way for maximizing the utility of LLMs.

Typically you pass instructions (how the LLM should respond) and data (what the LLM should talk about) as a pre-promt.

Graphic illustrating how a Large Language Model is pre-promted with Instructions and Data

 

LLMs are restricted by complexity

LLMs can only receive and return a certain amount of data (so-called tokens). Tokens are text units that represents words, phrases, or other piece of text. Token limits refer to the maximum number of tokens that the model can process at once. 

The amount of data sent to a model impacts its performance. Therefore, APIs to access LLMS usually charge based on the number of tokens sent to the API.

Examples of token limits of well-knows LLMs are:

  • GPT-3.5 Turbo: 4096 tokens
  • GPT-4: 8192 tokens
  • GPT-4 32k: 32,768 tokens

For complex conversations it can be necessary to break down the conversation into multipel sub-conversations. 

A good example would be an touristic recommendation chat. In the first part of the conversation the chat assistant would have to find out where a person wants to travel to and when person wants to travel. Once this information is obtained the correct events can be loaded and passed to the next conversation to recommend activities. Using an approach like this can help sending less tokens to the model and thus save computation time and cost.

Graphic showing how multiple LLMs can be chainded