The Large Language Model Course. How to become an LLM Scientist or… | by Maxime Labonne

How to become an LLM Scientist and Engineer from scratch

The Large Language Model (LLM) course is a collection of topics and educational resources for people to get into LLMs. It features two main roadmaps:

🧑‍🔬 The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
👷 The LLM Engineer focuses on creating LLM-based applications and deploying them.

For an interactive version of this course, I created an LLM assistant that will answer questions and test your knowledge in a personalized way on HuggingChat (recommended) or ChatGPT.

This section of the course focuses on learning how to build the best possible LLMs using the latest techniques.

An in-depth knowledge of the Transformer architecture is not required, but it’s important to understand the main steps of modern LLMs: converting text into numbers through tokenization, processing these tokens through layers including attention mechanisms, and finally generating new text through various sampling strategies.

Architectural Overview: Understand the evolution from encoder-decoder Transformers to decoder-only architectures like GPT, which form the basis of modern LLMs. Focus on how these models process and generate text at a high level.
Tokenization: Learn the principles of tokenization — how text is converted into numerical representations that LLMs can process. Explore different tokenization strategies and their impact on model performance and output quality.
Attention mechanisms: Master the core concepts of attention mechanisms, particularly self-attention and its variants. Understand how these mechanisms enable LLMs to process long-range dependencies and maintain context throughout sequences.
Sampling techniques: Explore various text generation approaches and their tradeoffs. Compare deterministic methods like greedy search and beam search with probabilistic approaches like temperature sampling and nucleus sampling.

📚 References:

Visual intro to Transformers by 3Blue1Brown: Visual introduction to Transformers for complete beginners.
LLM Visualization by Brendan Bycroft: Interactive 3D visualization of LLM internals.
nanoGPT by Andrej Karpathy: A 2h-long YouTube video to reimplement GPT from scratch (for programmers). He also made a video about tokenization.
Attention? Attention! by Lilian Weng: Historical overview to introduce the need for attention mechanisms.
Decoding Strategies in LLMs by Maxime Labonne: Provide code and a visual introduction to the different decoding strategies to generate text.

The Large Language Model Course. How to become an LLM Scientist or… | by Maxime Labonne | Jan, 2025

How to become an LLM Scientist and Engineer from scratch

Leave a Reply Cancel reply

The Large Language Model Course. How to become an LLM Scientist or… | by Maxime Labonne | Jan, 2025

How to become an LLM Scientist and Engineer from scratch

Related Posts

Why companies struggle with AI

Why companies struggle with AI

Leave a Reply Cancel reply