Home » The Large Language Model Course. How to become an LLM Scientist or… | by Maxime Labonne | Jan, 2025

The Large Language Model Course. How to become an LLM Scientist or… | by Maxime Labonne | Jan, 2025

Image by author

The Large Language Model (LLM) course is a collection of topics and educational resources for people to get into LLMs. It features two main roadmaps:

  1. 🧑‍🔬 The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
  2. 👷 The LLM Engineer focuses on creating LLM-based applications and deploying them.

For an interactive version of this course, I created an LLM assistant that will answer questions and test your knowledge in a personalized way on HuggingChat (recommended) or ChatGPT.

This section of the course focuses on learning how to build the best possible LLMs using the latest techniques.

Image by author

An in-depth knowledge of the Transformer architecture is not required, but it’s important to understand the main steps of modern LLMs: converting text into numbers through tokenization, processing these tokens through layers including attention mechanisms, and finally generating new text through various sampling strategies.

  • Architectural Overview: Understand the evolution from encoder-decoder Transformers to decoder-only architectures like GPT, which form the basis of modern LLMs. Focus on how these models process and generate text at a high level.
  • Tokenization: Learn the principles of tokenization — how text is converted into numerical representations that LLMs can process. Explore different tokenization strategies and their impact on model performance and output quality.
  • Attention mechanisms: Master the core concepts of attention mechanisms, particularly self-attention and its variants. Understand how these mechanisms enable LLMs to process long-range dependencies and maintain context throughout sequences.
  • Sampling techniques: Explore various text generation approaches and their tradeoffs. Compare deterministic methods like greedy search and beam search with probabilistic approaches like temperature sampling and nucleus sampling.

📚 References:

  • Visual intro to Transformers by 3Blue1Brown: Visual introduction to Transformers for complete beginners.
  • LLM Visualization by Brendan Bycroft: Interactive 3D visualization of LLM internals.
  • nanoGPT by Andrej Karpathy: A 2h-long YouTube video to reimplement GPT from scratch (for programmers). He also made a video about tokenization.
  • Attention? Attention! by Lilian Weng: Historical overview to introduce the need for attention mechanisms.
  • Decoding Strategies in LLMs by Maxime Labonne: Provide code and a visual introduction to the different decoding strategies to generate text.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *