Andrej Karpathy — co-founder of OpenAI, former Director of AI at Tesla — just built a complete GPT with no libraries, no frameworks, and no dependencies. 200 lines of pure Python. Here is what those lines actually tell you about how language models work.
Every AI course in the world teaches through abstraction. You use PyTorch. You import transformers. You call functions you do not understand. You build things without knowing how they work. Karpathy's entire career has been a war against that approach.
He previously built micrograd (automatic differentiation from scratch), makemore (character-level language models from scratch), and nanoGPT (a full GPT-2 training run from scratch). Each was a step toward stripping AI down to its mathematical skeleton.
microgpt is the final answer. It trains and runs a GPT model completely from scratch, with no external dependencies, in 200 lines of Python. Karpathy wrote: “This script is the culmination of multiple projects and a decade-long obsession to simplify LLMs to their bare essentials. I cannot simplify this any further.”
This matters even if you will never read the code. Because once you understand what each of those components does — in plain language — you understand what a language model actually is. Not what it does. What it is. That understanding changes how you use it, how you evaluate it, and how you spot when it is going wrong.
Each component is described in plain language with an analogy. The line counts are approximate.
Reads raw text — a book, a web page, a code repository — and converts it into numbers. Language models do not read words. They read numbers that represent words. The dataset loader builds the conversion table.
Decides how to split text into chunks ('tokens') and assigns each chunk a number. 'hello' might be one token. ' world' might be another. Punctuation gets its own tokens. The tokenizer determines the vocabulary the model can work with.
The mechanism by which the model learns. After making a prediction, autograd calculates how wrong it was and works backwards through the entire network to figure out which numbers to adjust and by how much. This is the heart of machine learning.
The actual neural network structure — the layers, the attention heads, the matrix multiplications that transform an input into a probability distribution over the next token. This is the 'brain' of the model.
The algorithm that uses the autograd calculations to actually update the model's parameters. Adam is sophisticated: it adjusts how aggressively it updates each parameter based on how often that parameter has been useful.
The cycle: show the model a batch of text, have it predict the next token, calculate how wrong it was, update the parameters. Repeat millions of times. This is how a blank model becomes a model that can write.
How you actually use the trained model. Give it a starting prompt. It predicts the next token. You add that token to the prompt and ask again. Repeat until you have the output you wanted.
Understanding microgpt changes how you think about deploying AI in production — even if you will never write a single line of the training code yourself.
If this article sparked genuine curiosity about how language models work — not at the level of building one from scratch, but at the level of understanding what is actually happening when you prompt one — the AI Foundations track is the right next step.
If you want to go deeper into the code, Karpathy's microgpt is public and free: gist.github.com/karpathy/microgpt. You do not need to understand Python to benefit from reading it — the structure alone communicates how the pieces fit together.
AI Foundations Track →The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.