Language → Action

A guided tour through the internals of a tiny transformer learning to map synonymous natural-language commands onto a small action vocabulary.

  1. The task
  2. Baseline: a 1-layer model, no synonyms
  3. Attention patterns
  4. Direct logit attribution
  5. Turning the data dial: synonyms
  6. Turning the model dial: depth
  7. Activation patching: the causal story
  8. Synthesis: across both axes