Language → Action

A guided tour through the internals of a tiny transformer learning to map synonymous natural-language commands onto a small action vocabulary.

The task
Baseline: a 1-layer model, no synonyms
Attention patterns
Direct logit attribution
Turning the data dial: synonyms
Turning the model dial: depth
Activation patching: the causal story
Synthesis: across both axes