5 · Turning the data dial: synonyms

Now we raise synonym richness and retrain. The question that drives the whole project: do synonymous phrases (go north / move north / head upward) converge to the same representation inside the model?

Below is a 1-layer model trained on four phrasings per action. The view reads each phrase's residual stream at the final (prediction) position and shows it two ways: a pairwise cosine-similarity heatmap and a 2D PCA projection, both colored by action. Use the Depth selector to step from the raw embedding through the layer.

Loading clustering…

At the embed checkpoint the final position is identical for every phrase — it hasn't yet seen the command — so everything is maximally similar and piled at one point. After the layer's attention has pulled in the discriminative word, the phrases separate by action, and synonyms of the same action sit together. That clustering — not accuracy — is the evidence that the model has learned a shared action abstraction. In the next lesson we ask whether more depth sharpens it.