Architecture Atlas - Is attention still all you need?

The Drift

A slow migration
away from attention.

Share of notable models by architecture family, by year. Estimated from the models tracked below - directional, not a census.

Pure transformer

Hybrid (attention + SSM / linear)

Non-transformer / alternative

Read carefully: pure transformers still dominate the frontier. The story isn't replacement - it's the emergence of hybrids at the edges. Bars reflect tracked notable models, not all models in existence.

How This Is Built

Rules we hold
ourselves to.

What's tracked: notable large language models and the architectures competing to be their backbone - sorted transformer, hybrid, or alternative. This is not a catalog of all neural networks: vision-only families like CNNs, GANs and image-diffusion sit outside the "is attention all you need for LLMs?" question. The one deliberate exception is JEPA - not a language model at all, but included as the contrarian bet on where frontier architecture goes next.

Confirmed means the architecture is stated in an official technical report or paper. Inferred means strong community consensus but not officially disclosed. Unknown means genuinely undisclosed - we don't guess.

Closed models stay honest. GPT, Gemini, Claude and Grok internals aren't fully public, so they're marked inferred rather than asserted as fact - with no fabricated source attached.

Every entry is verified against primary sources before publishing. No source, no publish. Where a lab calls a model "hybrid" to mean dual inference modes (Claude, Command A) rather than a hybrid architecture, it stays classified by its actual mechanism.

Open weights are verified at the source. Open-weight rows carry a Live badge - their architecture is read straight from each model's official config.json on Hugging Face during a periodic pull, then published with the site. Gated and closed models stay human-tracked against their official reports.

Is attention
still all you need?

A slow migration
away from attention.

What each model
is really built on.

The bets that could
move the bars.

Rules we hold
ourselves to.

A slow migrationaway from attention.

What each modelis really built on.

The bets that couldmove the bars.

Rules we holdourselves to.

A slow migration
away from attention.

What each model
is really built on.

The bets that could
move the bars.

Rules we hold
ourselves to.