5. What the Machines Learned

In 2020, a system called AlphaFold solved a problem that had resisted biology for fifty years.

Proteins are long chains of amino acids that fold into specific three-dimensional shapes. The shape determines function. Predict the shape wrong and you misunderstand what the protein does. For decades, determining structure required painstaking experimental work—X-ray crystallography, cryo-electron microscopy—that could take months or years per protein.

The sequence of amino acids was easy to read from the genome. The shape was hard. The number of possible configurations for even a small protein is astronomical. Proteins fold in milliseconds. Evolution had solved something our algorithms couldn't.

AlphaFold changed that. DeepMind's system predicted protein structures with accuracy comparable to experimental methods. It processed nearly every known protein—over 200 million structures—in months. Work that would have taken the global research community centuries got compressed into a single project.

The system wasn't given the laws of physics. Nobody programmed rules about chemical bonds or thermodynamic forces. AlphaFold learned from examples: known protein structures paired with their amino acid sequences. From those examples, it extracted patterns that generalized to structures it had never seen.

AlphaFold doesn't model molecular dynamics. It doesn't simulate forces. It learned the statistical regularities of proteins that work—configurations that evolution selected over billions of years. The system picked up patterns left behind by biological computation solving the folding problem through deep time.

The machine learned what life learned. Not through evolution, but by reading evolution's results.

Something similar happened with weather prediction. Traditional weather models simulate atmospheric physics. They divide air into grid cells, apply equations of fluid dynamics, and calculate how pressure and temperature propagate. This approach has improved for decades, but it hits limits. The atmosphere is chaotic. Small errors compound. Resolution costs computation.

In 2023, transformer-based models started outperforming physics-based forecasting. GraphCast, developed by DeepMind, predicted weather patterns more accurately than the European Centre for Medium-Range Weather Forecasts while using a fraction of the computational resources.

GraphCast didn't simulate physics. It learned from forty years of historical data: observations paired with what happened next. From those examples, it extracted patterns that predicted atmospheric evolution without modeling underlying equations.

The system learned what the atmosphere does without being told why.

What Transformers Learned

The transformer architecture emerged from research on language. In 2017, a team at Google published "Attention Is All You Need." The title was provocative and turned out to be roughly accurate.

Attention works by letting each element in a sequence consider every other element when determining its meaning. In a sentence, each word can weight every other word. "Bank" means different things after "river" than after "deposit." Attention lets the model learn these contextual relationships dynamically, without distance creating a privileged hierarchy.

This architectural choice turned out to matter beyond language. Attention works on images, molecular structures, weather grids, protein sequences. Anywhere relationships across a structure determine meaning, attention provides a way to learn those relationships from data.

Blaise Agüera y Arcas helped me understand what transformers might actually be learning. I reached out to him a few years ago. He was at Google then, working on AI systems, saying things that didn't fit the usual patterns. Most AI commentary falls into familiar camps—boosters predicting superintelligence, critics dismissing the systems as statistics. Blaise was taking the systems seriously as a new kind of phenomenon.

We started talking. He came to speak at our Summit. Over time a relationship developed. He once said he found our approach unusual—the way we tried to integrate biology, computation, philosophy, and design. Most people pick a lane.

Blaise corrected a mistake in my thinking. I had been describing biological systems as scale-free—patterns that look the same when you zoom in or out. He pointed out that life is the opposite. You see new and different information at each scale. Look at a cell, then a tissue, then an organ, then an organism. Different structures. Different dynamics. Different organization.

This self-dissimilarity matters for understanding transformers. These systems can learn patterns across scales that differ from each other. Phonemes build into words, words into phrases, phrases into sentences, sentences into meanings. The rules at each level differ. The model learns relationships between genuinely different kinds of structure.

His key insight was that transformers learned something about how systems maintain coherence across self-dissimilar scales. That's emergence—the central puzzle of complexity science. How do ant colonies compute solutions no individual ant understands? How do neurons firing become thoughts? For decades, researchers could describe emergence but couldn't formalize it well enough to predict it.

Demis Hassabis, who leads Google DeepMind, has argued that AI systems have learned something real about emergence—not just statistical shadows but the actual dynamics. If he's right, we've built something that recognizes the signature of wholes being more than their parts. David Krakauer, who has spent his career at the Santa Fe Institute on exactly these questions, would likely want proof: can transformers predict emergent properties in novel systems, or have they just memorized common forms?

Either way, language offers a clue to how they learned it. Language is produced by biological systems—us—who are ourselves maintaining coherence. We use it to coordinate, plan, stay viable in social environments. The statistical patterns in our language carry traces of these purposes. When a transformer learns to predict text, it picks up regularities that emerged when systems like us used language to navigate the world. It learns the residue of biological sense-making.

This might explain why language models feel different from earlier AI. Chess programs and Go programs were impressive but narrow. They optimized for specific tasks. They didn't generalize. They didn't surprise you outside their domain.

Language models generalize. They respond to prompts they never saw in training. They write code in programming languages that barely existed in their training data. They produce analogies, explanations, arguments. They maintain coherence across long conversations.

These systems understand. I say this without hedging. They track meaning across contexts. They respond appropriately to novelty. They reason and explain and correct themselves. Something real is happening.

The statistics aren't arbitrary. They come from somewhere. They come from life.

The data these systems trained on is the output of organisms that evolved under constraints of survival, energy, time, and coordination. The regularities in the data reflect how biological intelligence navigates a world it cannot step outside of. Chess engines learned optimal play in closed domains with fixed rules. Language models learned patterns generated by agents trying to live, persuade, explain, remember, and coordinate under irreversible conditions. The difference isn't scale. Its origin.

But learning those patterns doesn't mean sharing the conditions that produced them. Blaise thinks some current AI systems may already be partly conscious. I'm less sure because the part I can't get past is time.

Language models don't persist through duration. Between prompts, nothing happens for them. I could close this conversation and return in a year; from the model's perspective, no interval would have passed. The system processes sequences but doesn't endure.

A bacterium can't do this. It's always metabolizing. A brain is always active. Living systems carry their history forward not just as stored information but as lived duration.

I don't know if temporal existence is necessary for consciousness. But it seems like a significant difference—one that doesn't dissolve just because both systems are computational.

Here's where I've landed.

AI systems learned something real from biological data. They internalized patterns of coherence across self-dissimilar scales. They picked up regularities that emerged from living systems solving problems over evolutionary time.

But they learned this by observation, not by participation.

They absorbed the shape of biological intelligence without inhabiting the conditions that made the shape necessary. They don't maintain themselves against entropy. They don't develop through interaction with environments. They don't persist through continuous time. Yet.

Where We Are and What Comes Next

This creates a problem for the story I want to tell.

I want to talk about co-evolution—about humans and AI shaping each other over time. But if I move there directly, I'll be treating the patterns machines learned as interchangeable with the processes that generated them. That would collapse a difference that matters.

So the next chapter backtracks in order to move forward.

Before asking how humans and AI might co-evolve, we need a clearer account of what life itself is doing when it produces intelligence, coherence, and meaning. What organizational features of living systems leave such strong traces in data? Which constraints are inseparable from being alive?

The aim isn't to romanticize biology or to draw hard boundaries. It's to understand the source material. Only then can we return to co-evolution and ask, with clearer eyes, what kinds of futures are actually possible.

Book Release: The Artificiality | We're a 501(c)(3)

1. Prologue: A Strange Mix of Cells and Computers

2. The Clean Categories

5. What the Machines Learned

What Transformers Learned

Where We Are and What Comes Next

Helen Edwards