Preface

This book began with a question I could not answer.

A student in my class asked, simply: “Why does the transformer architecture work?” I gave the standard technical answer—self-attention, positional encodings, layer normalization—and it was correct. But it was not satisfying, not to the student and not to me. The question was not really about the mathematics. It was about the reasoning: why this architecture, why now, and why did it take so long to arrive at something that, in retrospect, seems almost inevitable?

To answer that question honestly, I realized, you have to go back. Not just to the recurrent networks that preceded transformers, or the convolutional networks that preceded those, but much further—to the moment when human beings first imagined that thought itself might be mechanized. You have to understand why Babbage’s gears mattered, why Turing’s abstract machine changed everything, why Shannon’s bits were revolutionary. You have to trace the rivalries and alliances, the dead ends and rediscoveries, the winters and the summers. Only then does the transformer make sense—not as a sudden breakthrough, but as the latest answer to questions that have been asked for nearly two centuries.

That is what this book attempts to do. It tells the story of artificial intelligence from its earliest mechanical precursors to the large language models that have reshaped our world, and it tells this story with both narrative depth and technical rigor. We follow the ideas chronologically, because the order in which they emerged matters. The problems that Minsky identified in 1969 shaped the solutions that Hinton developed in 1986. The limitations that plagued recurrent networks in the 1990s motivated the attention mechanisms of the 2010s. History is not decoration; it is explanation.

We write for readers who want to understand, not just to use. Graduate students and advanced undergraduates in computer science, data science, and related fields will find here the conceptual foundations that textbooks often skip. Researchers looking to deepen their historical understanding will find primary sources and annotated references. And technically curious readers from other fields—physics, philosophy, neuroscience, the humanities—will find a narrative that assumes mathematical maturity but introduces every concept as it arises.

The story has no ending. The field of AI is moving faster today than at any point in its history, and no book can capture where it will be even a year from now. What we can offer is something more durable: the intellectual context to understand whatever comes next.

We hope this book serves you well on that journey.