Chapter 7: Alternative Paths

Central Question: What other approaches to AI existed?


While the symbolic edifice rises in the main laboratories of AI, while expert systems capture corporate imagination and knowledge engineers codify human expertise in logical rules, something else stirs at the margins. Not everyone believes that intelligence reduces to symbol manipulation. Not everyone accepts that the path to machine minds runs through formal logic and explicit representation.

This chapter explores the roads not taken, or rather, the roads less traveled. These alternative approaches share a common intuition: perhaps the symbolic researchers have the problem fundamentally backwards. Perhaps intelligence does not flow from abstract reasoning downward to the physical world, but rather emerges from the physical world upward into something we call thought. Perhaps the key is not to encode knowledge but to let systems discover it. Perhaps certainty is the wrong goal entirely, and we should embrace uncertainty as the true texture of intelligent reasoning.

These “alternative” paths, as we call them, are not mere historical curiosities. Each contains insights that will prove essential when the machine learning revolution arrives. The situated roboticists teach us that intelligence is embodied and contextual. The evolutionary computation researchers demonstrate that optimization can occur without explicit design. The probabilistic reasoners show that uncertainty is not an obstacle to overcome but a framework for coherent thought.

The mainstream did not ignore these approaches, but neither did it embrace them as central. They flourished in their own communities, developed their own conferences and journals, and waited for their moment. That moment, for some of these ideas, is still arriving.


7.1 Situated and Embodied AI

The year is 1986, and Rodney Brooks is writing a paper that will provoke outrage among the symbolic AI establishment. Brooks, an Australian computer scientist who has recently joined the MIT faculty, has spent years building robots. Not the robots of science fiction, not humanoid reasoners with encyclopedic knowledge, but small, insect-like creatures that scuttle across laboratory floors. From this humble work, he has drawn radical conclusions.

The paper, eventually published in 1991 as “Intelligence Without Representation,” contains a claim that strikes at the heart of the GOFAI research program. Brooks argues that representation, the central concept of symbolic AI, is not only unnecessary for intelligence but may actually impede it. The physical symbol system hypothesis, that intelligence requires manipulating internal symbols that represent the world, has it backwards. Intelligence, Brooks contends, emerges from direct interaction with the environment, not from reasoning about abstract models.

To understand Brooks’ argument, we must first understand what he was reacting against. The symbolic AI of the 1970s and 1980s operated according to a specific paradigm. To build an intelligent system, one first constructs a detailed internal model of the world, what researchers called a world model or knowledge base. The system reasons about this internal model, planning actions, predicting consequences, updating beliefs. Only then does it act in the physical world, and its actions are guided by conclusions derived from the internal representation.

Consider a robot tasked with navigating a room. In the classical approach, the robot would first construct a map: walls here, furniture there, goal location over there. It would then plan a path through this abstract map, reasoning about obstacle avoidance and optimal routes. Finally, it would execute the plan, translating abstract path segments into motor commands. The heavy computational lifting happens in the representation and reasoning phases; execution is almost an afterthought.

Brooks observes that this approach faces severe practical problems. Building accurate world models is extraordinarily difficult. The world is complex, dynamic, and unpredictable. Sensors are noisy; actuators are imprecise. By the time your robot has finished constructing its elaborate internal model, the world has changed. The cat has moved. Someone has closed a door. The beautiful plan, derived through careful symbolic reasoning, collides with a reality that refuses to hold still.

But Brooks’ critique goes deeper than engineering practicality. He questions whether biological intelligence works this way at all. Insects navigate complex environments with remarkable skill despite having nervous systems far too small to maintain elaborate world models. A fly avoids swatters, finds food, and navigates intricate three-dimensional spaces with a brain containing roughly 100,000 neurons. It does this in real time, with millisecond response latencies. Whatever the fly is doing, it is not constructing symbolic representations and reasoning about them.

Brooks proposes an alternative architecture he calls subsumption. The idea is to build intelligence from the bottom up, as layers of simple behaviors that interact with the world directly. Each layer is a complete behavior system in itself, capable of operating autonomously. Higher layers can subsume lower layers, modifying or suppressing their behavior, but they do not replace them.

Consider a simple mobile robot built on subsumption principles. The lowest layer implements obstacle avoidance: sensors detect nearby objects, and the robot turns away. This layer is always active, always protecting the robot from collisions. A second layer implements wandering: the robot moves in a generally forward direction, exploring its environment. This layer operates unless the obstacle avoidance layer intervenes. A third layer might implement goal-seeking: if a target is detected, move toward it. This layer can override wandering but not obstacle avoidance.

The crucial point is that no layer contains a world model. No layer reasons about maps or plans. Each layer simply couples sensors to actuators through relatively simple mechanisms. The robot’s intelligent-seeming behavior emerges from the interaction of these layers with the environment, not from explicit reasoning about an internal representation.

Brooks and his students build a succession of robots embodying these principles. Allen, Toto, Herbert, Genghis: the names become famous in robotics circles. These are not glamorous machines. They are small, often cobbled together from spare parts, and their capabilities seem modest compared to the grandiose visions of classical AI. Herbert wanders around the lab collecting empty soda cans. Genghis walks on six legs, navigating terrain.

But observe what these robots achieve with almost no classical reasoning. They operate in real time, responding to the world as it actually is rather than as a model says it should be. They are robust to sensor noise and environmental change. They are computationally cheap. And they exhibit what Brooks calls emergent behavior: complex patterns arise from the interaction of simple components, patterns that no single layer explicitly encodes.

Brooks crystallizes his philosophy in a memorable slogan: “The world is its own best model.” Why build elaborate internal representations when the world itself is right there? Rather than predicting what lies around the corner, why not just look? Rather than maintaining a mental catalog of object locations, why not simply perceive them when needed?

This idea, sometimes called situated cognition, connects to broader movements in philosophy and cognitive science. The phenomenologist Maurice Merleau-Ponty had argued decades earlier that cognition is fundamentally embodied, inseparable from our physical existence in the world. The psychologist James Gibson proposed that perception is not a matter of constructing internal representations but of directly picking up affordances, opportunities for action that the environment offers.

Brooks brings these ideas into AI with engineering concreteness. His robots are existence proofs. They demonstrate that you can build systems that act intelligently in the world without anything resembling classical symbolic reasoning. The intelligence is in the coupling between agent and environment, not in an abstract model sitting in computer memory.

The implications for AI are significant. If Brooks is right, then the entire symbolic enterprise has been attacking the wrong problem. Instead of encoding common sense in logical rules, perhaps we should be building systems that acquire common sense through interaction with the physical world. Instead of giving robots explicit world knowledge, perhaps we should give them bodies and let them learn what they need.

The reaction from the symbolic AI community ranges from dismissive to hostile. Critics point out that Brooks’ robots, for all their real-time competence, cannot reason about the future, cannot communicate in language, cannot solve the problems that classical AI addresses. Picking up soda cans is not intelligence. True enough, Brooks might reply, but proving theorems and playing chess may not be either, at least not in the biologically relevant sense.

The debate exposes a fault line in AI that persists to this day. What counts as intelligence? Is chess-playing more intelligent than insect navigation? Is language more important than embodied skill? Different answers lead to different research programs.

Brooks’ most lasting legacy may be practical rather than philosophical. The subsumption architecture and its descendants lead to a new field of behavior-based robotics. This field emphasizes real-time performance, robustness, and physical embodiment over abstract reasoning. Its intellectual descendants include the Roomba vacuum cleaner, perhaps the most commercially successful robot in history. The Roomba does not construct an elaborate map of your home. It bounces around, detects obstacles, and gradually covers the floor through simple behavioral rules. Brooks himself co-founded iRobot, the company that builds it.

The broader insight endures. Maybe intelligence is not about having the right data structures in your head. Maybe it is about being embedded in a world, acting and sensing and acting again, developing competence through continuous interaction. This insight, that intelligence may be fundamentally grounded in embodied experience, resurfaces when we consider how large language models work. These systems have no bodies, no sensors, no actuators. Yet they exhibit remarkable capabilities. Does this refute Brooks, or does it pose a puzzle that embodiment might yet help solve?


7.2 Evolutionary Computation

While Brooks is building scuttling robots at MIT, a different alternative path develops in the world of optimization. This path takes its inspiration not from insects but from Darwin, not from embodiment but from evolution.

The central figure is John Holland, a computer scientist at the University of Michigan who has been thinking about adaptive systems since the 1960s. Holland’s 1975 book, Adaptation in Natural and Artificial Systems, lays the theoretical foundation for what becomes known as genetic algorithms. The core idea is simple and powerful: evolution solved hard problems in biology; perhaps we can simulate evolution to solve hard problems in computation.

Consider the challenge evolution faces. An organism must survive in its environment long enough to reproduce. Survival requires a vast array of capabilities: finding food, avoiding predators, navigating terrain, attracting mates. These capabilities are encoded in the organism’s genome, a string of genetic information. The genome is not designed by any engineer. It emerges through a process of variation and selection: random mutations produce variety, and differential survival filters that variety, preserving what works and discarding what does not.

Over millions of years, this blind process produces astonishing solutions. The eye evolves independently dozens of times. Echolocation emerges in bats and dolphins. Birds develop flight, and so do insects, and so do bats, each through different mechanisms. Evolution is, in a sense, a universal optimizer. It finds solutions to the problem of survival, whatever that problem turns out to require.

Holland’s insight is that we can abstract and simulate this process. We do not need actual organisms. We need only the essential ingredients: a population of candidate solutions, a way to evaluate how good each solution is, and mechanisms for creating new candidates from existing ones. The solutions are not DNA sequences but strings of symbols, typically bits. The evaluation is not survival but a fitness function that scores each candidate. The variation mechanisms are crossover, combining pieces of different solutions, and mutation, making random changes.

Technical Box: Basic Genetic Algorithm

A genetic algorithm operates in generations. We begin with a population of candidate solutions, each represented as a string of genes. For concreteness, imagine we are trying to optimize a function of many binary variables; each candidate is a bit string of some fixed length.

1. Initialize: Create a random population of N candidate solutions.
2. Evaluate: Compute the fitness of each candidate using the fitness function.
3. Select: Choose candidates to reproduce. Higher-fitness candidates are more likely to be selected.
4. Crossover: Pair selected candidates and combine them to produce offspring. A common method is single-point crossover: choose a random position and take the first part from one parent and the second part from the other.
5. Mutate: With some small probability, flip random bits in the offspring.
6. Replace: The offspring become the new population.
7. Repeat: Go to step 2 until some termination criterion is met (fixed number of generations, satisfactory solution found, etc.).

Consider a concrete example. Suppose we want to find a bit string of length 20 that maximizes the number of 1s. The fitness function simply counts the ones. We initialize a population of, say, 100 random strings. Most will have roughly 10 ones each. We select parents with probability proportional to their fitness, so strings with 12 or 13 ones are more likely to reproduce than strings with 7 or 8. Parents recombine through crossover, and occasional mutations flip bits.

Over generations, the average fitness of the population rises. Strings with more ones dominate, and their good genes spread through crossover. The occasional mutation can discover improvements that no current member possesses. After some number of generations, the population converges on all-ones or very nearly so.

This toy example understates the power of the approach. Genetic algorithms have been applied to problems where traditional optimization struggles. They have designed aircraft wings, scheduled complex manufacturing operations, evolved neural network architectures, and found approximate solutions to famously difficult combinatorial problems.


What makes genetic algorithms interesting is not just that they work but how they work. There is no gradient to follow, no explicit search direction. The algorithm does not know what a good solution looks like. It knows only how to evaluate candidates and how to create new candidates from old ones. Yet it finds its way toward optima in vast search spaces.

Holland’s theoretical contribution is the schema theorem, which provides intuition for why genetic algorithms succeed. A schema is a pattern that matches multiple bit strings. For example, the schema 1**0* matches all strings that start with 1, have 0 in the fourth position, and can have anything elsewhere. The schema theorem says, roughly, that short schemata with above-average fitness will spread through the population at an exponential rate. Good building blocks, once discovered, tend to persist and recombine with other good building blocks.

The biological metaphor runs deep. Just as evolution builds complex organs from simpler adaptations, genetic algorithms build solutions by combining partial solutions. The population maintains diversity, exploration of the search space, while selection ensures that good ideas are exploited. The tension between exploration and exploitation, a theme that recurs throughout machine learning, is elegantly balanced by the genetic algorithm’s structure.

Throughout the 1980s and 1990s, evolutionary computation branches into several related fields. Genetic programming, pioneered by John Koza, evolves computer programs rather than bit strings. Instead of strings of bits, populations contain tree structures representing executable code. Crossover and mutation operate on these trees, and the fitness function evaluates the programs on test cases. Evolution literally writes code.

Evolution strategies, developed primarily in Germany, focus on optimizing real-valued parameters rather than discrete strings. Evolutionary programming emphasizes mutation over crossover. Neuroevolution applies evolutionary techniques to discover neural network architectures and weights. NEAT, a neuroevolution algorithm developed by Kenneth Stanley, will later demonstrate that evolution can discover surprisingly effective network topologies.

The limitations of evolutionary approaches become clear with experience. Genetic algorithms are slow compared to gradient-based optimization. Each fitness evaluation may be computationally expensive, and many generations may be required. The approach is often called embarrassingly parallel, you can evaluate many candidates simultaneously, but even with parallelism, convergence can take time.

More fundamentally, genetic algorithms struggle with highly structured problems. If the solution must satisfy complex constraints, random variation is unlikely to produce valid candidates. If the fitness landscape is deceptive, with local optima that do not lead toward global optima, the algorithm may get stuck. For certain problem classes, specialized algorithms vastly outperform evolutionary approaches.

Yet the power of evolution-inspired methods cannot be denied. When we do not know how to design a solution, when the problem is too complex for analytical approaches, when we can evaluate solutions even if we cannot construct them, evolution offers a path forward. It is a method of last resort and sometimes of first resort. The idea that optimization can proceed without understanding, that good solutions can emerge from blind variation and selection, remains one of the deepest insights of the alternative AI traditions.


7.3 Probabilistic Reasoning

The third alternative path addresses a limitation that has plagued symbolic AI from the beginning: the problem of uncertainty.

Classical logic, the foundation of symbolic AI, deals in certainties. A proposition is true or false. An inference is valid or invalid. This brittleness serves mathematics well but maps poorly onto the real world. Real sensors are noisy. Real knowledge is incomplete. Real decisions must be made without full information. A medical diagnosis system that can reason only about certain facts will be useless in practice, where symptoms are ambiguous, tests are imperfect, and patients’ memories are unreliable.

The symbolic AI community recognizes this problem and attempts various solutions. Fuzzy logic allows degrees of truth. Non-monotonic reasoning allows conclusions to be retracted when new information arrives. But these approaches feel like patches on a fundamentally unsuitable foundation. What AI needs is a principled framework for reasoning under uncertainty from the ground up.

That framework comes from an unexpected direction: probability theory and, specifically, the work of Judea Pearl.

Pearl, trained as an electrical engineer and working on machine perception, becomes frustrated with the ad hoc uncertainty handling in existing AI systems. He sees that probability theory, a well-developed mathematical framework with centuries of intellectual history, provides exactly what is needed. His 1988 book, Probabilistic Reasoning in Intelligent Systems, introduces Bayesian networks, a representation that combines the expressiveness of graphs with the rigor of probability.

A Bayesian network is a directed acyclic graph. Each node represents a random variable, something whose value is uncertain. Edges represent direct probabilistic dependencies. If node A has an edge to node B, then A directly influences B’s probability distribution. The absence of an edge indicates conditional independence: given the values of its parents, a node is independent of its non-descendants.

Consider a simple medical example. A patient may or may not have a disease. The disease may or may not cause a positive test result; tests are imperfect. The disease may or may not cause symptoms; symptoms also have other causes. We can represent this situation as a Bayesian network: Disease is a parent of Test and Symptom. The network encodes the prior probability of disease and the conditional probabilities of test results and symptoms given disease status.

Now suppose we observe a positive test. What is the probability of disease? This is Bayes’ theorem in action:

\[P(\text{Disease} | \text{Positive Test}) = \frac{P(\text{Positive Test} | \text{Disease}) \cdot P(\text{Disease})}{P(\text{Positive Test})}\]

We update our prior belief in disease using the likelihood of the evidence. If the test is highly accurate and the prior probability of disease is not too low, a positive result substantially increases our belief. If the test often gives false positives, the update is smaller.

Bayesian networks extend this reasoning to complex, interconnected variables. The graphical structure makes the joint probability distribution tractable. Rather than specifying probabilities for every combination of variables, an exponentially large task, we specify only the conditional probabilities of each node given its parents. The graph structure tells us how to combine these local specifications into global inferences.

Pearl develops efficient inference algorithms for Bayesian networks. For tree-structured networks, belief propagation passes messages along edges, updating probabilities as evidence arrives. For more complex structures, approximations are necessary, but the framework provides principled guidance.

The impact on AI is significant. Expert systems, previously forced into awkward workarounds for uncertainty, can now represent uncertain knowledge naturally. Medical diagnosis, troubleshooting, sensor fusion: anywhere uncertainty matters, Bayesian networks offer a principled approach.

But Pearl’s most profound contribution comes later. In the 1990s and 2000s, he develops a mathematical theory of causation. Correlation, the statistician’s stock in trade, is not causation. Ice cream sales and drowning deaths are correlated; both increase in summer. But eating ice cream does not cause drowning. How can we reason about what would happen if we intervened, if we deliberately changed something, rather than merely observed it?

Pearl introduces the do-operator to distinguish intervention from observation. The probability of drowning given that we observe ice cream sales is high. The probability of drowning given that we force people to eat ice cream, that we do eat ice cream, is unchanged. The intervention breaks the confounding influence of summer. Causal reasoning requires new machinery beyond standard probability theory.

This work will prove prescient. Machine learning systems that merely capture correlations may perform poorly when the world changes, when they are deployed in conditions different from their training data. Causal models offer hope for more robust generalization. The question of whether neural networks can learn causal structure remains open and active.

Parallel to Bayesian networks, another probabilistic framework develops: Hidden Markov Models, or HMMs. An HMM is a state machine with two layers of uncertainty. The system has internal states that evolve over time, but we cannot observe these states directly. Instead, we observe outputs that depend probabilistically on the hidden state.

Consider speech recognition. The underlying words, the hidden states, generate acoustic signals, the observations. The same word sounds different when spoken by different people, in different contexts, at different speeds. An HMM models this: the hidden state sequence is the word sequence, and the observations are acoustic features. Given a sequence of observations, we infer the most likely hidden state sequence. Given examples of words and their acoustic realizations, we estimate the model parameters.

HMMs transform speech recognition. Through the 1980s and 1990s, HMM-based systems dominate the field. They are not perfect; the models make simplifying assumptions that real speech violates. But they are vastly better than previous approaches, and they scale. The techniques developed for HMMs, algorithms like the forward-backward procedure and the Viterbi algorithm, become standard tools in sequence modeling.

The probabilistic turn has broader implications. It represents a shift from the certainty of logic to the uncertainty of statistics. This shift aligns with a growing recognition that learning from data requires statistical methods. You cannot prove theorems about noisy observations. You can estimate probabilities and update beliefs as evidence accumulates.

Pearl, reflecting on his career, emphasizes that Bayesian networks were not just a technical contribution but a philosophical one. They showed that probability and causality are not obstacles to AI but essential tools for AI. The world is uncertain, and intelligent systems must embrace that uncertainty, representing it explicitly and reasoning about it correctly.


Chapter Synthesis

We have explored three roads that diverged from the symbolic highway: embodied AI, evolutionary computation, and probabilistic reasoning. Each began as a minority view, skeptical of the dominant paradigm, and each contributed insights that would prove essential.

What unites these alternative paths? They share a common suspicion of hand-coded knowledge. Brooks’ robots do not have explicit world models programmed by engineers. Holland’s genetic algorithms do not have solution structures designed by human optimization experts. Pearl’s Bayesian networks, while they may start from expert-specified structures, emphasize learning parameters from data and, in later work, even learning structure from data.

All three paths point toward learning and adaptation rather than knowledge engineering. They suggest that intelligence is not a matter of having the right symbolic facts in a database but of developing appropriate responses through interaction, evolution, or inference.

This emphasis on learning sets the stage for Part III of our story. The machine learning revolution, which begins with the connectionist revival and continues through deep learning, will vindicate much of what the alternative approaches suspected. Intelligence, it turns out, can emerge from learning systems that discover their own representations, that adapt to data rather than being programmed with rules.

The embodied AI researchers are partially vindicated: modern robotics increasingly emphasizes learning over engineering. But their specific subsumption architecture has not become the dominant paradigm. The lesson about grounding intelligence in physical interaction, however, remains compelling and underexplored.

The evolutionary computation community thrives, particularly in optimization and in the intersection with neural networks. Neuroevolution discovers effective architectures. Evolution strategies optimize reinforcement learning policies. The core insight, that optimization can proceed without gradients, finds application wherever gradients are unavailable or unreliable.

The probabilistic reasoning revolution may be the most complete victory. Modern machine learning is fundamentally probabilistic. Bayesian methods suffuse the field. The graphical models Pearl pioneered evolve into variational autoencoders and other generative models. His work on causality inspires growing research programs on causal machine learning.

These alternative paths, then, are not dead ends but tributaries that eventually join the main river. They remind us that progress in science is rarely linear. Ideas that seem marginal in one era may become central in another. Researchers working in isolation or near-isolation may be developing frameworks that will prove essential when the conditions are right.

As we turn to Part III, we leave the era of classical AI behind. The symbolic systems, the expert systems, the knowledge engineering approach, these will continue to exist and to find applications. But the center of gravity shifts. Learning from data becomes the new paradigm. And the insights from Brooks, from Holland, from Pearl, insights about embodiment, evolution, and probability, will inform this new era in ways their originators might not have fully anticipated.

The winter that closes out Part II is ending. New growth stirs. Neural networks, once declared dead by Minsky and Papert, will return with new architectures and new algorithms. The alternative paths we have explored in this chapter will feed into this revival. The future of AI turns out to be pluralistic, drawing on multiple traditions, synthesizing what the twentieth century explored into what the twenty-first century will build.


Chapter Notes

Key Figures

  • Rodney Brooks (1954-)
  • John Holland (1929-2015)
  • Judea Pearl (1936-)
  • Kenneth Stanley (1976-)
  • John Koza (1943-)

Primary Sources to Reference

  • Brooks, R.A. (1991). “Intelligence Without Representation.” Artificial Intelligence, 47(1-3), 139-159.
  • Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press.
  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
  • Baum, L.E. et al. (1970). “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” The Annals of Mathematical Statistics.

Figures Needed