Chapter 2: The Brain

Chapter 1 introduced the neuron as a cell specialized for fast, long-distance communication. Now it's worth understanding what's actually happening inside one, because the design of artificial neurons is a deliberate simplification of this biology — and knowing what was kept, what was dropped, and what was lost in translation matters for understanding both the power and the limits of neural networks.

The electrical signal that travels along the axon is called an action potential. It works on an all-or-nothing principle: either the neuron fires at full strength, or it doesn't fire at all. There's no "half-fire." The neuron's resting voltage sits at about −70 millivolts. When incoming signals from dendrites push that voltage above a threshold (roughly −55 mV), voltage-gated ion channels open in sequence along the axon — sodium ions rush in, spiking the voltage to about +40 mV, then potassium ions rush out, resetting it. This cascade propagates down the axon like a wave.

Many axons are wrapped in a myelin sheath — an insulating layer of fatty cells with gaps between segments called nodes of Ranvier. The action potential jumps from gap to gap rather than propagating continuously, which dramatically increases speed. This is why myelinated neurons conduct signals at up to 100 meters per second, while unmyelinated neurons are much slower. Diseases that damage myelin (like multiple sclerosis) impair this conduction, which is why symptoms include loss of motor control and sensation.

The Synapse

The synapse is where neurons communicate with each other, and it's where most of the interesting computational properties of the brain emerge. When an action potential reaches the axon terminal, it doesn't jump directly to the next neuron. Instead, there's a physical gap — the synaptic cleft, about 20–40 nanometers wide.

This is a critical distinction: synapses can be excitatory or inhibitory. A neuron doesn't just pass signals forward — it integrates thousands of incoming signals, some pushing it to fire, others pushing it not to. The neuron fires only if the sum of excitatory inputs exceeds the sum of inhibitory inputs by enough to cross the threshold. This integration is essentially a weighted sum followed by a threshold function — which is exactly what an artificial neuron computes.

Key idea: The biological neuron's core operation — sum weighted inputs, fire if above threshold — is the direct inspiration for the artificial neuron. The weights are synaptic strengths. The threshold is the activation function. The main simplifications in the artificial version: timing is removed (biological neurons care about when signals arrive, not just whether they arrive), and the all-or-nothing spike is replaced with a continuous output value.

Neurotransmitter	Role	AI relevance
Glutamate	Primary excitatory signal. Drives most of the brain's fast communication.	The "default" forward signal. Analogous to positive weights.
GABA	Primary inhibitory signal. Prevents runaway excitation.	Analogous to negative weights. Without inhibition, networks become unstable — true in biology and in AI.
Dopamine	Reward and motivation. Signals when outcomes are better (or worse) than expected.	Direct inspiration for reward signals in reinforcement learning. The "prediction error" concept maps closely to TD learning.
Serotonin	Mood regulation, impulse control, sleep.	Less directly mapped to AI, but relevant to regulation and homeostasis.
Acetylcholine	Muscle activation, attention, memory formation.	Attention mechanisms in transformers are a loose (very loose) analogy to cholinergic modulation of what gets processed.

Dopamine deserves special attention. Wolfram Schultz's experiments in the 1990s showed that dopamine neurons don't simply fire when a reward is received — they fire when a reward is unexpected. If a monkey learns that a light predicts juice, dopamine fires at the light (the predictor), not the juice (the reward). If the juice is omitted after the light, dopamine activity drops below baseline — a negative prediction error.¹ This is strikingly similar to the temporal difference (TD) learning algorithm used in reinforcement learning, which updates value estimates based on the difference between expected and received reward. The resemblance is not a coincidence — TD learning was developed in parallel with this neuroscience, and the fields have informed each other.

How the Brain Learns

Chapter 1 introduced Hebbian learning — "neurons that fire together, wire together." This section covers the actual mechanisms that implement it, because these mechanisms map directly to how artificial neural networks adjust their weights.

Long-Term Potentiation (LTP)

Long-term potentiation is the strengthening of a synaptic connection after repeated stimulation. Terje Lømo and Tim Bliss discovered it in 1973 in the rabbit hippocampus.² When a synapse is stimulated repeatedly at high frequency, the postsynaptic response to future signals at that synapse becomes stronger — and this change persists for hours, days, or longer.

The molecular mechanism centers on a receptor called the NMDA receptor, which acts as a coincidence detector. It only opens when two conditions are met simultaneously: the presynaptic neuron must release glutamate (meaning it fired), AND the postsynaptic neuron must already be depolarized (meaning it's receiving strong input from other sources too). When both conditions are met, calcium flows through the NMDA receptor and triggers a cascade that inserts more receptors into the postsynaptic membrane — making the synapse more sensitive to future signals.

This is Hebb's rule implemented in biology: the synapse strengthens only when both the sending and receiving neurons are active at the same time. The NMDA receptor is the mechanism that detects this coincidence.

Long-Term Depression (LTD)

The opposite of LTP is long-term depression — the weakening of a synaptic connection. LTD occurs when stimulation is weak or poorly timed. Low-frequency stimulation at a synapse causes a smaller calcium influx through NMDA receptors, which triggers a different cascade that removes receptors from the postsynaptic membrane.

LTD matters because learning isn't just about strengthening the right connections — it's about weakening the wrong ones. A network that can only strengthen connections would eventually saturate, with every synapse at maximum strength and no ability to discriminate. LTD provides the complementary mechanism: connections that aren't useful get pruned.

Key idea: The brain learns by adjusting synaptic strengths in both directions — strengthening connections that are useful (LTP) and weakening connections that aren't (LTD). Artificial neural networks do exactly the same thing: gradient descent adjusts weights up or down based on how much each connection contributed to the correct (or incorrect) output. The biological process is driven by calcium dynamics and receptor trafficking. The artificial process is driven by calculus. The principle is the same.

Spike-Timing-Dependent Plasticity (STDP)

Hebb's original rule — "neurons that fire together" — is actually a simplification. The brain cares about timing. Spike-timing-dependent plasticity refines Hebbian learning with a critical detail: if the presynaptic neuron fires just before the postsynaptic neuron (within about 20 milliseconds), the synapse is strengthened. If it fires just after, the synapse is weakened.

This makes biological sense: if neuron A consistently fires before neuron B, then A might be causing B to fire — that's a useful connection to strengthen. If A fires after B, then A isn't contributing to B's firing — that connection should weaken. The brain is learning causal relationships, not just correlations.

Most artificial neural networks don't implement timing-dependent plasticity — they operate on static input-output pairs with no temporal dynamics. This is one of the significant simplifications made when translating biology into math, and it's an active area of research in neuromorphic computing (hardware designed to mimic biological neural dynamics more faithfully).

The Architecture of the Brain

The mechanisms above — action potentials, synaptic transmission, LTP/LTD — are the building blocks. The brain's power comes from how these building blocks are organized into structures. The key insight is that the brain is layered, both physically and evolutionarily, with older structures handling more basic functions and newer structures adding higher-order capabilities on top.

The Brainstem

The brainstem is the oldest part of the brain, shared in its basic form with reptiles, fish, and amphibians. It handles functions essential for survival that can't wait for conscious processing:

The brainstem doesn't "think." It keeps you alive. But its role in the overall architecture is instructive: it handles the fast, non-negotiable responses that higher layers can modulate but can't override in time-critical situations. Your cortex can decide to hold your breath, but eventually the brainstem takes over and forces you to breathe. This is layered processing in its purest form.

The Cerebellum

The cerebellum sits behind the brainstem and despite being about 10% of the brain's volume, it contains roughly 80% of the brain's neurons — around 69 billion of the brain's 86 billion total.³ It's primarily responsible for motor coordination, timing, and procedural learning (the kind of learning that lets you ride a bike without thinking about it).

The cerebellum is notable for its extremely uniform, repeating circuit architecture — a regular grid of the same basic circuit repeated across its entire surface. This regularity makes it one of the best-understood brain structures. Its computational role is essentially a real-time error correction system: it compares intended movement with actual movement and adjusts the motor signal accordingly. This is, in principle, a biological feedback controller — and it operates largely outside conscious awareness.

The Limbic System

The limbic system sits above the brainstem and handles emotion, memory formation, motivation, and reward processing. It's evolutionarily newer than the brainstem but older than the neocortex — present in all mammals in recognizable form.

Hippocampus — critical for forming new memories and for spatial navigation. Damage to the hippocampus (as in the famous case of patient H.M.) results in an inability to form new long-term memories while leaving existing memories and skills intact. The hippocampus appears to function as a temporary buffer, consolidating new experiences into long-term storage in the cortex during sleep. John O'Keefe discovered place cells in the hippocampus — neurons that fire only when an animal is in a specific location — and May-Britt and Edvard Moser discovered grid cells in the adjacent entorhinal cortex — neurons that fire in a regular hexagonal grid pattern as an animal moves through space. Both discoveries earned Nobel Prizes in 2014.⁴

Amygdala — processes emotional significance, especially fear and threat detection. The amygdala can trigger a fear response before the cortex has finished processing what the stimulus is — you jump at a snake-shaped stick before your visual cortex confirms it's actually a stick. This is another example of the fast-path / slow-path layering: the amygdala trades accuracy for speed, the cortex refines the assessment afterward.

Hypothalamus — maintains homeostasis: body temperature, hunger, thirst, circadian rhythm, hormone regulation. It's the interface between the nervous system and the endocrine (hormonal) system, linking neural computation to the body's chemical regulatory systems.

Key idea: The limbic system is where motivation lives in the brain. The brainstem keeps you alive. The cortex reasons and plans. But it's the limbic system — dopamine reward circuits, emotional valence from the amygdala, memory consolidation in the hippocampus — that decides what matters. This is directly relevant to the question of what current AI lacks: LLMs have sophisticated cortex-like processing but nothing analogous to a limbic system. There's no internal signal that says "this matters more than that" unless it's been encoded in the training data or imposed externally.

The Neocortex

The neocortex is the outermost layer of the brain, the most recent evolutionary addition, and the structure most associated with what we think of as "higher" cognition: language, abstract reasoning, planning, conscious perception. In humans, it accounts for about 76% of the brain's volume.

Region	Primary function
Prefrontal cortex	Executive function — planning, decision-making, working memory, impulse control. The "CEO" of the brain.
Motor cortex	Voluntary movement — organized as a map of the body (homunculus).
Somatosensory cortex	Touch, pressure, temperature — also organized as a body map.
Visual cortex (occipital lobe)	Visual processing — hierarchical, from edges and orientations to objects and scenes.
Auditory cortex (temporal lobe)	Sound processing — organized by frequency (tonotopic map).
Broca's area	Speech production.
Wernicke's area	Language comprehension.

Despite this regional specialization, the neocortex has a remarkably uniform internal structure across all regions. Vernon Mountcastle proposed in 1957 that the entire neocortex is built from repeating units called cortical columns — vertical groups of about 100 neurons spanning all six layers of the cortex.⁵ The same basic circuit appears in the visual cortex, the motor cortex, the language areas — everywhere. What differs between regions isn't the architecture of the columns, but what they're connected to.

This uniformity has a profound implication: the neocortex may be running a single, general-purpose learning algorithm applied to different types of input. Visual cortex processes vision not because its columns are designed for vision, but because they're connected to the eyes. Experiments have shown that if visual input is surgically rerouted to auditory cortex in young animals, the auditory cortex develops visual processing capabilities.⁶

This observation — that one algorithm, applied uniformly, can learn to process radically different types of input depending on what data it receives — is one of the most important insights for AI. It suggests that intelligence might not require specialized architectures for each task, but rather a powerful enough general learning algorithm applied to sufficient data. This idea directly influenced the development of deep learning, where the same basic architecture (layers of artificial neurons with learned weights) is applied to vision, language, audio, and more.

Hierarchical Processing

One of the most well-studied properties of the neocortex is its hierarchical processing — particularly in the visual system. David Hubel and Torsten Wiesel's Nobel Prize-winning work in the 1960s showed that visual cortex neurons are organized in layers of increasing abstraction:⁷

Each layer takes the output of the layer below and combines it into more abstract representations. Edges become shapes. Shapes become objects. Objects become scenes. The raw sensory input is progressively transformed into something meaningful.

This hierarchical feature extraction is exactly what convolutional neural networks (CNNs) do — and it's not a coincidence. Kunihiko Fukushima's Neocognitron (1980) and later Yann LeCun's LeNet (1989) were explicitly modeled on Hubel and Wiesel's findings. The layers of a CNN mirror the layers of visual cortex: early layers detect edges, middle layers detect patterns, deep layers detect objects. This will be covered in detail in Chapter 8.

The Brain as a System

Individual structures are important, but the brain's power comes from how they work together. No structure operates in isolation — they form circuits and feedback loops that produce behavior more sophisticated than any single structure could.

Fear response: You see something that looks like a snake. The visual signal reaches the amygdala via a fast, crude pathway (the "low road") — within about 12 milliseconds. The amygdala triggers a fear response: heart rate spikes, muscles tense, attention narrows. Simultaneously, the signal travels the slower, higher-resolution pathway through the visual cortex (the "high road"). The cortex processes the image more carefully and determines it's a stick. The prefrontal cortex then sends inhibitory signals to the amygdala, dampening the fear response. Total time: a few hundred milliseconds. You flinched, then relaxed.

Memory formation: You have a meaningful conversation. The hippocampus encodes the event — not just the words but the emotional context (tagged by the amygdala), the spatial setting (tagged by hippocampal place cells), and the sensory details. Over subsequent sleep cycles, the hippocampus "replays" the event and gradually transfers the memory to distributed storage across the neocortex. The hippocampus is the indexer, the cortex is the long-term store.

Motivated behavior: You're hungry. The hypothalamus detects low blood sugar and signals hunger. Dopamine circuits in the limbic system activate reward-seeking behavior. The prefrontal cortex weighs options (cook vs. order food vs. eat what's available) and selects a plan. The motor cortex executes it. At each step, feedback loops adjust the plan — if you open the fridge and see something appealing, the reward signal updates the plan in real time.

The common pattern across all three: fast, old structures react first; slow, new structures refine afterward; feedback loops between layers adjust the response continuously. This isn't just how the brain works — it's an architectural principle that shows up in engineered systems too. Caching strategies, interrupt handlers, control loops — the principle of tiered processing with feedback is universal.

What This Means for AI

The brain provides several architectural lessons that shaped the development of artificial intelligence:

Chapter 3 will trace how researchers took these biological insights and translated them — selectively, approximately, and sometimes incorrectly — into mathematical models that became the foundation of modern AI.

¹ Schultz, Dayan, and Montague (1997), "A Neural Substrate of Prediction and Reward." Science 275:1593-1599. This paper formally linked dopamine signaling to temporal difference learning, becoming one of the most cited papers in computational neuroscience.

² Bliss and Lømo (1973), "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." Journal of Physiology 232:331-356.

³ Azevedo et al. (2009), "Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain." Journal of Comparative Neurology 513:532-541. This paper established the ~86 billion neuron count and the cerebellum's disproportionate share.

⁴ O'Keefe discovered place cells in 1971. The Mosers discovered grid cells in 2005. All three shared the Nobel Prize in Physiology or Medicine in 2014 for "discoveries of cells that constitute a positioning system in the brain."

⁵ Mountcastle (1957), "Modality and topographic properties of single neurons of cat's somatic sensory cortex." Journal of Neurophysiology 20:408-434. The cortical column hypothesis remains influential though debated in its specifics.

⁶ Sur, Garraghty, and Roe (1988), "Experimentally induced visual projections into auditory thalamus and cortex." Science 242:1437-1441. Demonstrated cross-modal plasticity — auditory cortex can process visual input when rewired.

⁷ Hubel and Wiesel received the Nobel Prize in Physiology or Medicine in 1981 for their discoveries concerning information processing in the visual system.

The Brain

The Neuron