The previous three chapters explored alternative learning paradigms: reinforcement learning, continual learning, and intrinsic motivation. All of them operate within the classical computational framework — silicon chips pushing bits through logic gates. This chapter asks whether a fundamentally different computational substrate could change the game.
The short answer is: not yet, and probably not soon. But the longer answer involves understanding what quantum computing actually does, what makes it different, and why the intersection with AI is simultaneously exciting and overhyped. Since Alfonso has a background in quantum mechanics, this chapter will move through the physics quickly and focus on the computational implications.
Three quantum-mechanical phenomena are relevant to quantum computing:
Superposition. A classical bit is either 0 or 1. A quantum bit (qubit) can exist in a superposition of both states simultaneously, described as α|0⟩ + β|1⟩, where α and β are complex amplitudes satisfying |α|² + |β|² = 1. When measured, the qubit collapses to |0⟩ with probability |α|² or |1⟩ with probability |β|². Before measurement, the qubit is genuinely in both states — not "we don't know which," but "it's both."
Entanglement. Two or more qubits can be correlated in ways that have no classical analog. If two qubits are entangled, measuring one instantly determines the state of the other, regardless of distance. Entanglement allows quantum circuits to create and manipulate correlations between qubits that classical computers cannot efficiently represent. An n-qubit system requires 2n complex amplitudes to fully describe its state — this exponential state space is the source of quantum computing's potential power.
Measurement / collapse. Accessing quantum information destroys the superposition. You can't "read out" a qubit's amplitudes; you can only get a single classical bit (0 or 1) from each measurement. The art of quantum algorithm design is arranging the amplitudes so that, when you measure, the answer you want has high probability. This is fundamentally different from classical computing, where you can inspect intermediate results freely.
Just as classical computing builds computation from logic gates (AND, OR, NOT), quantum computing builds computation from quantum gates that operate on qubits. The key difference: quantum gates are unitary operations — they're reversible transformations on the qubit state that preserve the total probability.
Common quantum gates:
| Gate | What it does | Classical analog |
|---|---|---|
| Hadamard (H) | Puts a qubit into equal superposition: |0⟩ → (|0⟩+|1⟩)/√2 | No direct analog — creates superposition |
| Pauli-X | Flips |0⟩ to |1⟩ and vice versa | NOT gate |
| CNOT | Flips target qubit if control qubit is |1⟩ | Controlled NOT — creates entanglement |
| Phase gates | Rotate the phase of a qubit's amplitude | No classical analog — phase is quantum-specific |
A quantum circuit is a sequence of gates applied to a register of qubits. The circuit starts with qubits in a known state (typically all |0⟩), applies a sequence of gates that create superpositions, entangle qubits, and manipulate amplitudes, then measures some or all qubits to extract a classical result. The challenge is designing gate sequences where quantum interference — constructive for correct answers, destructive for incorrect ones — amplifies the probability of the desired outcome.
Quantum computers aren't faster at everything. They provide provable speedups for specific classes of problems, and the nature of those speedups matters for understanding the relevance to AI.
Peter Shor's 1994 algorithm can factor large integers in polynomial time, compared to the best known classical algorithms, which are sub-exponential.1 This is the most famous quantum algorithm because it breaks RSA encryption. But factoring has essentially no relevance to AI. It's mentioned here only because it's the clearest example of quantum advantage — a problem where quantum computers are provably, dramatically faster than classical ones.
Lov Grover's 1996 algorithm searches an unstructured database of N items in O(√N) time, compared to O(N) classically.2 This is a quadratic speedup — real, but not exponential. For a database of a million items, Grover's searches in ~1000 steps instead of ~1,000,000. Useful, but not the kind of exponential gap that would transform a problem from impossible to tractable.
Quantum annealing (used by D-Wave's quantum processors) attempts to find the global minimum of a cost function by exploiting quantum tunneling to escape local minima. In principle, this could accelerate combinatorial optimization problems — the kind with many variables and constraints, like scheduling, logistics, or protein folding. In practice, definitive quantum advantage over the best classical optimization algorithms has not been demonstrated on real-world problems at commercially relevant scales.
Sampling from complex probability distributions is another area where quantum devices may have an edge. Google's 2019 "quantum supremacy" experiment showed that their 53-qubit Sycamore processor could sample from a specific probability distribution in 200 seconds, a task they estimated would take a classical supercomputer ~10,000 years.3 This was a genuine milestone, but the sampled distribution was specifically designed to be hard for classical computers — it had no practical application. IBM subsequently disputed the classical estimate, suggesting it could be done in days with clever techniques. The point stands that quantum computers can do something faster, but the "something" was purpose-built to showcase quantum advantage.
The intersection of quantum computing and machine learning has generated enormous interest and a substantial body of theoretical work. The main approaches:
Also called parameterized quantum circuits or quantum neural networks, these are quantum circuits with tunable gate parameters that are optimized using classical gradient-based methods. The circuit acts as a model: input data is encoded into qubit states, the circuit processes it, and measurements produce output. The classical optimizer adjusts the gate parameters to minimize a loss function, just like training a classical neural network.
This is a hybrid quantum-classical approach — the quantum circuit does the forward pass, and the classical computer handles the optimization. It's designed for near-term "noisy intermediate-scale quantum" (NISQ) devices that don't have enough qubits or low enough error rates for full fault-tolerant algorithms.
Kernel methods (support vector machines and related algorithms) compute similarity between data points using a kernel function. The idea behind quantum kernels is that mapping data into a quantum state space creates a feature space that's exponentially larger than any classical feature space of the same input dimension. If the "right" features for a classification task happen to live in this quantum feature space, a quantum kernel could be more powerful than any classical one.
The "if" in that sentence is doing a lot of work. There's no general argument for why real-world data would be better classified in quantum feature spaces, and empirical demonstrations have been limited to small-scale problems.
QAOA, proposed by Farhi, Goldstone, and Gutmann in 2014, uses a variational quantum circuit specifically designed for combinatorial optimization problems.4 It alternates between two types of quantum gates — one encoding the problem constraints and one encouraging exploration — with the number of alternation steps as a tunable depth parameter. As the depth increases, QAOA theoretically approaches the optimal solution. In practice, the achievable depth on current hardware is very limited.
Here is the uncomfortable truth for anyone hoping quantum computing will accelerate AI in the near term:
Training a transformer involves three computationally expensive operations:
The mismatch is fundamental. Quantum computing excels at problems with specific mathematical structure (periodicity for Shor's, unstructured search for Grover's, specific optimization landscapes for annealing). Deep learning's computational bottleneck is raw linear algebra throughput on known, structured operations — the exact regime where GPUs excel and quantum computers offer no clear advantage.
Understanding the gap between quantum computing's theoretical potential and its practical state requires looking at the hardware:
| Factor | Current state (2025–2026) | What's needed for AI relevance |
|---|---|---|
| Qubit count | ~1,000–1,200 physical qubits (IBM Condor, Google) | Millions of physical qubits for fault-tolerant computation |
| Error rates | ~0.1–1% per gate operation | <0.01% for practical error correction (or ~1,000 physical qubits per logical qubit) |
| Coherence time | Microseconds to milliseconds | Long enough to run deep circuits — currently a major constraint |
| Connectivity | Limited — qubits can only interact with neighbors | All-to-all connectivity (or efficient routing) for complex algorithms |
| Temperature | ~15 millikelvin (most approaches require dilution refrigerators) | Room temperature would unlock practical deployment |
The error correction overhead is the critical bottleneck. Current quantum error correction codes require roughly 1,000 physical qubits to create one reliable logical qubit. This means that a quantum computer with 1,000 physical qubits has, in error-corrected terms, approximately 1 logical qubit — not enough to do anything useful. Google's 2024 Willow chip demonstrated that their error correction was improving with scale (a crucial threshold), but the path from there to millions of reliable logical qubits is long.5
Setting aside the hype, there are areas where quantum computing could genuinely intersect with AI, though most of these are indirect rather than "making transformers faster":
Experts in the field generally estimate that practical quantum advantage for real-world AI tasks is at least a decade away, and possibly much more. The path requires:
Each of these is a major engineering and scientific challenge. The history of quantum computing has been one of persistent optimism about timelines that consistently prove too aggressive. This is not to say the technology is unimportant — it's potentially transformative — but the timeline for transformation is measured in decades, not years.
Quantum computing asks whether a different computational substrate could change what's computable in practice. The next chapter asks a complementary question: instead of changing the computer, what if we change the interface — connecting artificial computing directly to the biological neural substrate? Neural interfaces blur the boundary between artificial and biological intelligence in a different way entirely.
Next: Chapter 25 — Neural Interfaces. Neuralink, brain-computer interfaces, stimulating biological neurons, and the convergence of biological and artificial intelligence.1 Shor, P.W. (1994). "Algorithms for quantum computation: discrete logarithms and factoring." Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pp. 124–134.
2 Grover, L.K. (1996). "A fast quantum mechanical algorithm for database search." Proceedings of the 28th Annual ACM Symposium on Theory of Computing, pp. 212–219.
3 Arute, F. et al. (2019). "Quantum supremacy using a programmable superconducting processor." Nature 574:505–510.
4 Farhi, E., Goldstone, J., and Gutmann, S. (2014). "A Quantum Approximate Optimization Algorithm." arXiv:1411.4028.
5 Google Quantum AI (2024). "Quantum error correction below the surface code threshold." Nature. The Willow chip demonstrated that increasing the number of qubits in an error-correcting code reduced errors rather than increasing them — a critical threshold for scalable quantum computing.