Source: xkcd.com/1591 |

I've also been told in person that I understood the theorem wrong. So it seems about time for some studying.

This time, rather than pick up Bell's original article, I read this more popular account of the argument, which covers more or less the same ground. If I understand it correctly, it's actually simpler than what I first thought, although my hazy understanding of the physics stood in the way of extracting the purely statistical part of the argument.

### Background

Here's what I take to be the issue: We have a certain experiment in which two binary observables, $A$ and $B$, follow conditional distributions that depend on two control variables, $a$ and $b$:\begin{eqnarray}a &\longrightarrow& A \\

b &\longrightarrow& B

\end{eqnarray}Although the experiment is designed to prevent statistical dependencies between $A$ and $B$, we still observe a marked correlation between them for many settings of $a$ and $b$. This has to be explained somehow, either by postulating

- an unobserved common cause: $\lambda\rightarrow A,B$;
- an observed common effect: $A,B \rightarrow \gamma$ (i.e., a sampling bias);
- or a direct causal link: $A \leftrightarrow B$.

### Measurable Consequences

The measure in question is the following:\begin{eqnarray}

C(a,b) &=& +P(A=1,B=1\,|\,a,b) \\

& & +P(A=0,B=0\,|\,a,b) \\

& & -P(A=1,B=0\,|\,a,b) \\

& & -P(A=0,B=1\,|\,a,b).

\end{eqnarray}This statistic is related to the correlation between $A$ and $B$ but different due to the absence of marginal probabilities $P(A)$ and $P(B)$. It evaluates to $+1$ if and only if the two are perfectly correlated, and $-1$ if and only if they are perfectly anti-correlated.

Contours of $C(a,b)$ when $A$ and $B$ are independent with $x=P(A)$ and $y=P(B)$. |

In a certain type of experiment, where $a$ and $b$ are angles of two magnets used to reveal something about the spin of a particle, quantum mechanics predicts that

$$

C(a,b) \;=\; -\cos(a-b).

$$When the control variables only differ little, $A$ and $B$ are thus strongly anti-correlated, but when the control variables are on opposite sides of the unit circle, $A$ and $B$ are closely correlated. This is a prediction based on physical considerations.

### Bounds on Joint Correlations

However, let's stick with the pure statistics a bit longer. Suppose again $A$ depends only on $a$, and $B$ depends only on $b$, possibly given some fixed, shared background information which is independent of the control variables.The statistical situation when the background information is held constant. |

Then $C(a,b)$ can be expanded to

\begin{eqnarray}

C(a,b) &=& +P(A=1\,|\,a) \, P(B=1\,|\,b) \\

& & +P(A=0\,|\,a) \, P(B=0\,|\,b) \\

& & - P(A=1\,|\,a) \, P(B=0\,|\,b) \\

& & - P(A=0\,|\,a) \, P(B=1\,|\,b) \\

&=& [P(A=1\,|\,a) - P(A=0\,|\,a)] \times [P(B=1\,|\,b) - P(B=0\,|\,b)],

\end{eqnarray}that is, the product of two statistics which measure how stochastic the variables $A$ and $B$ are given the control parameter settings. Using obvious abbreviations,

$$

C(a,b) \; = \; (A_1 - A_0) (B_1 - B_0),

$$and thus

\begin{eqnarray}

C(a,b) + C(a,b^\prime) &=&

(A_1 - A_0) (B_1 - B_0 + B_1^\prime - B_0^\prime)

& \leq & (B_1 - B_0 + B_1^\prime - B_0^\prime); \\

C(a^\prime,b) - C(a^\prime,b^\prime) &=& (A_1^\prime - A_0^\prime) (B_1 - B_0 - B_1^\prime + B_0^\prime)

& \leq & (B_1 - B_0 - B_1^\prime + B_0^\prime).

\end{eqnarray}It follows that

$$

C(a,b) + C(a,b^\prime) + C(a^\prime,b) - C(a^\prime,b^\prime) \;\leq\; 2(B_1 - B_0) \;\leq\; 2.

$$Since $(B_1 - B_0)\geq-1$, a similar derivation shows that

$$

| C(a,b) + C(a,b^\prime) + C(a^\prime,b) - C(a^\prime,b^\prime) | \;\leq\; 2|B_1 - B_0| \;\leq\; 2.

$$In fact, all 16 variants of this inequality, with the signs alternating in all possible ways, can be derived using the same idea.

### Violations of Those Bounds

But now look again at$$

C(a,b) \;=\; -\cos(a-b).

$$We then have, for $(a,b,a^\prime,b^\prime)=(0,\pi/4,\pi/2,-\pi/4)$,

$$

\left| C\left(0, \frac{\pi}{4}\right) + C\left(0, -\frac{\pi}{4}\right) + C\left(\frac{\pi}{4}, \frac{\pi}{4}\right) - C\left(\frac{\pi}{4}, -\frac{\pi}{4}\right) \right| \;=\; -2\sqrt{2},

$$which is indeed outside the interval $[-2,2]$. $C$ can thus not be of the predicted functional form and at the same time satisfy the bound on the correlation statistics. Something's gotta give.

### Introducing Hidden Variables

This entire derivation relied on $A$ and $B$ depending on nothing other than their own private control variables, $a$ and $b$.However, suppose that a clever physicist proposes to explain the dependence between $A$ and $B$ by postulating some unobserved hidden cause influencing them both. There is then some stochastic variable $\lambda$ which is independent of the control variables, yet causally influences both $A$ and $B$.

The statistical situation when the background information varies stochastically. |

However, even if this is the case, we can go through the entire derivation above, adding "given $\lambda$" to every single step of the process. As long as we condition on a fixed value of lambda, each of the steps still hold. But since the inequality thus is valid for every single value of $\lambda$, it is also valid in expectation, and we can thus integrate $\lambda$ out; the result is that even under such a "hidden variable theory," the inequality still holds.

Hence, the statistical dependency cannot be explained by a shared cause alone, since the functional form of the probability densities for $A$ given $a$ and $B$ given $b$ are of a wrong form. We will therefore need to either postulate direct causality between $A$ and $B$ or an observed downstream variable (sampling bias) instead.

Note that the only thing we really need to prove this result is the assumption that the probability $P(A,B \, | \, a,b,\lambda)$ factors into the product $P(A \, | \, a,b,\lambda)\, P(B \, | \, a,b,\lambda)$. This corresponds to the assumption that there is no direct causal connection between $A$ and $B$.