Notation Guide
Posted: Updated:
A common source of confusion is notation. To avoid clutter, symbols are often omitted, and their meaning depends on context. So we make the following clarifications:

The entire series considers the discrete case. It should be noted the continuous case is not a simple generalization of the discrete case. The exceptions are in parts 4 and 5, where examples are in the continuous case but the demonstrated principles are the same.

An upper case nonbold letter indicates a single random variable. The same letter lower case with a super script indicates a specific value that random variable may take. For example, \(X=x^1\) is the event the random variable \(X\) took on the value \(x^1\). We call this event an assignment. The set of unique values a random variable may take is \(\textrm{Val}(X)\). For instance, we may have \(\textrm{Val}(X)=\{x^0,x^1\}\), indicating \(X\) is a discretevalued variable taking one of two values. This is a general form of a case like ‘the grade random variable may be either pass or fail’.

An upper case bold letter indicates a set of random variables (e.g. \(\mathbf{X}\)) and a bold lower case letter indicates a set of values they may take. For example, we may have \(\mathbf{X}=\{A,B\}\) and \(\mathbf{x}=\{a^3,b^1\}\). Then the event \(\mathbf{X}=\mathbf{x}\) is the event that \(A=a^3\) happens and \(B=b^1\) happens. When an assignment involves multiple variables, it’s called a joint assignment. \(\textrm{Val}(\mathbf{X})\) is the set of all possible joint assignments to the random variables of \(\mathbf{X}\).

\(\vert \textrm{Val}(\mathbf{X}) \vert\) is the count of elements in \(\textrm{Val}(\mathbf{X})\).

\(\mathbf{x}\) (or \(\mathbf{y}\) or \(\mathbf{z}\) etc.) within a probability function (e.g. \(P(\mathbf{x} \vert \cdots)\) or \(P(\cdots \vert \mathbf{x})\)) always abbreviates the event ‘\(\mathbf{X}=\mathbf{x}\)’.

Perhaps confusingly, we also abbreviate the event ‘\(\mathbf{X}=\mathbf{x}\)’ as ‘\(\mathbf{X}\)’, though this isn’t a clean abbreviation. Omission of \(\mathbf{x}\) means one of two things: either we mean this for any given \(\mathbf{x}\) or for all possible \(\mathbf{x}\)’s. As an example for the latter case, ‘calculate \(P(\mathbf{X})\)’ means calculate the set of probabilities \(P(\mathbf{X}=\mathbf{x})\) for all \(\mathbf{x}\in \textrm{Val}(\mathbf{X})\).

\(\sum_\mathbf{X}f(\mathbf{X})\) is shorthand for \(\sum_{\mathbf{x}\in \textrm{Val}(\mathbf{X})}f(\mathbf{X}=\mathbf{x})\). This is similarly true for \(\prod_\mathbf{X}(\cdot)\) and \(\textrm{argmin}_\mathbf{X}(\cdot)\).

Sometimes we write equations like \(f(A,B,C)=g(\mathbf{X})h(\mathbf{Y})\), which appears incorrect; the random variables on the left aren’t on the right. In these cases, we’ll have something like \(\mathbf{X} = \{A,B\}\) and \(\mathbf{Y} = \{B,C\}\). The intended interpretation is \(f(A,B,C)=g(A,B)h(B,C)\).

Probability distributions are referenced with a \(P\), \(\textrm{Q}\), \(p\), \(q\) or \(\pi\). It’s worth emphasizing that distributions are a special type of function.
Something to Add?
If you see an error, egregious omission, something confusing or something worth adding, please email dj@truetheta.io with your suggestion. If it’s substantive, you’ll be credited. Thank you in advance!