Notation Guide

By: DJ Rich

Posted: Updated:

A common source of confusion is notation. To avoid clutter, symbols are often omitted, and their meaning depends on context. So we make the following clarifications:

  • The entire series considers the discrete case. It should be noted the continuous case is not a simple generalization of the discrete case. The exceptions are in parts 4 and 5, where examples are in the continuous case but the demonstrated principles are the same.

  • An upper case non-bold letter indicates a single random variable. The same letter lower case with a super script indicates a specific value that random variable may take. For example, \(X=x^1\) is the event the random variable \(X\) took on the value \(x^1\). We call this event an assignment. The set of unique values a random variable may take is \(\textrm{Val}(X)\). For instance, we may have \(\textrm{Val}(X)=\{x^0,x^1\}\), indicating \(X\) is a discrete-valued variable taking one of two values. This is a general form of a case like ‘the grade random variable may be either pass or fail’.

  • An upper case bold letter indicates a set of random variables (e.g. \(\mathbf{X}\)) and a bold lower case letter indicates a set of values they may take. For example, we may have \(\mathbf{X}=\{A,B\}\) and \(\mathbf{x}=\{a^3,b^1\}\). Then the event \(\mathbf{X}=\mathbf{x}\) is the event that \(A=a^3\) happens and \(B=b^1\) happens. When an assignment involves multiple variables, it’s called a joint assignment. \(\textrm{Val}(\mathbf{X})\) is the set of all possible joint assignments to the random variables of \(\mathbf{X}\).

  • \(\vert \textrm{Val}(\mathbf{X}) \vert\) is the count of elements in \(\textrm{Val}(\mathbf{X})\).

  • \(\mathbf{x}\) (or \(\mathbf{y}\) or \(\mathbf{z}\) etc.) within a probability function (e.g. \(P(\mathbf{x} \vert \cdots)\) or \(P(\cdots \vert \mathbf{x})\)) always abbreviates the event ‘\(\mathbf{X}=\mathbf{x}\)’.

  • Perhaps confusingly, we also abbreviate the event ‘\(\mathbf{X}=\mathbf{x}\)’ as ‘\(\mathbf{X}\)’, though this isn’t a clean abbreviation. Omission of \(\mathbf{x}\) means one of two things: either we mean this for any given \(\mathbf{x}\) or for all possible \(\mathbf{x}\)’s. As an example for the latter case, ‘calculate \(P(\mathbf{X})\)’ means calculate the set of probabilities \(P(\mathbf{X}=\mathbf{x})\) for all \(\mathbf{x}\in \textrm{Val}(\mathbf{X})\).

  • \(\sum_\mathbf{X}f(\mathbf{X})\) is shorthand for \(\sum_{\mathbf{x}\in \textrm{Val}(\mathbf{X})}f(\mathbf{X}=\mathbf{x})\). This is similarly true for \(\prod_\mathbf{X}(\cdot)\) and \(\textrm{argmin}_\mathbf{X}(\cdot)\).

  • Sometimes we write equations like \(f(A,B,C)=g(\mathbf{X})h(\mathbf{Y})\), which appears incorrect; the random variables on the left aren’t on the right. In these cases, we’ll have something like \(\mathbf{X} = \{A,B\}\) and \(\mathbf{Y} = \{B,C\}\). The intended interpretation is \(f(A,B,C)=g(A,B)h(B,C)\).

  • Probability distributions are referenced with a \(P\), \(\textrm{Q}\), \(p\), \(q\) or \(\pi\). It’s worth emphasizing that distributions are a special type of function.

Something to Add?

If you see an error, egregious omission, something confusing or something worth adding, please email with your suggestion. If it’s substantive, you’ll be credited. Thank you in advance!