Notation

"Words offer the means to meaning, and for those who will listen, the enunciation of truth" ~ V for Vendetta

Notation is the verbiage of mathematics, and one I confess I have never mastered. Given my ambitions to get at least somewhat cracked at ML/AI stuff, I need to get better at this first.

Below is the notation table I extracted from the Mathematics for Machine Learning book by Marc Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong.

I will be using this as a reference to get better at understanding the notation used in ML/AI.

Table of Symbols

Symbol	Typical meaning
$a, b, c, \alpha, \beta, \gamma$	Scalars are lowercase
$\mathbf{x}, \mathbf{y}, \mathbf{z}$	Vectors are bold lowercase
$\mathbf{A}, \mathbf{B}, \mathbf{C}$	Matrices are bold uppercase
$\mathbf{x}^\top, \mathbf{A}^\top$	Transpose of a vector or matrix
$\mathbf{A}^{-1}$	Inverse of a matrix
$\langle \mathbf{x}, \mathbf{y} \rangle$	Inner product of $\mathbf{x}$ and $\mathbf{y}$
$\mathbf{x}^\top \mathbf{y}$	Dot product of $\mathbf{x}$ and $\mathbf{y}$
$\bm{B} = (b_1, b_2, b_3)$	(Ordered) tuple
$\mathbf{B} = [b_1, b_2, b_3]$	Matrix of column vectors stacked horizontally
$\mathcal{B} = \{b_1, b_2, b_3\}$	Set of vectors (unordered)
$\mathbb{Z}, \mathbb{N}$	Integers and natural numbers, respectively
$\mathbb{R}, \mathbb{C}$	Real and complex numbers, respectively
$\mathbb{R}^n$	$n$ -dimensional vector space of real numbers
$\forall x$	Universal quantifier: for all $x$
$\exists x$	Existential quantifier: there exists $x$
$a \coloneqq b$	$a$ is defined as $b$
$a \eqqcolon b$	$b$ is defined as $a$
$a \propto b$	$a$ is proportional to $b$ , i.e., $a = \text{constant} \cdot b$
$g \circ f$	Function composition: " $g$ after $f$ "
$\Leftrightarrow$	If and only if
$\Rightarrow$	Implies
$A, C$	Sets
$a \in A$	$a$ is an element of set $A$
$\varnothing$	Empty set
$A \setminus B$	$A$ without $B$ : the set of elements in $A$ but not in $B$
$D$	Number of dimensions; indexed by $d = 1, \ldots, D$
$N$	Number of data points; indexed by $n = 1, \ldots, N$
$I_m$	Identity matrix of size $m \times m$
$0_{m,n}$	Matrix of zeros of size $m \times n$
$1_{m,n}$	Matrix of ones of size $m \times n$
$\mathbf{e}_i$	Standard/canonical vector (where $i$ is the component that is $1$ )
$\dim$	Dimensionality of vector space
$\operatorname{rk}(\mathbf{A})$	Rank of matrix $\mathbf{A}$
$\operatorname{Im}(\Phi)$	Image of linear mapping $\Phi$
$\ker(\Phi)$	Kernel (null space) of a linear mapping $\Phi$
$\operatorname{span}[b_1]$	Span (generating set) of $b_1$
$\operatorname{tr}(\mathbf{A})$	Trace of $\mathbf{A}$
$\det(\mathbf{A})$	Determinant of $\mathbf{A}$
$\\|\cdot\\|$	Absolute value or determinant (depending on context)
$\\|\cdot\\|$	Norm; Euclidean, unless specified
$\lambda$	Eigenvalue or Lagrange multiplier
$E_\lambda$	Eigenspace corresponding to eigenvalue $\lambda$
$\mathbf{x} \perp \mathbf{y}$	Vectors $\mathbf{x}$ and $\mathbf{y}$ are orthogonal
$V$	Vector space
$V^\perp$	Orthogonal complement of vector space $V$
$\sum_{n=1}^N x_n$	Sum of the $x_n$ : $x_1 + \ldots + x_N$
$\prod_{n=1}^N x_n$	Product of the $x_n$ : $x_1 \cdot \ldots \cdot x_N$
$\theta$	Parameter vector
$\frac{\partial f}{\partial x}$	Partial derivative of $f$ with respect to $x$
$\frac{d f}{d x}$	Total derivative of $f$ with respect to $x$
$\nabla$	Gradient
$f^\ast = \min_x f(x)$	The smallest function value of $f$
$x^\ast \in \operatorname{arg\,min}_x f(x)$	The value $x^\ast$ that minimizes $f$ (note: $\operatorname{arg\,min}$ returns a set of values)
$\mathcal{L}$	Lagrangian
$\mathcal{L}$	Negative log-likelihood
$\binom{n}{k}$	Binomial coefficient, $n$ choose $k$
$\operatorname{Var}_X[x]$	Variance of $x$ with respect to the random variable $X$
$\operatorname{E}_X[x]$	Expectation of $x$ with respect to the random variable $X$
$\operatorname{Cov}_{X,Y}[x, y]$	Covariance between $x$ and $y$
$X \perp\!\!\!\perp Y \mid Z$	$X$ is conditionally independent of $Y$ given $Z$
$X \sim p$	Random variable $X$ is distributed according to $p$
$\mathcal{N}(\mu, \Sigma)$	Gaussian distribution with mean $\mu$ and covariance $\Sigma$
$\operatorname{Ber}(\mu)$	Bernoulli distribution with parameter $\mu$
$\operatorname{Bin}(N, \mu)$	Binomial distribution with parameters $N, \mu$
$\operatorname{Beta}(\alpha, \beta)$	Beta distribution with parameters $\alpha, \beta$

Table of Abbreviations and Acronyms

Acronym	Meaning
e.g.	Exempli gratia (Latin: for example)
GMM	Gaussian mixture model
i.e.	Id est (Latin: this means)
i.i.d.	Independent, identically distributed
MAP	Maximum a posteriori
MLE	Maximum likelihood estimation/estimator
ONB	Orthonormal basis
PCA	Principal component analysis
PPCA	Probabilistic principal component analysis
REF	Row-echelon form
SPD	Symmetric, positive definite
SVM	Support vector machine