Notation

5 min read
MathematicsMachine LearningReference

Notation

"Words offer the means to meaning, and for those who will listen, the enunciation of truth" ~ V for Vendetta

Notation is the verbiage of mathematics, and one I confess I have never mastered. Given my ambitions to get at least somewhat cracked at ML/AI stuff, I need to get better at this first.

Below is the notation table I extracted from the Mathematics for Machine Learning book by Marc Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong.

I will be using this as a reference to get better at understanding the notation used in ML/AI.

Table of Symbols

SymbolTypical meaning
a,b,c,α,β,γa, b, c, \alpha, \beta, \gammaScalars are lowercase
x,y,z\mathbf{x}, \mathbf{y}, \mathbf{z}Vectors are bold lowercase
A,B,C\mathbf{A}, \mathbf{B}, \mathbf{C}Matrices are bold uppercase
x,A\mathbf{x}^\top, \mathbf{A}^\topTranspose of a vector or matrix
A1\mathbf{A}^{-1}Inverse of a matrix
x,y\langle \mathbf{x}, \mathbf{y} \rangleInner product of x\mathbf{x} and y\mathbf{y}
xy\mathbf{x}^\top \mathbf{y}Dot product of x\mathbf{x} and y\mathbf{y}
B=(b1,b2,b3)\bm{B} = (b_1, b_2, b_3)(Ordered) tuple
B=[b1,b2,b3]\mathbf{B} = [b_1, b_2, b_3]Matrix of column vectors stacked horizontally
B={b1,b2,b3}\mathcal{B} = \{b_1, b_2, b_3\}Set of vectors (unordered)
Z,N\mathbb{Z}, \mathbb{N}Integers and natural numbers, respectively
R,C\mathbb{R}, \mathbb{C}Real and complex numbers, respectively
Rn\mathbb{R}^nnn-dimensional vector space of real numbers
x\forall xUniversal quantifier: for all xx
x\exists xExistential quantifier: there exists xx
aba \coloneqq baa is defined as bb
aba \eqqcolon bbb is defined as aa
aba \propto baa is proportional to bb, i.e., a=constantba = \text{constant} \cdot b
gfg \circ fFunction composition: "gg after ff"
\LeftrightarrowIf and only if
\RightarrowImplies
A,CA, CSets
aAa \in Aaa is an element of set AA
\varnothingEmpty set
ABA \setminus BAA without BB: the set of elements in AA but not in BB
DDNumber of dimensions; indexed by d=1,,Dd = 1, \ldots, D
NNNumber of data points; indexed by n=1,,Nn = 1, \ldots, N
ImI_mIdentity matrix of size m×mm \times m
0m,n0_{m,n}Matrix of zeros of size m×nm \times n
1m,n1_{m,n}Matrix of ones of size m×nm \times n
ei\mathbf{e}_iStandard/canonical vector (where ii is the component that is 11)
dim\dimDimensionality of vector space
rk(A)\operatorname{rk}(\mathbf{A})Rank of matrix A\mathbf{A}
Im(Φ)\operatorname{Im}(\Phi)Image of linear mapping Φ\Phi
ker(Φ)\ker(\Phi)Kernel (null space) of a linear mapping Φ\Phi
span[b1]\operatorname{span}[b_1]Span (generating set) of b1b_1
tr(A)\operatorname{tr}(\mathbf{A})Trace of A\mathbf{A}
det(A)\det(\mathbf{A})Determinant of A\mathbf{A}
\|\cdot\|Absolute value or determinant (depending on context)
\|\cdot\|Norm; Euclidean, unless specified
λ\lambdaEigenvalue or Lagrange multiplier
EλE_\lambdaEigenspace corresponding to eigenvalue λ\lambda
xy\mathbf{x} \perp \mathbf{y}Vectors x\mathbf{x} and y\mathbf{y} are orthogonal
VVVector space
VV^\perpOrthogonal complement of vector space VV
n=1Nxn\sum_{n=1}^N x_nSum of the xnx_n: x1++xNx_1 + \ldots + x_N
n=1Nxn\prod_{n=1}^N x_nProduct of the xnx_n: x1xNx_1 \cdot \ldots \cdot x_N
θ\thetaParameter vector
fx\frac{\partial f}{\partial x}Partial derivative of ff with respect to xx
dfdx\frac{d f}{d x}Total derivative of ff with respect to xx
\nablaGradient
f=minxf(x)f^\ast = \min_x f(x)The smallest function value of ff
xarg minxf(x)x^\ast \in \operatorname{arg\,min}_x f(x)The value xx^\ast that minimizes ff (note: arg min\operatorname{arg\,min} returns a set of values)
L\mathcal{L}Lagrangian
L\mathcal{L}Negative log-likelihood
(nk)\binom{n}{k}Binomial coefficient, nn choose kk
VarX[x]\operatorname{Var}_X[x]Variance of xx with respect to the random variable XX
EX[x]\operatorname{E}_X[x]Expectation of xx with respect to the random variable XX
CovX,Y[x,y]\operatorname{Cov}_{X,Y}[x, y]Covariance between xx and yy
X ⁣ ⁣ ⁣YZX \perp\!\!\!\perp Y \mid ZXX is conditionally independent of YY given ZZ
XpX \sim pRandom variable XX is distributed according to pp
N(μ,Σ)\mathcal{N}(\mu, \Sigma)Gaussian distribution with mean μ\mu and covariance Σ\Sigma
Ber(μ)\operatorname{Ber}(\mu)Bernoulli distribution with parameter μ\mu
Bin(N,μ)\operatorname{Bin}(N, \mu)Binomial distribution with parameters N,μN, \mu
Beta(α,β)\operatorname{Beta}(\alpha, \beta)Beta distribution with parameters α,β\alpha, \beta

Table of Abbreviations and Acronyms

AcronymMeaning
e.g.Exempli gratia (Latin: for example)
GMMGaussian mixture model
i.e.Id est (Latin: this means)
i.i.d.Independent, identically distributed
MAPMaximum a posteriori
MLEMaximum likelihood estimation/estimator
ONBOrthonormal basis
PCAPrincipal component analysis
PPCAProbabilistic principal component analysis
REFRow-echelon form
SPDSymmetric, positive definite
SVMSupport vector machine