Mathematics — Miko-pedia

Overview. This complete note provides a unified undergraduate review of core mathematical topics, from logic and discrete structures through calculus, linear algebra, probability, and modern analysis. Each section is designed to stand alone for exam review: definitions, key theorems, representative examples, and concise summaries are included throughout. The sections follow a natural progression from foundational reasoning to advanced topics such as topology and numerical methods. Use this overview to locate weak areas and target your study efficiently.

The Language of Mathematics: Propositions, Sets, Functions, and Proofs

Core ideas

Mathematics rests on a rigorous foundation of logic and set theory. A proposition is a declarative statement that is unambiguously either true or false. Propositions are combined using logical connectives: $\neg$ (negation), $\land$ (conjunction), $\lor$ (disjunction), $\implies$ (implication), and $\iff$ (biconditional). An implication $P \implies Q$ is vacuously true if $P$ is false. A predicate $P(x)$ is a statement whose truth depends on a variable $x$ , quantified by $\forall$ (for all'') or $\exists$ (there exists”). De Morgan’s laws for quantifiers state $\neg(\forall x\,P(x)) \iff \exists x\,\neg P(x)$ and $\neg(\exists x\,P(x)) \iff \forall x\,\neg P(x)$ .

A set is a collection of distinct objects. Basic operations include union $A \cup B$ , intersection $A \cap B$ , set difference $A \setminus B$ , and the power set $\mathcal{P}(A)$ (the set of all subsets of $A$ ). The Cartesian product $A \times B = \{(a,b) \mid a \in A,\; b \in B\}$ produces ordered pairs. A relation on $A$ and $B$ is a subset $R \subseteq A \times B$ . An equivalence relation $\sim$ on $A$ is a relation that is reflexive ( $a \sim a$ ), symmetric ( $a \sim b \implies b \sim a$ ), and transitive ( $a \sim b \land b \sim c \implies a \sim c$ ). It partitions $A$ into disjoint equivalence classes $[a] = \{x \in A \mid x \sim a\}$ .

A function $f : A \to B$ is a relation where each $a \in A$ is paired with exactly one $b \in B$ . A function is injective (one-to-one) if $f(a_1) = f(a_2) \implies a_1 = a_2$ ; surjective (onto) if $\forall b \in B, \exists a \in A$ with $f(a)=b$ ; and bijective if both hold. Bijections establish that two sets have the same cardinality $|A| = |B|$ . A set is countable if it is in bijection with a subset of $\mathbb{N}$ . Cantor’s Theorem states that $|A| < |\mathcal{P}(A)|$ , proving the existence of uncountably infinite sets (e.g., $\mathbb{R}$ ).

Proof techniques are the tools of mathematical reasoning: Direct proof: assume hypotheses, deduce conclusion via logical steps. Proof by contrapositive: prove $P \implies Q$ by showing $\neg Q \implies \neg P$ . Proof by contradiction: assume the negation of the desired statement and derive a logical impossibility. Mathematical induction: prove $P(1)$ (base case) and $P(k) \implies P(k+1)$ (inductive step) for all $k \in \mathbb{N}$ .

Logic: connectives, quantifiers, De Morgan’s laws.
Set Theory: operations, relations, equivalence classes, power sets, cardinality.
Functions: injectivity, surjectivity, bijections, and inverse functions.
Proof methods: direct, contrapositive, contradiction, induction.

Mathematical spine

\text{Truth table for Implication: } \begin{array}{c|c|c} P & Q & P\implies Q \\ \hline T & T & T \\ T & F & F \\ F & T & T \\ F & F & T \end{array}

\text{De Morgan: } \neg(P\land Q) \iff \neg P \lor \neg Q,\qquad \neg(P\lor Q) \iff \neg P \land \neg Q

\text{Equivalence relation: } a \sim a,\quad a \sim b \implies b \sim a,\quad a \sim b \land b \sim c \implies a \sim c

Example

Consider the function $f: \mathbb{R} \to \mathbb{R}$ defined by $f(x) = 2x + 3$ .

Injectivity: Suppose $f(a) = f(b)$ . Then $2a + 3 = 2b + 3$ , which simplifies to $a = b$ . Thus $f$ is injective.
Surjectivity: For any $y \in \mathbb{R}$ , set $x = (y-3)/2$ . Then $f(x) = 2((y-3)/2) + 3 = y$ . Thus $f$ is surjective.
Inverse: Since $f$ is bijective, it has an inverse $f^{-1}(y) = (y-3)/2$ .

Next, prove by induction that $\sum_{k=1}^n (2k-1) = n^2$ for all $n \in \mathbb{N}$ .

Base case ( $n=1$ ): LHS $= 2(1)-1 = 1$ , RHS $= 1^2 = 1$ .
Inductive step: Assume $\sum_{k=1}^m (2k-1) = m^2$ . Then $\sum_{k=1}^{m+1} (2k-1) = m^2 + (2(m+1)-1) = m^2 + 2m + 1 = (m+1)^2.$ Thus the formula holds for all $n \in \mathbb{N}$ .

Problems with Solutions

Problem. Prove by contrapositive: If $n^2$ is even, then $n$ is even.

Solution. We prove $\neg Q \implies \neg P$ : if $n$ is odd, then $n^2$ is odd. If $n$ is odd, $n=2k+1$ for some $k \in \mathbb{Z}$ . Then $n^2 = 4k^2+4k+1 = 2(2k^2+2k)+1$ , which is odd. Hence the original statement is true.
Problem. Let $A = \{1,2,3\}$ and $B = \{2,3,4\}$ . Compute $A \cup B$ , $A \cap B$ , $A \setminus B$ , and $|A \times B|$ .

Solution. $A \cup B = \{1,2,3,4\}$ , $A \cap B = \{2,3\}$ , $A \setminus B = \{1\}$ . The Cartesian product has $|A|\cdot|B| = 3 \cdot 4 = 12$ elements.
Problem. Use induction to prove that $\sum_{k=1}^n k = \frac{n(n+1)}{2}$ for all $n \ge 1$ .

Solution. Base case $n=1$ : $1 = 1(2)/2$ . Assume true for $n=m$ . Then
$\sum_{k=1}^{m+1} k = \frac{m(m+1)}{2} + (m+1) = \frac{m(m+1)+2(m+1)}{2} = \frac{(m+1)(m+2)}{2}.$
Thus the statement holds for all $n \ge 1$ .

Section summary. The language of mathematics is built upon formal logic and set theory. Propositions, predicates, and quantifiers provide the syntax for rigorous statements. Sets, relations, and functions offer the foundational structures for modeling mathematical objects. Mastery of proof techniques---including induction, contrapositive, and contradiction---is required to rigorously verify properties of these structures.

Discrete Structures: Induction, Combinatorics, Graphs, and Algorithms

Core ideas

Discrete mathematics studies countable, distinct structures. Mathematical induction (weak and strong) is the standard proof technique for statements concerning natural numbers. Recursion defines sequences or sets in terms of previous elements, often solvable via generating functions (e.g., representing sequence $\{a_n\}$ as a formal power series $\sum a_n x^n$ ) or characteristic equations.

Combinatorics counts finite arrangements. The multiplication principle: if task $A$ has $m$ ways and task $B$ has $n$ ways, the pair $(A,B)$ has $m \cdot n$ ways. A permutation is an ordered arrangement: $P(n,k) = n!/(n-k)!$ . A combination is an unordered selection: $\binom{n}{k} = n!/(k!(n-k)!)$ . The Binomial Theorem states $(x+y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k$ . The Pigeonhole Principle guarantees that placing $n > m$ items into $m$ boxes results in at least one box with $\ge 2$ items. The Principle of Inclusion-Exclusion (PIE) computes the union of overlapping sets: $|A \cup B| = |A| + |B| - |A \cap B|$ , expanding to alternating sums for more sets.

A graph $G = (V,E)$ consists of vertices $V$ and edges $E$ . The Handshaking Lemma states $\sum_{v \in V} \deg(v) = 2|E|$ . A tree is a connected acyclic graph on $n$ vertices, having exactly $n-1$ edges. A graph is bipartite if its vertices can be partitioned into two sets where all edges connect the two sets; equivalently, it contains no odd cycles. An Eulerian circuit traverses every edge exactly once (exists iff all vertices have even degrees and $G$ is connected). A Hamiltonian cycle visits every vertex exactly once. For connected planar graphs (drawable without edge crossings), Euler’s Formula states $V - E + F = 2$ , where $F$ is the number of faces. Graph coloring assigns colors to vertices so adjacent vertices differ; the chromatic number $\chi(G)$ is the minimum colors needed. By the Four Color Theorem, $\chi(G) \le 4$ for planar graphs.

Algorithms provide step-by-step procedures to solve discrete problems. Big-O notation, $f(n) = O(g(n))$ , describes worst-case asymptotic upper bounds: $|f(n)| \le c|g(n)|$ for sufficiently large $n$ . The Euclidean algorithm computes $\gcd(a,b)$ in $O(\log(\min(a,b)))$ steps. Graph algorithms include Breadth-first search (BFS) for shortest paths in unweighted graphs, Depth-first search (DFS) for topological sorting and cycle detection, and Dijkstra’s algorithm for shortest paths with non-negative weights.

Counting: permutations, combinations, Pigeonhole Principle, PIE, generating functions.
Graphs: trees, bipartite graphs, Eulerian/Hamiltonian paths, planar graphs (Euler’s formula).
Algorithms: BFS, DFS, Dijkstra, Euclidean algorithm, Big-O complexity.
Induction: strong induction, structural induction.

Mathematical spine

\binom{n}{k} = \frac{n!}{k!(n-k)!},\qquad (x+y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k

|A \cup B \cup C| = |A|+|B|+|C| - |A\cap B| - |A\cap C| - |B\cap C| + |A\cap B\cap C|

\sum_{v\in V}\deg(v) = 2|E|,\qquad V - E + F = 2 \text{ (planar graphs)}

Example

A computer password must be exactly $6$ characters long and is formed using uppercase letters ( $26$ choices) and digits ( $10$ choices). A valid password must contain at least one digit. How many valid passwords exist?

Solution. The total number of $6$ -character strings is $36^6$ . The number of strings with no digits (only letters) is $26^6$ . By the subtraction principle, the number of valid passwords is

36^6 - 26^6 = 2\,176\,782\,336 - 308\,915\,776 = 1\,867\,866\,560.

Now consider a simple graph $G$ with $V = \{A,B,C,D\}$ and $E = \{AB, BC, CD, DA, AC\}$ . Does $G$ contain an Eulerian circuit?

Solution. The degrees are $\deg(A)=3$ , $\deg(B)=2$ , $\deg(C)=3$ , $\deg(D)=2$ . Since vertices $A$ and $C$ have odd degree, the graph does not have an Eulerian circuit. (An Eulerian circuit requires all vertices to have even degree.)

Problems with Solutions

Problem. A password is $4$ characters long, chosen from $26$ letters and $10$ digits. How many such passwords contain at least one digit?

Solution. Total strings: $36^4$ . Strings with no digits: $26^4$ . Valid passwords: $36^4 - 26^4 = 1\,679\,616 - 456\,976 = 1\,222\,640$ .
Problem. Compute $\binom{7}{3}$ .

Solution.
$\binom{7}{3} = \frac{7!}{3!\,4!} = \frac{7\cdot 6\cdot 5}{3\cdot 2\cdot 1} = 35.$
Problem. A graph has $5$ vertices with degrees $2, 3, 3, 3, 3$ . How many edges does it have?

Solution. By the Handshaking Lemma, $\sum \deg(v) = 2|E|$ . The sum of degrees is $2+3+3+3+3 = 14$ . Hence $2|E|=14$ , so $|E|=7$ .

Section summary. Discrete structures encompass combinatorial counting (permutations, combinations, inclusion-exclusion), graph theory (trees, planarity, colorings), and algorithmic analysis (Big-O notation, graph traversal). Mastery of these principles is critical for analyzing finite structures, solving recurrence relations, and establishing computational complexity.

Single Variable Calculus: Change, Approximation, and Integration

Core ideas

Calculus studies continuous change and accumulation. The limit $\lim_{x\to a} f(x) = L$ means $f(x)$ approaches $L$ as $x$ approaches $a$ . A function is continuous at $a$ if $\lim_{x\to a} f(x) = f(a)$ . For sequences, $\lim_{n \to \infty} a_n = L$ means terms become arbitrarily close to $L$ as $n$ grows. The Intermediate Value Theorem states: if $f$ is continuous on $[a,b]$ , it takes on every value between $f(a)$ and $f(b)$ .

The derivative measures the instantaneous rate of change:

f'(a) = \lim_{h\to 0} \frac{f(a+h) - f(a)}{h}.

Differentiability implies continuity. Key differentiation rules: product rule $(fg)' = f'g + fg'$ ; quotient rule $(f/g)' = (f'g - fg')/g^2$ ; chain rule $(f\circ g)'(x) = f'(g(x))\,g'(x)$ . The Mean Value Theorem (MVT): if $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$ , then $f'(c) = (f(b)-f(a))/(b-a)$ for some $c\in(a,b)$ . Applications include optimization (critical points where $f'(x)=0$ ), and L’H^opital’s rule for limits of indeterminate forms like $0/0$ . Functions can be locally approximated by polynomials. Taylor’s Theorem states $f(x) = \sum_{k=0}^n \frac{f^{(k)}(a)}{k!}(x-a)^k + R_n(x)$ , where the remainder $R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}(x-a)^{n+1}$ for some $c$ between $a$ and $x$ (Lagrange form).

Integration calculates accumulation. The definite integral $\int_a^b f(x)\,dx$ is rigorously defined as the limit of Riemann sums $\lim_{n \to \infty} \sum_{i=1}^n f(x_i^*)\Delta x$ as the partition width $\Delta x \to 0$ . It represents the signed area under the curve. The Fundamental Theorem of Calculus (FTC) bridges derivatives and integrals. Part 1: if $f$ is continuous, $\frac{d}{dx}\int_a^x f(t)\,dt = f(x)$ . Part 2: $\int_a^b f(x)\,dx = F(b) - F(a)$ where $F'(x) = f(x)$ . Integration techniques include substitution ( $u = g(x)$ ), integration by parts ( $\int u\,dv = uv - \int v\,du$ ), and partial fractions.

Limits & Continuity: one-sided limits, sequence limits, IVT.
Derivatives: rules, MVT, Taylor series with remainder, optimization.
Integrals: Riemann sums, definite/indefinite integrals, FTC.
Techniques: substitution, integration by parts, L’H^opital’s rule.

Mathematical spine

\boxed{f'(a) = \lim_{h\to 0}\frac{f(a+h)-f(a)}{h}},\qquad \boxed{\int_a^b f(x)\,dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*)\Delta x}

\int_a^b f(x)\,dx = F(b)-F(a),\qquad \frac{d}{dx}\int_a^x f(t)\,dt = f(x)

f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2}(x-a)^2 + \dots + \frac{f^{(n)}(a)}{n!}(x-a)^n + R_n(x)

Example

Approximate $\sqrt{1.1}$ using the second-order Taylor polynomial for $f(x)=\sqrt{x}$ centered at $a=1$ , and estimate the error.

Solution. We have $f(x)=x^{1/2}$ , $f'(x)=\frac{1}{2}x^{-1/2}$ , $f''(x)=-\frac{1}{4}x^{-3/2}$ , and $f'''(x)=\frac{3}{8}x^{-5/2}$ . At $a=1$ : $f(1)=1$ , $f'(1)=\frac12$ , $f''(1)=-\frac14$ . The second-order Taylor polynomial is

P_2(x) = 1 + \frac12(x-1) - \frac18(x-1)^2.

Evaluating at $x=1.1$ :

P_2(1.1) = 1 + \frac12(0.1) - \frac18(0.01) = 1 + 0.05 - 0.00125 = 1.04875.

The exact value is $\sqrt{1.1}\approx 1.0488088$ , so the error is about $5.9\times10^{-5}$ . Using the Lagrange remainder:

R_2(1.1) = \frac{f'''(c)}{3!}(0.1)^3 = \frac{3/8}{6} c^{-5/2}(0.001) \approx 6.25\times10^{-5}\,c^{-5/2}

for some $c\in(1,1.1)$ . Since $c\ge 1$ , $|R_2|\le 6.25\times10^{-5}$ , consistent with the actual error.

Problems with Solutions

Problem. Evaluate $\displaystyle\lim_{x\to 0} \frac{\sin x - x}{x^3}$ using L’Hôpital’s rule.

Solution. The limit is $0/0$ . Applying L’Hôpital three times:
$\lim_{x\to 0}\frac{\sin x - x}{x^3} = \lim_{x\to 0}\frac{\cos x - 1}{3x^2} = \lim_{x\to 0}\frac{-\sin x}{6x} = \lim_{x\to 0}\frac{-\cos x}{6} = -\frac16.$
Problem. Find the derivative of $f(x) = e^{x^2}\ln x$ .

Solution. Using the product and chain rules:
$f'(x) = (2x e^{x^2})\ln x + e^{x^2}\cdot\frac1x = e^{x^2}\left(2x\ln x + \frac1x\right).$
Problem. Compute $\displaystyle\int_0^{\pi/2} x\sin x\,dx$ using integration by parts.

Solution. Let $u=x$ , $dv=\sin x\,dx$ . Then $du=dx$ , $v=-\cos x$ .
$\int_0^{\pi/2} x\sin x\,dx = \Bigl[-x\cos x\Bigr]_0^{\pi/2} + \int_0^{\pi/2}\cos x\,dx = (0-0) + \Bigl[\sin x\Bigr]_0^{\pi/2} = 1.$

Section summary. Single-variable calculus formalizes continuous change and area. Limits define both the derivative (instantaneous rate of change) and the integral (accumulation via Riemann sums). Taylor series extend linear approximation to polynomial approximation. The Fundamental Theorem of Calculus profoundly unites these two branches, enabling analytical evaluation of integrals and solving differential equations.

Linear Algebra: Spaces, Maps, Eigenvalues, and Projections

Core ideas

Linear algebra studies vector spaces and linear transformations. A vector space $V$ over a field $\mathbb{F}$ is a set closed under vector addition and scalar multiplication. A subspace $U \subseteq V$ is a nonempty subset closed under these operations. The span of vectors $\{v_1,\dots,v_k\}$ is the set of all linear combinations. A set is linearly independent if $\sum c_i v_i = \mathbf{0}$ implies all $c_i = 0$ . A basis is a linearly independent spanning set; its size is the dimension $\dim V$ . The dual space $V^*$ is the vector space of all linear maps (functionals) from $V$ to $\mathbb{F}$ .

A linear transformation $T: V \to W$ preserves operations. The kernel (null space) $\ker T$ and image (range) $\operatorname{im} T$ are subspaces. The rank-nullity theorem states $\dim\ker T + \dim\operatorname{im}T = \dim V$ . Every linear map between finite-dimensional spaces can be represented by a matrix. The determinant $\det A$ of a square matrix measures the volume scaling factor; $A$ is invertible iff $\det A \neq 0$ .

An eigenvector $v \neq \mathbf{0}$ of $A$ satisfies $Av = \lambda v$ , where $\lambda$ is the eigenvalue. Eigenvalues are roots of the characteristic polynomial $p_A(\lambda) = \det(A - \lambda I)$ . The Cayley-Hamilton Theorem states that every square matrix satisfies its own characteristic equation: $p_A(A) = 0$ . A matrix is diagonalizable if it has a basis of eigenvectors. The Spectral Theorem guarantees every real symmetric matrix is orthogonally diagonalizable with real eigenvalues. If a matrix is not diagonalizable, it can be put into Jordan canonical form. The Singular Value Decomposition (SVD) factorizes any $A \in \mathbb{R}^{m\times n}$ as $A = U\Sigma V^T$ , with $U, V$ orthogonal and $\Sigma$ diagonal.

An inner product space is a vector space with an inner product $\langle x,y \rangle$ (generalizing the dot product). It induces a norm $\|x\| = \sqrt{\langle x,x \rangle}$ and a notion of orthogonality ( $\langle x,y \rangle = 0$ ). Orthogonal projection onto a subspace $U$ minimizes the distance to $U$ . The Gram-Schmidt process converts any basis into an orthonormal basis. In systems $Ax = b$ with no exact solution, the least squares solution minimizes $\|Ax-b\|^2$ via the normal equations $A^T A x = A^T b$ .

Vector space: basis, dimension, dual space, linear independence.
Linear maps: rank-nullity theorem, matrix representation, determinants.
Eigen-theory: characteristic polynomial, Cayley-Hamilton, Spectral Theorem, SVD.
Inner products: Gram-Schmidt, orthogonal projections, least squares.

Mathematical spine

\det(A-\lambda I)=0,\qquad p_A(A) = 0 \text{ (Cayley-Hamilton)}

A = U\Sigma V^T,\qquad A^TA\hat{x} = A^T b \text{ (Least Squares)}

\operatorname{proj}_U(v) = \sum_{i=1}^k \langle v,u_i\rangle u_i \text{ (for orthonormal } \{u_i\})

Example

Find the eigenvalues and eigenvectors of $A = \begin{pmatrix} 4 & 2 \\ 1 & 3 \end{pmatrix}$ . Determine whether $A$ is diagonalizable.

Solution. The characteristic polynomial is

\det(A-\lambda I) = (4-\lambda)(3-\lambda) - 2 = \lambda^2 - 7\lambda + 10 = (\lambda-5)(\lambda-2).

Thus the eigenvalues are $\lambda_1 = 5$ and $\lambda_2 = 2$ .

For $\lambda_1 = 5$ : solve $(A-5I)\mathbf{v} = \mathbf{0}$ :

\begin{pmatrix} -1 & 2 \\ 1 & -2 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \mathbf{0} \implies v_1 = 2v_2.

An eigenvector is $\mathbf{v}_1 = (2,1)^T$ .

For $\lambda_2 = 2$ : solve $(A-2I)\mathbf{v} = \mathbf{0}$ :

\begin{pmatrix} 2 & 2 \\ 1 & 1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \mathbf{0} \implies v_1 = -v_2.

An eigenvector is $\mathbf{v}_2 = (1,-1)^T$ .

Since we found two linearly independent eigenvectors, $A$ is diagonalizable: $A = PDP^{-1}$ with $P = \begin{pmatrix} 2 & 1 \\ 1 & -1 \end{pmatrix}$ and $D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}$ .

Problems with Solutions

Problem. Determine whether the vectors $\mathbf{v}_1 = (1,2,3)$ , $\mathbf{v}_2 = (0,1,2)$ , $\mathbf{v}_3 = (1,3,5)$ are linearly independent.

Solution. We check if $c_1\mathbf{v}_1+c_2\mathbf{v}_2+c_3\mathbf{v}_3=\mathbf{0}$ has only the trivial solution. Note that $\mathbf{v}_3 = \mathbf{v}_1+\mathbf{v}_2$ . Hence $1\cdot\mathbf{v}_1 + 1\cdot\mathbf{v}_2 - 1\cdot\mathbf{v}_3 = \mathbf{0}$ , a nontrivial relation. The vectors are linearly dependent.
Problem. Compute the determinant of $B = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{pmatrix}$ .

Solution. For an upper triangular matrix, the determinant is the product of the diagonal entries:
$\det(B) = 1\cdot 4\cdot 6 = 24.$
Problem. Find the least-squares solution to $A\mathbf{x} = \mathbf{b}$ where $A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix}$ and $\mathbf{b} = \begin{pmatrix} 2 \\ 3 \\ 5 \end{pmatrix}$ .

Solution. The normal equations are $A^TA\hat{\mathbf{x}} = A^T\mathbf{b}$ .
$A^TA = \begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix}, \qquad A^T\mathbf{b} = \begin{pmatrix} 10 \\ 23 \end{pmatrix}.$
Solving $\begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 10 \\ 23 \end{pmatrix}$ : from the first equation $3x_1+6x_2=10 \implies x_1 = \frac{10-6x_2}{3}$ . Substituting into the second: $6(\frac{10-6x_2}{3}) + 14x_2 = 23 \implies 20 - 12x_2 + 14x_2 = 23 \implies 2x_2 = 3 \implies x_2 = 1.5$ . Then $x_1 = \frac{10-9}{3} = \frac13$ . The least-squares solution is $\hat{\mathbf{x}} = \begin{pmatrix} 1/3 \\ 3/2 \end{pmatrix}$ .

Section summary. Linear algebra provides the framework for analyzing flat spaces and linear mappings. Core topics include dimension and basis (coordinates), the rank-nullity theorem, and determinants. Eigenvalues, diagonalization, and the Cayley-Hamilton theorem analyze matrix powers and structure, while inner products formalize geometry (orthogonality and projection), forming the bedrock for fields ranging from quantum mechanics to machine learning.

Multivariable Analysis: Gradients, Multiple Integrals, and Vector Fields

Core ideas

Multivariable calculus extends the concepts of limits, derivatives, and integrals to functions of several variables. For a scalar function $f: \mathbb{R}^n \to \mathbb{R}$ , the partial derivative $\partial f/\partial x_i$ represents the rate of change of $f$ in the direction of the $x_i$ -axis, holding all other variables constant. Clairaut’s Theorem ensures that if the mixed second-order partial derivatives are continuous, they are equal: $\partial^2 f/\partial x_i \partial x_j = \partial^2 f/\partial x_j \partial x_i$ .

The gradient vector, denoted $\nabla f = \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right)$ , is central to optimization because it points in the direction of steepest ascent. Furthermore, the gradient is everywhere orthogonal to the level sets of $f$ . The directional derivative $D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$ computes the rate of change of $f$ in the direction of an arbitrary unit vector $\mathbf{u}$ .

For a vector-valued function $f: \mathbb{R}^n \to \mathbb{R}^m$ , the Jacobian matrix $J_f$ linearly approximates $f$ . Its entries are $(J_f)_{ij} = \partial f_i/\partial x_j$ . The multivariable chain rule is concisely expressed as matrix multiplication: $J_{f\circ g}(x) = J_f(g(x)) J_g(x)$ . The Inverse Function Theorem states that if $J_f(a)$ is a square matrix and invertible (i.e., $\det J_f(a) \neq 0$ ), then $f$ is locally a bijection near $a$ . Similarly, the Implicit Function Theorem guarantees that an equation $F(x,y)=0$ locally defines $y$ as a differentiable function of $x$ provided the partial derivative with respect to $y$ is non-zero.

For optimization, critical points occur where $\nabla f = \mathbf{0}$ . We classify these points using the symmetric Hessian matrix $H_f$ of second partial derivatives. If $H_f$ is positive definite, the point is a local minimum; if negative definite, a local maximum; if indefinite, a saddle point. Lagrange multipliers provide a method for finding extrema of $f(x)$ subject to constraints $g(x)=0$ by solving the system $\nabla f = \lambda \nabla g$ and $g(x)=0$ .

Multiple integrals generalize the definite integral to compute areas, volumes, and hypervolumes. When changing variables from $(x,y)$ to $(u,v)$ , the area element is scaled by the absolute value of the determinant of the Jacobian: $\iint_R f(x,y)\,dx\,dy = \iint_S f(u,v)\,|\det J|\,du\,dv$ . Canonical coordinate systems simplify integration over symmetric domains:

Polar coordinates (2D): $x = r\cos\theta, y = r\sin\theta \implies dA = r\,dr\,d\theta$ .
Cylindrical coordinates (3D): $x = r\cos\theta, y = r\sin\theta, z = z \implies dV = r\,dr\,d\theta\,dz$ .
Spherical coordinates (3D): $x = \rho\sin\phi\cos\theta, y = \rho\sin\phi\sin\theta, z = \rho\cos\phi \implies dV = \rho^2\sin\phi\,d\rho\,d\phi\,d\theta$ .

A vector field $\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n$ assigns a vector to every point in space. Two fundamental differential operators act on 3D vector fields:

Divergence ( $\nabla\cdot\mathbf{F}$ ): A scalar measuring the local outward flux or “source/sink” nature of the field.
Curl ( $\nabla\times\mathbf{F}$ ): A vector measuring the local rotation or “swirl” of the field.

A line integral $\int_C \mathbf{F}\cdot d\mathbf{r}$ accumulates the tangential component of $\mathbf{F}$ along a curve $C$ , computing the total work done by a force field. A vector field is conservative if it is the gradient of a scalar potential ( $\mathbf{F} = \nabla\phi$ ). For conservative fields, $\nabla \times \mathbf{F} = \mathbf{0}$ (on simply connected domains), and the line integral is path-independent: $\int_C \mathbf{F}\cdot d\mathbf{r} = \phi(B) - \phi(A)$ .

The core of vector calculus is a trio of theorems that are special cases of the generalized Stokes’ Theorem on manifolds (which links the integral of a differential form over a boundary to the integral of its exterior derivative over the interior):

Green’s Theorem (2D): Relates a line integral around a simple closed curve to a double integral over the enclosed plane region: $\oint_C \mathbf{F}\cdot d\mathbf{r} = \iint_D (\nabla\times\mathbf{F})\cdot\mathbf{k}\,dA$ .
Stokes’ Theorem (3D): Relates the surface integral of the curl of a field to the line integral around the surface’s boundary: $\oint_{\partial S} \mathbf{F}\cdot d\mathbf{r} = \iint_S (\nabla\times\mathbf{F})\cdot d\mathbf{S}$ .
Divergence Theorem (Gauss’s Theorem, 3D): Relates the flux of a field through a closed surface to the integral of its divergence over the enclosed volume: $\iint_{\partial V} \mathbf{F}\cdot d\mathbf{S} = \iiint_V (\nabla\cdot\mathbf{F})\,dV$ .
Derivatives: Gradient, Jacobian, Hessian, Inverse/Implicit Function Theorems, Clairaut’s Theorem.
Optimization: Critical points, second derivative test (Hessian), Lagrange multipliers.
Integration: Multiple integrals, change of variables (Jacobian determinant), coordinate systems.
Vector Calculus: Divergence, curl, conservative fields, Green’s, Stokes’, and Divergence theorems.

Example (Green’s Theorem for Area). We can use Green’s theorem to compute the area of a region $D$ . Let $C = \partial D$ be traversed counterclockwise. If we choose a field $\mathbf{F}=(-y, x)/2$ , its 2D curl is $\frac{1}{2}(\partial_x(x) - \partial_y(-y)) = 1$ . By Green’s theorem, $\text{Area}(D) = \iint_D 1 \, dA = \frac{1}{2} \oint_C (x\,dy - y\,dx)$ .

Mathematical spine

J_{f\circ g}(x) = J_f(g(x)) J_g(x),\qquad dx\,dy = \left| \det \frac{\partial(x,y)}{\partial(u,v)} \right| du\,dv

\nabla f = \lambda \nabla g \text{ (Lagrange)}

\oint_{\partial S} \mathbf{F}\cdot d\mathbf{r} = \iint_S (\nabla\times\mathbf{F})\cdot d\mathbf{S},\qquad \iint_{\partial V} \mathbf{F}\cdot d\mathbf{S} = \iiint_V (\nabla\cdot\mathbf{F})\,dV

Example

Let $f(x,y) = x^2y + y^3$ .

(a) Compute the gradient and the directional derivative at $(1,2)$ in the direction of $\mathbf{u} = \left(\frac35, \frac45\right)$ .

(b) Evaluate $\iint_R (x^2+y^2)\,dA$ where $R$ is the disk $x^2+y^2 \le 4$ .

Solution.

(a) The partial derivatives are $\partial f/\partial x = 2xy$ and $\partial f/\partial y = x^2 + 3y^2$ . At $(1,2)$ : $\nabla f(1,2) = (4, 13)$ . The directional derivative is

D_{\mathbf{u}}f(1,2) = \nabla f(1,2)\cdot \mathbf{u} = 4\cdot\frac35 + 13\cdot\frac45 = \frac{12+52}{5} = \frac{64}{5} = 12.8.

(b) In polar coordinates, $x^2+y^2 = r^2$ and $dA = r\,dr\,d\theta$ . The region $R$ is $0\le r\le 2$ , $0\le\theta\le 2\pi$ .

\iint_R (x^2+y^2)\,dA = \int_0^{2\pi}\int_0^2 r^2\cdot r\,dr\,d\theta = 2\pi \int_0^2 r^3\,dr = 2\pi\left[\frac{r^4}{4}\right]_0^2 = 2\pi\cdot 4 = 8\pi.

Problems with Solutions

Problem. Find and classify the critical points of $f(x,y) = x^3 + y^3 - 3xy$ .

Solution. Set $\nabla f = (3x^2-3y, 3y^2-3x) = (0,0)$ . This gives $y=x^2$ and $x=y^2$ . Substituting yields $x=x^4$ , so $x=0$ or $x=1$ . The critical points are $(0,0)$ and $(1,1)$ . The Hessian is $H = \begin{pmatrix} 6x & -3 \\ -3 & 6y \end{pmatrix}$ . At $(0,0)$ : $H = \begin{pmatrix} 0 & -3 \\ -3 & 0 \end{pmatrix}$ , $\det H = -9 < 0$ , so $(0,0)$ is a saddle point. At $(1,1)$ : $H = \begin{pmatrix} 6 & -3 \\ -3 & 6 \end{pmatrix}$ , $\det H = 27 > 0$ and $\operatorname{tr} H = 12 > 0$ , so $(1,1)$ is a local minimum.
Problem. Use Green’s Theorem to evaluate $\oint_C (x^2-y)\,dx + (y^2+x)\,dy$ where $C$ is the circle $x^2+y^2=4$ oriented counterclockwise.

Solution. Green’s Theorem gives
$\oint_C P\,dx+Q\,dy = \iint_D \left(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\right)dA = \iint_D (1 - (-1))\,dA = 2\iint_D dA.$
The disk $D$ has area $\pi(2)^2 = 4\pi$ . Hence the integral equals $2\cdot 4\pi = 8\pi$ .
Problem. Evaluate $\iiint_E z\,dV$ where $E$ is the solid bounded by $x^2+y^2+z^2 \le 9$ in the first octant using spherical coordinates.

Solution. In spherical coordinates, $z = \rho\cos\phi$ , $dV = \rho^2\sin\phi\,d\rho\,d\phi\,d\theta$ , and the first octant means $0\le\theta\le\pi/2$ , $0\le\phi\le\pi/2$ , $0\le\rho\le 3$ .
$\iiint_E z\,dV = \int_0^{\pi/2}\int_0^{\pi/2}\int_0^3 (\rho\cos\phi)\,\rho^2\sin\phi\,d\rho\,d\phi\,d\theta = \frac{\pi}{2} \int_0^{\pi/2}\cos\phi\sin\phi\,d\phi \int_0^3 \rho^3\,d\rho.$
The $\phi$ -integral is $\frac12$ , and the $\rho$ -integral is $\frac{81}{4}$ . Thus the value is $\frac{\pi}{2}\cdot\frac12\cdot\frac{81}{4} = \frac{81\pi}{16}$ .

Section summary. Multivariable analysis translates single-variable calculus to higher dimensions. The Jacobian and Hessian matrices capture linear and quadratic approximations, enabling optimization and the Implicit/Inverse Function Theorems. Multiple integration relies on the Jacobian determinant for coordinate transformations. The crowning achievements are the integral theorems of Green, Stokes, and Gauss, which unify geometry and calculus by proving that the aggregate behavior of a field within a domain is determined entirely by its behavior on the domain’s boundary.

Probability: Conditional Probability, Random Variables, and Distributions

Core ideas

Probability mathematically quantifies uncertainty. A sample space $\Omega$ is the set of all outcomes. An event $A \subseteq \Omega$ is a subset. A probability measure $\Pr$ satisfies Kolmogorov’s axioms: $\Pr(\Omega)=1$ , $0 \le \Pr(A) \le 1$ , and $\Pr(\bigcup_i A_i) = \sum_i \Pr(A_i)$ for mutually exclusive events.

Conditional probability $\Pr(A \mid B) = \Pr(A\cap B)/\Pr(B)$ updates belief given $B$ . Events $A$ and $B$ are independent if $\Pr(A\cap B) = \Pr(A)\Pr(B)$ . Bayes’ theorem, $\Pr(A\mid B) = \Pr(B\mid A)\Pr(A)/\Pr(B)$ , reverses the conditioning. The law of total probability states $\Pr(B) = \sum_i \Pr(B\mid A_i)\Pr(A_i)$ for a partition $\{A_i\}$ .

Example (Bayes’ Theorem): Consider a disease affecting $1\%$ of a population. A test is $99\%$ accurate. If a person tests positive, the probability they actually have the disease is $\frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.01 \times 0.99} = 0.5$ . This demonstrates the base rate fallacy, highlighting the importance of the prior probability.

A random variable $X: \Omega \to \mathbb{R}$ assigns numbers to outcomes. Its cumulative distribution function (CDF) is $F_X(x) = \Pr(X \le x)$ . Discrete $X$ has a probability mass function (PMF) $p_X(x)$ ; continuous $X$ has a probability density function (PDF) $f_X(x)$ such that $\int f_X(x)dx = 1$ . Expected value $E[X] = \sum x p(x)$ or $\int x f(x)\,dx$ . The variance $\operatorname{Var}(X) = E[(X-\mu)^2] = E[X^2] - \mu^2$ , with standard deviation $\sigma_X = \sqrt{\operatorname{Var}(X)}$ .

For multiple random variables, the joint distribution captures their simultaneous behavior. Marginal distributions are obtained by summing or integrating out other variables (e.g., $f_X(x) = \int f_{X,Y}(x,y)\,dy$ ). Covariance measures joint linear variation: $\operatorname{Cov}(X,Y) = E[XY]-E[X]E[Y]$ .

Key distributions: Binomial $(n,p)$ models successes in $n$ independent trials. Poisson $(\lambda)$ models rare events in time/space. Normal $(\mu,\sigma^2)$ is the symmetric bell curve $f(x) \propto \exp(-(x-\mu)^2/(2\sigma^2))$ . Exponential $(\lambda)$ models memoryless waiting times.

Limit theorems describe large-scale behavior. Markov’s and Chebyshev’s inequalities bound tail probabilities: $\Pr(|X-\mu| \ge k\sigma) \le 1/k^2$ . The Law of Large Numbers (Weak and Strong versions) guarantees that the sample mean $\bar{X}_n \to \mu$ as $n \to \infty$ . The Central Limit Theorem (CLT) states that the sum of $n$ i.i.d.\ variables with mean $\mu$ and variance $\sigma^2$ converges in distribution to Normal $(n\mu, n\sigma^2)$ .

Proof Sketch (Chebyshev’s Inequality): By definition, $\operatorname{Var}(X) = \int (x-\mu)^2 f(x) dx \ge \int_{|x-\mu| \ge k\sigma} (x-\mu)^2 f(x) dx \ge (k\sigma)^2 \int_{|x-\mu| \ge k\sigma} f(x) dx = k^2\sigma^2 \Pr(|X-\mu| \ge k\sigma)$ . Dividing by $k^2\sigma^2$ yields the result.

Axioms & Bayes: Conditional probability, independence, Bayes’ rule.
Random variables: PMF/PDF, joint/marginal distributions, expectation, variance.
Distributions: Binomial, Poisson, Normal, Exponential.
Limit theorems: Chebyshev’s inequality, Law of Large Numbers, CLT.

Mathematical spine

\Pr(A\mid B) = \frac{\Pr(B\mid A)\Pr(A)}{\Pr(B)},\qquad \operatorname{Var}(X) = E[X^2] - (E[X])^2

\Pr(|X-\mu| \ge \varepsilon) \le \frac{\sigma^2}{\varepsilon^2} \text{ (Chebyshev)}

\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \xrightarrow{d} N(0,1) \text{ (Central Limit Theorem)}

Example

A fair six-sided die is rolled once. Let $X$ be the outcome.

(a) Compute the expected value $E[X]$ .

(b) Compute the variance $\operatorname{Var}(X)$ .

Solution.

(a) Each outcome $k\in\{1,\dots,6\}$ has probability $1/6$ . Thus

E[X] = \sum_{k=1}^6 k\cdot\frac16 = \frac{1+2+3+4+5+6}{6} = \frac{21}{6} = 3.5.

(b) First compute $E[X^2] = \sum_{k=1}^6 k^2\cdot\frac16 = \frac{1+4+9+16+25+36}{6} = \frac{91}{6}$ . Then

\operatorname{Var}(X) = E[X^2] - (E[X])^2 = \frac{91}{6} - \left(\frac{21}{6}\right)^2 = \frac{91}{6} - \frac{441}{36} = \frac{546-441}{36} = \frac{105}{36} = \frac{35}{12} \approx 2.917.

Problems with Solutions

Problem. A bag contains $3$ red and $7$ blue marbles. Two marbles are drawn without replacement. Find the probability that both are red.

Solution.
$P(\text{both red}) = \frac{3}{10}\cdot\frac{2}{9} = \frac{6}{90} = \frac{1}{15}.$
Problem. Let $X \sim \operatorname{Binomial}(n=5, p=0.3)$ . Compute $P(X=2)$ .

Solution.
$P(X=2) = \binom{5}{2}(0.3)^2(0.7)^3 = 10\cdot 0.09\cdot 0.343 = 0.3087.$
Problem. The lifetime of a device is exponentially distributed with mean $5$ years. Find the probability it lasts more than $8$ years.

Solution. For an exponential distribution, $\lambda = 1/5$ . The survival function is $P(X>x) = e^{-\lambda x}$ . Hence
$P(X>8) = e^{-8/5} = e^{-1.6} \approx 0.2019.$

Section summary. Probability grounds uncertainty in rigorous measure theory. Core concepts include conditional updates (Bayes’ rule) and distributions of random variables (joint and marginal). The field culminates in the Law of Large Numbers, which justifies expected values as long-run averages, and the Central Limit Theorem, which explains the universality of the Normal distribution.

Statistics: Estimation, Testing, Regression, and Inference

Core ideas

Statistics uses data to make inferences about populations. A parameter is a numerical characteristic of a population; a statistic is a numerical characteristic of a sample. Point estimation produces a single best guess; interval estimation gives a plausible range. Maximum likelihood estimation (MLE) chooses $\theta$ maximizing the likelihood $L(\theta) = \prod_{i=1}^n f(x_i \mid \theta)$ . For example, the MLE of the mean $\mu$ for Normal data is $\hat\mu = \bar{x} = \frac1n\sum x_i$ .

Example (MLE for Poisson): Let $X_1, \dots, X_n \sim \operatorname{Poisson}(\lambda)$ . The likelihood is $L(\lambda) = \prod \frac{e^{-\lambda}\lambda^{x_i}}{x_i!}$ . The log-likelihood is $\ell(\lambda) = -n\lambda + (\sum x_i)\ln\lambda - \sum \ln(x_i!)$ . Setting $\frac{d\ell}{d\lambda} = -n + \frac{\sum x_i}{\lambda} = 0$ yields the MLE $\hat\lambda = \bar{x}$ .

The sampling distribution of a statistic is its probability distribution over repeated samples. The standard error is the standard deviation of the sampling distribution. For i.i.d.\ data with variance $\sigma^2$ , $\operatorname{SE}(\bar{X}) = \sigma/\sqrt{n}$ . A confidence interval for the mean when $\sigma$ is known: $\bar{x} \pm z_{\alpha/2}\cdot\sigma/\sqrt{n}$ , where $z_{\alpha/2}$ is the $1-\alpha/2$ quantile of the standard normal. When $\sigma$ is unknown, use the $t$ -distribution: $\bar{x} \pm t_{\alpha/2,\,n-1}\cdot s/\sqrt{n}$ where $s$ is the sample standard deviation.

Hypothesis testing assesses evidence against a null hypothesis $H_0$ . The $p$ -value is $\Pr(\text{observed or more extreme data} \mid H_0)$ . If $p < \alpha$ (significance level), reject $H_0$ . A Type I error rejects a true $H_0$ ; Type II error fails to reject a false $H_0$ . The $z$ -test for a mean: $z = (\bar{x} - \mu_0)/(\sigma/\sqrt{n})$ . The $t$ -test uses $t = (\bar{x} - \mu_0)/(s/\sqrt{n})$ .

Linear regression models $Y = \beta_0 + \beta_1 X + \varepsilon$ , with $\varepsilon \sim N(0,\sigma^2)$ i.i.d. The least squares estimates minimize $\sum (y_i - \beta_0 - \beta_1 x_i)^2$ : $\hat\beta_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}$ , $\hat\beta_0 = \bar{y} - \hat\beta_1\bar{x}$ . The $R^2$ statistic measures the proportion of variance explained. Multiple regression extends to $p$ predictors: $\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ , with $\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ .

Proof Sketch (OLS Estimator): The objective function is $S(\boldsymbol{\beta}) = (\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta}) = \mathbf{Y}^T\mathbf{Y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}$ . Taking the gradient with respect to $\boldsymbol{\beta}$ and setting it to zero gives $-2\mathbf{X}^T\mathbf{Y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = 0$ , leading to the normal equations $\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{Y}$ . Assuming $\mathbf{X}^T\mathbf{X}$ is invertible, we obtain $\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ .

Bayesian inference treats parameters as random variables with prior $\pi(\theta)$ and posterior $\pi(\theta \mid \mathbf{x}) \propto f(\mathbf{x} \mid \theta)\,\pi(\theta)$ . The posterior mean serves as a Bayesian estimator, and credible intervals are the Bayesian analog of confidence intervals.

Estimation: MLE, confidence intervals for mean ( $z$ and $t$ ).
Testing: null/alternative, $p$ -value, $z$ -test, $t$ -test, Type I/II errors.
Regression: least squares; $\hat{\beta} = (X^TX)^{-1}X^Ty$ ; $R^2$ .
Bayesian: posterior $\propto$ likelihood $\times$ prior.

For review, be able to: compute MLEs for common distributions; construct and interpret confidence intervals; perform $z$ - and $t$ -tests; interpret $p$ -values; fit and interpret linear regression; derive posterior distributions for simple conjugate priors. Identify the estimator, the test statistic null distribution, the regression assumptions, and the prior-posterior relationship.

Mathematical spine

\bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\qquad t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}},\qquad \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

Example

A random sample of $n=64$ lightbulbs yields a sample mean lifetime $\bar{x}=1200$ hours and a sample standard deviation $s=80$ hours.

(a) Construct a $95\%$ confidence interval for the true mean lifetime $\mu$ .

(b) Test $H_0: \mu = 1150$ versus $H_1: \mu \neq 1150$ at significance level $\alpha = 0.05$ .

Solution.

(a) Because $n$ is large, we use the $z$ -interval:

\bar{x} \pm z_{0.025}\frac{s}{\sqrt{n}} = 1200 \pm 1.96\frac{80}{8} = 1200 \pm 19.6.

The $95\%$ confidence interval is $(1180.4,\,1219.6)$ hours.

(b) The test statistic is

z = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} = \frac{1200-1150}{10} = 5.0.

Since $|z| = 5.0 > 1.96 = z_{0.025}$ , we reject $H_0$ . There is strong evidence that the true mean differs from $1150$ hours.

Problems with Solutions

Problem. Given the data set $2, 4, 6, 8, 10$ , compute the sample mean and the sample variance.

Solution. The sample mean is $\bar{x} = \frac{2+4+6+8+10}{5} = 6$ . The sample variance is
$s^2 = \frac{1}{4}\sum_{i=1}^5 (x_i-6)^2 = \frac{16+4+0+4+16}{4} = \frac{40}{4} = 10.$
Problem. In a $z$ -test of $H_0: \mu=50$ with known $\sigma=10$ , a sample of size $n=25$ yields $\bar{x}=53$ . Compute the test statistic and the two-tailed $p$ -value.

Solution. The test statistic is
$z = \frac{53-50}{10/\sqrt{25}} = \frac{3}{2} = 1.5.$
The two-tailed $p$ -value is $2P(Z > 1.5) \approx 2(0.0668) = 0.1336$ .
Problem. Fit a simple linear regression $y = \beta_0 + \beta_1 x$ to the points $(1,2), (2,3), (3,5)$ .

Solution. We have $n=3$ , $\bar{x}=2$ , $\bar{y}=10/3$ .
$\sum (x_i-\bar{x})(y_i-\bar{y}) = (-1)(-4/3) + (0)(-1/3) + (1)(5/3) = 3.$ $\sum (x_i-\bar{x})^2 = (-1)^2 + 0^2 + 1^2 = 2.$
Hence $\hat\beta_1 = 3/2 = 1.5$ and $\hat\beta_0 = \bar{y} - \hat\beta_1\bar{x} = \frac{10}{3} - 3 = \frac13$ . The fitted line is $\hat y = \frac13 + \frac32 x$ .

Section summary. Statistics bridges data and decisions through estimation, testing, and modeling. Point estimates (MLE), interval estimates (confidence intervals), and hypothesis tests ( $z$ , $t$ ) form classical inference. Linear regression models relationships. Bayesian methods incorporate prior knowledge via Bayes’ theorem. These tools are the foundation of data analysis and scientific discovery.

Real Analysis: Limits, Continuity, Convergence, and Rigorous Integration

Core ideas

Real analysis provides rigorous foundations for calculus. The real numbers $\mathbb{R}$ form a complete ordered field: they are ordered (totally), form a field under $+$ and $\times$ , and satisfy the completeness axiom (every nonempty set bounded above has a supremum). A sequence $\{a_n\}$ converges to $L$ if $\forall\varepsilon>0,\;\exists N\in\mathbb{N}$ such that $n\ge N \implies |a_n - L| < \varepsilon$ . A sequence is Cauchy if $\forall\varepsilon>0,\;\exists N$ such that $m,n\ge N \implies |a_m - a_n| < \varepsilon$ . In $\mathbb{R}$ , a sequence converges iff it is Cauchy. The Bolzano-Weierstrass theorem: every bounded sequence in $\mathbb{R}$ has a convergent subsequence.

For a function $f: \mathbb{R} \to \mathbb{R}$ , $\lim_{x\to a} f(x) = L$ means $\forall\varepsilon>0,\;\exists\delta>0$ such that $0<|x-a|<\delta \implies |f(x)-L|<\varepsilon$ . $f$ is continuous at $a$ if $\lim_{x\to a}f(x) = f(a)$ . Equivalent: preimages of open sets are open. The Intermediate Value Theorem: a continuous function on $[a,b]$ attains every value between $f(a)$ and $f(b)$ . The Extreme Value Theorem: a continuous function on a closed bounded interval attains its maximum and minimum.

Uniform continuity strengthens continuity: $\forall\varepsilon>0$ , $\exists\delta>0$ that works for all $x$ simultaneously (independent of the point). On closed bounded intervals, continuity implies uniform continuity (Heine-Cantor theorem). A sequence of functions $\{f_n\}$ converges pointwise to $f$ if $f_n(x) \to f(x)$ for each $x$ , and uniformly if $\sup_x |f_n(x)-f(x)| \to 0$ . Uniform convergence preserves continuity and commutes with integration.

Riemann integration: the integral $\int_a^b f(x)\,dx$ is the limit of Riemann sums $\sum f(x_i^*)\Delta x_i$ as the partition mesh goes to zero. A function is Riemann integrable if the upper and lower sums converge (which happens if the set of discontinuities has measure zero). The Fundamental Theorem of Calculus (rigorous version): if $f$ is integrable on $[a,b]$ and $F'(x)=f(x)$ uniformly, then $\int_a^b f = F(b)-F(a)$ . The Lebesgue integral generalizes Riemann integration by partitioning the range rather than the domain, allowing integration of more functions and better convergence theorems (dominated convergence, monotone convergence).

Series $\sum_{n=1}^\infty a_n$ converge if the partial sums converge. Tests: ratio test, root test, integral test, comparison test. Power series $\sum c_n (x-a)^n$ have a radius of convergence $R = 1/\limsup \sqrt[n]{|c_n|}$ . Within the radius, they can be differentiated and integrated termwise.

Sequences: $\varepsilon$ - $N$ definition; Cauchy criterion; Bolzano-Weierstrass.
Functions: $\varepsilon$ - $\delta$ continuity; uniform continuity; IVT, EVT.
Integration: Riemann sums; FTC; Lebesgue generalization.
Series: convergence tests; power series; radius of convergence.

Example ( $\varepsilon$ - $\delta$ proof). Prove $\lim_{x\to 2}x^2=4$ . Let $\varepsilon>0$ be given. We need $\delta>0$ such that $0<|x-2|<\delta\implies|x^2-4|<\varepsilon$ . Factor: $|x^2-4|=|x-2|\,|x+2|$ . If we first require $\delta\le 1$ , then $|x-2|<1$ implies $1<x<3$ , so $|x+2|<5$ . Choose $\delta=\min\{1,\varepsilon/5\}$ . Then $|x-2|<\delta$ gives $|x^2-4|<5\cdot(\varepsilon/5)=\varepsilon$ . This is the standard pattern: bound $|x-a|$ to control $|f(x)-L|$ .

For review, be able to: write $\varepsilon$ - $N$ and $\varepsilon$ - $\delta$ proofs; prove convergence/divergence of sequences and series; determine uniform versus pointwise convergence; apply IVT and EVT; define and work with Riemann integrals; prove properties of continuous functions on compact sets. Identify the $\varepsilon$ bound, the uniform modulus of continuity, the dominating function for convergence theorems, and the partition refinement for integration.

Mathematical spine

\lim_{n\to\infty} a_n = L \iff \forall\varepsilon>0\;\exists N\; \forall n\ge N:\; |a_n-L|<\varepsilon

\lim_{x\to a}f(x)=L \iff \forall\varepsilon>0\;\exists\delta>0\; \forall x:\; 0<|x-a|<\delta \implies |f(x)-L|<\varepsilon

\int_a^b f(x)\,dx = \lim_{\|\Delta\|\to 0}\sum_{i=1}^n f(x_i^*)\,\Delta x_i

Example

(a) Prove using the $\varepsilon$ - $N$ definition that $\displaystyle\lim_{n\to\infty} \frac{3n+1}{2n-5} = \frac32$ .

(b) Determine whether the series $\displaystyle\sum_{n=1}^\infty \frac{1}{n^2}$ converges.

Solution.

(a) Let $\varepsilon>0$ be given. We need $N$ such that $n\ge N$ implies $\left|\frac{3n+1}{2n-5} - \frac32\right| < \varepsilon$ . Simplifying:

\left|\frac{3n+1}{2n-5} - \frac32\right| = \left|\frac{2(3n+1)-3(2n-5)}{2(2n-5)}\right| = \left|\frac{17}{2(2n-5)}\right| = \frac{17}{2(2n-5)}

for $n\ge 3$ . We want $\frac{17}{2(2n-5)} < \varepsilon$ , which gives $n > \frac{17}{4\varepsilon} + \frac52$ . Choose $N = \max\left\{3,\left\lceil \frac{17}{4\varepsilon} + \frac52 \right\rceil\right\}$ . Then for all $n\ge N$ , the inequality holds, proving the limit.

(b) The function $f(x)=1/x^2$ is positive, continuous, and decreasing on $[1,\infty)$ . By the integral test:

\int_1^\infty \frac{1}{x^2}\,dx = \lim_{b\to\infty}\left[-\frac1x\right]_1^b = 1 < \infty.

Since the integral converges, the series $\sum_{n=1}^\infty \frac{1}{n^2}$ also converges.

Problems with Solutions

Problem. Prove using the definition that $\displaystyle\lim_{n\to\infty} \frac{1}{\sqrt{n}} = 0$ .

Solution. Let $\varepsilon>0$ . We need $N$ such that $n\ge N \implies |1/\sqrt{n}|<\varepsilon$ . Choose $N = \lceil 1/\varepsilon^2 \rceil$ . Then for $n\ge N$ , we have $\sqrt{n}\ge 1/\varepsilon$ , so $1/\sqrt{n}\le\varepsilon$ . Hence the limit is $0$ .
Problem. Show that $f(x)=x^2$ is uniformly continuous on $[0,1]$ .

Solution. Let $\varepsilon>0$ . For any $x,y\in[0,1]$ , $|x^2-y^2| = |x+y||x-y| \le 2|x-y|$ . Choose $\delta = \varepsilon/2$ . Then $|x-y|<\delta \implies |x^2-y^2|<\varepsilon$ . Since $\delta$ does not depend on $x$ or $y$ , $f$ is uniformly continuous on $[0,1]$ .
Problem. Find the radius of convergence of the power series $\displaystyle\sum_{n=0}^\infty \frac{x^n}{n!}$ .

Solution. Using the ratio test:
$L = \lim_{n\to\infty} \left|\frac{x^{n+1}/(n+1)!}{x^n/n!}\right| = \lim_{n\to\infty} \frac{|x|}{n+1} = 0$
for all $x$ . Since $L<1$ for every $x\in\mathbb{R}$ , the radius of convergence is $R=\infty$ .

Section summary. Real analysis places calculus on rigorous foundations using $\varepsilon$ - $\delta$ arguments. Key concepts: convergence of sequences and series (Cauchy criterion, Bolzano-Weierstrass), continuity (IVT, EVT, uniform continuity), Riemann integration (FTC, integrability conditions), and function space convergence (pointwise vs.\ uniform). This material develops the logical infrastructure underpinning all of calculus.

Algebra: Groups, Rings, Fields, and Symmetry

Core ideas

Abstract algebra studies algebraic structures. A group $(G,\ast)$ is a set with a binary operation satisfying: closure, associativity $(a\ast b)\ast c = a\ast(b\ast c)$ , identity $e\ast a = a\ast e = a$ , and inverses $a\ast a^{-1} = a^{-1}\ast a = e$ . A group is abelian if it is commutative ( $a\ast b = b\ast a$ ). Examples: $(\mathbb{Z},+)$ , $(\mathbb{R}\setminus\{0\},\times)$ , the symmetric group $S_n$ (permutations of $n$ elements), $GL_n(\mathbb{R})$ (invertible $n\times n$ matrices). A subgroup $H \le G$ is a subset closed under the operation and inverses. The order of a group is its cardinality; the order of element $a$ is the smallest $n$ with $a^n = e$ .

Example (Symmetric Group): $S_3$ is the group of permutations of $\{1,2,3\}$ . It has order $3! = 6$ . It is the smallest non-abelian group.

A homomorphism $\phi: G \to H$ preserves the operation: $\phi(a\ast b) = \phi(a)\ast\phi(b)$ . The kernel $\ker\phi = \{g \mid \phi(g) = e_H\}$ is a normal subgroup; the image $\operatorname{im}\phi$ is a subgroup of $H$ . An isomorphism is a bijective homomorphism; isomorphic groups are structurally identical. Cosets of subgroup $H$ are $gH = \{gh \mid h\in H\}$ ; they partition $G$ . Lagrange’s theorem: $|H|$ divides $|G|$ . A normal subgroup $N \trianglelefteq G$ satisfies $gNg^{-1} = N$ for all $g$ , allowing the quotient group $G/N$ to be formed. The First Isomorphism Theorem: $G/\ker\phi \cong \operatorname{im}\phi$ .

A ring $(R,+,\cdot)$ has two operations: $(R,+)$ is an abelian group, multiplication is associative and distributes over addition. Examples: $(\mathbb{Z},+,\cdot)$ , polynomial ring $\mathbb{R}[x]$ , matrix ring $M_n(\mathbb{R})$ . A field is a commutative ring where nonzero elements have multiplicative inverses. Examples: $\mathbb{Q},\mathbb{R},\mathbb{C},\mathbb{Z}_p$ for prime $p$ . Polynomials over a field form a Euclidean domain with division algorithm and unique factorization. The Fundamental Theorem of Algebra: every nonconstant polynomial in $\mathbb{C}[x]$ has a root in $\mathbb{C}$ .

Proof Sketch (Lagrange’s Theorem): Let $H$ be a subgroup of a finite group $G$ . The left cosets $gH$ partition $G$ into disjoint equivalence classes. For any $g$ , the map $h \mapsto gh$ is a bijection from $H$ to $gH$ , so all cosets have size $|H|$ . Since $G$ is partitioned into $[G:H]$ cosets of size $|H|$ , we have $|G| = [G:H]|H|$ , proving $|H|$ divides $|G|$ .

Group actions formalize symmetry. A group action $G \times X \to X$ satisfies $e\cdot x = x$ and $(gh)\cdot x = g\cdot(h\cdot x)$ . The orbit $Gx = \{gx \mid g\in G\}$ and stabilizer $G_x = \{g \mid gx = x\}$ ; the orbit-stabilizer theorem: $|G| = |Gx|\cdot|G_x|$ . Applications include counting (Burnside’s lemma) and classifying symmetries of geometric objects.

Group: closure, associativity, identity, inverses; subgroup, cosets, Lagrange.
Homomorphism: kernel, image, isomorphism theorems; quotient groups.
Rings and fields: polynomial rings, Euclidean domain, $\mathbb{Z}_p$ .
Group actions: orbit, stabilizer, Burnside’s lemma.

For review, be able to: verify group axioms; find subgroups and cosets; compute in $S_n$ ; construct quotient groups; apply isomorphism theorems; factor polynomials over fields; use group actions to count configurations. Identify the group operation and identity, the kernel of a homomorphism, the field characteristic, and the orbit decomposition.

Mathematical spine

\phi(ab) = \phi(a)\phi(b),\qquad |G| = |Gx|\cdot|G_x|,\qquad G/\ker\phi \cong \operatorname{im}\phi

Example

Consider the additive group $\mathbb{Z}_{12} = \{0,1,2,\dots,11\}$ under addition modulo $12$ .

(a) Find all subgroups of $\mathbb{Z}_{12}$ .

(b) Let $H = \{0,4,8\}$ . Verify that $H$ is a subgroup and find its left cosets. Confirm Lagrange’s theorem.

Solution.

(a) The subgroups of a cyclic group $\mathbb{Z}_n$ correspond to the divisors of $n$ . The divisors of $12$ are $1,2,3,4,6,12$ . Hence the subgroups are:

$\langle 0 \rangle = \{0\}$ (order $1$ )
$\langle 6 \rangle = \{0,6\}$ (order $2$ )
$\langle 4 \rangle = \{0,4,8\}$ (order $3$ )
$\langle 3 \rangle = \{0,3,6,9\}$ (order $4$ )
$\langle 2 \rangle = \{0,2,4,6,8,10\}$ (order $6$ )
$\langle 1 \rangle = \mathbb{Z}_{12}$ (order $12$ )

(b) $H = \{0,4,8\}$ is closed under addition mod $12$ , contains $0$ , and contains inverses ( $-0=0$ , $-4=8$ , $-8=4$ ), so it is a subgroup. The left cosets are:

$0+H = \{0,4,8\}$
$1+H = \{1,5,9\}$
$2+H = \{2,6,10\}$
$3+H = \{3,7,11\}$

There are $[G:H]=4$ cosets, each of size $|H|=3$ . Indeed $|G| = 12 = 4\cdot 3 = [G:H]|H|$ , verifying Lagrange’s theorem.

Problems with Solutions

Problem. Show that the set $3\mathbb{Z} = \{3k \mid k \in \mathbb{Z}\}$ is a subgroup of $(\mathbb{Z}, +)$ .

Solution. $3\mathbb{Z}$ is nonempty ( $0\in 3\mathbb{Z}$ ). If $3a,3b\in 3\mathbb{Z}$ , then $3a+3b=3(a+b)\in 3\mathbb{Z}$ . The inverse of $3a$ is $-3a=3(-a)\in 3\mathbb{Z}$ . Thus $3\mathbb{Z}$ is a subgroup.
Problem. In the symmetric group $S_3$ , compute the product $(1\;2)(2\;3)$ and find its order.

Solution. Applying right to left: $1\mapsto 1\mapsto 2$ , $2\mapsto 3\mapsto 3$ , $3\mapsto 2\mapsto 1$ . Thus $(1\;2)(2\;3) = (1\;2\;3)$ , a 3-cycle. Its order is $3$ .
Problem. Factor the polynomial $x^4 - 1$ completely over $\mathbb{C}$ .

Solution. $x^4-1 = (x^2-1)(x^2+1) = (x-1)(x+1)(x-i)(x+i)$ .

Section summary. Algebra abstracts arithmetic to structures: groups (symmetry via invertible operations), rings (arithmetic-like structures with addition and multiplication), and fields (where division works). Key results include Lagrange’s theorem (divisibility of group orders), isomorphism theorems (structural relationships), and unique factorization in polynomial rings. Group actions connect algebra to geometry and combinatorics.

Topology: Connectedness, Compactness, and Invariants of Shape

Core ideas

Topology studies properties preserved under continuous deformations (stretching, bending, but not tearing or gluing). A topological space $(X,\mathcal{T})$ consists of a set $X$ and a collection $\mathcal{T}$ of open subsets satisfying: (i) $\varnothing, X \in \mathcal{T}$ ; (ii) arbitrary unions of open sets are open; (iii) finite intersections of open sets are open. The complement of an open set is closed. The metric topology is induced by a metric $d$ : open balls $B_r(x) = \{y \mid d(x,y) < r\}$ form a basis. $\mathbb{R}^n$ with the Euclidean metric is the motivating example.

A function $f: X \to Y$ between topological spaces is continuous if preimages of open sets are open (the topological definition). Equivalently for metric spaces, $x_n \to x \implies f(x_n) \to f(x)$ . A homeomorphism is a bijective continuous function with continuous inverse; homeomorphic spaces are topologically equivalent (they have the same “shape”). For example, a coffee mug and a donut (torus) are homeomorphic.

Connectedness: $X$ is connected if it cannot be written as a disjoint union of two nonempty open subsets. Equivalently, the only clopen (both open and closed) subsets are $\varnothing$ and $X$ . A subset of $\mathbb{R}$ is connected iff it is an interval. The continuous image of a connected set is connected (generalizing IVT). Path-connectedness (every pair of points joined by a continuous path) implies connectedness; the converse is false in general. A space is totally disconnected if its only connected subsets are singletons (e.g., $\mathbb{Q}$ ).

Compactness: $X$ is compact if every open cover has a finite subcover. In $\mathbb{R}^n$ , the Heine-Borel theorem says $K$ is compact iff it is closed and bounded. The continuous image of a compact set is compact. A continuous function on a compact set attains its maximum and minimum (Extreme Value Theorem). Compactness also implies uniform continuity on metric spaces. Sequential compactness (every sequence has a convergent subsequence) is equivalent to compactness in metric spaces.

Proof Sketch (Continuous image of compact set is compact): Let $f: X \to Y$ be continuous and $X$ compact. Consider an open cover $\{V_\alpha\}$ of $f(X)$ in $Y$ . Since $f$ is continuous, the preimages $\{f^{-1}(V_\alpha)\}$ form an open cover of $X$ . Because $X$ is compact, this has a finite subcover $f^{-1}(V_{\alpha_1}), \dots, f^{-1}(V_{\alpha_n})$ . The corresponding $\{V_{\alpha_1}, \dots, V_{\alpha_n}\}$ covers $f(X)$ . Thus, $f(X)$ is compact.

Homotopy and the fundamental group $\pi_1(X, x_0)$ classify spaces up to homotopy equivalence. A homotopy between maps $f,g: X\to Y$ is a continuous $H: X\times[0,1] \to Y$ with $H(x,0)=f(x)$ , $H(x,1)=g(x)$ . Two spaces are homotopy equivalent if there exist maps $f: X\to Y$ , $g: Y\to X$ with $g\circ f \simeq \operatorname{id}_X$ and $f\circ g \simeq \operatorname{id}_Y$ . The fundamental group consists of homotopy classes of loops based at $x_0$ . For the circle $S^1$ , $\pi_1(S^1) \cong \mathbb{Z}$ ; for $S^n$ ( $n\ge 2$ ), $\pi_1(S^n)$ is trivial. Simply connected spaces have trivial fundamental group. The Euler characteristic $\chi = V - E + F$ for polyhedra is a topological invariant; for a sphere $\chi=2$ , for a torus $\chi=0$ .

Topological space: open sets, continuity, homeomorphism.
Connectedness: connected $\iff$ no nontrivial clopen sets; path-connectedness.
Compactness: Heine-Borel in $\mathbb{R}^n$ ; extreme value theorem; uniform continuity.
Algebraic topology: fundamental group $\pi_1$ , homotopy, Euler characteristic.

For review, be able to: determine if a set is open in a given topology; prove continuity via inverse images of open sets; check connectedness and path-connectedness; apply Heine-Borel for compactness arguments; compute fundamental groups of simple spaces; understand homotopy equivalence. Identify the open cover, the connected component, the compact subset, and the loop homotopy class.

Mathematical spine

f \text{ continuous } \iff f^{-1}(U) \text{ open for all open } U

\text{Compact: } \forall\text{ open cover } \{U_\alpha\},\; \exists\text{ finite subcover}

\pi_1(S^1) \cong \mathbb{Z},\qquad \chi = V - E + F

Example

Consider $\mathbb{R}$ with the standard metric $d(x,y)=|x-y|$ .

(a) Show that the interval $(0,1)$ is an open set.

(b) Show that $[0,1]$ is compact.

Solution.

(a) Let $x\in(0,1)$ . Choose $r = \min\{x, 1-x\} > 0$ . Then the open ball $B_r(x) = (x-r, x+r)$ is contained in $(0,1)$ . Since every point has an open neighborhood inside $(0,1)$ , the set is open.

(b) By the Heine-Borel theorem, a subset of $\mathbb{R}^n$ is compact if and only if it is closed and bounded. The set $[0,1]$ is bounded (contained in $B_2(0)$ ) and closed (its complement $(-\infty,0)\cup(1,\infty)$ is open). Therefore $[0,1]$ is compact.

Problems with Solutions

Problem. In $\mathbb{R}^2$ with the Euclidean metric, show that the open ball $B_1(0,0)$ is an open set.

Solution. Let $\mathbf{p}\in B_1(0,0)$ , so $\|\mathbf{p}\|<1$ . Let $r = 1-\|\mathbf{p}\|>0$ . For any $\mathbf{q}\in B_r(\mathbf{p})$ , the triangle inequality gives $\|\mathbf{q}\| \le \|\mathbf{q}-\mathbf{p}\| + \|\mathbf{p}\| < r + \|\mathbf{p}\| = 1$ . Hence $\mathbf{q}\in B_1(0,0)$ , proving the ball is open.
Problem. Prove that the continuous image of a connected set is connected. Use this to show there is no continuous surjection $f: [0,1] \to \{0,1\}$ (where $\{0,1\}$ has the discrete topology).

Solution. Let $f: X\to Y$ be continuous and $X$ connected. Suppose $f(X)=A\cup B$ with $A,B$ disjoint nonempty open sets in $f(X)$ . Then $f^{-1}(A)$ and $f^{-1}(B)$ are disjoint nonempty open sets in $X$ whose union is $X$ , contradicting connectedness. Thus $f(X)$ is connected. The space $[0,1]$ is connected, but $\{0,1\}$ is disconnected. If a continuous surjection existed, $\{0,1\}$ would be the continuous image of a connected set and hence connected, a contradiction.
Problem. Show that $\mathbb{Q}$ is not compact in $\mathbb{R}$ .

Solution. The set $\mathbb{Q}$ is not closed in $\mathbb{R}$ (its closure is $\mathbb{R}$ ). By the Heine-Borel theorem, a subset of $\mathbb{R}$ is compact iff it is closed and bounded. Since $\mathbb{Q}$ is not closed, it is not compact. (Alternatively, the open cover $\{(-\infty,\sqrt{2}-\frac1n)\cup(\sqrt{2}+\frac1n,\infty)\}_{n\in\mathbb{N}}$ has no finite subcover.)

Section summary. Topology abstracts notions of proximity and shape. A topological space defines continuity via open sets. Connectedness captures “being all in one piece”; compactness generalizes finiteness and guarantees extreme values and uniform continuity. Algebraic topology uses the fundamental group and Euler characteristic to classify spaces up to deformation, providing invariants that distinguish spheres, tori, and other manifolds.

Numerical Mathematics: Stability, Matrix Decompositions, and Optimization

Core ideas

Numerical mathematics develops algorithms for continuous mathematical problems with guaranteed accuracy. Conditioning measures how much a problem’s output changes relative to input perturbations. The condition number $\kappa = \sup_{\delta x} (\|\delta f\|/\|f\|) / (\|\delta x\|/\|x\|)$ ; problems with large $\kappa$ are ill-conditioned. Numerical stability means the algorithm produces nearly the exact output for nearly correct input (i.e., backward stable). Forward error = $\|\tilde{x} - x\|$ ; backward error = smallest $\Delta x$ such that $f(x+\Delta x) = \tilde{x}$ .

Example (Catastrophic Cancellation): Consider $f(x) = \sqrt{x+1} - \sqrt{x}$ for large $x$ . If $x=10^8$ , $\sqrt{x+1} \approx 10000.00005$ and $\sqrt{x} = 10000$ . Subtracting them loses significant digits. Rewriting as $f(x) = \frac{1}{\sqrt{x+1} + \sqrt{x}}$ completely avoids cancellation, demonstrating how algebraic manipulation improves numerical stability.

Floating-point arithmetic approximates real numbers with finite precision machine epsilon $\epsilon_{\text{mach}} \approx 2^{-52} \approx 2.22\times10^{-16}$ in double precision. Rounding produces relative error $\approx \epsilon_{\text{mach}}$ . Catastrophic cancellation occurs when subtracting nearly equal numbers. Algorithms should avoid such cancellation and use numerically stable formulas.

Matrix decompositions are the workhorses of numerical linear algebra. LU decomposition $PA = LU$ (permutation, lower triangular, upper triangular) solves linear systems $Ax = b$ in $O(n^3)$ time via forward/backward substitution. Cholesky decomposition $A = LL^T$ applies to symmetric positive definite matrices (twice as fast). QR decomposition $A = QR$ (orthogonal $Q$ , upper triangular $R$ ) provides stable least-squares solutions. Singular value decomposition $A = U\Sigma V^T$ gives the optimal low-rank approximation (Eckart-Young theorem): truncating to rank $k$ minimizes $\|A - A_k\|_F$ over rank- $k$ matrices. Eigenvalue computation uses the QR algorithm (iterative $QR$ on shifted $A$ ).

Iterative methods solve large sparse systems: Jacobi iteration $x^{(k+1)} = D^{-1}(b - (L+U)x^{(k)})$ ; Gauss-Seidel uses latest components; conjugate gradient for SPD matrices minimizes error in the energy norm. The condition number of $A$ affects convergence rate.

Numerical optimization finds minima of functions. Gradient descent $x_{k+1} = x_k - \alpha \nabla f(x_k)$ converges linearly for well-conditioned problems. Newton’s method $x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k)$ converges quadratically near the solution but requires Hessian inversion. Quasi-Newton methods (BFGS) approximate the Hessian. For constrained optimization, Lagrange multipliers and KKT conditions characterize optimality. Linear programming minimizes $c^Tx$ subject to $Ax \le b$ , $x \ge 0$ via the simplex method or interior-point methods.

Numerical integration (quadrature) approximates $\int_a^b f(x)\,dx$ : trapezoidal rule has error $O(h^2)$ , Simpson’s rule $O(h^4)$ , Gaussian quadrature $O(h^{2n})$ . Adaptive quadrature refines subintervals where the function varies rapidly. Differential equation solvers: Euler’s method (first-order), Runge-Kutta methods (classical RK4 is fourth-order). Stiff equations require implicit methods (backward Euler, BDF) for stability.

Stability: condition number, forward/backward error, machine epsilon.
Decompositions: LU, Cholesky, QR, SVD; QR algorithm for eigenvalues.
Iterative methods: Jacobi, Gauss-Seidel, conjugate gradient.
Optimization: gradient descent, Newton, BFGS, KKT, linear programming.
Quadrature & ODEs: trapezoidal, Simpson, RK4, adaptive methods.

For review, be able to: compute condition number and interpret it; perform and apply LU, QR, and SVD factorizations; derive convergence criteria for iterative solvers; set up gradient descent and Newton iterations; understand error bounds for quadrature rules; identify stiffness in ODEs. Identify the source of ill-conditioning, the best matrix factorization for the problem, the convergence rate of iterative methods, and the dominant error term in approximation.

Mathematical spine

A = LU,\qquad A = QR,\qquad A = U\Sigma V^T

x_{k+1} = x_k - \alpha \nabla f(x_k),\qquad x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k)

Example

Solve the linear system

\begin{cases} 2x + y = 5, \\ x + 3y = 8 \end{cases}

using LU decomposition (without pivoting).

Solution. Write $A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$ . We seek $L = \begin{pmatrix} 1 & 0 \\ l_{21} & 1 \end{pmatrix}$ and $U = \begin{pmatrix} u_{11} & u_{12} \\ 0 & u_{22} \end{pmatrix}$ such that $A=LU$ . From the first row: $u_{11}=2$ , $u_{12}=1$ . From the second row: $l_{21}u_{11}=1 \implies l_{21}=1/2$ , and $l_{21}u_{12}+u_{22}=3 \implies u_{22}=3-1/2=5/2$ . Thus

L = \begin{pmatrix} 1 & 0 \\ 1/2 & 1 \end{pmatrix}, \qquad U = \begin{pmatrix} 2 & 1 \\ 0 & 5/2 \end{pmatrix}.

Solve $L\mathbf{c}=\mathbf{b}$ with $\mathbf{b}=(5,8)^T$ :

$c_1 = 5$
$\frac12 c_1 + c_2 = 8 \implies c_2 = 8 - \frac52 = \frac{11}{2}$ .

Solve $U\mathbf{x}=\mathbf{c}$ :

$\frac52 y = \frac{11}{2} \implies y = \frac{11}{5}$
$2x + y = 5 \implies x = \frac{5-y}{2} = \frac{5-\frac{11}{5}}{2} = \frac{\frac{14}{5}}{2} = \frac{7}{5}$ .

The solution is $x = 7/5$ , $y = 11/5$ .

Problems with Solutions

Problem. Estimate the condition number of $f(x) = \sqrt{x}$ at $x=100$ using the relative condition number formula $\kappa = |xf'(x)/f(x)|$ .

Solution. We have $f'(x) = \frac{1}{2\sqrt{x}}$ . Then
$\kappa = \left|\frac{x\cdot \frac{1}{2\sqrt{x}}}{\sqrt{x}}\right| = \frac{x}{2x} = \frac12.$
The problem is well-conditioned.
Problem. Perform one step of gradient descent on $f(x) = x^2 + 2x + 3$ starting from $x_0=1$ with step size $\alpha=0.25$ .

Solution. The gradient is $f'(x)=2x+2$ . At $x_0=1$ , $f'(1)=4$ .
$x_1 = x_0 - \alpha f'(x_0) = 1 - 0.25(4) = 0.$
The new iterate is $x_1 = 0$ .
Problem. Approximate $\displaystyle\int_0^1 e^{-x^2}\,dx$ using the Trapezoidal rule with $n=2$ subintervals.

Solution. The nodes are $x_0=0$ , $x_1=0.5$ , $x_2=1$ with $h=0.5$ . The Trapezoidal rule gives
$T_2 = \frac{h}{2}\left[f(0) + 2f(0.5) + f(1)\right] = \frac{0.5}{2}\left[1 + 2e^{-0.25} + e^{-1}\right] \approx 0.25\left[1 + 2(0.7788) + 0.3679\right] \approx 0.25(2.9255) \approx 0.7314.$

Section summary. Numerical mathematics bridges continuous mathematics and computational implementation. Key themes: conditioning and stability ensure reliable computation; matrix decompositions (LU, QR, SVD) provide the algorithmic backbone for linear algebra; iterative methods and optimization algorithms solve large-scale problems; quadrature and ODE solvers approximate continuous processes. The interplay of mathematical insight and algorithmic design produces efficient, accurate, and stable numerical methods.