Skip to content
Miko-pedia Miko-pedia
← マップに戻る
数学 · AIのノート · 学部 · #math

Mathematics

A concise guide to mathematics.

· 33 分

このノートの改善点や、新しく欲しいノートを提案できます。

PDFを見る

Overview. This complete note provides a unified undergraduate review of core mathematical topics, from logic and discrete structures through calculus, linear algebra, probability, and modern analysis. Each section is designed to stand alone for exam review: definitions, key theorems, representative examples, and concise summaries are included throughout. The sections follow a natural progression from foundational reasoning to advanced topics such as topology and numerical methods. Use this overview to locate weak areas and target your study efficiently.

The Language of Mathematics: Propositions, Sets, Functions, and Proofs

Core ideas

Mathematics rests on a rigorous foundation of logic and set theory. A proposition is a declarative statement that is unambiguously either true or false. Propositions are combined using logical connectives: ¬\neg (negation), \land (conjunction), \lor (disjunction),     \implies (implication), and     \iff (biconditional). An implication P    QP \implies Q is vacuously true if PP is false. A predicate P(x)P(x) is a statement whose truth depends on a variable xx, quantified by \forall (for all'') or $\exists$ (there exists”). De Morgan’s laws for quantifiers state ¬(xP(x))    x¬P(x)\neg(\forall x\,P(x)) \iff \exists x\,\neg P(x) and ¬(xP(x))    x¬P(x)\neg(\exists x\,P(x)) \iff \forall x\,\neg P(x).

A set is a collection of distinct objects. Basic operations include union ABA \cup B, intersection ABA \cap B, set difference ABA \setminus B, and the power set P(A)\mathcal{P}(A) (the set of all subsets of AA). The Cartesian product A×B={(a,b)aA,  bB}A \times B = \{(a,b) \mid a \in A,\; b \in B\} produces ordered pairs. A relation on AA and BB is a subset RA×BR \subseteq A \times B. An equivalence relation \sim on AA is a relation that is reflexive (aaa \sim a), symmetric (ab    baa \sim b \implies b \sim a), and transitive (abbc    aca \sim b \land b \sim c \implies a \sim c). It partitions AA into disjoint equivalence classes [a]={xAxa}[a] = \{x \in A \mid x \sim a\}.

A function f:ABf : A \to B is a relation where each aAa \in A is paired with exactly one bBb \in B. A function is injective (one-to-one) if f(a1)=f(a2)    a1=a2f(a_1) = f(a_2) \implies a_1 = a_2; surjective (onto) if bB,aA\forall b \in B, \exists a \in A with f(a)=bf(a)=b; and bijective if both hold. Bijections establish that two sets have the same cardinality A=B|A| = |B|. A set is countable if it is in bijection with a subset of N\mathbb{N}. Cantor’s Theorem states that A<P(A)|A| < |\mathcal{P}(A)|, proving the existence of uncountably infinite sets (e.g., R\mathbb{R}).

Proof techniques are the tools of mathematical reasoning: Direct proof: assume hypotheses, deduce conclusion via logical steps. Proof by contrapositive: prove P    QP \implies Q by showing ¬Q    ¬P\neg Q \implies \neg P. Proof by contradiction: assume the negation of the desired statement and derive a logical impossibility. Mathematical induction: prove P(1)P(1) (base case) and P(k)    P(k+1)P(k) \implies P(k+1) (inductive step) for all kNk \in \mathbb{N}.

  • Logic: connectives, quantifiers, De Morgan’s laws.
  • Set Theory: operations, relations, equivalence classes, power sets, cardinality.
  • Functions: injectivity, surjectivity, bijections, and inverse functions.
  • Proof methods: direct, contrapositive, contradiction, induction.

Mathematical spine

Truth table for Implication: PQP    QTTTTFFFTTFFT\text{Truth table for Implication: } \begin{array}{c|c|c} P & Q & P\implies Q \\ \hline T & T & T \\ T & F & F \\ F & T & T \\ F & F & T \end{array} De Morgan: ¬(PQ)    ¬P¬Q,¬(PQ)    ¬P¬Q\text{De Morgan: } \neg(P\land Q) \iff \neg P \lor \neg Q,\qquad \neg(P\lor Q) \iff \neg P \land \neg Q Equivalence relation: aa,ab    ba,abbc    ac\text{Equivalence relation: } a \sim a,\quad a \sim b \implies b \sim a,\quad a \sim b \land b \sim c \implies a \sim c

Example

Consider the function f:RRf: \mathbb{R} \to \mathbb{R} defined by f(x)=2x+3f(x) = 2x + 3.

  1. Injectivity: Suppose f(a)=f(b)f(a) = f(b). Then 2a+3=2b+32a + 3 = 2b + 3, which simplifies to a=ba = b. Thus ff is injective.
  2. Surjectivity: For any yRy \in \mathbb{R}, set x=(y3)/2x = (y-3)/2. Then f(x)=2((y3)/2)+3=yf(x) = 2((y-3)/2) + 3 = y. Thus ff is surjective.
  3. Inverse: Since ff is bijective, it has an inverse f1(y)=(y3)/2f^{-1}(y) = (y-3)/2.

Next, prove by induction that k=1n(2k1)=n2\sum_{k=1}^n (2k-1) = n^2 for all nNn \in \mathbb{N}.

  • Base case (n=1n=1): LHS =2(1)1=1= 2(1)-1 = 1, RHS =12=1= 1^2 = 1.
  • Inductive step: Assume k=1m(2k1)=m2\sum_{k=1}^m (2k-1) = m^2. Then k=1m+1(2k1)=m2+(2(m+1)1)=m2+2m+1=(m+1)2.\sum_{k=1}^{m+1} (2k-1) = m^2 + (2(m+1)-1) = m^2 + 2m + 1 = (m+1)^2. Thus the formula holds for all nNn \in \mathbb{N}.

Problems with Solutions

  1. Problem. Prove by contrapositive: If n2n^2 is even, then nn is even.

    Solution. We prove ¬Q    ¬P\neg Q \implies \neg P: if nn is odd, then n2n^2 is odd. If nn is odd, n=2k+1n=2k+1 for some kZk \in \mathbb{Z}. Then n2=4k2+4k+1=2(2k2+2k)+1n^2 = 4k^2+4k+1 = 2(2k^2+2k)+1, which is odd. Hence the original statement is true.

  2. Problem. Let A={1,2,3}A = \{1,2,3\} and B={2,3,4}B = \{2,3,4\}. Compute ABA \cup B, ABA \cap B, ABA \setminus B, and A×B|A \times B|.

    Solution. AB={1,2,3,4}A \cup B = \{1,2,3,4\}, AB={2,3}A \cap B = \{2,3\}, AB={1}A \setminus B = \{1\}. The Cartesian product has AB=34=12|A|\cdot|B| = 3 \cdot 4 = 12 elements.

  3. Problem. Use induction to prove that k=1nk=n(n+1)2\sum_{k=1}^n k = \frac{n(n+1)}{2} for all n1n \ge 1.

    Solution. Base case n=1n=1: 1=1(2)/21 = 1(2)/2. Assume true for n=mn=m. Then

    k=1m+1k=m(m+1)2+(m+1)=m(m+1)+2(m+1)2=(m+1)(m+2)2.\sum_{k=1}^{m+1} k = \frac{m(m+1)}{2} + (m+1) = \frac{m(m+1)+2(m+1)}{2} = \frac{(m+1)(m+2)}{2}.

    Thus the statement holds for all n1n \ge 1.

Section summary. The language of mathematics is built upon formal logic and set theory. Propositions, predicates, and quantifiers provide the syntax for rigorous statements. Sets, relations, and functions offer the foundational structures for modeling mathematical objects. Mastery of proof techniques---including induction, contrapositive, and contradiction---is required to rigorously verify properties of these structures.

Discrete Structures: Induction, Combinatorics, Graphs, and Algorithms

Core ideas

Discrete mathematics studies countable, distinct structures. Mathematical induction (weak and strong) is the standard proof technique for statements concerning natural numbers. Recursion defines sequences or sets in terms of previous elements, often solvable via generating functions (e.g., representing sequence {an}\{a_n\} as a formal power series anxn\sum a_n x^n) or characteristic equations.

Combinatorics counts finite arrangements. The multiplication principle: if task AA has mm ways and task BB has nn ways, the pair (A,B)(A,B) has mnm \cdot n ways. A permutation is an ordered arrangement: P(n,k)=n!/(nk)!P(n,k) = n!/(n-k)!. A combination is an unordered selection: (nk)=n!/(k!(nk)!)\binom{n}{k} = n!/(k!(n-k)!). The Binomial Theorem states (x+y)n=k=0n(nk)xnkyk(x+y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k. The Pigeonhole Principle guarantees that placing n>mn > m items into mm boxes results in at least one box with 2\ge 2 items. The Principle of Inclusion-Exclusion (PIE) computes the union of overlapping sets: AB=A+BAB|A \cup B| = |A| + |B| - |A \cap B|, expanding to alternating sums for more sets.

A graph G=(V,E)G = (V,E) consists of vertices VV and edges EE. The Handshaking Lemma states vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2|E|. A tree is a connected acyclic graph on nn vertices, having exactly n1n-1 edges. A graph is bipartite if its vertices can be partitioned into two sets where all edges connect the two sets; equivalently, it contains no odd cycles. An Eulerian circuit traverses every edge exactly once (exists iff all vertices have even degrees and GG is connected). A Hamiltonian cycle visits every vertex exactly once. For connected planar graphs (drawable without edge crossings), Euler’s Formula states VE+F=2V - E + F = 2, where FF is the number of faces. Graph coloring assigns colors to vertices so adjacent vertices differ; the chromatic number χ(G)\chi(G) is the minimum colors needed. By the Four Color Theorem, χ(G)4\chi(G) \le 4 for planar graphs.

Algorithms provide step-by-step procedures to solve discrete problems. Big-O notation, f(n)=O(g(n))f(n) = O(g(n)), describes worst-case asymptotic upper bounds: f(n)cg(n)|f(n)| \le c|g(n)| for sufficiently large nn. The Euclidean algorithm computes gcd(a,b)\gcd(a,b) in O(log(min(a,b)))O(\log(\min(a,b))) steps. Graph algorithms include Breadth-first search (BFS) for shortest paths in unweighted graphs, Depth-first search (DFS) for topological sorting and cycle detection, and Dijkstra’s algorithm for shortest paths with non-negative weights.

  • Counting: permutations, combinations, Pigeonhole Principle, PIE, generating functions.
  • Graphs: trees, bipartite graphs, Eulerian/Hamiltonian paths, planar graphs (Euler’s formula).
  • Algorithms: BFS, DFS, Dijkstra, Euclidean algorithm, Big-O complexity.
  • Induction: strong induction, structural induction.

Mathematical spine

(nk)=n!k!(nk)!,(x+y)n=k=0n(nk)xnkyk\binom{n}{k} = \frac{n!}{k!(n-k)!},\qquad (x+y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k ABC=A+B+CABACBC+ABC|A \cup B \cup C| = |A|+|B|+|C| - |A\cap B| - |A\cap C| - |B\cap C| + |A\cap B\cap C| vVdeg(v)=2E,VE+F=2 (planar graphs)\sum_{v\in V}\deg(v) = 2|E|,\qquad V - E + F = 2 \text{ (planar graphs)}

Example

A computer password must be exactly 66 characters long and is formed using uppercase letters (2626 choices) and digits (1010 choices). A valid password must contain at least one digit. How many valid passwords exist?

Solution. The total number of 66-character strings is 36636^6. The number of strings with no digits (only letters) is 26626^6. By the subtraction principle, the number of valid passwords is

366266=2176782336308915776=1867866560.36^6 - 26^6 = 2\,176\,782\,336 - 308\,915\,776 = 1\,867\,866\,560.

Now consider a simple graph GG with V={A,B,C,D}V = \{A,B,C,D\} and E={AB,BC,CD,DA,AC}E = \{AB, BC, CD, DA, AC\}. Does GG contain an Eulerian circuit?

Solution. The degrees are deg(A)=3\deg(A)=3, deg(B)=2\deg(B)=2, deg(C)=3\deg(C)=3, deg(D)=2\deg(D)=2. Since vertices AA and CC have odd degree, the graph does not have an Eulerian circuit. (An Eulerian circuit requires all vertices to have even degree.)

Problems with Solutions

  1. Problem. A password is 44 characters long, chosen from 2626 letters and 1010 digits. How many such passwords contain at least one digit?

    Solution. Total strings: 36436^4. Strings with no digits: 26426^4. Valid passwords: 364264=1679616456976=122264036^4 - 26^4 = 1\,679\,616 - 456\,976 = 1\,222\,640.

  2. Problem. Compute (73)\binom{7}{3}.

    Solution.

    (73)=7!3!4!=765321=35.\binom{7}{3} = \frac{7!}{3!\,4!} = \frac{7\cdot 6\cdot 5}{3\cdot 2\cdot 1} = 35.
  3. Problem. A graph has 55 vertices with degrees 2,3,3,3,32, 3, 3, 3, 3. How many edges does it have?

    Solution. By the Handshaking Lemma, deg(v)=2E\sum \deg(v) = 2|E|. The sum of degrees is 2+3+3+3+3=142+3+3+3+3 = 14. Hence 2E=142|E|=14, so E=7|E|=7.

Section summary. Discrete structures encompass combinatorial counting (permutations, combinations, inclusion-exclusion), graph theory (trees, planarity, colorings), and algorithmic analysis (Big-O notation, graph traversal). Mastery of these principles is critical for analyzing finite structures, solving recurrence relations, and establishing computational complexity.

Single Variable Calculus: Change, Approximation, and Integration

Core ideas

Calculus studies continuous change and accumulation. The limit limxaf(x)=L\lim_{x\to a} f(x) = L means f(x)f(x) approaches LL as xx approaches aa. A function is continuous at aa if limxaf(x)=f(a)\lim_{x\to a} f(x) = f(a). For sequences, limnan=L\lim_{n \to \infty} a_n = L means terms become arbitrarily close to LL as nn grows. The Intermediate Value Theorem states: if ff is continuous on [a,b][a,b], it takes on every value between f(a)f(a) and f(b)f(b).

The derivative measures the instantaneous rate of change:

f(a)=limh0f(a+h)f(a)h.f'(a) = \lim_{h\to 0} \frac{f(a+h) - f(a)}{h}.

Differentiability implies continuity. Key differentiation rules: product rule (fg)=fg+fg(fg)' = f'g + fg'; quotient rule (f/g)=(fgfg)/g2(f/g)' = (f'g - fg')/g^2; chain rule (fg)(x)=f(g(x))g(x)(f\circ g)'(x) = f'(g(x))\,g'(x). The Mean Value Theorem (MVT): if ff is continuous on [a,b][a,b] and differentiable on (a,b)(a,b), then f(c)=(f(b)f(a))/(ba)f'(c) = (f(b)-f(a))/(b-a) for some c(a,b)c\in(a,b). Applications include optimization (critical points where f(x)=0f'(x)=0), and L’H^opital’s rule for limits of indeterminate forms like 0/00/0. Functions can be locally approximated by polynomials. Taylor’s Theorem states f(x)=k=0nf(k)(a)k!(xa)k+Rn(x)f(x) = \sum_{k=0}^n \frac{f^{(k)}(a)}{k!}(x-a)^k + R_n(x), where the remainder Rn(x)=f(n+1)(c)(n+1)!(xa)n+1R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}(x-a)^{n+1} for some cc between aa and xx (Lagrange form).

Integration calculates accumulation. The definite integral abf(x)dx\int_a^b f(x)\,dx is rigorously defined as the limit of Riemann sums limni=1nf(xi)Δx\lim_{n \to \infty} \sum_{i=1}^n f(x_i^*)\Delta x as the partition width Δx0\Delta x \to 0. It represents the signed area under the curve. The Fundamental Theorem of Calculus (FTC) bridges derivatives and integrals. Part 1: if ff is continuous, ddxaxf(t)dt=f(x)\frac{d}{dx}\int_a^x f(t)\,dt = f(x). Part 2: abf(x)dx=F(b)F(a)\int_a^b f(x)\,dx = F(b) - F(a) where F(x)=f(x)F'(x) = f(x). Integration techniques include substitution (u=g(x)u = g(x)), integration by parts (udv=uvvdu\int u\,dv = uv - \int v\,du), and partial fractions.

  • Limits & Continuity: one-sided limits, sequence limits, IVT.
  • Derivatives: rules, MVT, Taylor series with remainder, optimization.
  • Integrals: Riemann sums, definite/indefinite integrals, FTC.
  • Techniques: substitution, integration by parts, L’H^opital’s rule.

Mathematical spine

f(a)=limh0f(a+h)f(a)h,abf(x)dx=limni=1nf(xi)Δx\boxed{f'(a) = \lim_{h\to 0}\frac{f(a+h)-f(a)}{h}},\qquad \boxed{\int_a^b f(x)\,dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*)\Delta x} abf(x)dx=F(b)F(a),ddxaxf(t)dt=f(x)\int_a^b f(x)\,dx = F(b)-F(a),\qquad \frac{d}{dx}\int_a^x f(t)\,dt = f(x) f(x)=f(a)+f(a)(xa)+f(a)2(xa)2++f(n)(a)n!(xa)n+Rn(x)f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2}(x-a)^2 + \dots + \frac{f^{(n)}(a)}{n!}(x-a)^n + R_n(x)

Example

Approximate 1.1\sqrt{1.1} using the second-order Taylor polynomial for f(x)=xf(x)=\sqrt{x} centered at a=1a=1, and estimate the error.

Solution. We have f(x)=x1/2f(x)=x^{1/2}, f(x)=12x1/2f'(x)=\frac{1}{2}x^{-1/2}, f(x)=14x3/2f''(x)=-\frac{1}{4}x^{-3/2}, and f(x)=38x5/2f'''(x)=\frac{3}{8}x^{-5/2}. At a=1a=1: f(1)=1f(1)=1, f(1)=12f'(1)=\frac12, f(1)=14f''(1)=-\frac14. The second-order Taylor polynomial is

P2(x)=1+12(x1)18(x1)2.P_2(x) = 1 + \frac12(x-1) - \frac18(x-1)^2.

Evaluating at x=1.1x=1.1:

P2(1.1)=1+12(0.1)18(0.01)=1+0.050.00125=1.04875.P_2(1.1) = 1 + \frac12(0.1) - \frac18(0.01) = 1 + 0.05 - 0.00125 = 1.04875.

The exact value is 1.11.0488088\sqrt{1.1}\approx 1.0488088, so the error is about 5.9×1055.9\times10^{-5}. Using the Lagrange remainder:

R2(1.1)=f(c)3!(0.1)3=3/86c5/2(0.001)6.25×105c5/2R_2(1.1) = \frac{f'''(c)}{3!}(0.1)^3 = \frac{3/8}{6} c^{-5/2}(0.001) \approx 6.25\times10^{-5}\,c^{-5/2}

for some c(1,1.1)c\in(1,1.1). Since c1c\ge 1, R26.25×105|R_2|\le 6.25\times10^{-5}, consistent with the actual error.

Problems with Solutions

  1. Problem. Evaluate limx0sinxxx3\displaystyle\lim_{x\to 0} \frac{\sin x - x}{x^3} using L’Hôpital’s rule.

    Solution. The limit is 0/00/0. Applying L’Hôpital three times:

    limx0sinxxx3=limx0cosx13x2=limx0sinx6x=limx0cosx6=16.\lim_{x\to 0}\frac{\sin x - x}{x^3} = \lim_{x\to 0}\frac{\cos x - 1}{3x^2} = \lim_{x\to 0}\frac{-\sin x}{6x} = \lim_{x\to 0}\frac{-\cos x}{6} = -\frac16.
  2. Problem. Find the derivative of f(x)=ex2lnxf(x) = e^{x^2}\ln x.

    Solution. Using the product and chain rules:

    f(x)=(2xex2)lnx+ex21x=ex2(2xlnx+1x).f'(x) = (2x e^{x^2})\ln x + e^{x^2}\cdot\frac1x = e^{x^2}\left(2x\ln x + \frac1x\right).
  3. Problem. Compute 0π/2xsinxdx\displaystyle\int_0^{\pi/2} x\sin x\,dx using integration by parts.

    Solution. Let u=xu=x, dv=sinxdxdv=\sin x\,dx. Then du=dxdu=dx, v=cosxv=-\cos x.

    0π/2xsinxdx=[xcosx]0π/2+0π/2cosxdx=(00)+[sinx]0π/2=1.\int_0^{\pi/2} x\sin x\,dx = \Bigl[-x\cos x\Bigr]_0^{\pi/2} + \int_0^{\pi/2}\cos x\,dx = (0-0) + \Bigl[\sin x\Bigr]_0^{\pi/2} = 1.

Section summary. Single-variable calculus formalizes continuous change and area. Limits define both the derivative (instantaneous rate of change) and the integral (accumulation via Riemann sums). Taylor series extend linear approximation to polynomial approximation. The Fundamental Theorem of Calculus profoundly unites these two branches, enabling analytical evaluation of integrals and solving differential equations.

Linear Algebra: Spaces, Maps, Eigenvalues, and Projections

Core ideas

Linear algebra studies vector spaces and linear transformations. A vector space VV over a field F\mathbb{F} is a set closed under vector addition and scalar multiplication. A subspace UVU \subseteq V is a nonempty subset closed under these operations. The span of vectors {v1,,vk}\{v_1,\dots,v_k\} is the set of all linear combinations. A set is linearly independent if civi=0\sum c_i v_i = \mathbf{0} implies all ci=0c_i = 0. A basis is a linearly independent spanning set; its size is the dimension dimV\dim V. The dual space VV^* is the vector space of all linear maps (functionals) from VV to F\mathbb{F}.

A linear transformation T:VWT: V \to W preserves operations. The kernel (null space) kerT\ker T and image (range) imT\operatorname{im} T are subspaces. The rank-nullity theorem states dimkerT+dimimT=dimV\dim\ker T + \dim\operatorname{im}T = \dim V. Every linear map between finite-dimensional spaces can be represented by a matrix. The determinant detA\det A of a square matrix measures the volume scaling factor; AA is invertible iff detA0\det A \neq 0.

An eigenvector v0v \neq \mathbf{0} of AA satisfies Av=λvAv = \lambda v, where λ\lambda is the eigenvalue. Eigenvalues are roots of the characteristic polynomial pA(λ)=det(AλI)p_A(\lambda) = \det(A - \lambda I). The Cayley-Hamilton Theorem states that every square matrix satisfies its own characteristic equation: pA(A)=0p_A(A) = 0. A matrix is diagonalizable if it has a basis of eigenvectors. The Spectral Theorem guarantees every real symmetric matrix is orthogonally diagonalizable with real eigenvalues. If a matrix is not diagonalizable, it can be put into Jordan canonical form. The Singular Value Decomposition (SVD) factorizes any ARm×nA \in \mathbb{R}^{m\times n} as A=UΣVTA = U\Sigma V^T, with U,VU, V orthogonal and Σ\Sigma diagonal.

An inner product space is a vector space with an inner product x,y\langle x,y \rangle (generalizing the dot product). It induces a norm x=x,x\|x\| = \sqrt{\langle x,x \rangle} and a notion of orthogonality (x,y=0\langle x,y \rangle = 0). Orthogonal projection onto a subspace UU minimizes the distance to UU. The Gram-Schmidt process converts any basis into an orthonormal basis. In systems Ax=bAx = b with no exact solution, the least squares solution minimizes Axb2\|Ax-b\|^2 via the normal equations ATAx=ATbA^T A x = A^T b.

  • Vector space: basis, dimension, dual space, linear independence.
  • Linear maps: rank-nullity theorem, matrix representation, determinants.
  • Eigen-theory: characteristic polynomial, Cayley-Hamilton, Spectral Theorem, SVD.
  • Inner products: Gram-Schmidt, orthogonal projections, least squares.

Mathematical spine

det(AλI)=0,pA(A)=0 (Cayley-Hamilton)\det(A-\lambda I)=0,\qquad p_A(A) = 0 \text{ (Cayley-Hamilton)} A=UΣVT,ATAx^=ATb (Least Squares)A = U\Sigma V^T,\qquad A^TA\hat{x} = A^T b \text{ (Least Squares)} projU(v)=i=1kv,uiui (for orthonormal {ui})\operatorname{proj}_U(v) = \sum_{i=1}^k \langle v,u_i\rangle u_i \text{ (for orthonormal } \{u_i\})

Example

Find the eigenvalues and eigenvectors of A=(4213)A = \begin{pmatrix} 4 & 2 \\ 1 & 3 \end{pmatrix}. Determine whether AA is diagonalizable.

Solution. The characteristic polynomial is

det(AλI)=(4λ)(3λ)2=λ27λ+10=(λ5)(λ2).\det(A-\lambda I) = (4-\lambda)(3-\lambda) - 2 = \lambda^2 - 7\lambda + 10 = (\lambda-5)(\lambda-2).

Thus the eigenvalues are λ1=5\lambda_1 = 5 and λ2=2\lambda_2 = 2.

For λ1=5\lambda_1 = 5: solve (A5I)v=0(A-5I)\mathbf{v} = \mathbf{0}:

(1212)(v1v2)=0    v1=2v2.\begin{pmatrix} -1 & 2 \\ 1 & -2 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \mathbf{0} \implies v_1 = 2v_2.

An eigenvector is v1=(2,1)T\mathbf{v}_1 = (2,1)^T.

For λ2=2\lambda_2 = 2: solve (A2I)v=0(A-2I)\mathbf{v} = \mathbf{0}:

(2211)(v1v2)=0    v1=v2.\begin{pmatrix} 2 & 2 \\ 1 & 1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \mathbf{0} \implies v_1 = -v_2.

An eigenvector is v2=(1,1)T\mathbf{v}_2 = (1,-1)^T.

Since we found two linearly independent eigenvectors, AA is diagonalizable: A=PDP1A = PDP^{-1} with P=(2111)P = \begin{pmatrix} 2 & 1 \\ 1 & -1 \end{pmatrix} and D=(5002)D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}.

Problems with Solutions

  1. Problem. Determine whether the vectors v1=(1,2,3)\mathbf{v}_1 = (1,2,3), v2=(0,1,2)\mathbf{v}_2 = (0,1,2), v3=(1,3,5)\mathbf{v}_3 = (1,3,5) are linearly independent.

    Solution. We check if c1v1+c2v2+c3v3=0c_1\mathbf{v}_1+c_2\mathbf{v}_2+c_3\mathbf{v}_3=\mathbf{0} has only the trivial solution. Note that v3=v1+v2\mathbf{v}_3 = \mathbf{v}_1+\mathbf{v}_2. Hence 1v1+1v21v3=01\cdot\mathbf{v}_1 + 1\cdot\mathbf{v}_2 - 1\cdot\mathbf{v}_3 = \mathbf{0}, a nontrivial relation. The vectors are linearly dependent.

  2. Problem. Compute the determinant of B=(123045006)B = \begin{pmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{pmatrix}.

    Solution. For an upper triangular matrix, the determinant is the product of the diagonal entries:

    det(B)=146=24.\det(B) = 1\cdot 4\cdot 6 = 24.
  3. Problem. Find the least-squares solution to Ax=bA\mathbf{x} = \mathbf{b} where A=(111213)A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix} and b=(235)\mathbf{b} = \begin{pmatrix} 2 \\ 3 \\ 5 \end{pmatrix}.

    Solution. The normal equations are ATAx^=ATbA^TA\hat{\mathbf{x}} = A^T\mathbf{b}.

    ATA=(36614),ATb=(1023).A^TA = \begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix}, \qquad A^T\mathbf{b} = \begin{pmatrix} 10 \\ 23 \end{pmatrix}.

    Solving (36614)(x1x2)=(1023)\begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 10 \\ 23 \end{pmatrix}: from the first equation 3x1+6x2=10    x1=106x233x_1+6x_2=10 \implies x_1 = \frac{10-6x_2}{3}. Substituting into the second: 6(106x23)+14x2=23    2012x2+14x2=23    2x2=3    x2=1.56(\frac{10-6x_2}{3}) + 14x_2 = 23 \implies 20 - 12x_2 + 14x_2 = 23 \implies 2x_2 = 3 \implies x_2 = 1.5. Then x1=1093=13x_1 = \frac{10-9}{3} = \frac13. The least-squares solution is x^=(1/33/2)\hat{\mathbf{x}} = \begin{pmatrix} 1/3 \\ 3/2 \end{pmatrix}.

Section summary. Linear algebra provides the framework for analyzing flat spaces and linear mappings. Core topics include dimension and basis (coordinates), the rank-nullity theorem, and determinants. Eigenvalues, diagonalization, and the Cayley-Hamilton theorem analyze matrix powers and structure, while inner products formalize geometry (orthogonality and projection), forming the bedrock for fields ranging from quantum mechanics to machine learning.

Multivariable Analysis: Gradients, Multiple Integrals, and Vector Fields

Core ideas

Multivariable calculus extends the concepts of limits, derivatives, and integrals to functions of several variables. For a scalar function f:RnRf: \mathbb{R}^n \to \mathbb{R}, the partial derivative f/xi\partial f/\partial x_i represents the rate of change of ff in the direction of the xix_i-axis, holding all other variables constant. Clairaut’s Theorem ensures that if the mixed second-order partial derivatives are continuous, they are equal: 2f/xixj=2f/xjxi\partial^2 f/\partial x_i \partial x_j = \partial^2 f/\partial x_j \partial x_i.

The gradient vector, denoted f=(fx1,,fxn)\nabla f = \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right), is central to optimization because it points in the direction of steepest ascent. Furthermore, the gradient is everywhere orthogonal to the level sets of ff. The directional derivative Duf=fuD_{\mathbf{u}} f = \nabla f \cdot \mathbf{u} computes the rate of change of ff in the direction of an arbitrary unit vector u\mathbf{u}.

For a vector-valued function f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m, the Jacobian matrix JfJ_f linearly approximates ff. Its entries are (Jf)ij=fi/xj(J_f)_{ij} = \partial f_i/\partial x_j. The multivariable chain rule is concisely expressed as matrix multiplication: Jfg(x)=Jf(g(x))Jg(x)J_{f\circ g}(x) = J_f(g(x)) J_g(x). The Inverse Function Theorem states that if Jf(a)J_f(a) is a square matrix and invertible (i.e., detJf(a)0\det J_f(a) \neq 0), then ff is locally a bijection near aa. Similarly, the Implicit Function Theorem guarantees that an equation F(x,y)=0F(x,y)=0 locally defines yy as a differentiable function of xx provided the partial derivative with respect to yy is non-zero.

For optimization, critical points occur where f=0\nabla f = \mathbf{0}. We classify these points using the symmetric Hessian matrix HfH_f of second partial derivatives. If HfH_f is positive definite, the point is a local minimum; if negative definite, a local maximum; if indefinite, a saddle point. Lagrange multipliers provide a method for finding extrema of f(x)f(x) subject to constraints g(x)=0g(x)=0 by solving the system f=λg\nabla f = \lambda \nabla g and g(x)=0g(x)=0.

Multiple integrals generalize the definite integral to compute areas, volumes, and hypervolumes. When changing variables from (x,y)(x,y) to (u,v)(u,v), the area element is scaled by the absolute value of the determinant of the Jacobian: Rf(x,y)dxdy=Sf(u,v)detJdudv\iint_R f(x,y)\,dx\,dy = \iint_S f(u,v)\,|\det J|\,du\,dv. Canonical coordinate systems simplify integration over symmetric domains:

  • Polar coordinates (2D): x=rcosθ,y=rsinθ    dA=rdrdθx = r\cos\theta, y = r\sin\theta \implies dA = r\,dr\,d\theta.
  • Cylindrical coordinates (3D): x=rcosθ,y=rsinθ,z=z    dV=rdrdθdzx = r\cos\theta, y = r\sin\theta, z = z \implies dV = r\,dr\,d\theta\,dz.
  • Spherical coordinates (3D): x=ρsinϕcosθ,y=ρsinϕsinθ,z=ρcosϕ    dV=ρ2sinϕdρdϕdθx = \rho\sin\phi\cos\theta, y = \rho\sin\phi\sin\theta, z = \rho\cos\phi \implies dV = \rho^2\sin\phi\,d\rho\,d\phi\,d\theta.

A vector field F:RnRn\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n assigns a vector to every point in space. Two fundamental differential operators act on 3D vector fields:

  • Divergence (F\nabla\cdot\mathbf{F}): A scalar measuring the local outward flux or “source/sink” nature of the field.
  • Curl (×F\nabla\times\mathbf{F}): A vector measuring the local rotation or “swirl” of the field.

A line integral CFdr\int_C \mathbf{F}\cdot d\mathbf{r} accumulates the tangential component of F\mathbf{F} along a curve CC, computing the total work done by a force field. A vector field is conservative if it is the gradient of a scalar potential (F=ϕ\mathbf{F} = \nabla\phi). For conservative fields, ×F=0\nabla \times \mathbf{F} = \mathbf{0} (on simply connected domains), and the line integral is path-independent: CFdr=ϕ(B)ϕ(A)\int_C \mathbf{F}\cdot d\mathbf{r} = \phi(B) - \phi(A).

The core of vector calculus is a trio of theorems that are special cases of the generalized Stokes’ Theorem on manifolds (which links the integral of a differential form over a boundary to the integral of its exterior derivative over the interior):

  • Green’s Theorem (2D): Relates a line integral around a simple closed curve to a double integral over the enclosed plane region: CFdr=D(×F)kdA\oint_C \mathbf{F}\cdot d\mathbf{r} = \iint_D (\nabla\times\mathbf{F})\cdot\mathbf{k}\,dA.
  • Stokes’ Theorem (3D): Relates the surface integral of the curl of a field to the line integral around the surface’s boundary: SFdr=S(×F)dS\oint_{\partial S} \mathbf{F}\cdot d\mathbf{r} = \iint_S (\nabla\times\mathbf{F})\cdot d\mathbf{S}.
  • Divergence Theorem (Gauss’s Theorem, 3D): Relates the flux of a field through a closed surface to the integral of its divergence over the enclosed volume: VFdS=V(F)dV\iint_{\partial V} \mathbf{F}\cdot d\mathbf{S} = \iiint_V (\nabla\cdot\mathbf{F})\,dV.
  • Derivatives: Gradient, Jacobian, Hessian, Inverse/Implicit Function Theorems, Clairaut’s Theorem.
  • Optimization: Critical points, second derivative test (Hessian), Lagrange multipliers.
  • Integration: Multiple integrals, change of variables (Jacobian determinant), coordinate systems.
  • Vector Calculus: Divergence, curl, conservative fields, Green’s, Stokes’, and Divergence theorems.

Example (Green’s Theorem for Area). We can use Green’s theorem to compute the area of a region DD. Let C=DC = \partial D be traversed counterclockwise. If we choose a field F=(y,x)/2\mathbf{F}=(-y, x)/2, its 2D curl is 12(x(x)y(y))=1\frac{1}{2}(\partial_x(x) - \partial_y(-y)) = 1. By Green’s theorem, Area(D)=D1dA=12C(xdyydx)\text{Area}(D) = \iint_D 1 \, dA = \frac{1}{2} \oint_C (x\,dy - y\,dx).

Mathematical spine

Jfg(x)=Jf(g(x))Jg(x),dxdy=det(x,y)(u,v)dudvJ_{f\circ g}(x) = J_f(g(x)) J_g(x),\qquad dx\,dy = \left| \det \frac{\partial(x,y)}{\partial(u,v)} \right| du\,dv f=λg (Lagrange)\nabla f = \lambda \nabla g \text{ (Lagrange)} SFdr=S(×F)dS,VFdS=V(F)dV\oint_{\partial S} \mathbf{F}\cdot d\mathbf{r} = \iint_S (\nabla\times\mathbf{F})\cdot d\mathbf{S},\qquad \iint_{\partial V} \mathbf{F}\cdot d\mathbf{S} = \iiint_V (\nabla\cdot\mathbf{F})\,dV

Example

Let f(x,y)=x2y+y3f(x,y) = x^2y + y^3.

(a) Compute the gradient and the directional derivative at (1,2)(1,2) in the direction of u=(35,45)\mathbf{u} = \left(\frac35, \frac45\right).

(b) Evaluate R(x2+y2)dA\iint_R (x^2+y^2)\,dA where RR is the disk x2+y24x^2+y^2 \le 4.

Solution.

(a) The partial derivatives are f/x=2xy\partial f/\partial x = 2xy and f/y=x2+3y2\partial f/\partial y = x^2 + 3y^2. At (1,2)(1,2): f(1,2)=(4,13)\nabla f(1,2) = (4, 13). The directional derivative is

Duf(1,2)=f(1,2)u=435+1345=12+525=645=12.8.D_{\mathbf{u}}f(1,2) = \nabla f(1,2)\cdot \mathbf{u} = 4\cdot\frac35 + 13\cdot\frac45 = \frac{12+52}{5} = \frac{64}{5} = 12.8.

(b) In polar coordinates, x2+y2=r2x^2+y^2 = r^2 and dA=rdrdθdA = r\,dr\,d\theta. The region RR is 0r20\le r\le 2, 0θ2π0\le\theta\le 2\pi.

R(x2+y2)dA=02π02r2rdrdθ=2π02r3dr=2π[r44]02=2π4=8π.\iint_R (x^2+y^2)\,dA = \int_0^{2\pi}\int_0^2 r^2\cdot r\,dr\,d\theta = 2\pi \int_0^2 r^3\,dr = 2\pi\left[\frac{r^4}{4}\right]_0^2 = 2\pi\cdot 4 = 8\pi.

Problems with Solutions

  1. Problem. Find and classify the critical points of f(x,y)=x3+y33xyf(x,y) = x^3 + y^3 - 3xy.

    Solution. Set f=(3x23y,3y23x)=(0,0)\nabla f = (3x^2-3y, 3y^2-3x) = (0,0). This gives y=x2y=x^2 and x=y2x=y^2. Substituting yields x=x4x=x^4, so x=0x=0 or x=1x=1. The critical points are (0,0)(0,0) and (1,1)(1,1). The Hessian is H=(6x336y)H = \begin{pmatrix} 6x & -3 \\ -3 & 6y \end{pmatrix}. At (0,0)(0,0): H=(0330)H = \begin{pmatrix} 0 & -3 \\ -3 & 0 \end{pmatrix}, detH=9<0\det H = -9 < 0, so (0,0)(0,0) is a saddle point. At (1,1)(1,1): H=(6336)H = \begin{pmatrix} 6 & -3 \\ -3 & 6 \end{pmatrix}, detH=27>0\det H = 27 > 0 and trH=12>0\operatorname{tr} H = 12 > 0, so (1,1)(1,1) is a local minimum.

  2. Problem. Use Green’s Theorem to evaluate C(x2y)dx+(y2+x)dy\oint_C (x^2-y)\,dx + (y^2+x)\,dy where CC is the circle x2+y2=4x^2+y^2=4 oriented counterclockwise.

    Solution. Green’s Theorem gives

    CPdx+Qdy=D(QxPy)dA=D(1(1))dA=2DdA.\oint_C P\,dx+Q\,dy = \iint_D \left(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\right)dA = \iint_D (1 - (-1))\,dA = 2\iint_D dA.

    The disk DD has area π(2)2=4π\pi(2)^2 = 4\pi. Hence the integral equals 24π=8π2\cdot 4\pi = 8\pi.

  3. Problem. Evaluate EzdV\iiint_E z\,dV where EE is the solid bounded by x2+y2+z29x^2+y^2+z^2 \le 9 in the first octant using spherical coordinates.

    Solution. In spherical coordinates, z=ρcosϕz = \rho\cos\phi, dV=ρ2sinϕdρdϕdθdV = \rho^2\sin\phi\,d\rho\,d\phi\,d\theta, and the first octant means 0θπ/20\le\theta\le\pi/2, 0ϕπ/20\le\phi\le\pi/2, 0ρ30\le\rho\le 3.

    EzdV=0π/20π/203(ρcosϕ)ρ2sinϕdρdϕdθ=π20π/2cosϕsinϕdϕ03ρ3dρ.\iiint_E z\,dV = \int_0^{\pi/2}\int_0^{\pi/2}\int_0^3 (\rho\cos\phi)\,\rho^2\sin\phi\,d\rho\,d\phi\,d\theta = \frac{\pi}{2} \int_0^{\pi/2}\cos\phi\sin\phi\,d\phi \int_0^3 \rho^3\,d\rho.

    The ϕ\phi-integral is 12\frac12, and the ρ\rho-integral is 814\frac{81}{4}. Thus the value is π212814=81π16\frac{\pi}{2}\cdot\frac12\cdot\frac{81}{4} = \frac{81\pi}{16}.

Section summary. Multivariable analysis translates single-variable calculus to higher dimensions. The Jacobian and Hessian matrices capture linear and quadratic approximations, enabling optimization and the Implicit/Inverse Function Theorems. Multiple integration relies on the Jacobian determinant for coordinate transformations. The crowning achievements are the integral theorems of Green, Stokes, and Gauss, which unify geometry and calculus by proving that the aggregate behavior of a field within a domain is determined entirely by its behavior on the domain’s boundary.

Probability: Conditional Probability, Random Variables, and Distributions

Core ideas

Probability mathematically quantifies uncertainty. A sample space Ω\Omega is the set of all outcomes. An event AΩA \subseteq \Omega is a subset. A probability measure Pr\Pr satisfies Kolmogorov’s axioms: Pr(Ω)=1\Pr(\Omega)=1, 0Pr(A)10 \le \Pr(A) \le 1, and Pr(iAi)=iPr(Ai)\Pr(\bigcup_i A_i) = \sum_i \Pr(A_i) for mutually exclusive events.

Conditional probability Pr(AB)=Pr(AB)/Pr(B)\Pr(A \mid B) = \Pr(A\cap B)/\Pr(B) updates belief given BB. Events AA and BB are independent if Pr(AB)=Pr(A)Pr(B)\Pr(A\cap B) = \Pr(A)\Pr(B). Bayes’ theorem, Pr(AB)=Pr(BA)Pr(A)/Pr(B)\Pr(A\mid B) = \Pr(B\mid A)\Pr(A)/\Pr(B), reverses the conditioning. The law of total probability states Pr(B)=iPr(BAi)Pr(Ai)\Pr(B) = \sum_i \Pr(B\mid A_i)\Pr(A_i) for a partition {Ai}\{A_i\}.

Example (Bayes’ Theorem): Consider a disease affecting 1%1\% of a population. A test is 99%99\% accurate. If a person tests positive, the probability they actually have the disease is 0.99×0.010.99×0.01+0.01×0.99=0.5\frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.01 \times 0.99} = 0.5. This demonstrates the base rate fallacy, highlighting the importance of the prior probability.

A random variable X:ΩRX: \Omega \to \mathbb{R} assigns numbers to outcomes. Its cumulative distribution function (CDF) is FX(x)=Pr(Xx)F_X(x) = \Pr(X \le x). Discrete XX has a probability mass function (PMF) pX(x)p_X(x); continuous XX has a probability density function (PDF) fX(x)f_X(x) such that fX(x)dx=1\int f_X(x)dx = 1. Expected value E[X]=xp(x)E[X] = \sum x p(x) or xf(x)dx\int x f(x)\,dx. The variance Var(X)=E[(Xμ)2]=E[X2]μ2\operatorname{Var}(X) = E[(X-\mu)^2] = E[X^2] - \mu^2, with standard deviation σX=Var(X)\sigma_X = \sqrt{\operatorname{Var}(X)}.

For multiple random variables, the joint distribution captures their simultaneous behavior. Marginal distributions are obtained by summing or integrating out other variables (e.g., fX(x)=fX,Y(x,y)dyf_X(x) = \int f_{X,Y}(x,y)\,dy). Covariance measures joint linear variation: Cov(X,Y)=E[XY]E[X]E[Y]\operatorname{Cov}(X,Y) = E[XY]-E[X]E[Y].

Key distributions: Binomial(n,p)(n,p) models successes in nn independent trials. Poisson(λ)(\lambda) models rare events in time/space. Normal(μ,σ2)(\mu,\sigma^2) is the symmetric bell curve f(x)exp((xμ)2/(2σ2))f(x) \propto \exp(-(x-\mu)^2/(2\sigma^2)). Exponential(λ)(\lambda) models memoryless waiting times.

Limit theorems describe large-scale behavior. Markov’s and Chebyshev’s inequalities bound tail probabilities: Pr(Xμkσ)1/k2\Pr(|X-\mu| \ge k\sigma) \le 1/k^2. The Law of Large Numbers (Weak and Strong versions) guarantees that the sample mean Xˉnμ\bar{X}_n \to \mu as nn \to \infty. The Central Limit Theorem (CLT) states that the sum of nn i.i.d.\ variables with mean μ\mu and variance σ2\sigma^2 converges in distribution to Normal(nμ,nσ2)(n\mu, n\sigma^2).

Proof Sketch (Chebyshev’s Inequality): By definition, Var(X)=(xμ)2f(x)dxxμkσ(xμ)2f(x)dx(kσ)2xμkσf(x)dx=k2σ2Pr(Xμkσ)\operatorname{Var}(X) = \int (x-\mu)^2 f(x) dx \ge \int_{|x-\mu| \ge k\sigma} (x-\mu)^2 f(x) dx \ge (k\sigma)^2 \int_{|x-\mu| \ge k\sigma} f(x) dx = k^2\sigma^2 \Pr(|X-\mu| \ge k\sigma). Dividing by k2σ2k^2\sigma^2 yields the result.

  • Axioms & Bayes: Conditional probability, independence, Bayes’ rule.
  • Random variables: PMF/PDF, joint/marginal distributions, expectation, variance.
  • Distributions: Binomial, Poisson, Normal, Exponential.
  • Limit theorems: Chebyshev’s inequality, Law of Large Numbers, CLT.

Mathematical spine

Pr(AB)=Pr(BA)Pr(A)Pr(B),Var(X)=E[X2](E[X])2\Pr(A\mid B) = \frac{\Pr(B\mid A)\Pr(A)}{\Pr(B)},\qquad \operatorname{Var}(X) = E[X^2] - (E[X])^2 Pr(Xμε)σ2ε2 (Chebyshev)\Pr(|X-\mu| \ge \varepsilon) \le \frac{\sigma^2}{\varepsilon^2} \text{ (Chebyshev)} n(Xˉnμσ)dN(0,1) (Central Limit Theorem)\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \xrightarrow{d} N(0,1) \text{ (Central Limit Theorem)}

Example

A fair six-sided die is rolled once. Let XX be the outcome.

(a) Compute the expected value E[X]E[X].

(b) Compute the variance Var(X)\operatorname{Var}(X).

Solution.

(a) Each outcome k{1,,6}k\in\{1,\dots,6\} has probability 1/61/6. Thus

E[X]=k=16k16=1+2+3+4+5+66=216=3.5.E[X] = \sum_{k=1}^6 k\cdot\frac16 = \frac{1+2+3+4+5+6}{6} = \frac{21}{6} = 3.5.

(b) First compute E[X2]=k=16k216=1+4+9+16+25+366=916E[X^2] = \sum_{k=1}^6 k^2\cdot\frac16 = \frac{1+4+9+16+25+36}{6} = \frac{91}{6}. Then

Var(X)=E[X2](E[X])2=916(216)2=91644136=54644136=10536=35122.917.\operatorname{Var}(X) = E[X^2] - (E[X])^2 = \frac{91}{6} - \left(\frac{21}{6}\right)^2 = \frac{91}{6} - \frac{441}{36} = \frac{546-441}{36} = \frac{105}{36} = \frac{35}{12} \approx 2.917.

Problems with Solutions

  1. Problem. A bag contains 33 red and 77 blue marbles. Two marbles are drawn without replacement. Find the probability that both are red.

    Solution.

    P(both red)=31029=690=115.P(\text{both red}) = \frac{3}{10}\cdot\frac{2}{9} = \frac{6}{90} = \frac{1}{15}.
  2. Problem. Let XBinomial(n=5,p=0.3)X \sim \operatorname{Binomial}(n=5, p=0.3). Compute P(X=2)P(X=2).

    Solution.

    P(X=2)=(52)(0.3)2(0.7)3=100.090.343=0.3087.P(X=2) = \binom{5}{2}(0.3)^2(0.7)^3 = 10\cdot 0.09\cdot 0.343 = 0.3087.
  3. Problem. The lifetime of a device is exponentially distributed with mean 55 years. Find the probability it lasts more than 88 years.

    Solution. For an exponential distribution, λ=1/5\lambda = 1/5. The survival function is P(X>x)=eλxP(X>x) = e^{-\lambda x}. Hence

    P(X>8)=e8/5=e1.60.2019.P(X>8) = e^{-8/5} = e^{-1.6} \approx 0.2019.

Section summary. Probability grounds uncertainty in rigorous measure theory. Core concepts include conditional updates (Bayes’ rule) and distributions of random variables (joint and marginal). The field culminates in the Law of Large Numbers, which justifies expected values as long-run averages, and the Central Limit Theorem, which explains the universality of the Normal distribution.

Statistics: Estimation, Testing, Regression, and Inference

Core ideas

Statistics uses data to make inferences about populations. A parameter is a numerical characteristic of a population; a statistic is a numerical characteristic of a sample. Point estimation produces a single best guess; interval estimation gives a plausible range. Maximum likelihood estimation (MLE) chooses θ\theta maximizing the likelihood L(θ)=i=1nf(xiθ)L(\theta) = \prod_{i=1}^n f(x_i \mid \theta). For example, the MLE of the mean μ\mu for Normal data is μ^=xˉ=1nxi\hat\mu = \bar{x} = \frac1n\sum x_i.

Example (MLE for Poisson): Let X1,,XnPoisson(λ)X_1, \dots, X_n \sim \operatorname{Poisson}(\lambda). The likelihood is L(λ)=eλλxixi!L(\lambda) = \prod \frac{e^{-\lambda}\lambda^{x_i}}{x_i!}. The log-likelihood is (λ)=nλ+(xi)lnλln(xi!)\ell(\lambda) = -n\lambda + (\sum x_i)\ln\lambda - \sum \ln(x_i!). Setting ddλ=n+xiλ=0\frac{d\ell}{d\lambda} = -n + \frac{\sum x_i}{\lambda} = 0 yields the MLE λ^=xˉ\hat\lambda = \bar{x}.

The sampling distribution of a statistic is its probability distribution over repeated samples. The standard error is the standard deviation of the sampling distribution. For i.i.d.\ data with variance σ2\sigma^2, SE(Xˉ)=σ/n\operatorname{SE}(\bar{X}) = \sigma/\sqrt{n}. A confidence interval for the mean when σ\sigma is known: xˉ±zα/2σ/n\bar{x} \pm z_{\alpha/2}\cdot\sigma/\sqrt{n}, where zα/2z_{\alpha/2} is the 1α/21-\alpha/2 quantile of the standard normal. When σ\sigma is unknown, use the tt-distribution: xˉ±tα/2,n1s/n\bar{x} \pm t_{\alpha/2,\,n-1}\cdot s/\sqrt{n} where ss is the sample standard deviation.

Hypothesis testing assesses evidence against a null hypothesis H0H_0. The pp-value is Pr(observed or more extreme dataH0)\Pr(\text{observed or more extreme data} \mid H_0). If p<αp < \alpha (significance level), reject H0H_0. A Type I error rejects a true H0H_0; Type II error fails to reject a false H0H_0. The zz-test for a mean: z=(xˉμ0)/(σ/n)z = (\bar{x} - \mu_0)/(\sigma/\sqrt{n}). The tt-test uses t=(xˉμ0)/(s/n)t = (\bar{x} - \mu_0)/(s/\sqrt{n}).

Linear regression models Y=β0+β1X+εY = \beta_0 + \beta_1 X + \varepsilon, with εN(0,σ2)\varepsilon \sim N(0,\sigma^2) i.i.d. The least squares estimates minimize (yiβ0β1xi)2\sum (y_i - \beta_0 - \beta_1 x_i)^2: β^1=(xixˉ)(yiyˉ)(xixˉ)2\hat\beta_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}, β^0=yˉβ^1xˉ\hat\beta_0 = \bar{y} - \hat\beta_1\bar{x}. The R2R^2 statistic measures the proportion of variance explained. Multiple regression extends to pp predictors: Y=Xβ+ε\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, with β^=(XTX)1XTY\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}.

Proof Sketch (OLS Estimator): The objective function is S(β)=(YXβ)T(YXβ)=YTY2βTXTY+βTXTXβS(\boldsymbol{\beta}) = (\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta}) = \mathbf{Y}^T\mathbf{Y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{Y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}. Taking the gradient with respect to β\boldsymbol{\beta} and setting it to zero gives 2XTY+2XTXβ=0-2\mathbf{X}^T\mathbf{Y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = 0, leading to the normal equations XTXβ=XTY\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{Y}. Assuming XTX\mathbf{X}^T\mathbf{X} is invertible, we obtain β^=(XTX)1XTY\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}.

Bayesian inference treats parameters as random variables with prior π(θ)\pi(\theta) and posterior π(θx)f(xθ)π(θ)\pi(\theta \mid \mathbf{x}) \propto f(\mathbf{x} \mid \theta)\,\pi(\theta). The posterior mean serves as a Bayesian estimator, and credible intervals are the Bayesian analog of confidence intervals.

  • Estimation: MLE, confidence intervals for mean (zz and tt).
  • Testing: null/alternative, pp-value, zz-test, tt-test, Type I/II errors.
  • Regression: least squares; β^=(XTX)1XTy\hat{\beta} = (X^TX)^{-1}X^Ty; R2R^2.
  • Bayesian: posterior \propto likelihood ×\times prior.

For review, be able to: compute MLEs for common distributions; construct and interpret confidence intervals; perform zz- and tt-tests; interpret pp-values; fit and interpret linear regression; derive posterior distributions for simple conjugate priors. Identify the estimator, the test statistic null distribution, the regression assumptions, and the prior-posterior relationship.

Mathematical spine

xˉ±zα/2σn,t=xˉμ0s/n,β^=(XTX)1XTy\bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\qquad t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}},\qquad \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

Example

A random sample of n=64n=64 lightbulbs yields a sample mean lifetime xˉ=1200\bar{x}=1200 hours and a sample standard deviation s=80s=80 hours.

(a) Construct a 95%95\% confidence interval for the true mean lifetime μ\mu.

(b) Test H0:μ=1150H_0: \mu = 1150 versus H1:μ1150H_1: \mu \neq 1150 at significance level α=0.05\alpha = 0.05.

Solution.

(a) Because nn is large, we use the zz-interval:

xˉ±z0.025sn=1200±1.96808=1200±19.6.\bar{x} \pm z_{0.025}\frac{s}{\sqrt{n}} = 1200 \pm 1.96\frac{80}{8} = 1200 \pm 19.6.

The 95%95\% confidence interval is (1180.4,1219.6)(1180.4,\,1219.6) hours.

(b) The test statistic is

z=xˉμ0s/n=1200115010=5.0.z = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} = \frac{1200-1150}{10} = 5.0.

Since z=5.0>1.96=z0.025|z| = 5.0 > 1.96 = z_{0.025}, we reject H0H_0. There is strong evidence that the true mean differs from 11501150 hours.

Problems with Solutions

  1. Problem. Given the data set 2,4,6,8,102, 4, 6, 8, 10, compute the sample mean and the sample variance.

    Solution. The sample mean is xˉ=2+4+6+8+105=6\bar{x} = \frac{2+4+6+8+10}{5} = 6. The sample variance is

    s2=14i=15(xi6)2=16+4+0+4+164=404=10.s^2 = \frac{1}{4}\sum_{i=1}^5 (x_i-6)^2 = \frac{16+4+0+4+16}{4} = \frac{40}{4} = 10.
  2. Problem. In a zz-test of H0:μ=50H_0: \mu=50 with known σ=10\sigma=10, a sample of size n=25n=25 yields xˉ=53\bar{x}=53. Compute the test statistic and the two-tailed pp-value.

    Solution. The test statistic is

    z=535010/25=32=1.5.z = \frac{53-50}{10/\sqrt{25}} = \frac{3}{2} = 1.5.

    The two-tailed pp-value is 2P(Z>1.5)2(0.0668)=0.13362P(Z > 1.5) \approx 2(0.0668) = 0.1336.

  3. Problem. Fit a simple linear regression y=β0+β1xy = \beta_0 + \beta_1 x to the points (1,2),(2,3),(3,5)(1,2), (2,3), (3,5).

    Solution. We have n=3n=3, xˉ=2\bar{x}=2, yˉ=10/3\bar{y}=10/3.

    (xixˉ)(yiyˉ)=(1)(4/3)+(0)(1/3)+(1)(5/3)=3.\sum (x_i-\bar{x})(y_i-\bar{y}) = (-1)(-4/3) + (0)(-1/3) + (1)(5/3) = 3. (xixˉ)2=(1)2+02+12=2.\sum (x_i-\bar{x})^2 = (-1)^2 + 0^2 + 1^2 = 2.

    Hence β^1=3/2=1.5\hat\beta_1 = 3/2 = 1.5 and β^0=yˉβ^1xˉ=1033=13\hat\beta_0 = \bar{y} - \hat\beta_1\bar{x} = \frac{10}{3} - 3 = \frac13. The fitted line is y^=13+32x\hat y = \frac13 + \frac32 x.

Section summary. Statistics bridges data and decisions through estimation, testing, and modeling. Point estimates (MLE), interval estimates (confidence intervals), and hypothesis tests (zz, tt) form classical inference. Linear regression models relationships. Bayesian methods incorporate prior knowledge via Bayes’ theorem. These tools are the foundation of data analysis and scientific discovery.

Real Analysis: Limits, Continuity, Convergence, and Rigorous Integration

Core ideas

Real analysis provides rigorous foundations for calculus. The real numbers R\mathbb{R} form a complete ordered field: they are ordered (totally), form a field under ++ and ×\times, and satisfy the completeness axiom (every nonempty set bounded above has a supremum). A sequence {an}\{a_n\} converges to LL if ε>0,  NN\forall\varepsilon>0,\;\exists N\in\mathbb{N} such that nN    anL<εn\ge N \implies |a_n - L| < \varepsilon. A sequence is Cauchy if ε>0,  N\forall\varepsilon>0,\;\exists N such that m,nN    aman<εm,n\ge N \implies |a_m - a_n| < \varepsilon. In R\mathbb{R}, a sequence converges iff it is Cauchy. The Bolzano-Weierstrass theorem: every bounded sequence in R\mathbb{R} has a convergent subsequence.

For a function f:RRf: \mathbb{R} \to \mathbb{R}, limxaf(x)=L\lim_{x\to a} f(x) = L means ε>0,  δ>0\forall\varepsilon>0,\;\exists\delta>0 such that 0<xa<δ    f(x)L<ε0<|x-a|<\delta \implies |f(x)-L|<\varepsilon. ff is continuous at aa if limxaf(x)=f(a)\lim_{x\to a}f(x) = f(a). Equivalent: preimages of open sets are open. The Intermediate Value Theorem: a continuous function on [a,b][a,b] attains every value between f(a)f(a) and f(b)f(b). The Extreme Value Theorem: a continuous function on a closed bounded interval attains its maximum and minimum.

Uniform continuity strengthens continuity: ε>0\forall\varepsilon>0, δ>0\exists\delta>0 that works for all xx simultaneously (independent of the point). On closed bounded intervals, continuity implies uniform continuity (Heine-Cantor theorem). A sequence of functions {fn}\{f_n\} converges pointwise to ff if fn(x)f(x)f_n(x) \to f(x) for each xx, and uniformly if supxfn(x)f(x)0\sup_x |f_n(x)-f(x)| \to 0. Uniform convergence preserves continuity and commutes with integration.

Proof Sketch (Uniform Limit of Continuous Functions is Continuous): Let fnff_n \to f uniformly, with fnf_n continuous. To show ff is continuous at cc, we use an ε/3\varepsilon/3 argument. f(x)f(c)f(x)fn(x)+fn(x)fn(c)+fn(c)f(c)|f(x)-f(c)| \le |f(x)-f_n(x)| + |f_n(x)-f_n(c)| + |f_n(c)-f(c)|. Choose NN such that fN(y)f(y)<ε/3|f_N(y)-f(y)| < \varepsilon/3 for all yy (by uniform convergence). Then, choose δ\delta such that xc<δ    fN(x)fN(c)<ε/3|x-c|<\delta \implies |f_N(x)-f_N(c)| < \varepsilon/3 (by continuity of fNf_N). Thus, xc<δ    f(x)f(c)<ε/3+ε/3+ε/3=ε|x-c|<\delta \implies |f(x)-f(c)| < \varepsilon/3 + \varepsilon/3 + \varepsilon/3 = \varepsilon.

Riemann integration: the integral abf(x)dx\int_a^b f(x)\,dx is the limit of Riemann sums f(xi)Δxi\sum f(x_i^*)\Delta x_i as the partition mesh goes to zero. A function is Riemann integrable if the upper and lower sums converge (which happens if the set of discontinuities has measure zero). The Fundamental Theorem of Calculus (rigorous version): if ff is integrable on [a,b][a,b] and F(x)=f(x)F'(x)=f(x) uniformly, then abf=F(b)F(a)\int_a^b f = F(b)-F(a). The Lebesgue integral generalizes Riemann integration by partitioning the range rather than the domain, allowing integration of more functions and better convergence theorems (dominated convergence, monotone convergence).

Series n=1an\sum_{n=1}^\infty a_n converge if the partial sums converge. Tests: ratio test, root test, integral test, comparison test. Power series cn(xa)n\sum c_n (x-a)^n have a radius of convergence R=1/lim supcnnR = 1/\limsup \sqrt[n]{|c_n|}. Within the radius, they can be differentiated and integrated termwise.

  • Sequences: ε\varepsilon-NN definition; Cauchy criterion; Bolzano-Weierstrass.
  • Functions: ε\varepsilon-δ\delta continuity; uniform continuity; IVT, EVT.
  • Integration: Riemann sums; FTC; Lebesgue generalization.
  • Series: convergence tests; power series; radius of convergence.

Example (ε\varepsilon-δ\delta proof). Prove limx2x2=4\lim_{x\to 2}x^2=4. Let ε>0\varepsilon>0 be given. We need δ>0\delta>0 such that 0<x2<δ    x24<ε0<|x-2|<\delta\implies|x^2-4|<\varepsilon. Factor: x24=x2x+2|x^2-4|=|x-2|\,|x+2|. If we first require δ1\delta\le 1, then x2<1|x-2|<1 implies 1<x<31<x<3, so x+2<5|x+2|<5. Choose δ=min{1,ε/5}\delta=\min\{1,\varepsilon/5\}. Then x2<δ|x-2|<\delta gives x24<5(ε/5)=ε|x^2-4|<5\cdot(\varepsilon/5)=\varepsilon. This is the standard pattern: bound xa|x-a| to control f(x)L|f(x)-L|.

For review, be able to: write ε\varepsilon-NN and ε\varepsilon-δ\delta proofs; prove convergence/divergence of sequences and series; determine uniform versus pointwise convergence; apply IVT and EVT; define and work with Riemann integrals; prove properties of continuous functions on compact sets. Identify the ε\varepsilon bound, the uniform modulus of continuity, the dominating function for convergence theorems, and the partition refinement for integration.

Mathematical spine

limnan=L    ε>0  N  nN:  anL<ε\lim_{n\to\infty} a_n = L \iff \forall\varepsilon>0\;\exists N\; \forall n\ge N:\; |a_n-L|<\varepsilon limxaf(x)=L    ε>0  δ>0  x:  0<xa<δ    f(x)L<ε\lim_{x\to a}f(x)=L \iff \forall\varepsilon>0\;\exists\delta>0\; \forall x:\; 0<|x-a|<\delta \implies |f(x)-L|<\varepsilon abf(x)dx=limΔ0i=1nf(xi)Δxi\int_a^b f(x)\,dx = \lim_{\|\Delta\|\to 0}\sum_{i=1}^n f(x_i^*)\,\Delta x_i

Example

(a) Prove using the ε\varepsilon-NN definition that limn3n+12n5=32\displaystyle\lim_{n\to\infty} \frac{3n+1}{2n-5} = \frac32.

(b) Determine whether the series n=11n2\displaystyle\sum_{n=1}^\infty \frac{1}{n^2} converges.

Solution.

(a) Let ε>0\varepsilon>0 be given. We need NN such that nNn\ge N implies 3n+12n532<ε\left|\frac{3n+1}{2n-5} - \frac32\right| < \varepsilon. Simplifying:

3n+12n532=2(3n+1)3(2n5)2(2n5)=172(2n5)=172(2n5)\left|\frac{3n+1}{2n-5} - \frac32\right| = \left|\frac{2(3n+1)-3(2n-5)}{2(2n-5)}\right| = \left|\frac{17}{2(2n-5)}\right| = \frac{17}{2(2n-5)}

for n3n\ge 3. We want 172(2n5)<ε\frac{17}{2(2n-5)} < \varepsilon, which gives n>174ε+52n > \frac{17}{4\varepsilon} + \frac52. Choose N=max{3,174ε+52}N = \max\left\{3,\left\lceil \frac{17}{4\varepsilon} + \frac52 \right\rceil\right\}. Then for all nNn\ge N, the inequality holds, proving the limit.

(b) The function f(x)=1/x2f(x)=1/x^2 is positive, continuous, and decreasing on [1,)[1,\infty). By the integral test:

11x2dx=limb[1x]1b=1<.\int_1^\infty \frac{1}{x^2}\,dx = \lim_{b\to\infty}\left[-\frac1x\right]_1^b = 1 < \infty.

Since the integral converges, the series n=11n2\sum_{n=1}^\infty \frac{1}{n^2} also converges.

Problems with Solutions

  1. Problem. Prove using the definition that limn1n=0\displaystyle\lim_{n\to\infty} \frac{1}{\sqrt{n}} = 0.

    Solution. Let ε>0\varepsilon>0. We need NN such that nN    1/n<εn\ge N \implies |1/\sqrt{n}|<\varepsilon. Choose N=1/ε2N = \lceil 1/\varepsilon^2 \rceil. Then for nNn\ge N, we have n1/ε\sqrt{n}\ge 1/\varepsilon, so 1/nε1/\sqrt{n}\le\varepsilon. Hence the limit is 00.

  2. Problem. Show that f(x)=x2f(x)=x^2 is uniformly continuous on [0,1][0,1].

    Solution. Let ε>0\varepsilon>0. For any x,y[0,1]x,y\in[0,1], x2y2=x+yxy2xy|x^2-y^2| = |x+y||x-y| \le 2|x-y|. Choose δ=ε/2\delta = \varepsilon/2. Then xy<δ    x2y2<ε|x-y|<\delta \implies |x^2-y^2|<\varepsilon. Since δ\delta does not depend on xx or yy, ff is uniformly continuous on [0,1][0,1].

  3. Problem. Find the radius of convergence of the power series n=0xnn!\displaystyle\sum_{n=0}^\infty \frac{x^n}{n!}.

    Solution. Using the ratio test:

    L=limnxn+1/(n+1)!xn/n!=limnxn+1=0L = \lim_{n\to\infty} \left|\frac{x^{n+1}/(n+1)!}{x^n/n!}\right| = \lim_{n\to\infty} \frac{|x|}{n+1} = 0

    for all xx. Since L<1L<1 for every xRx\in\mathbb{R}, the radius of convergence is R=R=\infty.

Section summary. Real analysis places calculus on rigorous foundations using ε\varepsilon-δ\delta arguments. Key concepts: convergence of sequences and series (Cauchy criterion, Bolzano-Weierstrass), continuity (IVT, EVT, uniform continuity), Riemann integration (FTC, integrability conditions), and function space convergence (pointwise vs.\ uniform). This material develops the logical infrastructure underpinning all of calculus.

Algebra: Groups, Rings, Fields, and Symmetry

Core ideas

Abstract algebra studies algebraic structures. A group (G,)(G,\ast) is a set with a binary operation satisfying: closure, associativity (ab)c=a(bc)(a\ast b)\ast c = a\ast(b\ast c), identity ea=ae=ae\ast a = a\ast e = a, and inverses aa1=a1a=ea\ast a^{-1} = a^{-1}\ast a = e. A group is abelian if it is commutative (ab=baa\ast b = b\ast a). Examples: (Z,+)(\mathbb{Z},+), (R{0},×)(\mathbb{R}\setminus\{0\},\times), the symmetric group SnS_n (permutations of nn elements), GLn(R)GL_n(\mathbb{R}) (invertible n×nn\times n matrices). A subgroup HGH \le G is a subset closed under the operation and inverses. The order of a group is its cardinality; the order of element aa is the smallest nn with an=ea^n = e.

Example (Symmetric Group): S3S_3 is the group of permutations of {1,2,3}\{1,2,3\}. It has order 3!=63! = 6. It is the smallest non-abelian group.

A homomorphism ϕ:GH\phi: G \to H preserves the operation: ϕ(ab)=ϕ(a)ϕ(b)\phi(a\ast b) = \phi(a)\ast\phi(b). The kernel kerϕ={gϕ(g)=eH}\ker\phi = \{g \mid \phi(g) = e_H\} is a normal subgroup; the image imϕ\operatorname{im}\phi is a subgroup of HH. An isomorphism is a bijective homomorphism; isomorphic groups are structurally identical. Cosets of subgroup HH are gH={ghhH}gH = \{gh \mid h\in H\}; they partition GG. Lagrange’s theorem: H|H| divides G|G|. A normal subgroup NGN \trianglelefteq G satisfies gNg1=NgNg^{-1} = N for all gg, allowing the quotient group G/NG/N to be formed. The First Isomorphism Theorem: G/kerϕimϕG/\ker\phi \cong \operatorname{im}\phi.

A ring (R,+,)(R,+,\cdot) has two operations: (R,+)(R,+) is an abelian group, multiplication is associative and distributes over addition. Examples: (Z,+,)(\mathbb{Z},+,\cdot), polynomial ring R[x]\mathbb{R}[x], matrix ring Mn(R)M_n(\mathbb{R}). A field is a commutative ring where nonzero elements have multiplicative inverses. Examples: Q,R,C,Zp\mathbb{Q},\mathbb{R},\mathbb{C},\mathbb{Z}_p for prime pp. Polynomials over a field form a Euclidean domain with division algorithm and unique factorization. The Fundamental Theorem of Algebra: every nonconstant polynomial in C[x]\mathbb{C}[x] has a root in C\mathbb{C}.

Proof Sketch (Lagrange’s Theorem): Let HH be a subgroup of a finite group GG. The left cosets gHgH partition GG into disjoint equivalence classes. For any gg, the map hghh \mapsto gh is a bijection from HH to gHgH, so all cosets have size H|H|. Since GG is partitioned into [G:H][G:H] cosets of size H|H|, we have G=[G:H]H|G| = [G:H]|H|, proving H|H| divides G|G|.

Group actions formalize symmetry. A group action G×XXG \times X \to X satisfies ex=xe\cdot x = x and (gh)x=g(hx)(gh)\cdot x = g\cdot(h\cdot x). The orbit Gx={gxgG}Gx = \{gx \mid g\in G\} and stabilizer Gx={ggx=x}G_x = \{g \mid gx = x\}; the orbit-stabilizer theorem: G=GxGx|G| = |Gx|\cdot|G_x|. Applications include counting (Burnside’s lemma) and classifying symmetries of geometric objects.

  • Group: closure, associativity, identity, inverses; subgroup, cosets, Lagrange.
  • Homomorphism: kernel, image, isomorphism theorems; quotient groups.
  • Rings and fields: polynomial rings, Euclidean domain, Zp\mathbb{Z}_p.
  • Group actions: orbit, stabilizer, Burnside’s lemma.

For review, be able to: verify group axioms; find subgroups and cosets; compute in SnS_n; construct quotient groups; apply isomorphism theorems; factor polynomials over fields; use group actions to count configurations. Identify the group operation and identity, the kernel of a homomorphism, the field characteristic, and the orbit decomposition.

Mathematical spine

ϕ(ab)=ϕ(a)ϕ(b),G=GxGx,G/kerϕimϕ\phi(ab) = \phi(a)\phi(b),\qquad |G| = |Gx|\cdot|G_x|,\qquad G/\ker\phi \cong \operatorname{im}\phi

Example

Consider the additive group Z12={0,1,2,,11}\mathbb{Z}_{12} = \{0,1,2,\dots,11\} under addition modulo 1212.

(a) Find all subgroups of Z12\mathbb{Z}_{12}.

(b) Let H={0,4,8}H = \{0,4,8\}. Verify that HH is a subgroup and find its left cosets. Confirm Lagrange’s theorem.

Solution.

(a) The subgroups of a cyclic group Zn\mathbb{Z}_n correspond to the divisors of nn. The divisors of 1212 are 1,2,3,4,6,121,2,3,4,6,12. Hence the subgroups are:

  • 0={0}\langle 0 \rangle = \{0\} (order 11)
  • 6={0,6}\langle 6 \rangle = \{0,6\} (order 22)
  • 4={0,4,8}\langle 4 \rangle = \{0,4,8\} (order 33)
  • 3={0,3,6,9}\langle 3 \rangle = \{0,3,6,9\} (order 44)
  • 2={0,2,4,6,8,10}\langle 2 \rangle = \{0,2,4,6,8,10\} (order 66)
  • 1=Z12\langle 1 \rangle = \mathbb{Z}_{12} (order 1212)

(b) H={0,4,8}H = \{0,4,8\} is closed under addition mod 1212, contains 00, and contains inverses (0=0-0=0, 4=8-4=8, 8=4-8=4), so it is a subgroup. The left cosets are:

  • 0+H={0,4,8}0+H = \{0,4,8\}
  • 1+H={1,5,9}1+H = \{1,5,9\}
  • 2+H={2,6,10}2+H = \{2,6,10\}
  • 3+H={3,7,11}3+H = \{3,7,11\}

There are [G:H]=4[G:H]=4 cosets, each of size H=3|H|=3. Indeed G=12=43=[G:H]H|G| = 12 = 4\cdot 3 = [G:H]|H|, verifying Lagrange’s theorem.

Problems with Solutions

  1. Problem. Show that the set 3Z={3kkZ}3\mathbb{Z} = \{3k \mid k \in \mathbb{Z}\} is a subgroup of (Z,+)(\mathbb{Z}, +).

    Solution. 3Z3\mathbb{Z} is nonempty (03Z0\in 3\mathbb{Z}). If 3a,3b3Z3a,3b\in 3\mathbb{Z}, then 3a+3b=3(a+b)3Z3a+3b=3(a+b)\in 3\mathbb{Z}. The inverse of 3a3a is 3a=3(a)3Z-3a=3(-a)\in 3\mathbb{Z}. Thus 3Z3\mathbb{Z} is a subgroup.

  2. Problem. In the symmetric group S3S_3, compute the product (1  2)(2  3)(1\;2)(2\;3) and find its order.

    Solution. Applying right to left: 1121\mapsto 1\mapsto 2, 2332\mapsto 3\mapsto 3, 3213\mapsto 2\mapsto 1. Thus (1  2)(2  3)=(1  2  3)(1\;2)(2\;3) = (1\;2\;3), a 3-cycle. Its order is 33.

  3. Problem. Factor the polynomial x41x^4 - 1 completely over C\mathbb{C}.

    Solution. x41=(x21)(x2+1)=(x1)(x+1)(xi)(x+i)x^4-1 = (x^2-1)(x^2+1) = (x-1)(x+1)(x-i)(x+i).

Section summary. Algebra abstracts arithmetic to structures: groups (symmetry via invertible operations), rings (arithmetic-like structures with addition and multiplication), and fields (where division works). Key results include Lagrange’s theorem (divisibility of group orders), isomorphism theorems (structural relationships), and unique factorization in polynomial rings. Group actions connect algebra to geometry and combinatorics.

Topology: Connectedness, Compactness, and Invariants of Shape

Core ideas

Topology studies properties preserved under continuous deformations (stretching, bending, but not tearing or gluing). A topological space (X,T)(X,\mathcal{T}) consists of a set XX and a collection T\mathcal{T} of open subsets satisfying: (i) ,XT\varnothing, X \in \mathcal{T}; (ii) arbitrary unions of open sets are open; (iii) finite intersections of open sets are open. The complement of an open set is closed. The metric topology is induced by a metric dd: open balls Br(x)={yd(x,y)<r}B_r(x) = \{y \mid d(x,y) < r\} form a basis. Rn\mathbb{R}^n with the Euclidean metric is the motivating example.

A function f:XYf: X \to Y between topological spaces is continuous if preimages of open sets are open (the topological definition). Equivalently for metric spaces, xnx    f(xn)f(x)x_n \to x \implies f(x_n) \to f(x). A homeomorphism is a bijective continuous function with continuous inverse; homeomorphic spaces are topologically equivalent (they have the same “shape”). For example, a coffee mug and a donut (torus) are homeomorphic.

Connectedness: XX is connected if it cannot be written as a disjoint union of two nonempty open subsets. Equivalently, the only clopen (both open and closed) subsets are \varnothing and XX. A subset of R\mathbb{R} is connected iff it is an interval. The continuous image of a connected set is connected (generalizing IVT). Path-connectedness (every pair of points joined by a continuous path) implies connectedness; the converse is false in general. A space is totally disconnected if its only connected subsets are singletons (e.g., Q\mathbb{Q}).

Compactness: XX is compact if every open cover has a finite subcover. In Rn\mathbb{R}^n, the Heine-Borel theorem says KK is compact iff it is closed and bounded. The continuous image of a compact set is compact. A continuous function on a compact set attains its maximum and minimum (Extreme Value Theorem). Compactness also implies uniform continuity on metric spaces. Sequential compactness (every sequence has a convergent subsequence) is equivalent to compactness in metric spaces.

Proof Sketch (Continuous image of compact set is compact): Let f:XYf: X \to Y be continuous and XX compact. Consider an open cover {Vα}\{V_\alpha\} of f(X)f(X) in YY. Since ff is continuous, the preimages {f1(Vα)}\{f^{-1}(V_\alpha)\} form an open cover of XX. Because XX is compact, this has a finite subcover f1(Vα1),,f1(Vαn)f^{-1}(V_{\alpha_1}), \dots, f^{-1}(V_{\alpha_n}). The corresponding {Vα1,,Vαn}\{V_{\alpha_1}, \dots, V_{\alpha_n}\} covers f(X)f(X). Thus, f(X)f(X) is compact.

Homotopy and the fundamental group π1(X,x0)\pi_1(X, x_0) classify spaces up to homotopy equivalence. A homotopy between maps f,g:XYf,g: X\to Y is a continuous H:X×[0,1]YH: X\times[0,1] \to Y with H(x,0)=f(x)H(x,0)=f(x), H(x,1)=g(x)H(x,1)=g(x). Two spaces are homotopy equivalent if there exist maps f:XYf: X\to Y, g:YXg: Y\to X with gfidXg\circ f \simeq \operatorname{id}_X and fgidYf\circ g \simeq \operatorname{id}_Y. The fundamental group consists of homotopy classes of loops based at x0x_0. For the circle S1S^1, π1(S1)Z\pi_1(S^1) \cong \mathbb{Z}; for SnS^n (n2n\ge 2), π1(Sn)\pi_1(S^n) is trivial. Simply connected spaces have trivial fundamental group. The Euler characteristic χ=VE+F\chi = V - E + F for polyhedra is a topological invariant; for a sphere χ=2\chi=2, for a torus χ=0\chi=0.

  • Topological space: open sets, continuity, homeomorphism.
  • Connectedness: connected     \iff no nontrivial clopen sets; path-connectedness.
  • Compactness: Heine-Borel in Rn\mathbb{R}^n; extreme value theorem; uniform continuity.
  • Algebraic topology: fundamental group π1\pi_1, homotopy, Euler characteristic.

For review, be able to: determine if a set is open in a given topology; prove continuity via inverse images of open sets; check connectedness and path-connectedness; apply Heine-Borel for compactness arguments; compute fundamental groups of simple spaces; understand homotopy equivalence. Identify the open cover, the connected component, the compact subset, and the loop homotopy class.

Mathematical spine

f continuous     f1(U) open for all open Uf \text{ continuous } \iff f^{-1}(U) \text{ open for all open } U Compact:  open cover {Uα},   finite subcover\text{Compact: } \forall\text{ open cover } \{U_\alpha\},\; \exists\text{ finite subcover} π1(S1)Z,χ=VE+F\pi_1(S^1) \cong \mathbb{Z},\qquad \chi = V - E + F

Example

Consider R\mathbb{R} with the standard metric d(x,y)=xyd(x,y)=|x-y|.

(a) Show that the interval (0,1)(0,1) is an open set.

(b) Show that [0,1][0,1] is compact.

Solution.

(a) Let x(0,1)x\in(0,1). Choose r=min{x,1x}>0r = \min\{x, 1-x\} > 0. Then the open ball Br(x)=(xr,x+r)B_r(x) = (x-r, x+r) is contained in (0,1)(0,1). Since every point has an open neighborhood inside (0,1)(0,1), the set is open.

(b) By the Heine-Borel theorem, a subset of Rn\mathbb{R}^n is compact if and only if it is closed and bounded. The set [0,1][0,1] is bounded (contained in B2(0)B_2(0)) and closed (its complement (,0)(1,)(-\infty,0)\cup(1,\infty) is open). Therefore [0,1][0,1] is compact.

Problems with Solutions

  1. Problem. In R2\mathbb{R}^2 with the Euclidean metric, show that the open ball B1(0,0)B_1(0,0) is an open set.

    Solution. Let pB1(0,0)\mathbf{p}\in B_1(0,0), so p<1\|\mathbf{p}\|<1. Let r=1p>0r = 1-\|\mathbf{p}\|>0. For any qBr(p)\mathbf{q}\in B_r(\mathbf{p}), the triangle inequality gives qqp+p<r+p=1\|\mathbf{q}\| \le \|\mathbf{q}-\mathbf{p}\| + \|\mathbf{p}\| < r + \|\mathbf{p}\| = 1. Hence qB1(0,0)\mathbf{q}\in B_1(0,0), proving the ball is open.

  2. Problem. Prove that the continuous image of a connected set is connected. Use this to show there is no continuous surjection f:[0,1]{0,1}f: [0,1] \to \{0,1\} (where {0,1}\{0,1\} has the discrete topology).

    Solution. Let f:XYf: X\to Y be continuous and XX connected. Suppose f(X)=ABf(X)=A\cup B with A,BA,B disjoint nonempty open sets in f(X)f(X). Then f1(A)f^{-1}(A) and f1(B)f^{-1}(B) are disjoint nonempty open sets in XX whose union is XX, contradicting connectedness. Thus f(X)f(X) is connected. The space [0,1][0,1] is connected, but {0,1}\{0,1\} is disconnected. If a continuous surjection existed, {0,1}\{0,1\} would be the continuous image of a connected set and hence connected, a contradiction.

  3. Problem. Show that Q\mathbb{Q} is not compact in R\mathbb{R}.

    Solution. The set Q\mathbb{Q} is not closed in R\mathbb{R} (its closure is R\mathbb{R}). By the Heine-Borel theorem, a subset of R\mathbb{R} is compact iff it is closed and bounded. Since Q\mathbb{Q} is not closed, it is not compact. (Alternatively, the open cover {(,21n)(2+1n,)}nN\{(-\infty,\sqrt{2}-\frac1n)\cup(\sqrt{2}+\frac1n,\infty)\}_{n\in\mathbb{N}} has no finite subcover.)

Section summary. Topology abstracts notions of proximity and shape. A topological space defines continuity via open sets. Connectedness captures “being all in one piece”; compactness generalizes finiteness and guarantees extreme values and uniform continuity. Algebraic topology uses the fundamental group and Euler characteristic to classify spaces up to deformation, providing invariants that distinguish spheres, tori, and other manifolds.

Numerical Mathematics: Stability, Matrix Decompositions, and Optimization

Core ideas

Numerical mathematics develops algorithms for continuous mathematical problems with guaranteed accuracy. Conditioning measures how much a problem’s output changes relative to input perturbations. The condition number κ=supδx(δf/f)/(δx/x)\kappa = \sup_{\delta x} (\|\delta f\|/\|f\|) / (\|\delta x\|/\|x\|); problems with large κ\kappa are ill-conditioned. Numerical stability means the algorithm produces nearly the exact output for nearly correct input (i.e., backward stable). Forward error = x~x\|\tilde{x} - x\|; backward error = smallest Δx\Delta x such that f(x+Δx)=x~f(x+\Delta x) = \tilde{x}.

Example (Catastrophic Cancellation): Consider f(x)=x+1xf(x) = \sqrt{x+1} - \sqrt{x} for large xx. If x=108x=10^8, x+110000.00005\sqrt{x+1} \approx 10000.00005 and x=10000\sqrt{x} = 10000. Subtracting them loses significant digits. Rewriting as f(x)=1x+1+xf(x) = \frac{1}{\sqrt{x+1} + \sqrt{x}} completely avoids cancellation, demonstrating how algebraic manipulation improves numerical stability.

Floating-point arithmetic approximates real numbers with finite precision machine epsilon ϵmach2522.22×1016\epsilon_{\text{mach}} \approx 2^{-52} \approx 2.22\times10^{-16} in double precision. Rounding produces relative error ϵmach\approx \epsilon_{\text{mach}}. Catastrophic cancellation occurs when subtracting nearly equal numbers. Algorithms should avoid such cancellation and use numerically stable formulas.

Matrix decompositions are the workhorses of numerical linear algebra. LU decomposition PA=LUPA = LU (permutation, lower triangular, upper triangular) solves linear systems Ax=bAx = b in O(n3)O(n^3) time via forward/backward substitution. Cholesky decomposition A=LLTA = LL^T applies to symmetric positive definite matrices (twice as fast). QR decomposition A=QRA = QR (orthogonal QQ, upper triangular RR) provides stable least-squares solutions. Singular value decomposition A=UΣVTA = U\Sigma V^T gives the optimal low-rank approximation (Eckart-Young theorem): truncating to rank kk minimizes AAkF\|A - A_k\|_F over rank-kk matrices. Eigenvalue computation uses the QR algorithm (iterative QRQR on shifted AA).

Iterative methods solve large sparse systems: Jacobi iteration x(k+1)=D1(b(L+U)x(k))x^{(k+1)} = D^{-1}(b - (L+U)x^{(k)}); Gauss-Seidel uses latest components; conjugate gradient for SPD matrices minimizes error in the energy norm. The condition number of AA affects convergence rate.

Numerical optimization finds minima of functions. Gradient descent xk+1=xkαf(xk)x_{k+1} = x_k - \alpha \nabla f(x_k) converges linearly for well-conditioned problems. Newton’s method xk+1=xkHf1(xk)f(xk)x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k) converges quadratically near the solution but requires Hessian inversion. Quasi-Newton methods (BFGS) approximate the Hessian. For constrained optimization, Lagrange multipliers and KKT conditions characterize optimality. Linear programming minimizes cTxc^Tx subject to AxbAx \le b, x0x \ge 0 via the simplex method or interior-point methods.

Numerical integration (quadrature) approximates abf(x)dx\int_a^b f(x)\,dx: trapezoidal rule has error O(h2)O(h^2), Simpson’s rule O(h4)O(h^4), Gaussian quadrature O(h2n)O(h^{2n}). Adaptive quadrature refines subintervals where the function varies rapidly. Differential equation solvers: Euler’s method (first-order), Runge-Kutta methods (classical RK4 is fourth-order). Stiff equations require implicit methods (backward Euler, BDF) for stability.

  • Stability: condition number, forward/backward error, machine epsilon.
  • Decompositions: LU, Cholesky, QR, SVD; QR algorithm for eigenvalues.
  • Iterative methods: Jacobi, Gauss-Seidel, conjugate gradient.
  • Optimization: gradient descent, Newton, BFGS, KKT, linear programming.
  • Quadrature & ODEs: trapezoidal, Simpson, RK4, adaptive methods.

For review, be able to: compute condition number and interpret it; perform and apply LU, QR, and SVD factorizations; derive convergence criteria for iterative solvers; set up gradient descent and Newton iterations; understand error bounds for quadrature rules; identify stiffness in ODEs. Identify the source of ill-conditioning, the best matrix factorization for the problem, the convergence rate of iterative methods, and the dominant error term in approximation.

Mathematical spine

A=LU,A=QR,A=UΣVTA = LU,\qquad A = QR,\qquad A = U\Sigma V^T xk+1=xkαf(xk),xk+1=xkHf1(xk)f(xk)x_{k+1} = x_k - \alpha \nabla f(x_k),\qquad x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k)

Example

Solve the linear system

{2x+y=5,x+3y=8\begin{cases} 2x + y = 5, \\ x + 3y = 8 \end{cases}

using LU decomposition (without pivoting).

Solution. Write A=(2113)A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}. We seek L=(10l211)L = \begin{pmatrix} 1 & 0 \\ l_{21} & 1 \end{pmatrix} and U=(u11u120u22)U = \begin{pmatrix} u_{11} & u_{12} \\ 0 & u_{22} \end{pmatrix} such that A=LUA=LU. From the first row: u11=2u_{11}=2, u12=1u_{12}=1. From the second row: l21u11=1    l21=1/2l_{21}u_{11}=1 \implies l_{21}=1/2, and l21u12+u22=3    u22=31/2=5/2l_{21}u_{12}+u_{22}=3 \implies u_{22}=3-1/2=5/2. Thus

L=(101/21),U=(2105/2).L = \begin{pmatrix} 1 & 0 \\ 1/2 & 1 \end{pmatrix}, \qquad U = \begin{pmatrix} 2 & 1 \\ 0 & 5/2 \end{pmatrix}.

Solve Lc=bL\mathbf{c}=\mathbf{b} with b=(5,8)T\mathbf{b}=(5,8)^T:

  • c1=5c_1 = 5
  • 12c1+c2=8    c2=852=112\frac12 c_1 + c_2 = 8 \implies c_2 = 8 - \frac52 = \frac{11}{2}.

Solve Ux=cU\mathbf{x}=\mathbf{c}:

  • 52y=112    y=115\frac52 y = \frac{11}{2} \implies y = \frac{11}{5}
  • 2x+y=5    x=5y2=51152=1452=752x + y = 5 \implies x = \frac{5-y}{2} = \frac{5-\frac{11}{5}}{2} = \frac{\frac{14}{5}}{2} = \frac{7}{5}.

The solution is x=7/5x = 7/5, y=11/5y = 11/5.

Problems with Solutions

  1. Problem. Estimate the condition number of f(x)=xf(x) = \sqrt{x} at x=100x=100 using the relative condition number formula κ=xf(x)/f(x)\kappa = |xf'(x)/f(x)|.

    Solution. We have f(x)=12xf'(x) = \frac{1}{2\sqrt{x}}. Then

    κ=x12xx=x2x=12.\kappa = \left|\frac{x\cdot \frac{1}{2\sqrt{x}}}{\sqrt{x}}\right| = \frac{x}{2x} = \frac12.

    The problem is well-conditioned.

  2. Problem. Perform one step of gradient descent on f(x)=x2+2x+3f(x) = x^2 + 2x + 3 starting from x0=1x_0=1 with step size α=0.25\alpha=0.25.

    Solution. The gradient is f(x)=2x+2f'(x)=2x+2. At x0=1x_0=1, f(1)=4f'(1)=4.

    x1=x0αf(x0)=10.25(4)=0.x_1 = x_0 - \alpha f'(x_0) = 1 - 0.25(4) = 0.

    The new iterate is x1=0x_1 = 0.

  3. Problem. Approximate 01ex2dx\displaystyle\int_0^1 e^{-x^2}\,dx using the Trapezoidal rule with n=2n=2 subintervals.

    Solution. The nodes are x0=0x_0=0, x1=0.5x_1=0.5, x2=1x_2=1 with h=0.5h=0.5. The Trapezoidal rule gives

    T2=h2[f(0)+2f(0.5)+f(1)]=0.52[1+2e0.25+e1]0.25[1+2(0.7788)+0.3679]0.25(2.9255)0.7314.T_2 = \frac{h}{2}\left[f(0) + 2f(0.5) + f(1)\right] = \frac{0.5}{2}\left[1 + 2e^{-0.25} + e^{-1}\right] \approx 0.25\left[1 + 2(0.7788) + 0.3679\right] \approx 0.25(2.9255) \approx 0.7314.

Section summary. Numerical mathematics bridges continuous mathematics and computational implementation. Key themes: conditioning and stability ensure reliable computation; matrix decompositions (LU, QR, SVD) provide the algorithmic backbone for linear algebra; iterative methods and optimization algorithms solve large-scale problems; quadrature and ODE solvers approximate continuous processes. The interplay of mathematical insight and algorithmic design produces efficient, accurate, and stable numerical methods.