2023-10-18 Krylov and preconditioning

2023-10-18 Krylov and preconditioning#

Last time#

Classical iterative methods
Concept of preconditioning

Today#

Krylov methods (focus on GMRES)
PETSc experiments
Simple preconditioners

using Plots
default(linewidth=3)
using LinearAlgebra
using SparseArrays

function my_spy(A)
    cmax = norm(vec(A), Inf)
    s = max(1, ceil(120 / size(A, 1)))
    spy(A, marker=(:square, s), c=:diverging_rainbow_bgymr_45_85_c67_n256, clims=(-cmax, cmax))
end

function advdiff_matrix(n; kappa=1, wind=[0, 0])
    h = 2 / (n + 1)
    rows = Vector{Int64}()
    cols = Vector{Int64}()
    vals = Vector{Float64}()
    idx((i, j),) = (i-1)*n + j
    in_domain((i, j),) = 1 <= i <= n && 1 <= j <= n
    stencil_advect = [-wind[1], -wind[2], 0, wind[1], wind[2]] / h
    stencil_diffuse = [-1, -1, 4, -1, -1] * kappa / h^2
    stencil = stencil_advect + stencil_diffuse
    for i in 1:n
        for j in 1:n
            neighbors = [(i-1, j), (i, j-1), (i, j), (i+1, j), (i, j+1)]
            mask = in_domain.(neighbors)
            append!(rows, idx.(repeat([(i,j)], 5))[mask])
            append!(cols, idx.(neighbors)[mask])
            append!(vals, stencil[mask])
        end
    end
    sparse(rows, cols, vals)
end

advdiff_matrix (generic function with 1 method)

Krylov subspaces#

All matrix iterations work with approximations in a Krylov subspace, which has the form

\[ K_n = \big[ b \big| Ab \big| A^2 b \big| \dotsm \big| A^{n-1} b \big] . \]

This matrix is horribly ill-conditioned and cannot stably be computed as written. Instead, we seek an orthogonal basis $Q_n$ that spans the same space as $K_n$. We could write this as a factorization

\[ K_n = Q_n R_n \]

where the first column $q_0 = b / \lVert b \rVert$. The $R_n$ is unnecessary and hopelessly ill-conditioned, so a slightly different procedure is used.

Arnoldi iteration#

The Arnoldi iteration applies orthogonal similarity transformations to reduce $A$ to Hessenberg form, starting from a vector $q_0 = b$,

\[ A = Q H Q^* . \]

Let’s multiply on the right by $Q$ and examine the first $n$ columns,

\[ A Q_n = Q_{n+1} H_n \]

where $H_n$ is an $(n+1) \times n$ Hessenberg matrix.

Aside: Conditioning of Arnoldi process#

The Arnoldi process is well-conditioned because $Q$ is orthogonal and

\[ \begin{align}\begin{aligned} \lVert H_n \rVert \le \lVert Q_{n+1}^* \rVert \lVert A \rVert \lVert Q_n \rVert \le \lVert A \rVert $$.\\For a lower bound, we have\\$$ \sigma_{\min}(A)^2 \le x^* A^* A x \end{aligned}\end{align} \]

for all $x$ of norm 1. It must also be true for any $x = Q_n y$ where $\lVert y\rVert = 1$, thus

\[ \sigma_{\min}(A)^2 \le y^* Q_n^* A^* A Q_n y = y^* H_n^* Q_{n+1}^* Q_{n+1} H_n y = y^* H_n^* H_n y . \]

GMRES#

\[ A Q_n = Q_{n+1} H_n \]

GMRES (Generalized Minimum Residual) minimizes

\[ \lVert A x - b \rVert \]

over the subspace $Q_n$. I.e., $x = Q_n y$ for some $y$. By the Arnoldi recurrence, this is equivalent to

\[ \lVert Q_{n+1} H_n y - b \lVert \]

which can be solved by minimizing

\[ \lVert H_n y - Q_{n+1}^* b \rVert . \]

Since $q_0 = b/\lVert b \lVert$, the least squares problem is to minimize

\[ \Big\lVert H_n y - \lVert b \rVert e_0 \Big\rVert . \]

The solution of this least squares problem is achieved by incrementally updating a $QR$ factorization of $H_n$.

Notes#

The solution $x_n$ constructed by GMRES at iteration $n$ is not explicitly available. If a solution is needed, it must be constructed by solving the $(n+1)\times n$ least squares problem and forming the solution as a linear combination of the $n$ vectors $Q_n$. The leading cost is $2mn$ where $m \gg n$.
The residual vector $r_n = A x_n - b$ is not explicitly available in GMRES. To compute it, first build the solution $x_n$ as above.
GMRES minimizes the 2-norm of the residual $\lVert r_n \rVert$ which is equivalent to the $A^* A$ norm of the error $\lVert x_n - x_* \rVert_{A^* A}$.

More notes on GMRES#

GMRES needs to store the full $Q_n$, which is unaffordable for large $n$ (many iterations). The standard solution is to choose a “restart” $k$ and to discard $Q_n$ and start over with an initial guess $x_k$ after each $k$ iterations. This algorithm is called GMRES(k). PETSc’s default solver is GMRES(30) and the restart can be controlled using the run-time option -ksp_gmres_restart.
Most implementations of GMRES use classical Gram-Schmidt because it is much faster in parallel (one reduction per iteration instead of $n$ reductions per iteration). The PETSc option -ksp_gmres_modifiedgramschmidt can be used when you suspect that classical Gram-Schmidt may be causing instability.
There is a very similar (and older) algorithm called GCR that maintains $x_n$ and $r_n$. This is useful, for example, if a convergence tolerance needs to inspect individual entries. GCR requires $2n$ vectors instead of $n$ vectors, and can tolerate a nonlinear preconditioner. FGMRES is a newer algorithm with similar properties to GCR.

Experiments in PETSc #

PETSc = Portable Extensible Toolkit for Scientific computing
./configure, make
- Depends on BLAS and LAPACK (default system or package manager)
  - or ./configure --download-f2cblaslapack --download-blis
- Depends on MPI for parallelism (package manager)
  - or ./configure --download-mpich (or --download-openmpi)
docker run -it --rm jedbrown/petsc
Lots of examples (mostly C and Fortran, some Python)
Experimental bindings in Rust, Julia
We’ll use src/snes/tutorials/ex15.c
- “p-Bratu”: combines p-Laplacian with Bratu nonlinearity
  \[ -\nabla\cdot\big(\kappa(\nabla u) \nabla u\big) - \lambda e^u = f \]
- Newton solver with Krylov on each iteration

Simple preconditioners#

Jacobi `-pc_type jacobi`#

\[ P_{\text{Jacobi}}^{-1} = D^{-1} \]

where $D$ is the diagonal of $A$.

Gauss-Seidel `-pc_type sor`#

\[ P_{GS}^{-1} = (L+D)^{-1} \]

where $L$ is the (strictly) lower triangular part of $A$. The upper triangular part may be used instead, or a symmetric form

\[ P_{SGS}^{-1} = (L+U)^{-1} A \Big( I - (L+D)^{-1} \Big) . \]

Over-relaxation#

-pc_sor_omega 1. is default (Gauss-Seidel)

Run p-Laplacian example#

$ cd src/snes/tutorials
$ make ex15
$ ./ex15 -da_refine 2 -dm_view
$ ./ex15 -ksp_monitor -pc_type jacobi
$ ./ex15 -snes_view

Experiments#

How does iteration count vary under grid refinement?
How sensitive is it to parameters
- p-Laplacian -p $> 1$ and -epsilon $> 0$
- Bratu -lambda $< 6.8$
How sensitive to -ksp_gmres_restart?
-ksp_monitor_true_residual
-ksp_view_eigenvalues

Symmetric problems#

Lanczos iteration: like GMRES for symmetric problems#

If $A$ is symmetric, then $H = Q^T A Q$ is also symmetric. Since $H$ is Hessenberg, this means $H$ is tridiagonal. Instead of storing $Q_n$, it is sufficient to store only the last two columns since the iteration satisfies a 3-term recurrence. The analog of GMRES for the symmetric case is called MINRES and is also useful for solving symmetric indefinite problems.

Conjugate Gradients: changing the norm#

Instead of minimizing the $A^T A$ norm of the error, the Conjugate Gradient method minimizes the $A$ norm of the error. For $A$ to induce a norm, it must be symmetric positive definite. Jeremy Shewchuck’s guide to CG is an excellent resource.