2023-10-13 Linear Solvers#

Last time#

  • Wave equation and Hamiltonians

  • Symplectic integrators

Today#

  • Request RC accounts

  • Sparse direct solvers

    • impact of ordering on formulation

    • cost scaling

using Plots
default(linewidth=3)
using LinearAlgebra
using SparseArrays

function my_spy(A)
    cmax = norm(vec(A), Inf)
    s = max(1, ceil(120 / size(A, 1)))
    spy(A, marker=(:square, s), c=:diverging_rainbow_bgymr_45_85_c67_n256, clims=(-cmax, cmax))
end

function advdiff_matrix(n; kappa=1, wind=[0, 0])
    h = 2 / (n + 1)
    rows = Vector{Int64}()
    cols = Vector{Int64}()
    vals = Vector{Float64}()
    idx((i, j),) = (i-1)*n + j
    in_domain((i, j),) = 1 <= i <= n && 1 <= j <= n
    stencil_advect = [-wind[1], -wind[2], 0, wind[1], wind[2]] / h
    stencil_diffuse = [-1, -1, 4, -1, -1] * kappa / h^2
    stencil = stencil_advect + stencil_diffuse
    for i in 1:n
        for j in 1:n
            neighbors = [(i-1, j), (i, j-1), (i, j), (i+1, j), (i, j+1)]
            mask = in_domain.(neighbors)
            append!(rows, idx.(repeat([(i,j)], 5))[mask])
            append!(cols, idx.(neighbors)[mask])
            append!(vals, stencil[mask])
        end
    end
    sparse(rows, cols, vals)
end
advdiff_matrix (generic function with 1 method)

Request an RC account#

https://www.colorado.edu/rc#

  • This gives ssh login access. We’ll use Alpine, which is a modern CPU and GPU cluster.

  • This will be good for benchmarking and larger runs. We’ll also use GPUs later in class.

PETSc#

  • We’ll start using PETSc next week.

  • You can build PETSc on your laptop. You’ll need C development tools.

    • Linux: use package manager (apt install, dnf install, …)

    • OSX: install xcode, many people like homebrew as a package manager

    • Windows: WSL then follow Linux instructions (or MSYS2 or Cygwin)

    • Any: install Docker, then use our image

Gaussian elimination and Cholesky#

\(LU = A\)#

Given a \(2\times 2\) block matrix, the algorithm proceeds as \begin{split}

(32)#\[\begin{bmatrix} A & B \\ C & D \end{bmatrix}\]
(33)#\[\begin{bmatrix} L_A & \\ C U_A^{-1} & 1 \end{bmatrix}\]
(34)#\[\begin{bmatrix} U_A & L_A^{-1} B \\ & S \end{bmatrix}\]

\end{split} where \(L_A U_A = A\) and

\[S = D - C \underbrace{U_A^{-1} L_A^{-1}}_{A^{-1}} B .\]

Cholesky \(L L^T = A\)#

\begin{split}

(35)#\[\begin{bmatrix} A & B^T \\ B & D \end{bmatrix}\]
(36)#\[\begin{bmatrix} L_A & \\ B L_A^{-T} & 1 \end{bmatrix}\]
(37)#\[\begin{bmatrix} L_A^T & L_A^{-1} B^T \\ & S \end{bmatrix}\]

\end{split} where \(L_A L_A^T = A\) and

\[ S = D - B \underbrace{L_A^{-T} L_A^{-1}}_{A^{-1}} B^T .\]

A = advdiff_matrix(10)
N = size(A, 1)
ch = cholesky(A, perm=1:N)
my_spy(A)
my_spy(sparse(ch.L))

Cost of a banded solve#

Consider an \(N\times N\) matrix with bandwidth \(b\), \(1 \le b \le N\).

  • Work one row at a time

  • Each row/column of panel has \(b\) nonzeros

  • Schur update affects \(b\times b\) sub-matrix

  • Total compute cost \(N b^2\)

  • Storage cost \(N b\)

Question#

  • What bandwidth \(b\) is needed for an \(N = n\times n \times n\) cube in 3 dimensions?

  • What is the memory cost?

  • What is the compute cost?

my_spy(sparse(ch.L))

Different orderings#

n = 40
A = advdiff_matrix(n)
heatmap(reshape(1:n^2, n, n))
import Metis
perm, iperm = Metis.permutation(A)
heatmap(reshape(iperm, n, n))
cholesky(A, perm=1:n^2)
my_spy(A[perm, perm])
ch = cholesky(A, perm=Vector{Int64}(perm))
my_spy(sparse(ch.L))