2021-10-15 Sparse Linear Algebra¶
Last time¶
Note on p-Laplacian activity
Weak forms and jump conditions
2D wave equations via gas dynamics
Appropriate time integrators
Hamiltonian structure and energy drift
Today¶
Recap and questions
Sparse direct solvers
matrix orderings
impact on formulation
cost scaling
Why iterative solvers
Discussion with Ioana Fleming
using Plots
using LinearAlgebra
using SparseArrays
default(linewidth=4)
function plot_stability(Rz, title; xlims=(-2, 2), ylims=(-2, 2))
x = LinRange(xlims[1], xlims[2], 100)
y = LinRange(ylims[1], ylims[2], 100)
heatmap(x, y, (x, y) -> abs(Rz(x + 1im*y)), c=:bwr, clims=(0, 2), aspect_ratio=:equal, title=title)
end
struct RKTable
A::Matrix
b::Vector
c::Vector
function RKTable(A, b)
s = length(b)
A = reshape(A, s, s)
c = vec(sum(A, dims=2))
new(A, b, c)
end
end
function rk_stability(z, rk)
s = length(rk.b)
1 + z * rk.b' * ((I - z*rk.A) \ ones(s))
end
rk4 = RKTable([0 0 0 0; .5 0 0 0; 0 .5 0 0; 0 0 1 0], [1, 2, 2, 1] / 6)
function ode_rk_explicit(f, u0; tfinal=1., h=0.1, table=rk4)
u = copy(u0)
t = 0.
n, s = length(u), length(table.c)
fY = zeros(n, s)
thist = [t]
uhist = [u0]
while t < tfinal
tnext = min(t+h, tfinal)
h = tnext - t
for i in 1:s
ti = t + h * table.c[i]
Yi = u + h * sum(fY[:,1:i-1] * table.A[i,1:i-1], dims=2)
fY[:,i] = f(ti, Yi)
end
u += h * fY * table.b
t = tnext
push!(thist, t)
push!(uhist, u)
end
thist, hcat(uhist...)
end
function laplacian_matrix(n)
"Laplacian in n×n periodic domain"
h = 2 / n
rows = Vector{Int64}()
cols = Vector{Int64}()
vals = Vector{Float64}()
wrap(i) = (i + n - 1) % n + 1
idx(i, j) = (wrap(i)-1)*n + wrap(j)
stencil_diffuse = [-1, -1, 4, -1, -1] / h^2
for i in 1:n
for j in 1:n
append!(rows, repeat([idx(i,j)], 5))
append!(cols, [idx(i-1,j), idx(i,j-1), idx(i,j), idx(i+1,j), idx(i,j+1)])
append!(vals, stencil_diffuse)
end
end
sparse(rows, cols, vals)
end
laplacian_matrix (generic function with 1 method)
Two forms of acoustic wave equation¶
Divide the momentum equation through by background density and dropping the tildes yields the standard form.
Examine second equation
Let’s differentiate the first equation,
Note: we had to assume these derivatives exist!
We can reduce this to a first order system as
Question¶
How is the problem size different?
What might we be concerned about in choosing the second formulation?
Wave operator¶
function wave_matrix(n; a=1)
Z = spzeros(n^2, n^2)
L = laplacian_matrix(n)
[Z I; -a^2*L Z]
end
wave_matrix(2)
8×8 SparseMatrixCSC{Float64, Int64} with 16 stored entries:
⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0
-4.0 2.0 2.0 ⋅ ⋅ ⋅ ⋅ ⋅
2.0 -4.0 ⋅ 2.0 ⋅ ⋅ ⋅ ⋅
2.0 ⋅ -4.0 2.0 ⋅ ⋅ ⋅ ⋅
⋅ 2.0 2.0 -4.0 ⋅ ⋅ ⋅ ⋅
A = wave_matrix(8; a=2) * .1
ev = eigvals(Matrix(A))
plot_stability(z -> rk_stability(z, rk4), "RK4", xlims=(-4, 4), ylims=(-4, 4))
scatter!(real(ev), imag(ev), color=:black)
Question: would forward Euler work?¶
Example 2D wave solver with RK4¶
n = 20
A = wave_matrix(n)
x = LinRange(-1, 1, n+1)[1:end-1]
y = x
rho0 = vec(exp.(-9*((x .+ 1e-4).^2 .+ y'.^2)))
sol0 = vcat(rho0, zero(rho0))
thist, solhist = ode_rk_explicit((t, sol) -> A * sol, sol0, h=.02)
size(solhist)
(800, 51)
@gif for tstep in 1:length(thist)
rho = solhist[1:n^2, tstep]
contour(x, y, reshape(rho, n, n), title="\$ t = $(thist[tstep])\$")
end
┌ Info: Saved animation to
│ fn = /home/jed/cu/numpde/slides/tmp.gif
└ @ Plots /home/jed/.julia/packages/Plots/1RWWg/src/animation.jl:114
Accuracy, conservation of mass with RK4¶
thist, solhist = ode_rk_explicit((t, sol) -> A * sol, sol0, h=.05,
tfinal=1)
tfinal = thist[end]
M = exp(Matrix(A*tfinal))
sol_exact = M * sol0
sol_final = solhist[:, end]
norm(sol_final - sol_exact)
0.02015111748435016
mass = vec(sum(solhist[1:n^2, :], dims=1))
plot(thist[2:end], mass[2:end] .- mass[1])
Conservation of energy with RK4¶
Hamiltonian structure¶
We can express the total energy for our system as a sum of kinetic and potential energy.
where we identify \(\rho\) as a generalized position and \(\dot\rho\) as generalized momentum. Hamilton’s equations state that the equations of motion are
where we have used the weak form to associate \(\int \nabla v \cdot \nabla u = v^T L u\).
function energy(sol, n)
L = laplacian_matrix(n)
rho = sol[1:end÷2]
rhodot = sol[end÷2+1:end]
kinetic = .5 * norm(rhodot)^2
potential = .5 * rho' * L * rho
kinetic + potential
end
ehist = [energy(solhist[:,i], n) for i in 1:length(thist)]
plot(thist, ehist)
Velocity Verlet integrator¶
function wave_verlet(n, u0; tfinal=1., h=0.1)
L = laplacian_matrix(n)
u = copy(u0)
t = 0.
thist = [t]
uhist = [u0]
irho = 1:n^2
irhodot = n^2+1:2*n^2
accel = -L * u[irho]
while t < tfinal
tnext = min(t+h, tfinal)
h = tnext - t
u[irho] += h * u[irhodot] + h^2/2 * accel
accel_next = -L * u[irho]
u[irhodot] += h/2 * (accel + accel_next)
accel = accel_next
t = tnext
push!(thist, t)
push!(uhist, copy(u))
end
thist, hcat(uhist...)
end
wave_verlet (generic function with 1 method)
thist, solhist = wave_verlet(n, sol0, h=.04)
@gif for tstep in 1:length(thist)
rho = solhist[1:n^2, tstep]
contour(x, y, reshape(rho, n, n), title="\$ t = $(thist[tstep])\$")
end
┌ Info: Saved animation to
│ fn = /home/jed/cu/numpde/slides/tmp.gif
└ @ Plots /home/jed/.julia/packages/Plots/1RWWg/src/animation.jl:114
Accuracy and conservation for Verlet¶
thist, solhist = wave_verlet(n, sol0, h=.05, tfinal=50)
tfinal = thist[end]
M = exp(Matrix(A*tfinal))
sol_exact = M * sol0
sol_final = solhist[:, end]
@show norm(sol_final - sol_exact)
mass = vec(sum(solhist[1:n^2, :], dims=1))
plot(thist[2:end], mass[2:end] .- mass[1])
norm(sol_final - sol_exact) = 6.86250099252766
ehist = [energy(solhist[:,i], n) for i in 1:length(thist)]
plot(thist, ehist)
Notes on time integrators¶
We need stability on the imaginary axis for our discretization (and the physical system)
If the model is dissipative (e.g., we didn’t make the zero-viscosity assumption), then we need stability in the left half plane.
The split form \(\rho, \rho\mathbf u\) form is usually used with (nonlinear) upwinding, and thus will have dissipation.
Runge-Kutta methods¶
Easy to use, stability region designed for spatial discretization
Energy drift generally present
Verlet/leapfrog/Newmark and symplectic integrators¶
These preserve the “geometry of the Hamiltonian”
energy is not exactly conserved, but it doesn’t drift over time
such methods are called “symplectic integrators”
May not have stability away from the imaginary axis (for dissipation)
Most require a generalized position/momentum split, “canonical variables”
Start with advection-diffusion operator in 2D¶
Eliminate Dirichlet boundary conditions around all sides
function advdiff_matrix(n; kappa=1, wind=[0, 0])
h = 2 / (n + 1)
rows = Vector{Int64}()
cols = Vector{Int64}()
vals = Vector{Float64}()
idx((i, j),) = (i-1)*n + j
in_domain((i, j),) = 1 <= i <= n && 1 <= j <= n
stencil_advect = [-wind[1], -wind[2], 0, wind[1], wind[2]] / h
stencil_diffuse = [-1, -1, 4, -1, -1] * kappa / h^2
stencil = stencil_advect + stencil_diffuse
for i in 1:n
for j in 1:n
neighbors = [(i-1, j), (i, j-1), (i, j), (i+1, j), (i, j+1)]
mask = in_domain.(neighbors)
append!(rows, idx.(repeat([(i,j)], 5))[mask])
append!(cols, idx.(neighbors)[mask])
append!(vals, stencil[mask])
end
end
sparse(rows, cols, vals)
end
advdiff_matrix (generic function with 1 method)
function my_spy(A)
cmax = norm(vec(A), Inf)
s = max(1, ceil(120 / size(A, 1)))
spy(A, marker=(:square, s), c=:diverging_rainbow_bgymr_45_85_c67_n256, clims=(-cmax, cmax))
end
A = advdiff_matrix(5, wind=[1, 1])
@show norm(A - A')
my_spy(A)
norm(A - A') = 53.665631459994955
Gaussian elimination and Cholesky¶
\(LU = A\)¶
Given a \(2\times 2\) block matrix, the algorithm proceeds as \begin{split}
\end{split} where \(L_A U_A = A\) and
Cholesky \(L L^T = A\)¶
\begin{split}
\end{split} where \(L_A L_A^T = A\) and
A = advdiff_matrix(5)
N = size(A, 1)
ch = cholesky(A, perm=1:N)
my_spy(sparse(ch.L))
Cost of a banded solve¶
Consider an \(N\times N\) matrix with bandwidth \(b\), \(1 \le b \le N\).
Work one row at a time
Each row/column of panel has \(b\) nonzeros
Schur update affects \(b\times b\) sub-matrix
Total compute cost \(N b^2\)
Storage cost \(N b\)
Question¶
What bandwidth \(b\) is needed for an \(N = n\times n \times n\) cube in 3 dimensions?
What is the memory cost?
What is the compute cost?
my_spy(sparse(ch.L))
Different orderings¶
n = 20
A = advdiff_matrix(n)
heatmap(reshape(1:n^2, n, n))
import Metis
perm, iperm = Metis.permutation(A)
heatmap(reshape(iperm, n, n))
cholesky(A, perm=1:n^2)
SuiteSparse.CHOLMOD.Factor{Float64}
type: LLt
method: simplicial
maxnnz: 8019
nnz: 8019
success: true
cholesky(A, perm=Vector{Int64}(perm))
SuiteSparse.CHOLMOD.Factor{Float64}
type: LLt
method: simplicial
maxnnz: 4031
nnz: 4031
success: true
Cholesky factors in nested dissection¶
n = 10
A = advdiff_matrix(n)
perm, iperm = Metis.permutation(A)
my_spy(A[perm, perm])
ch = cholesky(A, perm=Vector{Int64}(perm))
my_spy(sparse(ch.L))
The dense blocks in factor \(L\) are “supernodes”
They correspond to “vertex separators” in the ordering
Cost in nested dissection¶
Cost is dominated by dense factorization of the largest supernode
Its size comes from the vertex separator size
2D square¶
\(N = n^2\) dofs
Vertex separator of size \(n\)
Compute cost \(v^3 = N^{3/2}\)
Storage cost \(N \log N\)
3D Cube¶
\(N = n^3\) dofs
Vertex separator of size \(v = n^2\)
Compute cost \(v^3 = n^6 = N^2\)
Storage cost \(v^2 = n^4 = N^{4/3}\)
Questions¶
How much does the cost change if we switch from Dirichlet to periodic boundary conditions in 2D?
How much does the cost change if we move from 5-point stencil (\(O(h^2)\) accuracy) to 9-point “star” stencil (\(O(h^4)\) accuracy)?
Would you rather solve a 3D problem on a \(10\times 10\times 10000\) grid or \(100\times 100 \times 100\)?
Test our intuition¶
n = 50
A_dirichlet = advdiff_matrix(n)
perm, iperm = Metis.permutation(A_dirichlet)
cholesky(A_dirichlet, perm=Vector{Int64}(perm))
#cholesky(A_dirichlet)
SuiteSparse.CHOLMOD.Factor{Float64}
type: LLt
method: simplicial
maxnnz: 40203
nnz: 40203
success: true
A_periodic = laplacian_matrix(n) + 1e-10*I
perm, iperm = Metis.permutation(A_periodic)
cholesky(A_periodic, perm=Vector{Int64}(perm))
#cholesky(A_periodic)
SuiteSparse.CHOLMOD.Factor{Float64}
type: LLt
method: supernodal
maxnnz: 0
nnz: 78361
success: true
How expensive how fast?¶
Suppose we have a second order accurate method in 3D.
n = 2. .^ (2:13)
N = n.^3
error = (50 ./ n) .^ 2
seconds = 1e-10 * N.^2
hours = seconds / 3600
cloud_dollars = 3 * hours
kW_hours = 0.2 * hours
barrel_of_oil = kW_hours / 1700
kg_CO2 = kW_hours * 0.709
;
cost = hours
plot(cost, error, xlabel="cost", ylabel="error", xscale=:log10, yscale=:log10)
Outlook on sparse direct solvers¶
Sparse direct works well for 2D and almost-2D problems to medium large sizes
High order FD methods make sparse direct cry
High order finite element are okay, but not high-continuity splines
Almost-2D includes a lot of industrial solid mechanics applications
The body of a car, the frame of an airplane
Sparse direct is rarely usable in “fully 3D” problems
“thick” structures
soil mechanics, hydrology, building foundations, bones, tires
fluid mechanics
aerodynamics, heating/cooling systems, atmosphere/ocean
Setup cost (factorization) is much more expensive than solve
Amortize cost in time-dependent problems
Rosenbrock methods: factorization reused across stages
“lag” Jacobian in Newton (results in “modified Newton”)
“lag” preconditioner with matrix-free iterative methods (Sundials, PETSc)
Factorization pays off if you have many right hand sides