GSOC 2017 summary

The central part of my GSOC project concerns implementing the Jacobi-Davidson method natively in Julia, available in JacobiDavidson.jl. This method computes a few approximate solutions of the eigenvalue problem $Ax = \lambda Bx$ for large matrices $A$ and $B$ . As it uses iterative solvers internally, much time has gone into improving IterativeSolvers.jl in general. Lastly, as iterative solvers are typically used with preconditioners, I have implemented the incomplete LU factorization for sparse matrices as well in ILU.jl.

JacobiDavidson.jl

The Jacobi-Davidson implementation is ready for use and can be applied to solving the (generalized) eigenvalue problem for non-Hermitian matrices. It’s similar to the eigs method already available in Julia: you can get a couple eigenvalues near a specified target in the complex plane.

The jdqr function computes a partial Schur decomposition $AQ = QR$ of a general matrix $A$ . The columns of $Q$ are orthonormal Schur vectors, and the diagonal of $R$ contains the eigenvalues of $A$ . Working with Schur vectors is numerically more stable than working with eigenvectors, but eigenvectors can be computed from the Schur vectors when necessary.

For the generalized problem we have the jdqz function that computes a partial generalized Schur decomposition $AZ = QS$ and $BZ = QT$ for a matrix pencil $(A, B)$ . Here $Q$ and $Z$ have orthonormal columns and $S$ and $T$ upper triangular. The eigenvalues $\lambda_k$ of $Ax = \lambda_k Bx$ can be found by computing $\lambda_k = \frac{S_{kk}}{T_{kk}}$ .

At this point no official release has been tagged yet, as there is still some work to be done: hopefully the jdqr and jdqz methods can largely be merged as they are very similar; there are some optimizations for Hermitian problems that aren’t yet implemented; and lastly the methods do not yet support generic vectors and numbers.

For an example of jdqz see the last section of this post.

IterativeSolvers.jl

My contributions to IterativeSolvers.jl include

improving speed and memory usage in GMRES, CG, Chebyshev iteration, the power method and stationary methods mentioned below.
adding missing iterative methods MINRES and BiCGStab(l);
implementing stationary methods for sparse matrices #156;
upgrading the package to Julia 0.6 and thereby updating the tests;
improving and restructuring the documentation.

ILU.jl

Computing the full LU decomposition of a large, sparse matrix requires too much memory and computational work. By simply dropping small terms during the LU factorization these problems can be remedied, at the expense of being inexact or incomplete (ILU). This ILU factorization can still be an excellent preconditioner to speed up iterative methods.

As ILU for the SparseMatrixCSC type was not yet available in Julia, I’ve implemented it based on the article “Crout versions of ILU for general sparse matrices” by Na Li, Yousef Saad and Edmond Chow.

The package is completely ready for use and is well tested. See below for an example.

In a later version the package can be improved by adding different dropping strategies.

Examples

Below you can find a few examples on how to use the packages I’ve been working on.

Jacobi-Davidson

Let’s take a look at a toy example of the generalized eigenvalue problem $Ax = \lambda Bx$ where $A$ and $B$ are diagonal matrices of size $n \times n$ with $A_{kk} = \sqrt{k}$ and $B_{kk} = 1 / \sqrt{k}$ . The eigenvalues are just the integers $1, \cdots, n$ .

We implement the action of the matrices $A$ and $B$ matrix-free, using LinearMaps.jl:

using LinearMaps

function myA!(y, x)
  for i = 1 : length(x)
    @inbounds y[i] = sqrt(i) * x[i]
  end
end

function myB!(y, x)
  for i = 1 : length(x)
    @inbounds y[i] = x[i] / sqrt(i)
  end
end

A = LinearMap{Complex128}(myA!, n; ismutating = true)
B = LinearMap{Complex128}(myB!, n; ismutating = true)

Next we fix $n = 100$ , set the number of requested eigenpairs to 5 and target the eigenvalues with largest real part, somewhere near $110.0$ . We use a couple steps of BiCGStab(2) as an internal iterative solver, and set the dimension of the search space from 5 to 10 vectors:

using JacobiDavidson

n = 100
target = LR(110.0 + 0.0im) # Largest Real part

schur, resnorms = jdqz(A, B, 
    bicgstabl_solver(n, max_mv_products = 10, l = 2),
    target = target,
    pairs = 5,
    ɛ = 1e-9,
    min_dimension = 5,
    max_dimension = 10,
    max_iter = 100,
    verbose = true
)

println.(schur.alphas ./ schur.betas)
plot(resnorms, yscale = :log10, marker = :+, label = "Residual norm", xlabel = "Iteration")

It indeed finds the eigenvalues from 96 up to 100:

100.00000000000041 + 5.329070518200757e-14im
98.99999999999996 - 1.7674527390216706e-14im
98.00000000000013 - 3.542337114695263e-14im
97.00000000000024 + 1.749508591650104e-14im
96.00000000000013 + 1.3053503572900918e-14im

The plot shows the convergence history, which is the residual norm $\|Ax - \lambda Bx\|$ in each iteration:

Preconditioning

The problem gets harder when we increase $n$ a few orders of magnitude. Especially if we want to target eigenvalues in the interior of the spectrum (near $n / 2$ ), GMRES and BiCGStab(l) might have trouble solving a very indefinite system.

In that case a preconditioner for $(A - \tau B)$ can be used, where $\tau$ is the target. We will just use the exact inverse, which is a diagonal matrix $P$ with entries $P_{kk} = \sqrt{k} / (k - \tau)$ .

It can be implemented matrix-free and in-place:

import Base.LinAlg.A_ldiv_B!

struct SuperPreconditioner{numT <: Number}
    target::numT
end

function A_ldiv_B!(p::SuperPreconditioner, x)
    for i = 1 : length(x)
        @inbounds x[i] *= sqrt(i) / (i - p.target)
    end
end

Now we call Jacobi-Davidson with the Near target and pass the preconditioner. This time we use GMRES, but BiCGStab works as well.

n = 100_000
τ = 50_000.1 + 0im
target = Near(τ)
A = LinearMap{Complex128}(myA!, n; ismutating = true)
B = LinearMap{Complex128}(myB!, n; ismutating = true)
P = SuperPreconditioner(τ)

schur, residuals = jdqz(A, B, 
    gmres_solver(n, iterations = 10),
    preconditioner = P,
    target = target,
    pairs = 5,
    ɛ = 1e-9,
    min_dimension = 5,
    max_dimension = 10,
    max_iter = 200,
    verbose = true
)

It converges to the eigenvalues 49999, 50000, 50001, 50002 and 50004:

50004.00000000014 + 3.5749921718300463e-12im
49999.999999986496 - 7.348301591250897e-12im
50001.00000000359 - 1.9761169705101647e-11im
49998.99999999998 - 1.0866253642291695e-10im
50002.00000000171 - 2.3559720511618024e-11im

It does not yet detect 50003, but that might happen when pairs is increased a bit. What is more interesting is that the total number of iterations is small as a result of the preconditioner:

It’s not easy to construct a preconditioner this good for any given problem, but usually people tend to know what works well in specific classes of problems. If no specific preconditioner is availabe, you can always try a general one such as ILU. The next section illustrates that.

ILU example

As an example of how ILU can be used we generate a non-symmetric, banded matrix having a structure that typically arises in finite differences schemes of three-dimensional problems:

n = 64
N = n^3
A = spdiagm((fill(-1.0, n - 1), fill(3.0, n), fill(-2.0, n - 1)), (-1, 0, 1))
Id = speye(n)
A = kron(A, Id) + kron(Id, A)
A = kron(A, Id) + kron(Id, A)
x = ones(N)
b = A * x

The matrix $A$ has size $64^3 \times 64^3$ . We want to solve the problem $Ax = b$ using for instance BiCGStab(2), but it turns out that convergence can get slow when the size of the problem grows. A quick benchmark shows it takes about 2.0 seconds to solve the problem to a reasonable tolerance:

> using BenchmarkTools, IterativeSolvers
> my_x = @btime bicgstabl($A, $b, 2, max_mv_products = 2000);
2.051 s
> norm(b - A * my_x) / norm(b)
1.6967043606691152e-9

Now let’s construct the incomplete LU decomposition:

> using ILU
> LU = crout_ilu(A, τ = 0.1)
> nnz(LU) / nnz(A)
2.1180353639352374

Using the above drop tolerance $\tau$ , we get an approximate LU decomposition using only about twice as many entries as the original matrix, which is reasonable. Let’s see what happens when we benchmark the solver again, now with ILU as a preconditioner:

> my_x = @btime bicgstabl($A, $b, 2, Pl = $LU, max_mv_products = 2000);
692.187 ms
> norm(b - A * my_x) / norm(b)
2.133397068536056e-9

It solves the problem 66% faster to the same tolerance. There is of course a caveat, as constructing the preconditioner itself takes time as well:

> LU = @btime crout_ilu($A, τ = 0.1);
611.019 ms

So all in all the problem is solved about 36% faster. However, if we have multiple right-hand sides for the same matrix, we can construct the preconditioner only once and use it multiple times. Even when the matrix changes slightly you could reuse the ILU factorization. The latter is exactly what happens in Jacobi-Davidson.