GSOC 2017 summary

As GSOC 2017 is coming to an end, I'd like to use this blog post to give a summary of the things I have been working on.

August 18, 2017 - 13 minute read -
gsoc julia

The central part of my GSOC project concerns implementing the Jacobi-Davidson method natively in Julia, available in JacobiDavidson.jl. This method computes a few approximate solutions of the eigenvalue problem Ax=λBxAx = \lambda Bx for large matrices AA and BB. As it uses iterative solvers internally, much time has gone into improving IterativeSolvers.jl in general. Lastly, as iterative solvers are typically used with preconditioners, I have implemented the incomplete LU factorization for sparse matrices as well in ILU.jl.

JacobiDavidson.jl

The Jacobi-Davidson implementation is ready for use and can be applied to solving the (generalized) eigenvalue problem for non-Hermitian matrices. It’s similar to the eigs method already available in Julia: you can get a couple eigenvalues near a specified target in the complex plane.

The jdqr function computes a partial Schur decomposition AQ=QRAQ = QR of a general matrix AA. The columns of QQ are orthonormal Schur vectors, and the diagonal of RR contains the eigenvalues of AA. Working with Schur vectors is numerically more stable than working with eigenvectors, but eigenvectors can be computed from the Schur vectors when necessary.

For the generalized problem we have the jdqz function that computes a partial generalized Schur decomposition AZ=QSAZ = QS and BZ=QTBZ = QT for a matrix pencil (A,B)(A, B). Here QQ and ZZ have orthonormal columns and SS and TT upper triangular. The eigenvalues λk\lambda_k of Ax=λkBxAx = \lambda_k Bx can be found by computing λk=SkkTkk\lambda_k = \frac{S_{kk}}{T_{kk}}.

At this point no official release has been tagged yet, as there is still some work to be done: hopefully the jdqr and jdqz methods can largely be merged as they are very similar; there are some optimizations for Hermitian problems that aren’t yet implemented; and lastly the methods do not yet support generic vectors and numbers.

For an example of jdqz see the last section of this post.

IterativeSolvers.jl

My contributions to IterativeSolvers.jl include

ILU.jl

Computing the full LU decomposition of a large, sparse matrix requires too much memory and computational work. By simply dropping small terms during the LU factorization these problems can be remedied, at the expense of being inexact or incomplete (ILU). This ILU factorization can still be an excellent preconditioner to speed up iterative methods.

As ILU for the SparseMatrixCSC type was not yet available in Julia, I’ve implemented it based on the article “Crout versions of ILU for general sparse matrices” by Na Li, Yousef Saad and Edmond Chow.

The package is completely ready for use and is well tested. See below for an example.

In a later version the package can be improved by adding different dropping strategies.

Examples

Below you can find a few examples on how to use the packages I’ve been working on.

Jacobi-Davidson

Let’s take a look at a toy example of the generalized eigenvalue problem Ax=λBxAx = \lambda Bx where AA and BB are diagonal matrices of size n×nn \times n with Akk=kA_{kk} = \sqrt{k} and Bkk=1/kB_{kk} = 1 / \sqrt{k}. The eigenvalues are just the integers 1,,n1, \cdots, n.

We implement the action of the matrices AA and BB matrix-free, using LinearMaps.jl:

using LinearMaps

function myA!(y, x)
  for i = 1 : length(x)
    @inbounds y[i] = sqrt(i) * x[i]
  end
end

function myB!(y, x)
  for i = 1 : length(x)
    @inbounds y[i] = x[i] / sqrt(i)
  end
end

A = LinearMap{Complex128}(myA!, n; ismutating = true)
B = LinearMap{Complex128}(myB!, n; ismutating = true)

Next we fix n=100n = 100, set the number of requested eigenpairs to 5 and target the eigenvalues with largest real part, somewhere near 110.0110.0. We use a couple steps of BiCGStab(2) as an internal iterative solver, and set the dimension of the search space from 5 to 10 vectors:

using JacobiDavidson

n = 100
target = LR(110.0 + 0.0im) # Largest Real part

schur, resnorms = jdqz(A, B, 
    bicgstabl_solver(n, max_mv_products = 10, l = 2),
    target = target,
    pairs = 5,
    ɛ = 1e-9,
    min_dimension = 5,
    max_dimension = 10,
    max_iter = 100,
    verbose = true
)

println.(schur.alphas ./ schur.betas)
plot(resnorms, yscale = :log10, marker = :+, label = "Residual norm", xlabel = "Iteration")

It indeed finds the eigenvalues from 96 up to 100:

100.00000000000041 + 5.329070518200757e-14im
98.99999999999996 - 1.7674527390216706e-14im
98.00000000000013 - 3.542337114695263e-14im
97.00000000000024 + 1.749508591650104e-14im
96.00000000000013 + 1.3053503572900918e-14im

The plot shows the convergence history, which is the residual norm AxλBx\|Ax - \lambda Bx\| in each iteration:

summary/residualnorm.svg

Preconditioning

The problem gets harder when we increase nn a few orders of magnitude. Especially if we want to target eigenvalues in the interior of the spectrum (near n/2n / 2), GMRES and BiCGStab(l) might have trouble solving a very indefinite system.

In that case a preconditioner for (AτB)(A - \tau B) can be used, where τ\tau is the target. We will just use the exact inverse, which is a diagonal matrix PP with entries Pkk=k/(kτ)P_{kk} = \sqrt{k} / (k - \tau).

It can be implemented matrix-free and in-place:

import Base.LinAlg.A_ldiv_B!

struct SuperPreconditioner{numT <: Number}
    target::numT
end

function A_ldiv_B!(p::SuperPreconditioner, x)
    for i = 1 : length(x)
        @inbounds x[i] *= sqrt(i) / (i - p.target)
    end
end

Now we call Jacobi-Davidson with the Near target and pass the preconditioner. This time we use GMRES, but BiCGStab works as well.

n = 100_000
τ = 50_000.1 + 0im
target = Near(τ)
A = LinearMap{Complex128}(myA!, n; ismutating = true)
B = LinearMap{Complex128}(myB!, n; ismutating = true)
P = SuperPreconditioner(τ)

schur, residuals = jdqz(A, B, 
    gmres_solver(n, iterations = 10),
    preconditioner = P,
    target = target,
    pairs = 5,
    ɛ = 1e-9,
    min_dimension = 5,
    max_dimension = 10,
    max_iter = 200,
    verbose = true
)

It converges to the eigenvalues 49999, 50000, 50001, 50002 and 50004:

50004.00000000014 + 3.5749921718300463e-12im
49999.999999986496 - 7.348301591250897e-12im
50001.00000000359 - 1.9761169705101647e-11im
49998.99999999998 - 1.0866253642291695e-10im
50002.00000000171 - 2.3559720511618024e-11im

It does not yet detect 50003, but that might happen when pairs is increased a bit. What is more interesting is that the total number of iterations is small as a result of the preconditioner:

summary/residualnorm_2.svg

It’s not easy to construct a preconditioner this good for any given problem, but usually people tend to know what works well in specific classes of problems. If no specific preconditioner is availabe, you can always try a general one such as ILU. The next section illustrates that.

ILU example

As an example of how ILU can be used we generate a non-symmetric, banded matrix having a structure that typically arises in finite differences schemes of three-dimensional problems:

n = 64
N = n^3
A = spdiagm((fill(-1.0, n - 1), fill(3.0, n), fill(-2.0, n - 1)), (-1, 0, 1))
Id = speye(n)
A = kron(A, Id) + kron(Id, A)
A = kron(A, Id) + kron(Id, A)
x = ones(N)
b = A * x

The matrix AA has size 643×64364^3 \times 64^3. We want to solve the problem Ax=bAx = b using for instance BiCGStab(2), but it turns out that convergence can get slow when the size of the problem grows. A quick benchmark shows it takes about 2.0 seconds to solve the problem to a reasonable tolerance:

> using BenchmarkTools, IterativeSolvers
> my_x = @btime bicgstabl($A, $b, 2, max_mv_products = 2000);
2.051 s
> norm(b - A * my_x) / norm(b)
1.6967043606691152e-9

Now let’s construct the incomplete LU decomposition:

> using ILU
> LU = crout_ilu(A, τ = 0.1)
> nnz(LU) / nnz(A)
2.1180353639352374

Using the above drop tolerance τ\tau, we get an approximate LU decomposition using only about twice as many entries as the original matrix, which is reasonable. Let’s see what happens when we benchmark the solver again, now with ILU as a preconditioner:

> my_x = @btime bicgstabl($A, $b, 2, Pl = $LU, max_mv_products = 2000);
692.187 ms
> norm(b - A * my_x) / norm(b)
2.133397068536056e-9

It solves the problem 66% faster to the same tolerance. There is of course a caveat, as constructing the preconditioner itself takes time as well:

> LU = @btime crout_ilu($A, τ = 0.1);
611.019 ms

So all in all the problem is solved about 36% faster. However, if we have multiple right-hand sides for the same matrix, we can construct the preconditioner only once and use it multiple times. Even when the matrix changes slightly you could reuse the ILU factorization. The latter is exactly what happens in Jacobi-Davidson.