Interior eigenvalues in Jacobi-Davidson

From the Arnoldi method we know that Ritz values tend to converge to exterior eigenvalues; similar behaviour is observed in Jacobi-Davidson. This is an issue when restarting the method: a Ritz value close to a specified interior target is not necessarily a good approximation of an eigenvalue in that neighbourhood; the Ritz value might as well be on its way to an exterior eigenvalue, and in that case we would not want the Ritz vector to be retained at restart. An alternative is to use harmonic Ritz pairs.

June 6, 2017 - 14 minute read -
gsoc julia

Ritz pairs turn out to be good approximate eigenpairs when one is interested in exterior eigenpairs. If we are interested in interior eigenpairs (somewhere inside the convex hull of the spectrum), we must ensure that we do not restart Jacobi-Davidson with a set of Ritz vectors that correspond to Ritz values converging to exterior eigenvalues. The simple criterion of Ritz values being close to our target, might give us a low-quality search space at restart. Maybe we could make a better judgement about which Ritz vectors to keep if we would inspect their residuals, but that strategy would be far too expensive.

Shift-and-invert is a well-known trick to solve the above problem: rather than solving the eigenvalue problem Ax=λxAx = \lambda x for λ\lambda close to τ\tau, we reformulate the problem to solving (AτI)1x=θ1x(A - \tau I)^{-1}x = \theta^{-1} x with θ=λτ\theta = \lambda - \tau. The eigenvectors remain unchanged, but the eigenvalues close to our target are transformed to exterior eigenvalues. The message is that we do not change our methods, rather we change the problem itself.

However, changing the problem like this can be very costly: iterating with a matrix (AτI)1(A - \tau I)^{-1} requires solving a system each iteration. In fact, the Arnoldi method requires accurate solves because otherwise the underlying Krylov subspace would be inexact.

Harmonic Ritz values

The main question is whether we can get higher quality approximations to eigenvalues (without doing much work), so that we can restart with a search space that does not contain partly converged exterior eigenvectors.

One way to go is to look at the Ritz values of (AτI)1(A - \tau I)^{-1} with respect to a specially chosen subspace (AτI)V(A - \tau I)V, so that the (AτI)1(A - \tau I)^{-1} term drops out:

(AτI)1xθ1x(AτI)V for x=(AτI)Vy (A - \tau I)^{-1}x - \theta^{-1} x \perp (A - \tau I)V \text{ for } x = (A - \tau I)Vy

is equivalent to the requirement

(AτI)zθz(AτI)V for z=Vy. (A - \tau I)z - \theta z \perp (A - \tau I)V \text{ for } z = Vy.

The latter is a Petrov-Galerkin projection: the search space is VV and the test space is (AτI)V(A - \tau I)V. We refer to a pair (θ,Vy)(\theta, Vy) as a harmonic Ritz pair.

The important thing is that θ\theta is identical in both expressions. The second shows a practical means to compute it, while the first shows its interpretation as being a Ritz value of (AτI)1(A - \tau I)^{-1}.

Now the line of thought is as follows: if the Ritz values θ1\theta^{-1} converge from within the convex hull of the spectrum of (AτI)1(A - \tau I)^{-1} towards the exterior eigenvalues, then conversely its reciprocal θ\theta becomes small only when it approximates an eigenvalue of (AτI)(A - \tau I) well. This in turn means that θ+τ\theta + \tau can only be close to an eigenvalue of AA near τ\tau if it is nearly converged. And that is what we wanted to ensure at restart.

All in all we work with the practical Petrov-Galerkin projection. Let’s write W~:=(AτI)V\tilde{W} := (A - \tau I)V so that the last equation can be rewritten as the generalized eigenvalue problem

W~W~y=θW~Vy. \tilde{W}^*\tilde{W}y = \theta \tilde{W}^*Vy.

For stability reasons we like to work with orthonormal matrices. The Gram-Schmidt procedure allows us to perform the QR-decomposition of W~\tilde{W} into W~=WMA\tilde{W} = WM_A with WW orthonormal and MAM_A upper triangular. If we also define M:=WVM := W^*V similarly to what we had before, then the generalized eigenvalue problem becomes

MAy=θMy M_Ay = \theta My

assuming that MAM_A is invertible (which is the case a long as the columns of WW are independent).

So we can just obtain harmonic Ritz pairs by computing eigenpairs (θ,y)(\theta, y) from the small generalized eigenvalue problem, and transform them back to approximate eigenpairs by setting λ=τ+θ\lambda = \tau + \theta and z=Vzz = Vz.

In the implementation we will explicitly build these two small matrices MM and MAM_A.

Schur vectors vs eigenvectors

We have used a Schur decomposition in favour of an eigendecomposition of the low-dimensional problem to make it easier to shrink the search subspace. Let’s see how this works with harmonic Ritz values.

We compute the generalized Schur (QZ) decomposition MASR=SLTAM_AS^R = S^LT^A and MSR=SLTMS^R = S^LT for unitary matrices SR,SLS^R, S^L and upper triangular matrices TA,TT^A, T. The eigenvalues θ\theta are now TiiA/TiiT_{ii}^A / T_{ii} and can be reordered. In Julia it is the familiar function ordschur!. We must order them from small to large in absolute magnitude. It is not hard to verify that the first column s1Rs_1^R of SRS^R is an eigenvector with T11A/T11T_{11}^A / T_{11} as eigenvalue.

Now if the mminm_{min} harmonic Ritz values we want to keep are moved to the front, we can easily restart by updating both our search space as VVSR[:,1:m]V \leftarrow VS^R[:, 1 : m] and our test space as WWSL[:,1:m]W \leftarrow WS^L[:, 1 : m].

Rayleigh quotient vs harmonic Ritz value

Harmonic Ritz values are an excellent choice to determine which Ritz vectors to use at restart. They are however suboptimal for expansion. A few posts ago we wanted the residual rr to be perpendicular to the approximate eigenvector uu, so that when solving the correction equation (Iuu)(AϕI)t=r(I - uu^*)(A - \phi I)t = -r with a Krylov method, tt would automatically be perpendicular to uu.

With harmonic Ritz values this might not be the case. Therefore we can choose to use harmonic Ritz values θ\theta when it comes to shrinking the search space, yet use the Rayleigh quotients ϕ\phi when it comes to expanding it.

We obtain it by setting (u,r)=0(u, r) = 0 ; in our case u=Vs1Ru = Vs_1^R and r=(AτI)uϕur = (A - \tau I)u - \phi u. This gives after some manipulations that ϕ=T¯11T11A\phi = \bar{T}_{11} T_{11}^A, where the bar denotes complex conjugation.