Talk:Generalized minimal residual method

Why Krylov subspace?

It isn't clear to me why the Krylov subspace makes a good basis for the solution. —Ben FrantzDale 20:43, 19 January 2007 (UTC)[reply]

Are you asking why the Krylov subspace is a good subspace, or why a particular basis for the Krylov subspace is good? -- Jitse Niesen (talk) 06:30, 22 January 2007 (UTC)[reply]

I'm asking why the Krylov subspace is good. When I first heard about GMRES, I immediately thought of the power iteration algorithm for finding eigenvalues. That makes me think the Krylov subspaces will contain the directions that “matter most” for approximating the range of the operator, and since it will approximate eigenvectors it would also contain directions that “matter most” for the domain of the operator. Is that intuition reasonable? On the other hand, the fact that power iterations converge on the principle eigenvecctor makes it seem like the Krylov subspaces will basically igore the low-eigenvalue directions. (Perhaps that's tha good thing.) —Ben FrantzDale 13:17, 22 January 2007 (UTC)[reply]

Actually, I don't know that much about iterative methods, but I'll give your interesting question a try. Firstly, I think the Krylov subspace is a fairly natural space to consider; it's easy to construct and uses the problem data. The second argument is emperical: the method works. I think this is not a satisfactory answer, so if you ever find something better please let me know.

You're right that there is a connection with power iteration. Krylov methods indeed concentrate on the extremal eigenvalues (not just the largest eigenvalue, as power iteration, but also the small eigenvalues). Compare with the formula for the speed of convergence when the matrix is positive definite: it depends on the smallest and largest eigenvalues. -- Jitse Niesen (talk) 01:41, 23 January 2007 (UTC)[reply]

Thanks for the comment. I think I found an answer: If I wanted to construct an approximate solution, x, to

Ax=b

, I would want to do it with the eigenvectors of A, starting with the eigenvector with the largest eigenvalue, and going on in descending order. It appears that GMRES approximates this in that the Krylov subspace will quickly head off towards those principle eigenvectors and will eventually span the entire space. According to Arnoldi iteration:

The resulting vectors are a basis of the Krylov subspace,

{\mathcal {K}}_{n}

. We may expect the vectors of this basis to give good approximations of the eigenvectors corresponding to the

n

largest eigenvalues, for the same reason that

A^{n-1}b

approximates the dominant eigenvector.

So that basically answers the question.

Two things stood out for me as points of confusion. First, the idea of constructing an input to A out of the output of A seemed unnatural. (I'm thinking in terms of mechanics, so the output might be force when the input is position, making it extra counterintuitive.) Second, it seems like the solution will depend on the initial guess. If it is orthogonal to some of the eigenvectors, won't those eigenvectors never be seen in the result? (That is highly unlikely, I guess, but it could happen in theory, right? I suppose it's more likely when the matrix is ill-conditioned.) —Ben FrantzDale 03:45, 23 January 2007 (UTC)[reply]

Suppose A is non-singular. Take the characteristic polynomial c(x). By Caylay-Hamilton, c(A)=0, and c(0)=+-det(A). So it is

A(c_{1}+c_{2}A+\ldots +A^{n-1})=-c_{0}=(-1)^{n-1}\det(A)

.

From this a formula for the inverse matrix and thus a formula for the solution results

A^{-1}={\frac {(-1)^{n-1}}{\det(A)}}(c_{1}+c_{2}A+\ldots +A^{n-1})

.

That is, if there is a solution x to Ax=b, then it is a linear combination of b,Ab,AAb,... This proves the theoretical correctness of the algorithm and also shows why this particular Krylow space is used.

For the input-output-question one has to assume that some rescaling by an invertible matrix took place so that all entries are of the same unit or, after cancelling it out, are unitless.--LutzL 12:49, 21 September 2007 (UTC)[reply]

The main reason for using the Krylov subspace is that you have the residuals for all the Krylov subspace vectors already available from the previous iterations. So forming the minimum residual linear combination is very cheap (especially if you're using a limited history). Basically, it's just re-using the information already calculated previously to a maximum degree, and by this increasing the efficiency of the preconditioner (you can easily see where it went wrong and actually increased the residual norm!). That's all.141.70.179.56 (talk) —Preceding undated comment added 21:25, 28 March 2011 (UTC).[reply]

Pseudocode

I'm confused by the pseudocode. First, c and s are never initialized. Second, there seems to be no convergence criteria. So, is this to find the exact x after all n steps? If so, what good is it? JefeDeJefes 20:49, 17 January 2007 (UTC)[reply]

I don't understand your first point. All entries in c and s are initialized before they are used (in the first iteration of the j-loop, the i-loop is skipped because the sequence 1, … j-1 is empty). Your second point is spot on: the algorithm finds the exact x after n steps, and no, this is not very useful. The algorithm needs to be amended and cleaned up. -- Jitse Niesen (talk) 14:50, 30 March 2007 (UTC)[reply]

Pseudocode bug ?

in the implementation description, the Givens rotation contains the submatrix ((c,s),(-s,c)) in the pseudocode, the Givens rotation contains the submatrix ((c,s)(s,-c)) which version is correct ? 157.161.41.179 21:36, 23 June 2007 (UTC)[reply]

((c,s)(s,-c)) is not a rotation, so the version in the pseudocode is probably wrong. Together with the point raised in the previous section, this is enough for me to have my doubts about it (I copied it from the German article at de:GMRES-Verfahren without really checking it). So I removed it from the article and copied it below, until somebody can have a good look at it. -- Jitse Niesen (talk) 10:11, 24 June 2007 (UTC)[reply]

Moved from the article:

=== Pseudocode ===

Inputs: A, b, x₀.

$r_{0}\leftarrow b-Ax_{0}\,$
if $r_{0}=0$ then

return

x_{0}

end if
$v_{1}={\frac {r_{0}}{|r_{0}|_{2}}}\,$
$\gamma _{1}=|r_{0}|_{2}$
for j ← 1, …, n do

for i ← 1, …, j do

h_{ij}\leftarrow v_{i}^{\top }Av_{j}\,

end for

w_{j}\leftarrow Av_{j}-\sum _{i=1}^{j}h_{ij}v_{i}\,

h_{j+1,j}\leftarrow |w_{j}|_{2}\,

for i ← 1, …, j − 1 do

{\begin{pmatrix}h_{ij}\\h_{i+1,j}\end{pmatrix}}\leftarrow {\begin{pmatrix}c_{i+1}&s_{i+1}\\s_{i+1}&-c_{i+1}\end{pmatrix}}{\begin{pmatrix}h_{ij}\\h_{i+1,j}\end{pmatrix}}

end for

\beta \leftarrow {\sqrt {h_{jj}^{2}+h_{j+1,j}^{2}}}\,

s_{j+1}\leftarrow {\frac {h_{j+1,j}}{\beta }}\,

c_{j+1}\leftarrow {\frac {h_{jj}}{\beta }}\,

h_{jj}\leftarrow \beta \,

\gamma _{j+1}\leftarrow s_{j+1}\gamma _{j}\,

\gamma _{j}\leftarrow c_{j+1}\gamma _{j}\,

.

if

\gamma _{j+1}\neq 0

then

v_{j+1}\leftarrow {\frac {w_{j}}{h_{j+1,j}}}\,

else

for i ← j, …, 1 do

\alpha _{i}\leftarrow {\frac {1}{h_{ii}}}\left(\gamma _{i}-\sum _{k=i+1}^{j}h_{ik}\alpha _{k}\right)\,

end for

x\leftarrow x_{0}+\sum _{i=1}^{j}\alpha _{i}v_{i}\,

return x

end if

end for

Wrong definition of Krylov subspaces?

The text states that the n-th Krylov subspace is defined as

K_{n}=\operatorname {span} \,\{b,Ab,A^{2}b,\ldots ,A^{n-1}b\}\,

where $b$ is the right-hand side of the system we want to solve. I think this should be substituted by

K_{n}=\operatorname {span} \,\{r,Ar,A^{2}r,\ldots ,A^{n-1}r\}\,

where $r$ is the initial residual. Otherwise you could choose $b$ as one eigenvector and the initial iterate $x_{0}$ as a different one and your Krylov subspace certainly never contains the solution vector.

Or did I misunderstand something? —Preceding unsigned comment added by 195.176.179.209 (talk) 07:41, 23 December 2010 (UTC)[reply]

Response: I think that your objection is correct, so I changed to definition of the Krylov space accordingly. Note that the new definition reduces to the old one when the initial iterate x_0 is taken to be zero. Source: Iterative Methods for Nonlinear Equations, by C.T. Kelley. 64.134.243.230 (talk) 23:28, 30 July 2011 (UTC)particle25[reply]

intentional obfuscation -- somebody trying to show off with his knowledge of obscure math terms?!

I was able to understand the entire page regarding Conjugate Gradient Method just fine. But I'm not getting anything out of this page. I heard about Krylov Subspace before, but I still don't know what it is and why it needs to be mentioned here. Of course a set of not linear independent vectors creates a space -- but why does this need to be mentioned here?! And this is just one term on this page which is superfluous and potentially of interest to pure mathematicians (in contrary to applied mathematicians) only -- and they won't look on this page anyway. Everywhere on this page new expressions are introduced without explaining what they stand for! I think the first sentence on this page should mention what is being minimized by this method -- I was thinking |Ax-b|^2. — Preceding unsigned comment added by 192.73.228.23 (talk) 22:19, 12 December 2012 (UTC)[reply]

I absolutely agree. I'm just trying to get a little background info on this method so I understand how it's used to fit actual data, but I couldn't decipher all of the jargon in this page. Wikipedia is supposed to be for general public use, not for measuring manhoods and competing to see who can write the page so that the fewest possible people can understand it. --Zoharalson (talk) 20:39, 27 June 2013 (UTC)[reply]

Krylov subspace is the correct technical term for a subspace generated by the repeated application of a matrix to a vector. It is quite usual to express the analysis of gradient and quasi-Newton methods in terms of Krylov subspaces. You will find this term quite often if you are going to explore the inner workings of those methods.--LutzL (talk) 18:04, 28 June 2013 (UTC)[reply]

Error in convergence?

The english version states for symmetric positive matrices:

$\|r_{n}\|\leq \left({\frac {2\kappa _{2}(A)-1}{2\kappa _{2}(A)}}\right)^{n/2}\|r_{0}\|$

whereas boh the german and japanese versions state

$\|r_{m}\|_{2}\leq \left({\frac {\kappa _{2}^{2}(A)-1}{\kappa _{2}^{2}(A)}}\right)^{m/2}\|r_{0}\|_{2}.$

Which one is correct? — Preceding unsigned comment added by 193.54.112.22 (talk) 15:28, 7 June 2013 (UTC)[reply]

The German / Japanese versions are correct. To obtain this, simply start from the preceding bound and note that for the symmetric matrix $M$ , we have $\lambda _{\min }(M+M^{T})/2=\lambda _{\min }(M)=\sigma _{\min }(M)$ and $\|M\|=\lambda _{\max }(M)=\sigma _{\max }(M)$ . Finally, substituting $\kappa _{2}(M)=\sigma _{\max }(M)/\sigma _{\min }(M)$ yields the bound in the German / Japanese version. I've made the edit. 18.138.1.194 (talk) 19:41, 3 October 2015 (UTC)[reply]

History

I am familiar with the history of GMRES, but I have never heard DIIS mentioned. Is there a published literature review that mentions this relation? A quick reading of Pulay's 1980 paper shows he misses the Arnodli process and the underlying theory for the GMRES polynomial (although he gets tantalizing close). As it stands, calling GMRES a special case of DIIS is like calling the Wright Flyer a special case of Langley's Aerodrome. Jeffreyhokanson (talk) 16:13, 27 November 2013 (UTC)[reply]

iterative process -- is n incremented somewhere?

192.73.228.23 (talk) 00:09, 6 February 2014 (UTC)[reply]

Why is this a Krylov subspace method as it is not using a polynomial of A?

192.73.228.23 (talk) 00:15, 6 February 2014 (UTC)[reply]

assuming that n indicates the nth step, is the dimension of the sub problem increasing by one at every step?

192.73.228.23 (talk) 00:25, 6 February 2014 (UTC)[reply]

H_n is a time machine...

A*Q_n = Q_n+1*H_n

but Q_n+1 depends on x_n+1 which we are trying to calculate right now... — Preceding unsigned comment added by 192.73.228.23 (talk) 01:05, 6 February 2014 (UTC)[reply]

Posted code incorrect?

I have downloaded the given code and renamed the function/file into mygmres.m to avoid confusion with Matlab's build-in GMRES. If I then run a script

A = tril(ones(5));

b = ones(5,1);

x, e = mygmres(A, b, 0*b, 5, 1e-15);

norm(A*x - b)

log10(abs(e))

I get convergence in the residual but the returned x does not solve the linear system - the value of norm(A*x - b) is 5.4772, not zero(ish). Can anybody confirm this and may be identify the problem in the code? — Preceding unsigned comment added by 129.11.240.89 (talk) 14:05, 29 June 2018 (UTC)[reply]

I do get the right answer, is the problem you posted the one you also solved? Obviously tril(ones(5)) is invertible and x should be [1, 0, 0, 0, 0], but if you make x and b to test and happen to use a matrix which is notinvertible, the x which is found will not be the same as the one used. There is another implementation I have found online : http://www.netlib.org/templates/matlab/gmres.m (also not that rotmat in this implementation is given here: http://www.netlib.org/templates/matlab/rotmat.m, not the built in one). As far as I can see they give the same results.

Paragraph Method introduces expression y_n without explanation what it stands for

I hope my explanation in Generalized_minimal_residual_method#The_method section (paragraph starting with "Therefore,...") clears up the role of $y_{n}$ .

Why Krylov subspace?

Pseudocode

Pseudocode bug ?

Wrong definition of Krylov subspaces?

intentional obfuscation -- somebody trying to show off with his knowledge of obscure math terms?!

Error in convergence?

History

iterative process -- is n incremented somewhere?

Why is this a Krylov subspace method as it is not using a polynomial of A?

assuming that n indicates the nth step, is the dimension of the sub problem increasing by one at every step?

Hn is a time machine...

Posted code incorrect?

Paragraph Method introduces expression yn without explanation what it stands for

H_n is a time machine...

Paragraph Method introduces expression y_n without explanation what it stands for