What are some techniques for validating the result obtained from a large scale parallel computing network? Can we only check theoretically that the code is correct?
This is a good question. Often, when we want to validate the result obtained in a large parallel system, for a problem that cannot be run on a single processor system,
we make a case of the problem for which we know the solution.
This will test some aspects of the parallel code, but not everything.
For a simple example, if the matrix is a matrix of ones and the vector is [1, 2, …, n]^T,
then we know sum_k (k) = n(n+1)/2, and this should be the result of matrix*vector (all components).
You can also go the other way, for example pick a matrix with rows [1, 2, …, n] and a vector of ones.
Of course, you want smarter scenarios, with more variability in the matrix and/or vector.
You can employ formulae such as sum_k (k^2) = n(n+1)(2n+1)/6 to get a bit more variability. Or employ various scaling factors for rows or columns.
Other techniques include matrices with known eigenvectors/values. Then Ax = lx, and so on.
While there isn’t some general technique that can cover all cases for all types of problems tested, we try to check some “known” cases for the particular problem in question. Once “enough” such tests have passed for this problem, we apply the code to other cases for which the result is not known.