Just to clarify, for question 3, the requirement says “the memory requirements per processor should be limited to n/p matrix rows per processor”. But this does not seem possible because we also need to store a vector for multiplication and another vector for the final solution?
Yes, of course you will store the vectors.
I meant to restrict the matrix storage only.
Essentially, I emphasize that the matrix is distributed among processors.