This software is a benchmarking tool that delivers high performance and operates with high parallelism.
The HPL algorithm is characterized by a two-dimensional block-cyclic data distribution, a right-looking variant of the LU factorization with row partial pivoting, and a recursive panel factorization that combines pivot search and column broadcast. It also utilizes various virtual panel broadcast topologies and a bandwidth-reducing swap-broadcast algorithm. Additionally, backward substitution with a look-ahead of depth 1 is implemented.
To accurately benchmark and quantify the performance of HPL, it comes with a testing and timing program that measures the accuracy of the obtained solution and the time it takes to compute it. However, the best performance achievable by the software largely depends on several factors. Fortunately, the algorithm and its attached implementation are scalable, making it possible to maintain parallel efficiency even when per processor memory usage increases.
To use the HPL software package, an implementation of the Message Passing Interface MPI (1.1 compliant) is required, along with an implementation of either the Basic Linear Algebra Subprograms BLAS or the Vector Signal Image Processing Library VSIPL. Fortunately, both machine-specific and generic implementations of MPI, the BLAS, and VSIPL are available for various systems.
Notably, this software package was supported, in part, by a grant from the Department of Energy's Lawrence Livermore National Laboratory and Los Alamos National Laboratory as part of the ASCI Projects contract numbers B503962 and 12187-001-00 4R.
Version 2.0: N/A