================ == LAPACK 3.1 == ================ Release date: Su 11/12/2006. * LAPACK 3.1: What's new * Contributor list * Developer list * Thanks * LAPACK subroutine interface policy * More details ============================= == LAPACK 3.1: What's new == ============================= 1) Hessenberg QR algorithm with the small bulge multi-shift QR algorithm together with aggressive early deflation. This is an implementation of the 2003 SIAM SIAG LA Prize winning algorithm of Braman, Byers and Mathias, that significantly speeds up the nonsymmetric eigenproblem. 2) Improvements of the Hessenberg reduction subroutines. These accelerate the first phase of the nonsymmetric eigenvalue problem. 3) New MRRR eigenvalue algorithms that also support subset computations. These implementations of the 2006 SIAM SIAG LA Prize winning algorithm of Dhillon and Parlett are also significantly more accurate than the version in LAPACK 3.0. 4) Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. These are prototype routines in the sense that their interfaces might changed based on user feedback. 5) New partial column norm updating strategy for QR factorization with column pivoting. This fixes a subtle numerical bug dating back to LINPACK that can give completely wrong results. 6) Thread safety: Removed all the SAVE and DATA statements (or provided alternate routines without those statements), increasing reliability on SMPs. 7) Additional support for matrices with NaN/subnormal elements, optimization of the balancing subroutines, improving reliability. 8) Several bug fixes. 9) Timing have been removed from the LAPACK package. The TIMING need to be updated with respect to the new algorithms in LAPACK. We are also renovating the way FLOPS are counted. ================== == Contributors == ================== 1) Hessenberg QR algorithm with the small bulge multi-shift QR algorithm together with aggressive early deflation. Karen Braman and Ralph Byers, Dept. of Mathematics, University of Kansas, USA 2) Improvements of the Hessenberg reduction subroutines. Daniel Kressner, Dept. of Mathematics, University of Zagreb, Croatia 3) New MRRR eigenvalue algorithms that also support subset computations Christof Voemel, Lawrence Berkeley National Laboratory, USA. 4) Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. Julie Langou, UTK, Julien Langou, CU Denver, Jack Dongarra, UTK. 5) New partial column norm updating strategy for QR factorization with pivoting. Zlatko Drmac and Zvonomir Bujanovic, Dept. of Mathematics, University of Zagreb, Croatia 6) Thread safety: Removed all the SAVE and DATA statements (or provided alternate routines without those statements) Sven Hammarling, NAG Ltd., UK 7) Additional support for matrices with NaN/subnormal elements, optimization of the balancing subroutines Bobby Cheng, MathWorks, USA ====================================== == Thanks for bug-report/patches to == ====================================== * Eduardo Anglada (Universidad Autonoma de Madrid, Spain) * David Barnes (University of Kent, England) * Alberto Garcia (Universidad del Pais Vasco, Spain) * Tim Hopkins (University of Kent, England) * Javier Junquera (CITIMAC, Universidad de Cantabria, Spain) * Mathworks: Penny Anderson, Bobby Cheng, Pat Quillen, Cleve Moler, Duncan Po, Bin Shi, Greg Wolodkin (MathWorks, USA) * George McBane (Grand Valley State University, USA) * Matyas Sustik (University of Texas at Austin, USA) * Michael Wimmer (Universität Regensburg, Germany) * Simon Wood (University of Bath, UK) and in more generally all the R developers =========================== = Principal Investigators = =========================== Jim Demmel (University or California at Berkeley, USA) Jack Dongarra (University of Tennessee and ORNL, USA) ================================================ == LAPACK developers involved in this release == ================================================ Ralph Byers (University of Kansas, USA) Zlatko Drmac (University of Zagreb, Croatia) Remi Delmas (University of Tennessee, USA) Sven Hammarling (NAG Ltd., UK) Yozo Hida (University of California at Berkeley, USA) Daniel Kressner (University of Zagreb, Croatia) Julie Langou (University of Tennessee, USA) Julien Langou (CU Denver, USA) Ren-Cang Li ( University of Texas at Arlington, USA) Xiaoye Li (Lawrence Berkeley Laboratory, USA) Osni Marques (Lawrence Berkeley Laboratory, USA) E. Jason Riedy (University of California at Berkeley, USA) Edward Smyth (NAG Ltd., UK) Christof Voemel (Lawrence Berkeley Laboratory, USA) ======================================== == LAPACK subroutine interface policy == ======================================== The interfaces to primary computational routines are fixed and will not be changed by minor LAPACK versions (e.g. 3.x). Primary routines are those prefixed by a precision and matrix type like SGERFS, CUNMQR, ZHEGV, etc., and these interfaces will remain the same for all LAPACK version 3 versions. Most routines labelled as auxiliary are implementation details for specific algorithms and may have their interfaces changed by minor versions. Auxiliary routines are those prefixed by a precision and LA like SLAQR3. Some auxiliary routines are of general use and are subject to the same fixed-interface policy as the primary computational routines. The current list of fixed-interface auxiliary routines is as follows, with x standing for a precision and yy standing for a matrix type: * ILAENV : Environmental parameters * SLAMCH, DLAMCH : Machine parameters * xLACON : Norm estimation, not thread-safe * xLACN2 : Norm estimation, thread-safe * xLANyy : Simple norm calculations (note: xLANEG is not related to these and may change) * xLASWP : Row interchanges * xLARF, xLARZ : Applying single elementary reflection This list may grow by user request. The interface change for this LAPACK 3.1 version are: SLAR1V SLARRB SLARRE SLARRF SLARRV DLAR1V DLARRB DLARRE DLARRF DLARRV ================= == More details = ================= ---------------------------------------------------------------------------- 1) Hessenberg QR algorithm with the small bulge multi-shift QR algorithm together with aggressive early deflation. ---------------------------------------------------------------------------- Contributors: ============= Karen Braman and Ralph Byers, Dept. of Mathematics, University of Kansas, USA, July 2006. Comments: ========= This is an implementation of the 2003 SIAM SIAG LA Prize winning algorithm of Braman, Byers and Mathias, that significantly speeds up the nonsymmetric eigenproblem. Changes: ======== M SRC/{c,d,s,z}gees.f M SRC/{c,d,s,z}geev.f M SRC/{c,d,s,z}geesx.f M SRC/{c,d,s,z}geevx.f A SRC/{c,d,s,z}laqr0.f A SRC/{c,d,s,z}laqr1.f A SRC/{c,d,s,z}laqr2.f A SRC/{c,d,s,z}laqr3.f A SRC/{c,d,s,z}laqr4.f A SRC/{c,d,s,z}laqr5.f M SRC/{c,d,s,z}hseqr.f M SRC/{c,d,s,z}lahqr.f M SRC/ilaenv.f A SRC/iparmq.f A SRC/{c,d,s,z}laqr0.f A SRC/{c,d,s,z}laqr1.f A SRC/{c,d,s,z}laqr2.f A SRC/{c,d,s,z}laqr3.f A SRC/{c,d,s,z}laqr4.f A SRC/{c,d,s,z}laqr5.f References: =========== [1] K. Braman, R. Byers and R. Mathias, The Multi-Shift QR Algorithm Part I: Maintaining Well Focused Shifts, and Level 3 Performance, SIAM Journal of Matrix Analysis, 23:929-947, 2002. [2] K. Braman, R. Byers and R. Mathias, The Multi-Shift QR Algorithm Part II: Aggressive Early Deflation, SIAM Journal of Matrix Analysis, 23:948-973, 2002. ---------------------------------------------------------------------------- 2) Improvements of the Hessenberg reduction subroutines. ---------------------------------------------------------------------------- Contributor: ============ Daniel Kressner, Dept. of Mathematics, University of Zagreb, Croatia. June 2006. Comments: ========= These accelerate the first phase of the nonsymmetric eigenvalue problem. Changes: ======== M SRC/{c,d,s,z}gehrd.f A SRC/{c,d,s,z}lahr2.f D SRC/{c,d,s,z}lahrd.f Reference: ========== [1] Gregorio Quintana-Orti and Robert van de Geijn, Improving the Performance of Reduction to Hessenberg Form. ACM Transactions on Mathematical Software, 32(2):180-194, June 2006. ---------------------------------------------------------------------------- 3) New MRRR eigenvalue algorithms that also support subset computations ---------------------------------------------------------------------------- Contributors: ============= Inderjit Dhillon, University of Texas at Austin, USA Beresford Parlett, Universtiy of California at Berkeley, USA Christof Voemel, Lawrence Berkeley Laboratory, USA July 2006. Changes: ======== New MRRR eigenvalue algorithms that also support subset computations. These implementations of the 2006 SIAM SIAG LA Prize winning algorithm of Dhillon and Parlett are also significantly more accurate than the version in LAPACK 3.0. References: =========== [1] Inderjit S. Dhillon and Beresford N. Parlett. Orthogonal Eigenvectors and Relative Gaps. SIAM J. Matrix Anal. Appl., 25(3):858-899, 2004. [1] Inderjit S. Dhillon, Beresford N. Parlett, and Christof Vomel. LAPACK Working Note 162. The Design and Implementation of the MRRR Algorithm. December, 2004. [3] Osni A. Marques, Beresford N. Parlett, and Christof Voemel. LAPACK Working Note 167. Subset Computations with the MRRR algorithm. August 2005. ---------------------------------------------------------------------------- 4) Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. ---------------------------------------------------------------------------- Contributors: ============= Julie Langou, UTK, Julien Langou, CU Denver, Jack Dongarra, UTK, June 2006. Comments: ========= This is a prototype routines in the sense that its interface might changed based on user feedback. Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. Changes: ======== A SRC/dsgesv.f A SRC/dlag2s.f A SRC/slag2d.f A SRC/zcgesv.f A SRC/zlag2c.f A SRC/clag2z.f References: =========== [1] Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra. LAPACK Working Note 176. Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems). June 2006. [2] Jakub Kurzak and Jack Dongarra. LAPACK Working Note 177. Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor. Sept 2006. ---------------------------------------------------------------------------- 5) New partial column norm updating strategy for QR factorization with pivoting. ---------------------------------------------------------------------------- Contributors: ============= Z. Drmac and Z. Bujanovic, Dept. of Mathematics, University of Zagreb, Croatia, June 2006. Comments: ========= This fixes a subtle numerical bug dating back to LINPACK that can give completely wrong results. Changes: ======== M SRC/{c,d,s,z}geqpf.f M SRC/{c,d,s,z}laqp2.f M SRC/{c,d,s,z}laqps.f Reference: ========== [1] Z. Drmac and Z. Bujanovic, LAPACK Working Note 176, On the failure of rank revealing QR factorization software - a case study, June 2006. ---------------------------------------------------------------------------- 6) Thread safe version of the LAPACK routines. ---------------------------------------------------------------------------- Contributor: ============= Sven Hammarling, NAG Ltd., UK, July 2005. Comments: ========= All the LAPACK routines are now thread safe except {s,d}LAMCH and the triplet {c,d,s,z}lacon.f, {d,s}lasq3.f and {d,s}lasq4.f. By thread safe, we mean that your LAPACK library wil be thread safe provided that your compiler is. Regarding the 10 routines still containing DATA or SAVE statement: {s,d}LAMCH can be manually replace by thread-safe routines (e.g. using F90). {c,d,s,z}lacon.f, {d,s}lasq3.f and {d,s}lasq4.f are left in LAPACK for backward compatibility, they are flagged as deprecated, and represent dead codes in this current release (v3.1). Changes: ======== D SRC/{c,d,s,z}lacon.f (D means deprecated) D SRC/{d,z}lasq3.f (D means deprecated) D SRC/{d,z}lasq4.f (D means deprecated) M SRC/{c,z}largv.f M SRC/{c,d,s,z}lartg.f M SRC/{d,s}laed6.f A SRC/{c,d,s,z}lacn2.f A SRC/{d,s}lazq3.f A SRC/{d,s}lazq4.f ---------------------------------------------------------------------------- 7) Additional support for matrices with NaN/subnormal elements, optimization of the balancing subroutines ---------------------------------------------------------------------------- Contributors: ============= Bobby Cheng, MathWorks, USA. July 2006. Changes: ======== M SRC/{c,d,s,z}gebal.f M SRC/{c,d,s,z}getf2.f M SRC/{d,s}lapy3.f M SRC/{d,s}sytf2.f M SRC/{c,z}hetf2.f References: =========== [1] see svn log: r146, r296. ---------------------------------------------------------------------------- 8) Several bug fixes and details. ---------------------------------------------------------------------------- Changes: ======== too long to list Comments: ========= add a subroutine to get the version number of a user's LAPACK add a licence and a copyright to LAPACK. Contributors: ============= Remi Delmas (UTK, USA) Sven Hammarling (NAG, UK) Daniel Kressner (University of Zagreb, Croatia) Julie Langou (UTK, USA) Julien Langou (CU Denver, USA) E. Jason Riedy (UCB, UK) Ren-Cang Li ( Department of Mathematics, University of Texas at Arlington, USA) Osni Marques (Lawrence Berkeley Laboratory, USA) References: =========== [1] too long to list, see svn logs for references. ---------------------------------------------------------------------------- .