This release adds beta versions of simple threaded GEMV & GER. It adds threaded L2 testing to the tester. It fixes a bug in axpby where it called SCAL with alpha=0, which fixes GEMM error for BETA=0 case. It fixes several simple buffer overruns in the full tester. It adds dynamically scheduled tgemm, which is used whenever all dimensions are large. It adds support for complex types for both dynamic cases (rank-K, large). It fixes several errors in GEMM that occurred when K dim was cut.
This release adds AVX support (mostly impacts newer Intel and AMD CPUs). It merges Windows threads, POSIX threads, and OpenMP into the same codebase. It adds a dynamically-scheduled rank-K update to the threaded GEMM (improving load-balancing between CPU cores). It has a complete rewrite of all threaded routines to use goparallel, and thus dynamic spawn. An ATL_thread_yield function has been added. If affinity is not set, dynamic functions now yield thread execution when waiting for their peers to signal completion of a stage. There are assorted bugfixes.
Extensive bugfixes were made. Preliminary support was added for threaded/parallel lapack. The architecture definitions were updated for P4ESSE3, PPCG564AltiVec, Core264SSE, and AMD64K10h64SSE3. PCA codes were added for LU and QR.