AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support by rsanagap · Pull Request #36 · flame/libflame

rsanagap · 2020-07-05T17:32:25Z

Field,

Please review and merge them.

Thanks

- Overflow/underflow tests for sygvd/hegvd - Memory leak fixes for sygvd test cases - Enabling lapacke interfaces for sygvd/hegvd Change-Id: I79ec9c009e6ba52df17bc6247bb726e60193d5ed Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6037

… cases Fixed the copy matrix sizes in validate_gesdd/gesvd to avoid out of bound memory access while testing corner cases. for gesdd, jobz = O, m >= n, ldu = 1, m < n, ldvt = 1 for gesvd, jobu/vt = O, ldu/ldvt = 1 cases Fixed gtsv test2 under validate_gtsv(). Scaling down the residual by 10 times to fall in the expected threshold range as input matrix Xact is randomly generated. AMD-Internal: CPUPL-5926 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia99bb6d81b76de394265ffded0069fb440de979f

details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3

Components added: 1. Test run/validation 2. Negative test cases 3. Extreme test cases 4. Overflow/Underflow tests 5. Lapacke test Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5903 Change-Id: I38fa28ac0216740e0669e41509ca7870fd3adab8

1. Move block size computation to a separate function for each of the 4 types. 2. Optimal block sizes for various input sizes vary as OMP_NUM_THREADS is varied. Set optimal block sizes based on input size ranges only when OMP_NUM_THREADS=1 3. For small sizes, take the un-optimized path because with the optimized path there are regressions due to overhead of openmp calls. Gains obtained for single threaded runs - Upto 15% on genoa and 28% on turin Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: I8fdeccdf0debdacec3913f8192711d86e9d62314

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e

- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060

Following features are implemented in this commit: 1. Library path and include path for aocl-utils and blis can be automatically inferred while building libflame if pkg-config files for these libraries are available. Only works on Linux for now. 2. Various cmake configure/build/install/test/workflow presets. 3. Cmake presets for Windows (msvc and ninja). As of now test presets do not work! 4. Minimum cmake version upgraded to 3.26.0 Preset names follow the convention: <os>-<make/ninja>-<compiler>-<st/mt>-<lp/ilp>-<static/shared>-<isa-mode>-<other optional commands> Usage: $ cmake --build --list-presets -- Without aocl-utils pkgconfig file $ cmake --preset {chosen-preset} -DLIBAOCLUTILS_INCLUDE_PATH={aoclutils header path} -DLIBAOCLUTILS_LIBRARY_PATH={aoclutils library path} -- With aocl-utils pkgconfig file $ cmake --preset {chosen-preset} $ cmake --build --preset {chosen-preset} Build and test workflow -- If aocl-utils and blis pkg-config files are available $ cmake --workflow --preset {chosen-preset} More info in BUILD.md Change-Id: I8bc54b3eabed9a18c305e911df9aa76d8ff746d0 Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5862

Port Netlib Lapack-3.12 newly added double precision fortran files to c Files added : dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c. Netlib test for lapack-3.12 included. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: I60b5c47505162882a19f2086e4842c858e0586e8

a. Added separate invoke functions in CPP for each API b. Added support for cmake and make c. Added support for --interface in cmd line d. Resolved warnings in existing interface header file e. Resolved errors/warnings in Windows f. Added ENABLE_CPP_TEST flag in cmake, make files to enable/disable CPP test interface. g. Updated Readme as per latest changes. h. Added CPP changes for 25+ test APIs. Change-Id: I6b77d24e204833134401c69813a3a1672de02c18

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision complex APIs Note: Retained lapack-3.11 zlaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: I40df2b270d82159cd0bc16a0158951139054b90a

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702

…ex New files Port Netlib Lapack-3.12 newly added double precision complex fortran files to c Files added : zgedmd.c, zgedmdq.c, zgeqp3rk.c, zlaqp2rk.c, zlaqp3rk.c, zrscl.c. Updating DTL logging in dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: Ia043bdeace2efc61fb25c64e47c2a45bdd8bda9c

Updated test code to display output status as INVALID_PARAM when 1) illegal inputs are passed 2) LAPACK API returns illegal input warning NOTE: Existing behaviour is to display status = FAIL for these cases. Modified FLA_TEST_CHECK_EINFO macro. Signed-off-by: dnikku <Deepika.Nikku@amd.com> AMD-Internal: CPUPL-6250 Change-Id: I913fbef39f1f2e58142c21765033b2688de4585a

Port Netlib Lapack 3.12 FORTRAN code to C files for single precision APIs Note: Retained lapack-3.11 slaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6149] Change-Id: Id4b96fd45e78246640ca681b5b9236bde09cab52

1. Optimized the DNRM2 blas api with avx2 and avx512 intrinsics. AMD Internal : [CPUPL-6122] Change-Id: I8d8822f5a300997bda3cee15b730489892d938f9

-> Removed unused variables. -> Initialized variables where they were not Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: If6e4b04a1d23b008812afa920efd226ac923e2c1

…iles Port Netlib Lapack-3.12 newly added single precision fortran files to c Files added : sgedmd.c, sgedmdq.c, sgeqp3rk.c, slaqp2rk.c, slaqp3rk.c. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6149] Change-Id: Idea38525527da5893fa61760f58e62953274708e

Updated residual calculation for least square APIs. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6065] Change-Id: I6db26fcd1738aa4cae1d92406823294bbc0e903d

Port Netlib Lapack 3.12 FORTRAN code to C files for single precision complex APIs Note: Retained lapack-3.11 claqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6151] Change-Id: I2d21d45f7a4e7840ac54620c61d21395bc4ac2cb

1. Removed netlib-test build from presets for windows. 2. flame pkgconfig file now has aocl-utils pc as requirement instead of specifying link flags. Change-Id: I3261b6d7526c866096cc8e2444f8c61ddf98db92 Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5624

…ex New Files Port Netlib Lapack-3.12 newly added single precision fortran files to c Files added : cgedmd.c, cgedmdq.c, cgeqp3rk.c, claqp2rk.c, claqp3rk.c, crscl.c. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6151] Change-Id: I3bacb5a82b65d049b6250f565c301baa17b8ff77

Change-Id: I67b992d876fe7de77689b53223f0188df5579799 Signed-off-by: tprnaidu <tprnaidu@amd.com> AMD-Internal: CPUPL-5910

Fix for windows build warnings noticed after lapack 3.12 porting Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: Ic00400b12ad29327755aad7f02e2fe5b7c5703e7

1) Existing implementation is using SGESVD code path for sizes > 750. Removed the size condition checks in FLA_gesdd to use lapack code for all the sizes. 2) Removed redundant check function calls(sgesdd_check, dgesdd_check) in FLA_gesdd.c LAPACK_gesdd_real S, D precisions. Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia7ac39881d93792c08332f2050c36b99b20e3d24 AMD-Internal: CPUPL-6332

Current SYTRD implementation was resulting in wrong TAU values in certain tests. Enabled netlib lapack path for SSYTRD, SORGTR and SORMTR apis for "UPLO==L" AMD Internal : [CPUPL-6337] Change-Id: I92e49b28ac020e6d28af752b27d56db82424db7a

The commit has following changes: 1. Added test cases for lange API covering all norm types. 2. Overflow/underflow test cases for lange API are also added. 3. For each test, there will be multiple lines of output representing result for each norm type tested. The output name would be as "LANGEX", where X will be one of {M, 1, I, F}. 4. Updated gitignore to ignore build and editor files. Change-Id: I3096653650e2a932a9fdf3303a31925bbba9cd99 Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6260

Also added changes for ROT, GERQF in CPP interface headers. Note: By default, enabled CPP test flag (ENABLE_CPP_TEST) AMD-Internal: CPUPL-6268 Change-Id: Ic512f0f08a761f7f248a0b3960b64e538c9ded10

Port Netlib Lapack 3.12 remaining FORTRAN code to C files. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6159] Change-Id: I7188b0e04cb756999c89eac7849d07ebaaa4cb7c

Added test code and validation for GECON APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6203] Change-Id: Idc9f13805380836351a232d9ad2f8d5feb2c36f8

* Resolving merge conflicts * replaced GEMM switches with fla_invoke_gemm * BDSQR: Added conformance tests * BDSQR: Added Overflow/Underflow tests * BDSQR: Clang-format * U and VT are orthogonal, D and E are derived from GEBRD. Validation fixed * Used build_bidiagonal_matrix function from test_common * Used build_bidiagonal_matrix function from test_common * Added BRT support * Fixed BDSQR BRT Tests + Corrected GEJSV api config file test parameters * Resolving conflicts * resolving conflicts * used fla_mem_alloc in test_gejsv and addressed negative test failing in BDSQR * Removed trailing whitespaces. * U and VT from GEBRD/ORGBR + Added C matrix validations. * Used GETRI for C matrix validation + SVD reconstruction test * Fixed conformance tests and validation * Addressed segfault for lapacke_row * Removed redundant variables. * Correct matrix dimension allocation. * Reconstruction U and VT tests * Fixed zero residual issue * Fixed compute_matrix_norm call * Added compute_matrix_inverse to test_common * Removed reset_matrix calls * Fixed memory leaks

Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.

Also removed warnings in CPP interface for deprecated APIs. Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Added prevention check, if norm value > safe_min before residual calculation. Added common eps (E, P), safe min to be used to compute residual changes of all APIs. AMD-Internal: CPUPL-7742

Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475

* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]

…… (#220) AOCL-LAPACK: Fix for Reproducibility failures observed on POTRF, POTF2, POTRI, POTRS, GEBRD APIs -> aocl_fla_init() was not invoked before checking the minimum arch id, added this call for potrf and potri code. -> Fix for GEBRD reproducibility failures AMD-Internal: [CPUPL-7493] Co-authored-by: Reshi Krish <reshikrish.thangajawahar@amd.com>

Updated version string in the so_version file and wherever applicable.

We have removed support for auto-conf tools based build. Hence removing this file.

Co-authored-by: tprnaidu <tprnaidu@amd.com>

* Removed vector definition * formatting + copyright

Optimization in DORG2R to fix regression in DORGQR compared to AOCL-5.2. Optimization detail: Using internal fla_dscal function in place of BLAS dscal_ in DORG2R . Change-Id: I3e1632cbbafa6581773e635a946b7e28c55eaea5 AMD-Internal: CPUPL-7712

…219) AOCL LAPACK Test-suite: Added YAML files for all the available APIs Added new yaml files for all the existing APIs in main test for generating API wise ctests

Separate multiply operations, Each multiply result is rounded once Addition happens only after the two rounded intermediates exist This prevents the compiler from doing: FMA contraction (fused multiply add) reassociation (bb*cs + dd*sn) -> (bb*cs + dd*sn) in a different order evaluating in SIMD with different precision reordering that increases cancellation error Change-Id: I2075f13de94e6b4a5775f74a5d92d9607cec2c82

- Making the code design more consistent by removing changes done in the source to support netlib_tests after wrapper changes and adding required changes within netlib_tests itself. - Fixed wrong parameter types in wrapper code due to order error in Netlib's documentation - Fixed a bug in (c|z)geqp3rk, where an uninitialized max2nrm variable was used to determine control path. AMD-Internal: CPUPL-7996

- Fix for possible resource leaks. - Fix: 32 bit integer pointers wrongly type casted to 64 bit integer pointers. AMD-Internal: CPUPL-8018

Root cause: Due to very small precision values than safe min, computation on values becomes nan's. Solution: Added safemin (lamch) and checking value before computation of SGETRF, DGETRF APIs for avx2, avx512 kernels. AMD-Internal: CPUPL-8032

* AOCL LAPACK: Fix sgesdd performance gap Ported the slarf1f implementation from Netlib and integrated it into sgebrd and sgebd2 to resolve accuracy issues present in the existing slarf. Removed previous workaround changes in sgebrd that were added to mitigate this accuracy problem. * Added copyright and addressed copilot comments * Addressed review comments

Fix description: The entire FLAME.h header file was included within 'extern "C"' block. This also included external header files line 'omp.h'. To fix, removed the overall extern "C" block and added extern "C" to specific function prototype declarations that are supposed to be exposed by libFLAME. AMD-Internal: CPUPL-8062

* AOCL-LAPACK: Context initialization before min arch check. -> Included a call to aocl_fla_init() for initializing the context before any FLA_IS_MIN_ARCH_ID()

Root cause: When output pointers (C, S, or R) pointed to the same memory as input pointers (F or G), the routines wrote to an output and then read the same location as F or G. That read-after-write gave wrong results (e.g. R or S incorrect when c__==f, c__==g, or s==f). Fix: Copy inputs once at entry: f__t = *f, g__t = *g (and for complex, copy the full complex values). Use only f__t and g__t (and their addresses where a pointer is needed) for all logic and for writing outputs. Add input tracing (AOCL DTL) code.

* Refactored Main CMakeLists.txt Created New Module: `cmake/CompilerFlags.cmake` A well-organized, modular file (~220 lines) containing: - ISA configuration functions (AVX, AVX2, AVX512, auto-detection) - Compiler flag setup functions (security, warnings, language standards, debug) - Platform-specific configurations (Windows/Unix) - Special build mode support (ASAN, GCOV) All compiler flags are now in one dedicated, well-documented module * Resolve conflicts after rebase with target branch * Fix copyright and redundant status message in ASAN check * echo -n | git rebase --continue Update copyright year to 2026 in root CMakeLists.txt GIT_EDITOR=true git cherry-pick --continue * Remove duplicate ISA validation code from CMakeLists.txt The ISA validation and auto-detection logic was duplicated between CMakeLists.txt (lines 49-84) and cmake/CompilerFlags.cmake. This caused the auto_config.py script to run twice when LF_ISA_CONFIG=auto. Removed the duplicate code from CMakeLists.txt as it's now properly handled by the configure_compiler_flags() macro in CompilerFlags.cmake. Addresses code review feedback about redundant ISA processing. * Address Copilot code review feedback for CompilerFlags.cmake Implemented all 5 improvements suggested by GitHub Copilot: 1. Added error checking for auto_detect_isa_config(): - Capture stderr with ERROR_VARIABLE - Check CMD_RESULT and fail with detailed error message - Prevents silent failures when auto-detection script fails 2. Updated ISA validation error message: - Added 'none' to the list of valid values - Changed to lowercase format matching actual accepted values 3. Replaced add_definitions() with add_compile_options(): - Changed /arch:AVX512 and /arch:AVX2 to use add_compile_options() - More appropriate for compiler flags vs preprocessor definitions 4. Fixed GCOV flag handling: - Replaced add_definitions(--coverage) with add_compile_options(--coverage) - Added add_link_options(--coverage) for proper linker flag handling 5. Fixed Clang compiler ID check: - Now checks CMAKE_C_COMPILER_ID for C flags - Added separate check for CMAKE_CXX_COMPILER_ID for CXX flags - Prevents skipping C flags when only CXX compiler is Clang All changes improve code robustness, error handling, and CMake best practices. * Fix variable quoting in CMake if() conditions Address Copilot code review feedback: - Quote CMD_OUTPUT variable in auto_detect_isa_config() comparisons - Quote ISA_CONFIG variable in set_isa_compiler_flags() comparisons Unquoted variable expansions in if() conditions can break evaluation when variables are empty or contain unexpected characters. Quoting ensures robust CMake syntax parsing in all cases. This follows CMake best practices for variable expansion in conditionals.

Keeps the algorithm mathematically identical using the BLAS GEMV instead SDOT for consistency. FLA_hetrd.c: small-n fallback to unblocked reduction for accuracy with sytd2 Change-Id: Ib7d34067fff123507401a9112554737229b4f204

NAN found in output of LDLT factorization API (SSYTRF_ROOK) for specific inputs resolved. Issue resolved by updating SLASYF_ROOK with latest version from Netlib. AMD Internal: CPUPL-7986

- Added aocl_lapack_* files to CMakelists.txt to fix build error on windows. - Wrapper C files are compiled manually on Windows as adding C files to the same target was adding Fortran compiler flags to C compiler flags. AMD-Internal: CPUPL-8145

Updated version string to point to 5.3 release version. Corrected print format. Documentation related files also updated with latest version string.

Synchronize LICENSE, NOTICES, and source files to match the internal AOCL-5.3-RC branch exactly. This fixup accounts for differences accumulated from previous external-only commits. Co-authored-by: Cursor <cursoragent@cursor.com>

samahmad and others added 30 commits November 28, 2024 01:00

AOCL LAPACK Test suite: Optimizing Memory Usage in structures

5065142

details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3

AOCL-LAPACK: Upgrade to LAPACK-3.12 Part 1 - Double Precision

f8de6a3

Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e

Optimisation porting to sgetrf

0aee1c5

- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060

AOCL-LAPACK Test-Suite: Add sytrf_rook test case

7fe4e04

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702

DGEQP3 perf gap optimization for 10x10 input size

b79916c

1. Optimized the DNRM2 blas api with avx2 and avx512 intrinsics. AMD Internal : [CPUPL-6122] Change-Id: I8d8822f5a300997bda3cee15b730489892d938f9

AOCL-LAPACK: Fix for build warning

ca29148

-> Removed unused variables. -> Initialized variables where they were not Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: If6e4b04a1d23b008812afa920efd226ac923e2c1

AOCL-LAPACK: Fix for GELS, GELSS, GELSD test failure

a0a5685

Updated residual calculation for least square APIs. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6065] Change-Id: I6db26fcd1738aa4cae1d92406823294bbc0e903d

AOCL-LAPACK: Fix to obfuscate absolute file paths in shared binaries

6a5c7eb

Change-Id: I67b992d876fe7de77689b53223f0188df5579799 Signed-off-by: tprnaidu <tprnaidu@amd.com> AMD-Internal: CPUPL-5910

AOCL-LAPACK: Fix for windows build warning

d50bb4c

Fix for windows build warnings noticed after lapack 3.12 porting Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: Ic00400b12ad29327755aad7f02e2fe5b7c5703e7

Fix Tau value compuatation in SYTRD

df1e487

Current SYTRD implementation was resulting in wrong TAU values in certain tests. Enabled netlib lapack path for SSYTRD, SORGTR and SORMTR apis for "UPLO==L" AMD Internal : [CPUPL-6337] Change-Id: I92e49b28ac020e6d28af752b27d56db82424db7a

CPP changes for 10 more APIs in main test suite.

cdf5f1d

Also added changes for ROT, GERQF in CPP interface headers. Note: By default, enabled CPP test flag (ENABLE_CPP_TEST) AMD-Internal: CPUPL-6268 Change-Id: Ic512f0f08a761f7f248a0b3960b64e538c9ded10

AOCL-LAPACK: Upgrade to LAPACK - 3.12 : Part 5 - Remaining files

7765761

Port Netlib Lapack 3.12 remaining FORTRAN code to C files. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6159] Change-Id: I7188b0e04cb756999c89eac7849d07ebaaa4cb7c

AOCL-LAPACK : Test Suite - Addition of test case for GECON

3f6ff80

Added test code and validation for GECON APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6203] Change-Id: Idc9f13805380836351a232d9ad2f8d5feb2c36f8

Yuvraj, Kunwar and others added 30 commits January 6, 2026 11:20

Fix for DGELS Regression

69cb2b6

Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.

AOCL-LAPACK: Build Error Fix for AVX2 (#213)

365a579

Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475

AOCL-LAPACK: Fix for GELSS validation failure for Rank < N (#167)

cc113f7

* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]

zen5 model update

7db13e8

Version String Update to 5.2

ffd88d9

Updated version string in the so_version file and wherever applicable.

Remove unused configure_tidsp file

badddac

We have removed support for auto-conf tools based build. Hence removing this file.

AOCL-LAPACK: Version string update as 5.2.2 (#234)

7315582

Co-authored-by: tprnaidu <tprnaidu@amd.com>

Updated LICENSE and NOTICES for 5.2.2 release

55e0dd7

Merging changes from internal repo

ed29a1c

Fix: GEQPF compilation bug with AOCC (#228)

ddd32b5

* Removed vector definition * formatting + copyright

AOCL LAPACK: DORGQR Optimization (#222)

b73f289

Optimization in DORG2R to fix regression in DORGQR compared to AOCL-5.2. Optimization detail: Using internal fla_dscal function in place of BLAS dscal_ in DORG2R . Change-Id: I3e1632cbbafa6581773e635a946b7e28c55eaea5 AMD-Internal: CPUPL-7712

AOCL LAPACK Testsuite: Added YAML files for all the available APIs (#…

1fd7726

…219) AOCL LAPACK Test-suite: Added YAML files for all the available APIs Added new yaml files for all the existing APIs in main test for generating API wise ctests

AOCL-LAPACK: Coverity Fixes (#238)

38add19

- Fix for possible resource leaks. - Fix: 32 bit integer pointers wrongly type casted to 64 bit integer pointers. AMD-Internal: CPUPL-8018

AOCL-LAPACK: Context initialization before min arch check. (#236)

3dcdc5c

* AOCL-LAPACK: Context initialization before min arch check. -> Included a call to aocl_fla_init() for initializing the context before any FLA_IS_MIN_ARCH_ID()

SSYEVD: EIG value error due to denormal number hit

4446097

Keeps the algorithm mathematically identical using the BLAS GEMV instead SDOT for consistency. FLA_hetrd.c: small-n fallback to unblocked reduction for accuracy with sytd2 Change-Id: Ib7d34067fff123507401a9112554737229b4f204

Single Precision LDL issue fix (#256)

ce3085c

NAN found in output of LDLT factorization API (SSYTRF_ROOK) for specific inputs resolved. Issue resolved by updating SLASYF_ROOK with latest version from Netlib. AMD Internal: CPUPL-7986

Version string update for 5.3

4531286

Updated version string to point to 5.3 release version. Corrected print format. Documentation related files also updated with latest version string.

Align codebase with AOCL-5.3-RC for 5.3 release

90f5120

Synchronize LICENSE, NOTICES, and source files to match the internal AOCL-5.3-RC branch exactly. This fixup accounts for differences accumulated from previous external-only commits. Co-authored-by: Cursor <cursoragent@cursor.com>

Removed old files and added new

954fc26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36
rsanagap wants to merge 1503 commits into
flame:masterfrom
amd:master

rsanagap commented Jul 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

rsanagap commented Jul 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants