AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36
Open
rsanagap wants to merge 1503 commits into
Open
Conversation
- Overflow/underflow tests for sygvd/hegvd - Memory leak fixes for sygvd test cases - Enabling lapacke interfaces for sygvd/hegvd Change-Id: I79ec9c009e6ba52df17bc6247bb726e60193d5ed Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6037
… cases Fixed the copy matrix sizes in validate_gesdd/gesvd to avoid out of bound memory access while testing corner cases. for gesdd, jobz = O, m >= n, ldu = 1, m < n, ldvt = 1 for gesvd, jobu/vt = O, ldu/ldvt = 1 cases Fixed gtsv test2 under validate_gtsv(). Scaling down the residual by 10 times to fall in the expected threshold range as input matrix Xact is randomly generated. AMD-Internal: CPUPL-5926 Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia99bb6d81b76de394265ffded0069fb440de979f
details: Datatype alignment changes for the structures used in test suite Signed-off-by: ksaithar <katteboina.saitharun@amd.com> Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3
Components added: 1. Test run/validation 2. Negative test cases 3. Extreme test cases 4. Overflow/Underflow tests 5. Lapacke test Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5903 Change-Id: I38fa28ac0216740e0669e41509ca7870fd3adab8
1. Move block size computation to a separate function for each of the 4 types. 2. Optimal block sizes for various input sizes vary as OMP_NUM_THREADS is varied. Set optimal block sizes based on input size ranges only when OMP_NUM_THREADS=1 3. For small sizes, take the un-optimized path because with the optimized path there are regressions due to overhead of openmp calls. Gains obtained for single threaded runs - Upto 15% on genoa and 28% on turin Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-5876 Change-Id: I8fdeccdf0debdacec3913f8192711d86e9d62314
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e
- Added avx2, avx512 and parallel version for sgetrf Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-6060
Following features are implemented in this commit:
1. Library path and include path for aocl-utils and blis can be automatically inferred while building libflame if pkg-config files
for these libraries are available. Only works on Linux for now.
2. Various cmake configure/build/install/test/workflow presets.
3. Cmake presets for Windows (msvc and ninja). As of now test presets do not work!
4. Minimum cmake version upgraded to 3.26.0
Preset names follow the convention: <os>-<make/ninja>-<compiler>-<st/mt>-<lp/ilp>-<static/shared>-<isa-mode>-<other optional commands>
Usage:
$ cmake --build --list-presets
-- Without aocl-utils pkgconfig file
$ cmake --preset {chosen-preset} -DLIBAOCLUTILS_INCLUDE_PATH={aoclutils header path} -DLIBAOCLUTILS_LIBRARY_PATH={aoclutils library path}
-- With aocl-utils pkgconfig file
$ cmake --preset {chosen-preset}
$ cmake --build --preset {chosen-preset}
Build and test workflow
-- If aocl-utils and blis pkg-config files are available
$ cmake --workflow --preset {chosen-preset}
More info in BUILD.md
Change-Id: I8bc54b3eabed9a18c305e911df9aa76d8ff746d0
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-5862
Port Netlib Lapack-3.12 newly added double precision fortran files to c Files added : dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c. Netlib test for lapack-3.12 included. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-5708] Change-Id: I60b5c47505162882a19f2086e4842c858e0586e8
a. Added separate invoke functions in CPP for each API b. Added support for cmake and make c. Added support for --interface in cmd line d. Resolved warnings in existing interface header file e. Resolved errors/warnings in Windows f. Added ENABLE_CPP_TEST flag in cmake, make files to enable/disable CPP test interface. g. Updated Readme as per latest changes. h. Added CPP changes for 25+ test APIs. Change-Id: I6b77d24e204833134401c69813a3a1672de02c18
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision complex APIs Note: Retained lapack-3.11 zlaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: I40df2b270d82159cd0bc16a0158951139054b90a
Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com> AMD-Internal: CPUPL-6155 Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702
…ex New files Port Netlib Lapack-3.12 newly added double precision complex fortran files to c Files added : zgedmd.c, zgedmdq.c, zgeqp3rk.c, zlaqp2rk.c, zlaqp3rk.c, zrscl.c. Updating DTL logging in dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6150] Change-Id: Ia043bdeace2efc61fb25c64e47c2a45bdd8bda9c
Updated test code to display output status as INVALID_PARAM when
1) illegal inputs are passed
2) LAPACK API returns illegal input warning
NOTE: Existing behaviour is to display status = FAIL for these cases.
Modified FLA_TEST_CHECK_EINFO macro.
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
AMD-Internal: CPUPL-6250
Change-Id: I913fbef39f1f2e58142c21765033b2688de4585a
Port Netlib Lapack 3.12 FORTRAN code to C files for single precision APIs Note: Retained lapack-3.11 slaqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6149] Change-Id: Id4b96fd45e78246640ca681b5b9236bde09cab52
1. Optimized the DNRM2 blas api with avx2 and avx512 intrinsics. AMD Internal : [CPUPL-6122] Change-Id: I8d8822f5a300997bda3cee15b730489892d938f9
-> Removed unused variables. -> Initialized variables where they were not Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: If6e4b04a1d23b008812afa920efd226ac923e2c1
…iles Port Netlib Lapack-3.12 newly added single precision fortran files to c Files added : sgedmd.c, sgedmdq.c, sgeqp3rk.c, slaqp2rk.c, slaqp3rk.c. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6149] Change-Id: Idea38525527da5893fa61760f58e62953274708e
Updated residual calculation for least square APIs. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6065] Change-Id: I6db26fcd1738aa4cae1d92406823294bbc0e903d
Port Netlib Lapack 3.12 FORTRAN code to C files for single precision complex APIs Note: Retained lapack-3.11 claqr5.c, to overcome netlib test failures. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6151] Change-Id: I2d21d45f7a4e7840ac54620c61d21395bc4ac2cb
1. Removed netlib-test build from presets for windows. 2. flame pkgconfig file now has aocl-utils pc as requirement instead of specifying link flags. Change-Id: I3261b6d7526c866096cc8e2444f8c61ddf98db92 Signed-off-by: samahmad <Sameer.Ahmad@amd.com> AMD-Internal: CPUPL-5624
…ex New Files Port Netlib Lapack-3.12 newly added single precision fortran files to c Files added : cgedmd.c, cgedmdq.c, cgeqp3rk.c, claqp2rk.c, claqp3rk.c, crscl.c. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6151] Change-Id: I3bacb5a82b65d049b6250f565c301baa17b8ff77
Change-Id: I67b992d876fe7de77689b53223f0188df5579799 Signed-off-by: tprnaidu <tprnaidu@amd.com> AMD-Internal: CPUPL-5910
Fix for windows build warnings noticed after lapack 3.12 porting Signed-off-by: Venkatesha <vprasada@amd.com> Change-Id: Ic00400b12ad29327755aad7f02e2fe5b7c5703e7
1) Existing implementation is using SGESVD code path for sizes > 750. Removed the size condition checks in FLA_gesdd to use lapack code for all the sizes. 2) Removed redundant check function calls(sgesdd_check, dgesdd_check) in FLA_gesdd.c LAPACK_gesdd_real S, D precisions. Signed-off-by: dnikku <Deepika.Nikku@amd.com> Change-Id: Ia7ac39881d93792c08332f2050c36b99b20e3d24 AMD-Internal: CPUPL-6332
Current SYTRD implementation was resulting in wrong TAU values in certain tests. Enabled netlib lapack path for SSYTRD, SORGTR and SORMTR apis for "UPLO==L" AMD Internal : [CPUPL-6337] Change-Id: I92e49b28ac020e6d28af752b27d56db82424db7a
The commit has following changes:
1. Added test cases for lange API covering all norm types.
2. Overflow/underflow test cases for lange API are also
added.
3. For each test, there will be multiple lines of output
representing result for each norm type tested.
The output name would be as
"LANGEX", where X will be one of {M, 1, I, F}.
4. Updated gitignore to ignore build and editor files.
Change-Id: I3096653650e2a932a9fdf3303a31925bbba9cd99
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6260
Also added changes for ROT, GERQF in CPP interface headers. Note: By default, enabled CPP test flag (ENABLE_CPP_TEST) AMD-Internal: CPUPL-6268 Change-Id: Ic512f0f08a761f7f248a0b3960b64e538c9ded10
Port Netlib Lapack 3.12 remaining FORTRAN code to C files. Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6159] Change-Id: I7188b0e04cb756999c89eac7849d07ebaaa4cb7c
Added test code and validation for GECON APIs Signed-off-by: Venkatesha <vprasada@amd.com> AMD-Internal: [CPUPL-6203] Change-Id: Idc9f13805380836351a232d9ad2f8d5feb2c36f8
* Resolving merge conflicts * replaced GEMM switches with fla_invoke_gemm * BDSQR: Added conformance tests * BDSQR: Added Overflow/Underflow tests * BDSQR: Clang-format * U and VT are orthogonal, D and E are derived from GEBRD. Validation fixed * Used build_bidiagonal_matrix function from test_common * Used build_bidiagonal_matrix function from test_common * Added BRT support * Fixed BDSQR BRT Tests + Corrected GEJSV api config file test parameters * Resolving conflicts * resolving conflicts * used fla_mem_alloc in test_gejsv and addressed negative test failing in BDSQR * Removed trailing whitespaces. * U and VT from GEBRD/ORGBR + Added C matrix validations. * Used GETRI for C matrix validation + SVD reconstruction test * Fixed conformance tests and validation * Addressed segfault for lapacke_row * Removed redundant variables. * Correct matrix dimension allocation. * Reconstruction U and VT tests * Fixed zero residual issue * Fixed compute_matrix_norm call * Added compute_matrix_inverse to test_common * Removed reset_matrix calls * Fixed memory leaks
Address alignment of key x86 optimized functions called for DGELS. This improves the performance of small size problems for dgels.
Also removed warnings in CPP interface for deprecated APIs. Root cause: Residual value is NAN/INF as norm value is 0 and test case fails. Solution: Added prevention check, if norm value > safe_min before residual calculation. Added common eps (E, P), safe min to be used to compute residual changes of all APIs. AMD-Internal: CPUPL-7742
Fixed build error when libflame is built for AVX2 with ENABLE_AOCL_BLAS option. Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS kernels for zen4 and above. Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9 AMD-Internal: CPUPL-7475
* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N -> Updated validation formula for rank < n case. -> Updated fla_invoke_gemm function to have alpha and beta parameters. AMD-Internal: [CPUPL-7571]
…… (#220) AOCL-LAPACK: Fix for Reproducibility failures observed on POTRF, POTF2, POTRI, POTRS, GEBRD APIs -> aocl_fla_init() was not invoked before checking the minimum arch id, added this call for potrf and potri code. -> Fix for GEBRD reproducibility failures AMD-Internal: [CPUPL-7493] Co-authored-by: Reshi Krish <reshikrish.thangajawahar@amd.com>
Updated version string in the so_version file and wherever applicable.
We have removed support for auto-conf tools based build. Hence removing this file.
Co-authored-by: tprnaidu <tprnaidu@amd.com>
* Removed vector definition * formatting + copyright
Optimization in DORG2R to fix regression in DORGQR compared to AOCL-5.2. Optimization detail: Using internal fla_dscal function in place of BLAS dscal_ in DORG2R . Change-Id: I3e1632cbbafa6581773e635a946b7e28c55eaea5 AMD-Internal: CPUPL-7712
…219) AOCL LAPACK Test-suite: Added YAML files for all the available APIs Added new yaml files for all the existing APIs in main test for generating API wise ctests
Separate multiply operations, Each multiply result is rounded once Addition happens only after the two rounded intermediates exist This prevents the compiler from doing: FMA contraction (fused multiply add) reassociation (bb*cs + dd*sn) -> (bb*cs + dd*sn) in a different order evaluating in SIMD with different precision reordering that increases cancellation error Change-Id: I2075f13de94e6b4a5775f74a5d92d9607cec2c82
- Making the code design more consistent by removing changes done in the source to support netlib_tests after wrapper changes and adding required changes within netlib_tests itself. - Fixed wrong parameter types in wrapper code due to order error in Netlib's documentation - Fixed a bug in (c|z)geqp3rk, where an uninitialized max2nrm variable was used to determine control path. AMD-Internal: CPUPL-7996
- Fix for possible resource leaks. - Fix: 32 bit integer pointers wrongly type casted to 64 bit integer pointers. AMD-Internal: CPUPL-8018
Root cause: Due to very small precision values than safe min, computation on values becomes nan's. Solution: Added safemin (lamch) and checking value before computation of SGETRF, DGETRF APIs for avx2, avx512 kernels. AMD-Internal: CPUPL-8032
* AOCL LAPACK: Fix sgesdd performance gap Ported the slarf1f implementation from Netlib and integrated it into sgebrd and sgebd2 to resolve accuracy issues present in the existing slarf. Removed previous workaround changes in sgebrd that were added to mitigate this accuracy problem. * Added copyright and addressed copilot comments * Addressed review comments
Fix description: The entire FLAME.h header file was included within 'extern "C"' block. This also included external header files line 'omp.h'. To fix, removed the overall extern "C" block and added extern "C" to specific function prototype declarations that are supposed to be exposed by libFLAME. AMD-Internal: CPUPL-8062
* AOCL-LAPACK: Context initialization before min arch check. -> Included a call to aocl_fla_init() for initializing the context before any FLA_IS_MIN_ARCH_ID()
Root cause: When output pointers (C, S, or R) pointed to the same memory as input pointers (F or G), the routines wrote to an output and then read the same location as F or G. That read-after-write gave wrong results (e.g. R or S incorrect when c__==f, c__==g, or s==f). Fix: Copy inputs once at entry: f__t = *f, g__t = *g (and for complex, copy the full complex values). Use only f__t and g__t (and their addresses where a pointer is needed) for all logic and for writing outputs. Add input tracing (AOCL DTL) code.
* Refactored Main CMakeLists.txt Created New Module: `cmake/CompilerFlags.cmake` A well-organized, modular file (~220 lines) containing: - ISA configuration functions (AVX, AVX2, AVX512, auto-detection) - Compiler flag setup functions (security, warnings, language standards, debug) - Platform-specific configurations (Windows/Unix) - Special build mode support (ASAN, GCOV) All compiler flags are now in one dedicated, well-documented module * Resolve conflicts after rebase with target branch * Fix copyright and redundant status message in ASAN check * echo -n | git rebase --continue Update copyright year to 2026 in root CMakeLists.txt GIT_EDITOR=true git cherry-pick --continue * Remove duplicate ISA validation code from CMakeLists.txt The ISA validation and auto-detection logic was duplicated between CMakeLists.txt (lines 49-84) and cmake/CompilerFlags.cmake. This caused the auto_config.py script to run twice when LF_ISA_CONFIG=auto. Removed the duplicate code from CMakeLists.txt as it's now properly handled by the configure_compiler_flags() macro in CompilerFlags.cmake. Addresses code review feedback about redundant ISA processing. * Address Copilot code review feedback for CompilerFlags.cmake Implemented all 5 improvements suggested by GitHub Copilot: 1. Added error checking for auto_detect_isa_config(): - Capture stderr with ERROR_VARIABLE - Check CMD_RESULT and fail with detailed error message - Prevents silent failures when auto-detection script fails 2. Updated ISA validation error message: - Added 'none' to the list of valid values - Changed to lowercase format matching actual accepted values 3. Replaced add_definitions() with add_compile_options(): - Changed /arch:AVX512 and /arch:AVX2 to use add_compile_options() - More appropriate for compiler flags vs preprocessor definitions 4. Fixed GCOV flag handling: - Replaced add_definitions(--coverage) with add_compile_options(--coverage) - Added add_link_options(--coverage) for proper linker flag handling 5. Fixed Clang compiler ID check: - Now checks CMAKE_C_COMPILER_ID for C flags - Added separate check for CMAKE_CXX_COMPILER_ID for CXX flags - Prevents skipping C flags when only CXX compiler is Clang All changes improve code robustness, error handling, and CMake best practices. * Fix variable quoting in CMake if() conditions Address Copilot code review feedback: - Quote CMD_OUTPUT variable in auto_detect_isa_config() comparisons - Quote ISA_CONFIG variable in set_isa_compiler_flags() comparisons Unquoted variable expansions in if() conditions can break evaluation when variables are empty or contain unexpected characters. Quoting ensures robust CMake syntax parsing in all cases. This follows CMake best practices for variable expansion in conditionals.
Keeps the algorithm mathematically identical using the BLAS GEMV instead SDOT for consistency. FLA_hetrd.c: small-n fallback to unblocked reduction for accuracy with sytd2 Change-Id: Ib7d34067fff123507401a9112554737229b4f204
NAN found in output of LDLT factorization API (SSYTRF_ROOK) for specific inputs resolved. Issue resolved by updating SLASYF_ROOK with latest version from Netlib. AMD Internal: CPUPL-7986
- Added aocl_lapack_* files to CMakelists.txt to fix build error on windows. - Wrapper C files are compiled manually on Windows as adding C files to the same target was adding Fortran compiler flags to C compiler flags. AMD-Internal: CPUPL-8145
Updated version string to point to 5.3 release version. Corrected print format. Documentation related files also updated with latest version string.
Synchronize LICENSE, NOTICES, and source files to match the internal AOCL-5.3-RC branch exactly. This fixup accounts for differences accumulated from previous external-only commits. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Field,
Please review and merge them.
Thanks