Skip to content

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36

Open
rsanagap wants to merge 1503 commits into
flame:masterfrom
amd:master
Open

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support#36
rsanagap wants to merge 1503 commits into
flame:masterfrom
amd:master

Conversation

@rsanagap
Copy link
Copy Markdown

@rsanagap rsanagap commented Jul 5, 2020

Field,

Please review and merge them.

Thanks

samahmad and others added 30 commits November 28, 2024 01:00
- Overflow/underflow tests for sygvd/hegvd
- Memory leak fixes for sygvd test cases
- Enabling lapacke interfaces for sygvd/hegvd

Change-Id: I79ec9c009e6ba52df17bc6247bb726e60193d5ed
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6037
… cases

Fixed the copy matrix sizes in validate_gesdd/gesvd to avoid out of bound
memory access while testing corner cases.
for gesdd, jobz = O, m >= n, ldu = 1, m < n, ldvt = 1
for gesvd, jobu/vt = O, ldu/ldvt = 1 cases

Fixed gtsv test2 under validate_gtsv(). Scaling down the residual
by 10 times to fall in the expected threshold range as input matrix Xact
is randomly generated.

AMD-Internal: CPUPL-5926
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: Ia99bb6d81b76de394265ffded0069fb440de979f
details: Datatype alignment changes for the structures used in test suite
Signed-off-by: ksaithar <katteboina.saitharun@amd.com>
Change-Id: I08ce86f5d642189b6f9142c74af41a633415b1f3
Components added:
1. Test run/validation
2. Negative test cases
3. Extreme test cases
4. Overflow/Underflow tests
5. Lapacke test

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com>
AMD-Internal: CPUPL-5903

Change-Id: I38fa28ac0216740e0669e41509ca7870fd3adab8
1. Move block size computation to a separate function for each of the
   4 types.
2. Optimal block sizes for various input sizes vary as OMP_NUM_THREADS
   is varied. Set optimal block sizes based on input size ranges only
   when OMP_NUM_THREADS=1
3. For small sizes, take the un-optimized path because with the
   optimized path there are regressions due to overhead of openmp
   calls.

Gains obtained for single threaded runs - Upto 15% on genoa and 28%
on turin

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com>
AMD-Internal: CPUPL-5876

Change-Id: I8fdeccdf0debdacec3913f8192711d86e9d62314
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision APIs

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-5708]
Change-Id: If2ddc85a9ad0818c96155945340b9cea23b40c8e
- Added avx2, avx512 and parallel version for sgetrf

Change-Id: I724cc5c9bf98f42014bcaf680a2fa7373195f10d
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6060
Following features are implemented in this commit:
1. Library path and include path for aocl-utils and blis can be automatically inferred while building libflame if pkg-config files
for these libraries are available. Only works on Linux for now.
2. Various cmake configure/build/install/test/workflow presets.
3. Cmake presets for Windows (msvc and ninja). As of now test presets do not work!
4. Minimum cmake version upgraded to 3.26.0

Preset names follow the convention: <os>-<make/ninja>-<compiler>-<st/mt>-<lp/ilp>-<static/shared>-<isa-mode>-<other optional commands>

Usage:
$ cmake --build --list-presets

-- Without aocl-utils pkgconfig file
$ cmake --preset {chosen-preset} -DLIBAOCLUTILS_INCLUDE_PATH={aoclutils header path} -DLIBAOCLUTILS_LIBRARY_PATH={aoclutils library path}

-- With aocl-utils pkgconfig file
$ cmake --preset {chosen-preset}

$ cmake --build --preset {chosen-preset}

Build and test workflow

-- If aocl-utils and blis pkg-config files are available
$ cmake --workflow --preset {chosen-preset}

More info in BUILD.md

Change-Id: I8bc54b3eabed9a18c305e911df9aa76d8ff746d0
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-5862
Port Netlib Lapack-3.12 newly added double precision fortran files to c
Files added : dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c.
Netlib test for lapack-3.12 included.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-5708]
Change-Id: I60b5c47505162882a19f2086e4842c858e0586e8
a. Added separate invoke functions in CPP for each API
b. Added support for cmake and make
c. Added support for --interface in cmd line
d. Resolved warnings in existing interface header file
e. Resolved errors/warnings in Windows
f. Added ENABLE_CPP_TEST flag in cmake, make files to enable/disable CPP test interface.
g. Updated Readme as per latest changes.
h. Added CPP changes for 25+ test APIs.

Change-Id: I6b77d24e204833134401c69813a3a1672de02c18
Port Netlib Lapack 3.12 FORTRAN code to C files for double precision complex APIs
Note: Retained lapack-3.11 zlaqr5.c, to overcome netlib test failures.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6150]
Change-Id: I40df2b270d82159cd0bc16a0158951139054b90a
Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com>
AMD-Internal: CPUPL-6155

Change-Id: Icbd84fdfa434875bf3bfd2072b09d8e77c326702
…ex New files

Port Netlib Lapack-3.12 newly added double precision complex fortran files to c
Files added : zgedmd.c, zgedmdq.c, zgeqp3rk.c, zlaqp2rk.c, zlaqp3rk.c, zrscl.c.
Updating DTL logging in dgedmd.c, dgedmdq.c, dgeqp3rk.c, dlaqp2rk.c, dlaqp3rk.c

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6150]
Change-Id: Ia043bdeace2efc61fb25c64e47c2a45bdd8bda9c
Updated test code to display output status as INVALID_PARAM when
1) illegal inputs are passed
2) LAPACK API returns illegal input warning

NOTE: Existing behaviour is to display status = FAIL for these cases.
      Modified FLA_TEST_CHECK_EINFO macro.

Signed-off-by: dnikku <Deepika.Nikku@amd.com>
AMD-Internal: CPUPL-6250
Change-Id: I913fbef39f1f2e58142c21765033b2688de4585a
Port Netlib Lapack 3.12 FORTRAN code to C files for single precision APIs
Note: Retained lapack-3.11 slaqr5.c, to overcome netlib test failures.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6149]
Change-Id: Id4b96fd45e78246640ca681b5b9236bde09cab52
   1. Optimized the DNRM2 blas api with avx2 and avx512 intrinsics.

AMD Internal : [CPUPL-6122]

Change-Id: I8d8822f5a300997bda3cee15b730489892d938f9
-> Removed unused variables.
-> Initialized variables where they were not

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: If6e4b04a1d23b008812afa920efd226ac923e2c1
…iles

Port Netlib Lapack-3.12 newly added single precision fortran files to c
Files added : sgedmd.c, sgedmdq.c, sgeqp3rk.c, slaqp2rk.c, slaqp3rk.c.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6149]
Change-Id: Idea38525527da5893fa61760f58e62953274708e
Updated residual calculation for least square APIs.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6065]
Change-Id: I6db26fcd1738aa4cae1d92406823294bbc0e903d
Port Netlib Lapack 3.12 FORTRAN code to C files for single precision complex APIs
Note: Retained lapack-3.11 claqr5.c, to overcome netlib test failures.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6151]
Change-Id: I2d21d45f7a4e7840ac54620c61d21395bc4ac2cb
1. Removed netlib-test build from presets for windows.
2. flame pkgconfig file now has aocl-utils pc as requirement
   instead of specifying link flags.

Change-Id: I3261b6d7526c866096cc8e2444f8c61ddf98db92
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-5624
…ex New Files

Port Netlib Lapack-3.12 newly added single precision fortran files to c
Files added : cgedmd.c, cgedmdq.c, cgeqp3rk.c, claqp2rk.c, claqp3rk.c, crscl.c.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6151]
Change-Id: I3bacb5a82b65d049b6250f565c301baa17b8ff77
Change-Id: I67b992d876fe7de77689b53223f0188df5579799
Signed-off-by: tprnaidu <tprnaidu@amd.com>
AMD-Internal: CPUPL-5910
Fix for windows build warnings noticed after lapack 3.12 porting

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: Ic00400b12ad29327755aad7f02e2fe5b7c5703e7
1) Existing implementation is using SGESVD code path for sizes > 750.
Removed the size condition checks in FLA_gesdd to use lapack code
for all the sizes.

2) Removed redundant check function calls(sgesdd_check, dgesdd_check)
in FLA_gesdd.c LAPACK_gesdd_real S, D precisions.

Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: Ia7ac39881d93792c08332f2050c36b99b20e3d24
AMD-Internal: CPUPL-6332
 Current SYTRD implementation was resulting in wrong TAU values in certain tests.
 Enabled netlib lapack path for SSYTRD, SORGTR and SORMTR apis for "UPLO==L"

AMD Internal : [CPUPL-6337]

Change-Id: I92e49b28ac020e6d28af752b27d56db82424db7a
The commit has following changes:

1. Added test cases for lange API covering all norm types.

2. Overflow/underflow test cases for lange API are also
   added.

3. For each test, there will be multiple lines of output
   representing result for each norm type tested.
   The output name would be as
   "LANGEX", where X will be one of {M, 1, I, F}.

4. Updated gitignore to ignore build and editor files.

Change-Id: I3096653650e2a932a9fdf3303a31925bbba9cd99
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6260
Also added changes for ROT, GERQF in CPP interface headers.
Note: By default, enabled CPP test flag (ENABLE_CPP_TEST)

AMD-Internal: CPUPL-6268

Change-Id: Ic512f0f08a761f7f248a0b3960b64e538c9ded10
Port Netlib Lapack 3.12 remaining FORTRAN code to C files.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6159]
Change-Id: I7188b0e04cb756999c89eac7849d07ebaaa4cb7c
Added test code and validation for GECON APIs

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6203]
Change-Id: Idc9f13805380836351a232d9ad2f8d5feb2c36f8
Yuvraj, Kunwar and others added 30 commits January 6, 2026 11:20
* Resolving merge conflicts

* replaced GEMM switches with fla_invoke_gemm

* BDSQR: Added conformance tests

* BDSQR: Added Overflow/Underflow tests

* BDSQR: Clang-format

* U and VT are orthogonal, D and E are derived from GEBRD. Validation fixed

* Used build_bidiagonal_matrix function from test_common

* Used build_bidiagonal_matrix function from test_common

* Added BRT support

* Fixed BDSQR BRT Tests + Corrected GEJSV api config file test parameters

* Resolving conflicts

* resolving conflicts

* used fla_mem_alloc in test_gejsv and addressed negative test failing in BDSQR

* Removed trailing whitespaces.

* U and VT from GEBRD/ORGBR + Added C matrix validations.

* Used GETRI for C matrix validation + SVD reconstruction test

* Fixed conformance tests and validation

* Addressed segfault for lapacke_row

* Removed redundant variables.

* Correct matrix dimension allocation.

* Reconstruction U and VT tests

* Fixed zero residual issue

* Fixed compute_matrix_norm call

* Added compute_matrix_inverse to test_common

* Removed reset_matrix calls

* Fixed memory leaks
Address alignment of key x86 optimized functions called for DGELS.
This improves the performance of small size problems for dgels.
Also removed warnings in CPP interface for deprecated APIs.

Root cause: Residual value is NAN/INF as norm value is 0 and test case fails.

Solution: Added prevention check, if norm value > safe_min before residual calculation.
Added common eps (E, P), safe min to be used to compute residual changes of all APIs.

AMD-Internal: CPUPL-7742
Fixed build error when libflame is built for AVX2 with
ENABLE_AOCL_BLAS option.

Fix: Used macro BLIS_KERNELS_ZEN4 to run direct BLIS
kernels for zen4 and above.

Change-Id: I19e4cea12becd9daac7dc85f4ce3c0ebb5d366c9
AMD-Internal: CPUPL-7475
* AOCL-LAPACK: Fix for GELSS validation failure for Rank < N

-> Updated validation formula for rank < n case.
-> Updated fla_invoke_gemm function to have alpha and beta parameters.

AMD-Internal: [CPUPL-7571]
…… (#220)

AOCL-LAPACK: Fix for Reproducibility failures observed on POTRF, POTF2, POTRI, POTRS, GEBRD APIs

-> aocl_fla_init() was not invoked before checking the minimum arch id, added this call for potrf and potri code.
-> Fix for GEBRD reproducibility failures

AMD-Internal: [CPUPL-7493]

Co-authored-by: Reshi Krish <reshikrish.thangajawahar@amd.com>
Updated version string in the so_version file and wherever applicable.
We have removed support for auto-conf tools based build.
Hence removing this file.
Co-authored-by: tprnaidu <tprnaidu@amd.com>
* Removed vector definition

* formatting + copyright
Optimization in DORG2R to fix regression in DORGQR compared to
AOCL-5.2.

Optimization detail: Using internal fla_dscal function in place of BLAS dscal_ in DORG2R .


Change-Id: I3e1632cbbafa6581773e635a946b7e28c55eaea5
AMD-Internal: CPUPL-7712
…219)

AOCL LAPACK Test-suite: Added YAML files for all the available APIs

Added new yaml files for all the existing APIs in main test
for generating API wise ctests
 Separate multiply operations, Each multiply result is rounded once
 Addition happens only after the two rounded intermediates exist
 This prevents the compiler from doing:
 FMA contraction (fused multiply add)
 reassociation (bb*cs + dd*sn) -> (bb*cs + dd*sn) in a different order
 evaluating in SIMD with different precision
 reordering that increases cancellation error

Change-Id: I2075f13de94e6b4a5775f74a5d92d9607cec2c82
- Making the code design more consistent by removing changes done in the source to support netlib_tests after wrapper changes and adding required changes within netlib_tests itself.

- Fixed wrong parameter types in wrapper code due to order error in Netlib's documentation

- Fixed a bug in (c|z)geqp3rk, where an uninitialized max2nrm variable was used
to determine control path.

AMD-Internal: CPUPL-7996
- Fix for possible resource leaks.
- Fix: 32 bit integer pointers wrongly type casted to 64 bit integer pointers. 

AMD-Internal: CPUPL-8018
Root cause: Due to very small precision values than safe min, computation on values becomes nan's.
Solution: Added safemin (lamch) and checking value before computation of SGETRF, DGETRF APIs for avx2, avx512 kernels.

AMD-Internal: CPUPL-8032
* AOCL LAPACK: Fix sgesdd performance gap

Ported the slarf1f implementation from Netlib and integrated it into sgebrd and
sgebd2 to resolve accuracy issues present in the existing slarf.

Removed previous workaround changes in sgebrd that were added to mitigate this accuracy problem.

* Added copyright and addressed copilot comments

* Addressed review comments
Fix description: The entire FLAME.h header file was included
within 'extern "C"' block. This also included external header
files line 'omp.h'. To fix, removed the overall extern "C"
block and added extern "C" to specific function prototype
declarations that are supposed to be exposed by libFLAME.

AMD-Internal: CPUPL-8062
* AOCL-LAPACK: Context initialization before min arch check.

-> Included a call to aocl_fla_init() for initializing the context before any FLA_IS_MIN_ARCH_ID()
Root cause:
When output pointers (C, S, or R) pointed to the same memory as input pointers (F or G), the routines wrote to an output and then read the same location as F or G. That read-after-write gave wrong results (e.g. R or S incorrect when c__==f, c__==g, or s==f).
Fix:
Copy inputs once at entry: f__t = *f, g__t = *g (and for complex, copy the full complex values).
Use only f__t and g__t (and their addresses where a pointer is needed) for all logic and for writing outputs.
Add input tracing (AOCL DTL) code.
* Refactored Main CMakeLists.txt

Created New Module: `cmake/CompilerFlags.cmake`
A well-organized, modular file (~220 lines) containing:
- ISA configuration functions (AVX, AVX2, AVX512, auto-detection)
- Compiler flag setup functions (security, warnings, language standards, debug)
- Platform-specific configurations (Windows/Unix)
- Special build mode support (ASAN, GCOV)
All compiler flags are now in one dedicated, well-documented module

* Resolve conflicts after rebase with target branch

* Fix copyright and redundant status message in ASAN check

* echo -n | git rebase --continue
Update copyright year to 2026 in root CMakeLists.txt

GIT_EDITOR=true git cherry-pick --continue

* Remove duplicate ISA validation code from CMakeLists.txt

The ISA validation and auto-detection logic was duplicated between
CMakeLists.txt (lines 49-84) and cmake/CompilerFlags.cmake.
This caused the auto_config.py script to run twice when LF_ISA_CONFIG=auto.

Removed the duplicate code from CMakeLists.txt as it's now properly
handled by the configure_compiler_flags() macro in CompilerFlags.cmake.

Addresses code review feedback about redundant ISA processing.

* Address Copilot code review feedback for CompilerFlags.cmake

Implemented all 5 improvements suggested by GitHub Copilot:

1. Added error checking for auto_detect_isa_config():
   - Capture stderr with ERROR_VARIABLE
   - Check CMD_RESULT and fail with detailed error message
   - Prevents silent failures when auto-detection script fails

2. Updated ISA validation error message:
   - Added 'none' to the list of valid values
   - Changed to lowercase format matching actual accepted values

3. Replaced add_definitions() with add_compile_options():
   - Changed /arch:AVX512 and /arch:AVX2 to use add_compile_options()
   - More appropriate for compiler flags vs preprocessor definitions

4. Fixed GCOV flag handling:
   - Replaced add_definitions(--coverage) with add_compile_options(--coverage)
   - Added add_link_options(--coverage) for proper linker flag handling

5. Fixed Clang compiler ID check:
   - Now checks CMAKE_C_COMPILER_ID for C flags
   - Added separate check for CMAKE_CXX_COMPILER_ID for CXX flags
   - Prevents skipping C flags when only CXX compiler is Clang

All changes improve code robustness, error handling, and CMake best practices.

* Fix variable quoting in CMake if() conditions

Address Copilot code review feedback:

- Quote CMD_OUTPUT variable in auto_detect_isa_config() comparisons
- Quote ISA_CONFIG variable in set_isa_compiler_flags() comparisons

Unquoted variable expansions in if() conditions can break evaluation
when variables are empty or contain unexpected characters. Quoting
ensures robust CMake syntax parsing in all cases.

This follows CMake best practices for variable expansion in conditionals.
  Keeps the algorithm mathematically identical using the BLAS GEMV instead SDOT for consistency.
  FLA_hetrd.c: small-n fallback to unblocked reduction for accuracy with sytd2

Change-Id: Ib7d34067fff123507401a9112554737229b4f204
NAN found in output of LDLT factorization API (SSYTRF_ROOK) for specific inputs resolved.
Issue resolved by updating SLASYF_ROOK with latest version from Netlib.

AMD Internal: CPUPL-7986
- Added aocl_lapack_* files to CMakelists.txt to fix build error on windows.
- Wrapper C files are compiled manually on Windows as adding C files to the same
   target was adding Fortran compiler flags to C compiler flags.

AMD-Internal: CPUPL-8145
Updated version string to point to 5.3 release version.
Corrected print format.
Documentation related files also updated with latest version string.
Synchronize LICENSE, NOTICES, and source files to match the
internal AOCL-5.3-RC branch exactly. This fixup accounts for
differences accumulated from previous external-only commits.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants