Fix: unify DAV subspace diagonalization to use lapack_hegvx by Cstandardlib · Pull Request #7302 · deepmodeling/abacus-develop

Cstandardlib · 2026-04-30T13:58:38Z

Summary

Unify GPU and CPU serial subspace diagonalization to use container::kernels::lapack_hegvx, computing only the needed nband eigenpairs instead of all nbase eigenvalues
Remove ~30 lines of manual matrix save/restore code in CPU path (now handled internally by lapack_hegvx)
Add lapack_hegvx<T, DEVICE_GPU> GPU specialization for ROCm using CPU fallback, following the existing lapack_heevd pattern

Motivation

The DAV subspace diagonalization had inconsistent eigensolver calls:

GPU: lapack_hegvd (divide-and-conquer, ALL eigenvalues)
CPU: hegvx_op (expert driver, subset eigenvalues)

This caused wasted computation on GPU (computing all eigenvalues when only nband were needed) and inconsistent semantics. Now both paths explicitly request only the first nband eigenpairs.

Changes

File	Change
`rocm/lapack.hip.cu`	+33 lines: add `lapack_hegvx` GPU specialization with CPU fallback
`diago_dav_subspace.cpp`	+8/-45 lines: unify both paths to `lapack_hegvx`, remove save/restore boilerplate

Test Results

MODULE_HSOLVER_dav (double): 3/3 PASSED
MODULE_HSOLVER_dav_float (single): 3/3 PASSED
GPU integration scf_dav: 7/7 PASSED
GPU integration scf_dav_sub: 7/7 PASSED

Previously the GPU path used lapack_hegvd (all eigenvalues) while the CPU serial path used hegvx_op (subset eigenvalues), causing inconsistent behavior and wasted computation on GPU. This commit unifies both paths to use container::kernels::lapack_hegvx, which computes only the needed nband eigenpairs. The CPU serial path also drops ~30 lines of manual save/restore code since lapack_hegvx preserves input matrices internally. A GPU specialization of lapack_hegvx is added for ROCm using CPU fallback (D2H -> CPU solve -> H2D), following the existing pattern in lapack_heevd. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Copilot

Pull request overview

This PR unifies DAV subspace diagonalization across CPU and GPU to use container::kernels::lapack_hegvx, so only the required nband eigenpairs are computed (instead of all nbase), and adds an ROCm lapack_hegvx<T, DEVICE_GPU> specialization via a CPU fallback.

Changes:

Switch DAV subspace diagonalization (CPU + GPU) to lapack_hegvx with nband subset selection.
Remove CPU-side manual save/restore boilerplate around hegvx.
Add ROCm lapack_hegvx<T, DEVICE_GPU> specialization that falls back to CPU LAPACK by copying data host<->device.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`source/source_hsolver/diago_dav_subspace.cpp`	Uses `lapack_hegvx` for both CPU and GPU paths; removes manual matrix save/restore in CPU serial path.
`source/source_base/module_container/ATen/kernels/rocm/lapack.hip.cu`	Adds ROCm `lapack_hegvx` GPU specialization implemented via CPU fallback and host/device copies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T14:07:01Z

+        std::vector<Real> H_eigen_val(n);
+        std::vector<T> H_eigen_vec(n * lda);
+
+        hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
+        hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
+
+        lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());
+
+        hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * n, hipMemcpyHostToDevice));
+        hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * n * lda, hipMemcpyHostToDevice));


In this ROCm GPU specialization, the host buffers and memcpy sizes for outputs should match what lapack_hegvx actually produces: eigen_val is length m (see lapack_hegvx<T, DEVICE_CPU> usage in lapack_test.cpp), and eigen_vec is lda * m (Z is n-by-m). Copying n eigenvalues and n*lda eigenvector elements back can write past the caller’s allocated buffers when they size them to m (e.g., the hegvx unit test allocates E with neig entries). Please resize H_eigen_val to m and copy back only m values; likewise allocate/copy H_eigen_vec as lda * m (or at least only memcpy that many elements).

Suggested change

std::vector<Real> H_eigen_val(n);

std::vector<T> H_eigen_vec(n * lda);

hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));

hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));

lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());

hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * n, hipMemcpyHostToDevice));

hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * n * lda, hipMemcpyHostToDevice));

std::vector<Real> H_eigen_val(m);

std::vector<T> H_eigen_vec(lda * m);

hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));

hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));

lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());

hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * m, hipMemcpyHostToDevice));

hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * lda * m, hipMemcpyHostToDevice));

The CPU specialization of lapack_hegvx used 'L' (lower) uplo while the existing hegvx_op uses 'U' (upper), causing eigenvalue divergence in the pyabacus dav_subspace tests. Additionally, passing nullptr for IWORK and IFAIL during the workspace query can cause crashes with some LAPACK implementations. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

The GPU lapack_hegvx had a bug: it passed A/hcc directly to cuSOLVER hegvdx which overwrites the upper triangle, corrupting the subspace matrix for subsequent DAV iterations. Fixed by copying A to eigen_vec and backing up B before the call. The GPU path in diago_dav_subspace.cpp is kept on lapack_hegvd because cuSOLVER hegvdx correctly detects a pre-existing numerical issue (B not positive definite at higher orders) that hegvd silently ignores. Switching to hegvx on GPU is left for future work once the underlying subspace conditioning is addressed. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Copilot AI review requested due to automatic review settings April 30, 2026 13:58

Copilot started reviewing on behalf of Cstandardlib April 30, 2026 13:59 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Cstandardlib and others added 2 commits April 30, 2026 22:14

Cstandardlib closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: unify DAV subspace diagonalization to use lapack_hegvx#7302

Fix: unify DAV subspace diagonalization to use lapack_hegvx#7302
Cstandardlib wants to merge 3 commits intodeepmodeling:developfrom
Cstandardlib:fix/ds

Cstandardlib commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cstandardlib commented Apr 30, 2026

Summary

Motivation

Changes

Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants