Skip to content

Fix: unify DAV subspace diagonalization to use lapack_hegvx#7302

Closed
Cstandardlib wants to merge 3 commits intodeepmodeling:developfrom
Cstandardlib:fix/ds
Closed

Fix: unify DAV subspace diagonalization to use lapack_hegvx#7302
Cstandardlib wants to merge 3 commits intodeepmodeling:developfrom
Cstandardlib:fix/ds

Conversation

@Cstandardlib
Copy link
Copy Markdown
Collaborator

Summary

  • Unify GPU and CPU serial subspace diagonalization to use container::kernels::lapack_hegvx, computing only the needed nband eigenpairs instead of all nbase eigenvalues
  • Remove ~30 lines of manual matrix save/restore code in CPU path (now handled internally by lapack_hegvx)
  • Add lapack_hegvx<T, DEVICE_GPU> GPU specialization for ROCm using CPU fallback, following the existing lapack_heevd pattern

Motivation

The DAV subspace diagonalization had inconsistent eigensolver calls:

  • GPU: lapack_hegvd (divide-and-conquer, ALL eigenvalues)
  • CPU: hegvx_op (expert driver, subset eigenvalues)

This caused wasted computation on GPU (computing all eigenvalues when only nband were needed) and inconsistent semantics. Now both paths explicitly request only the first nband eigenpairs.

Changes

File Change
rocm/lapack.hip.cu +33 lines: add lapack_hegvx GPU specialization with CPU fallback
diago_dav_subspace.cpp +8/-45 lines: unify both paths to lapack_hegvx, remove save/restore boilerplate

Test Results

  • MODULE_HSOLVER_dav (double): 3/3 PASSED
  • MODULE_HSOLVER_dav_float (single): 3/3 PASSED
  • GPU integration scf_dav: 7/7 PASSED
  • GPU integration scf_dav_sub: 7/7 PASSED

Previously the GPU path used lapack_hegvd (all eigenvalues) while the
CPU serial path used hegvx_op (subset eigenvalues), causing
inconsistent behavior and wasted computation on GPU.

This commit unifies both paths to use container::kernels::lapack_hegvx,
which computes only the needed nband eigenpairs. The CPU serial path
also drops ~30 lines of manual save/restore code since lapack_hegvx
preserves input matrices internally.

A GPU specialization of lapack_hegvx is added for ROCm using CPU
fallback (D2H -> CPU solve -> H2D), following the existing pattern
in lapack_heevd.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Copilot AI review requested due to automatic review settings April 30, 2026 13:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR unifies DAV subspace diagonalization across CPU and GPU to use container::kernels::lapack_hegvx, so only the required nband eigenpairs are computed (instead of all nbase), and adds an ROCm lapack_hegvx<T, DEVICE_GPU> specialization via a CPU fallback.

Changes:

  • Switch DAV subspace diagonalization (CPU + GPU) to lapack_hegvx with nband subset selection.
  • Remove CPU-side manual save/restore boilerplate around hegvx.
  • Add ROCm lapack_hegvx<T, DEVICE_GPU> specialization that falls back to CPU LAPACK by copying data host<->device.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
source/source_hsolver/diago_dav_subspace.cpp Uses lapack_hegvx for both CPU and GPU paths; removes manual matrix save/restore in CPU serial path.
source/source_base/module_container/ATen/kernels/rocm/lapack.hip.cu Adds ROCm lapack_hegvx GPU specialization implemented via CPU fallback and host/device copies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +151 to +160
std::vector<Real> H_eigen_val(n);
std::vector<T> H_eigen_vec(n * lda);

hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));

lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());

hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * n, hipMemcpyHostToDevice));
hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * n * lda, hipMemcpyHostToDevice));
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this ROCm GPU specialization, the host buffers and memcpy sizes for outputs should match what lapack_hegvx actually produces: eigen_val is length m (see lapack_hegvx<T, DEVICE_CPU> usage in lapack_test.cpp), and eigen_vec is lda * m (Z is n-by-m). Copying n eigenvalues and n*lda eigenvector elements back can write past the caller’s allocated buffers when they size them to m (e.g., the hegvx unit test allocates E with neig entries). Please resize H_eigen_val to m and copy back only m values; likewise allocate/copy H_eigen_vec as lda * m (or at least only memcpy that many elements).

Suggested change
std::vector<Real> H_eigen_val(n);
std::vector<T> H_eigen_vec(n * lda);
hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());
hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * n, hipMemcpyHostToDevice));
hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * n * lda, hipMemcpyHostToDevice));
std::vector<Real> H_eigen_val(m);
std::vector<T> H_eigen_vec(lda * m);
hipErrcheck(hipMemcpy(H_A.data(), A, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
hipErrcheck(hipMemcpy(H_B.data(), B, sizeof(T) * n * lda, hipMemcpyDeviceToHost));
lapack_hegvx<T, DEVICE_CPU>()(n, lda, H_A.data(), H_B.data(), m, H_eigen_val.data(), H_eigen_vec.data());
hipErrcheck(hipMemcpy(eigen_val, H_eigen_val.data(), sizeof(Real) * m, hipMemcpyHostToDevice));
hipErrcheck(hipMemcpy(eigen_vec, H_eigen_vec.data(), sizeof(T) * lda * m, hipMemcpyHostToDevice));

Copilot uses AI. Check for mistakes.
Cstandardlib and others added 2 commits April 30, 2026 22:14
The CPU specialization of lapack_hegvx used 'L' (lower) uplo while
the existing hegvx_op uses 'U' (upper), causing eigenvalue divergence
in the pyabacus dav_subspace tests. Additionally, passing nullptr
for IWORK and IFAIL during the workspace query can cause crashes with
some LAPACK implementations.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The GPU lapack_hegvx had a bug: it passed A/hcc directly to
cuSOLVER hegvdx which overwrites the upper triangle, corrupting
the subspace matrix for subsequent DAV iterations. Fixed by
copying A to eigen_vec and backing up B before the call.

The GPU path in diago_dav_subspace.cpp is kept on lapack_hegvd
because cuSOLVER hegvdx correctly detects a pre-existing numerical
issue (B not positive definite at higher orders) that hegvd
silently ignores. Switching to hegvx on GPU is left for future
work once the underlying subspace conditioning is addressed.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants