Describe the bug
I conducted some verification tests using the provided test cases and have drawn the following conclusions:
1.For the IrO test case, when specifying nx, ny, nz as 64, 64, 64 (default is 54, 54, 40), the test with 64 processes runs normally.
2.Using the PW group's 001_4GaAs test case and specifying nx, ny, nzas 54, 54, 40, the calculation results deviate significantly from the reference values: TOTAL-PRESSURE 66.242796 kbar (-9.861349) and #TOTAL ENERGY#-7837.8815489 eV (-979.589643548583). Performance is also abnormal.
3.I verified this on both the x86 and FT platforms, and the conclusions are consistent.
The x86 platform used version 26 + fftw 3.3.10, and the FT platform used version 21 + fftw 3.3.7. Currently, it is suspected to be related to the MPI process partitioning. Could it be that when nz=40, partitioning across 64 processes for FFT calculations introduces errors in the final result? I noticed that the ABACUS code performs FFT calculations by partitioning based on nzacross different processes. How does it handle situations where some processes are not assigned any data?
Expected behavior
No response
To Reproduce
ABACUS-天河复现和Intel对比.zip
1.Unzip the example mentioned above and enter its directory.
2.Run the ABACUS calculation using 64 processes.
3.When the grid dimensions are 54, 54, 40, a performance anomaly occurs.
4.Performance returns to normal when modifying the grid to 64, 64, 64 or reducing the number of processes to below 40.
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
Describe the bug
I conducted some verification tests using the provided test cases and have drawn the following conclusions:
1.For the IrO test case, when specifying nx, ny, nz as 64, 64, 64 (default is 54, 54, 40), the test with 64 processes runs normally.
2.Using the PW group's 001_4GaAs test case and specifying nx, ny, nzas 54, 54, 40, the calculation results deviate significantly from the reference values: TOTAL-PRESSURE 66.242796 kbar (-9.861349) and #TOTAL ENERGY#-7837.8815489 eV (-979.589643548583). Performance is also abnormal.
3.I verified this on both the x86 and FT platforms, and the conclusions are consistent.
The x86 platform used version 26 + fftw 3.3.10, and the FT platform used version 21 + fftw 3.3.7. Currently, it is suspected to be related to the MPI process partitioning. Could it be that when nz=40, partitioning across 64 processes for FFT calculations introduces errors in the final result? I noticed that the ABACUS code performs FFT calculations by partitioning based on nzacross different processes. How does it handle situations where some processes are not assigned any data?
Expected behavior
No response
To Reproduce
ABACUS-天河复现和Intel对比.zip
1.Unzip the example mentioned above and enter its directory.
2.Run the ABACUS calculation using 64 processes.
3.When the grid dimensions are 54, 54, 40, a performance anomaly occurs.
4.Performance returns to normal when modifying the grid to 64, 64, 64 or reducing the number of processes to below 40.
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)