Poor performance of mGGA functional in ABACUS for spin-polarized SCF

### Details

I have tested SCF (and relax) performance for mGGA functional in two simple system
1. diamond C
2. fcc Fe

mGGA used: 
1. SCAN
2. r2SCAN, r4SCAN, rppSCAN from libxc, which is `dft_functional  MGGA_X_R2SCAN+MGGA_C_R2SCAN` or ` MGGA_X_R4SCAN+MGGA_C_R2SCAN` or `MGGA_X_RPPSCAN+MGGA_C_RPPSCAN`
4. revTPSS from libxc which is `dft_functional MGGA_X_RTPSS+MGGA_C_REVTPSS`
5. M06L and MN15L from libxc, which is `dft_functional MGGA_X_REVM06_L+MGGA_C_REVM06_L` and `MGGA_X_MN15_L+MGGA_C_MN15_L`

example for my INPUT file:
```
INPUT_PARAMETERS RUNNING ABACUS-DFT

#Parameters (1.General)
suffix                  C  # suffix of OUTPUT DIR
nspin                   2   #  1/2/4 4 for SOC
symmetry                0   #  0/1  1 for open, default
esolver_type            ksdft  # ksdft, ofdft, sdft, tddft, lj, dp
dft_functional          MGGA_X_RPPSCAN+MGGA_C_RPPSCAN
ks_solver             genelpa  # default for ksdft-lcao
vdw_method              none  # none, d3, d3_bj
pseudo_dir              /lustre/home/2201110432/example/abacus/PP
orbital_dir             /lustre/home/2201110432/example/abacus/ORB

#Parameters (2.Iteration)
calculation             relax # scf relax cell-relax md
ecutwfc                 100
scf_thr                 1e-7
scf_nmax                300
relax_nmax              300
relax_method            bfgs  # cg, bfgs, cg_bfgs, sd, "fire"
force_thr_ev            0.01  # ev

#Parameters (3.Basis)
basis_type              lcao  # lcao or pw

#Parameters (4.Smearing)
smearing_method         gau    # mp/gaussian/fixed
smearing_sigma          0.001  # Rydberg

#Parameters (5.Mixing)
mixing_type             broyden  # pulay/broyden

#Parameters (6.Calculation)
cal_force          1
cal_stress         0
out_stru           1  # print STRU in OUT
out_chg            0  # print CHG or not
out_bandgap        1
out_mul            1  # print Mulliken charge and mag of atom in mulliken.txt
```
and KPT is `9 9 9`

Input file packages:
[C_diamond.tar.gz](https://github.com/deepmodeling/abacus-develop/files/13455122/C_diamond.tar.gz)
[Fe_bcc.tar.gz](https://github.com/deepmodeling/abacus-develop/files/13455125/Fe_bcc.tar.gz)


Results for C:
1. all SCAN and TPSS functional give 0.4 AMAG for diamond C, TMAG is surely ->0
2. require more than 100-200 SCF steps to converge to `scf_thr 1e-7`, where
     1. SCAN takes 256 SCF steps, but EDIFF is 5e-4 magnitude
     2. r2SCAN takes 237 SCF steps, but EDIFF is 1e-3 magnitude
     3. r4SCAN takes 300 SCF steps and only converge to `scf_thr 1e-6`, while EDIFF is 5e-4 magnitude
     4. rppSCAN takes 252 SCF steps, but EDIFF is 1e-3 magnitude
     5. TPSS takes 137 SCF steps, but EDIFF is 5e-4 magnitude
     6. M06L and MN15L do not converge in 300 steps
7. Even in converged case, EDIFF can only reach 1E-4 magnitude 

Results for Fe:
1. SCAN takes 417 SCF steps to reach `scf_thr 1e-6`, but EDIFF is 5e-4 magnitude. The TMAG and AMAG is 7,04 and 7.10 respectively
2. r2SCAN takes 147 SCF steps to reach  `scf_thr 1e-6`, but EDIFF is 5e-3 magnitude. The TMAG and AMAG is 7,06 and 7.18 respectively
3. r4SCAN takse 83 SCF steps to reach  `scf_thr 1e-6`, but EDIFF is 2e-2 magnitude. The TMAG and AMAG is 7.06 and 7.18 respectively.
4. rppSCAN takes 246 SCF steps to reach `scf_thr 1e-6`, but EDIFF is 4e-4 magnitude. The TMAG and AMAG is 7,06 and 7.16 respectively
8. TPSS takes 176 SCF steps to reach `scf_thr 1e-6` and 242 SCF steps to reach  `scf_thr 1e-7`, but EDIFF is 4e-4 magnitude, for two `scf_thr`. Also, The AMAG and TMAG is 4.30 and 4.52 respectively. for comparison, TMAG and AMAG from PBE is 4.33 and 4.43 respectively.
9. M06L and MN15L do not converge to even 1e-4 `drho`, and the magnetic moments is not converge to good result.

It is sure that SCAN functional have better performance than other functional, but is the phenomena that large number of SCF steps and low EDIFF while converge normal? Also, is the TMAG for diamond C a proper result ? 

System information:
- ABACUS: 3.4.3 commit f6db91c3d
- LibXC 6.2.2
- Intel-OneAPI MKL:2023.0
- ELPA 2023.05.001


### Task list for Issue attackers (only for developers)

- [ ] Reproduce the performance issue on a similar system or environment.
- [ ] Identify the specific section of the code causing the performance issue.
- [ ] Investigate the issue and determine the root cause.
- [ ] Research best practices and potential solutions for the identified performance issue.
- [ ] Implement the chosen solution to address the performance issue.
- [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
- [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance of mGGA functional in ABACUS for spin-polarized SCF #3260

Details

Task list for Issue attackers (only for developers)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor performance of mGGA functional in ABACUS for spin-polarized SCF #3260

Description

Details

Task list for Issue attackers (only for developers)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions