[SofaSimpleFem] Simplify bloc-based optimization #2281

alxbilger · 2021-07-30T15:35:28Z

PR based on #2280.
Template specialization for void CompressedRowSparseMatrix<type::Mat<3,3,double> >::add(Index row, Index col, const type::Mat3x3d & _M) in order to accelerate insertion.
This allows to avoid branching in force fields, based on the type of the system matrix (dynamic_cast). I removed it in HexahedronFEMForceField, but it could be removed in other places. It allows also to automatically optimize bloc insertion in force fields that did not have the branches.

Benchmarks

List of benchmarks

BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble: insertion of 1000 3x3 blocs into a CRS made of double
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, but insertion uses the fast function specialized for 3x3 CRS matrices. This is the fastest possible bloc insertion. It is actually used in the specialized function introduced by this PR, among other checks. Therefore, this speed is the goal to achieve for the specialized function.
BM_CRS_Fixture<double>/Add3x3BlocScalar_double: insertion of 1000 3x3 blocs into a CRS made of double using 9 individual scalar insertion
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d : insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, using 9 individual scalar insertion. This is equivalent to what happens in BaseMatrix' bloc insertion, therefore it corresponds to the previous behavior of bloc insertion (before this PR).

Before

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble                                 75568 ns        75550 ns         9185
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d              55533 ns        54699 ns        12798
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d      12930 ns        12785 ns        49662
BM_CRS_Fixture<double>/Add3x3BlocScalar_double                              67780 ns        66811 ns        10488
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d        51334 ns        50603 ns        13884

After

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble                                 76223 ns        76266 ns         9132
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d              13781 ns        13808 ns        51026
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d      12434 ns        12458 ns        56014
BM_CRS_Fixture<double>/Add3x3BlocScalar_double                              66579 ns        66637 ns        10530
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d        49684 ns        49713 ns        14274

Conclusion

The benchmarks show that insertion of 3x3 blocs is faster in 3x3 bloc-based CRS matrices than before (the test Add3x3Bloc_CRS3x3d). It goes almost at the same speed than the bloc insertion specialized for 3x3 CRS matrices (benchmark BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d).
The speed remains the same for CRS made doubles, which is expected.

TODO: explain the benchmarks and push them

By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).

Reviewers will merge this pull-request only if

it builds with SUCCESS for all platforms on the CI.
it does not generate new warnings.
it does not generate new unit test failures.
it does not generate new scene test failures.
it does not break API compatibility.
it is more than 1 week old (or has fast-merge label).

fredroy · 2021-08-04T09:07:35Z

It needs a rebase on master once #2280 is merged

fredroy · 2021-08-05T08:14:46Z

[ci-build]

fredroy · 2021-08-05T15:59:01Z

[ci-build][with-all-tests]

alxbilger added pr: enhancement About a possible enhancement pr: status to review To notify reviewers to review this pull-request labels Jul 30, 2021

fredroy added pr: status wip Development in the pull-request is still in progress and removed pr: status to review To notify reviewers to review this pull-request labels Aug 4, 2021

alxbilger added 2 commits August 5, 2021 10:02

[SofaBaseLinearSolver] Overload add for bloc-based CRS matrices

4b71349

[SofaSimpleFem] Simplify bloc-based optimization

f48a128

fredroy force-pushed the crs_add branch from c41a1b9 to f48a128 Compare August 5, 2021 08:04

fix namespace

ebc8333

fredroy added pr: status to review To notify reviewers to review this pull-request and removed pr: status wip Development in the pull-request is still in progress labels Aug 5, 2021

hugtalbot mentioned this pull request Aug 18, 2021

Clean matrix-add to avoid template specialization in ForceField #2295

Closed

Update FEM_SparseLDLSolver.scn

7180fa5

hugtalbot added pr: status ready Approved a pull-request, ready to be squashed and removed pr: status to review To notify reviewers to review this pull-request labels Aug 18, 2021

hugtalbot merged commit 9863d96 into sofa-framework:master Aug 19, 2021

alxbilger mentioned this pull request Aug 25, 2021

[SofaBaseLinearSolver] CRS explicit instantiation #2306

Merged

guparan added this to the v21.12 milestone Oct 1, 2021

alxbilger deleted the crs_add branch June 28, 2022 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SofaSimpleFem] Simplify bloc-based optimization #2281

[SofaSimpleFem] Simplify bloc-based optimization #2281

Uh oh!

alxbilger commented Jul 30, 2021 •

edited

Loading

Uh oh!

fredroy commented Aug 4, 2021 •

edited

Loading

Uh oh!

fredroy commented Aug 5, 2021

Uh oh!

fredroy commented Aug 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SofaSimpleFem] Simplify bloc-based optimization #2281

[SofaSimpleFem] Simplify bloc-based optimization #2281

Uh oh!

Conversation

alxbilger commented Jul 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

List of benchmarks

Before

After

Conclusion

Uh oh!

fredroy commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fredroy commented Aug 5, 2021

Uh oh!

fredroy commented Aug 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alxbilger commented Jul 30, 2021 •

edited

Loading

fredroy commented Aug 4, 2021 •

edited

Loading