Skip to content

Conversation

@alxbilger
Copy link
Contributor

@alxbilger alxbilger commented Jul 30, 2021

PR based on #2280.
Template specialization for void CompressedRowSparseMatrix<type::Mat<3,3,double> >::add(Index row, Index col, const type::Mat3x3d & _M) in order to accelerate insertion.
This allows to avoid branching in force fields, based on the type of the system matrix (dynamic_cast). I removed it in HexahedronFEMForceField, but it could be removed in other places. It allows also to automatically optimize bloc insertion in force fields that did not have the branches.

Benchmarks

List of benchmarks

  • BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble: insertion of 1000 3x3 blocs into a CRS made of double
  • BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs
  • BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, but insertion uses the fast function specialized for 3x3 CRS matrices. This is the fastest possible bloc insertion. It is actually used in the specialized function introduced by this PR, among other checks. Therefore, this speed is the goal to achieve for the specialized function.
  • BM_CRS_Fixture<double>/Add3x3BlocScalar_double: insertion of 1000 3x3 blocs into a CRS made of double using 9 individual scalar insertion
  • BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d : insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, using 9 individual scalar insertion. This is equivalent to what happens in BaseMatrix' bloc insertion, therefore it corresponds to the previous behavior of bloc insertion (before this PR).

Before

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble                                 75568 ns        75550 ns         9185
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d              55533 ns        54699 ns        12798
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d      12930 ns        12785 ns        49662
BM_CRS_Fixture<double>/Add3x3BlocScalar_double                              67780 ns        66811 ns        10488
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d        51334 ns        50603 ns        13884

After

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble                                 76223 ns        76266 ns         9132
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d              13781 ns        13808 ns        51026
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d      12434 ns        12458 ns        56014
BM_CRS_Fixture<double>/Add3x3BlocScalar_double                              66579 ns        66637 ns        10530
BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d        49684 ns        49713 ns        14274

Conclusion

The benchmarks show that insertion of 3x3 blocs is faster in 3x3 bloc-based CRS matrices than before (the test Add3x3Bloc_CRS3x3d). It goes almost at the same speed than the bloc insertion specialized for 3x3 CRS matrices (benchmark BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d).
The speed remains the same for CRS made doubles, which is expected.

TODO: explain the benchmarks and push them


By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).


Reviewers will merge this pull-request only if

  • it builds with SUCCESS for all platforms on the CI.
  • it does not generate new warnings.
  • it does not generate new unit test failures.
  • it does not generate new scene test failures.
  • it does not break API compatibility.
  • it is more than 1 week old (or has fast-merge label).

@alxbilger alxbilger added pr: enhancement About a possible enhancement pr: status to review To notify reviewers to review this pull-request labels Jul 30, 2021
@fredroy
Copy link
Contributor

fredroy commented Aug 4, 2021

It needs a rebase on master once #2280 is merged

@fredroy fredroy added pr: status wip Development in the pull-request is still in progress and removed pr: status to review To notify reviewers to review this pull-request labels Aug 4, 2021
@fredroy
Copy link
Contributor

fredroy commented Aug 5, 2021

[ci-build]

@fredroy fredroy added pr: status to review To notify reviewers to review this pull-request and removed pr: status wip Development in the pull-request is still in progress labels Aug 5, 2021
@fredroy
Copy link
Contributor

fredroy commented Aug 5, 2021

[ci-build][with-all-tests]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: enhancement About a possible enhancement pr: status ready Approved a pull-request, ready to be squashed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants