[SofaSimpleFem] Simplify bloc-based optimization #2281
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR based on #2280.
Template specialization for
void CompressedRowSparseMatrix<type::Mat<3,3,double> >::add(Index row, Index col, const type::Mat3x3d & _M)in order to accelerate insertion.This allows to avoid branching in force fields, based on the type of the system matrix (
dynamic_cast). I removed it inHexahedronFEMForceField, but it could be removed in other places. It allows also to automatically optimize bloc insertion in force fields that did not have the branches.Benchmarks
List of benchmarks
BM_CRS_Fixture<double>/Add3x3Bloc_CRSdouble: insertion of 1000 3x3 blocs into a CRS made of doubleBM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3Bloc_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocsBM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, but insertion uses the fast function specialized for 3x3 CRS matrices. This is the fastest possible bloc insertion. It is actually used in the specialized function introduced by this PR, among other checks. Therefore, this speed is the goal to achieve for the specialized function.BM_CRS_Fixture<double>/Add3x3BlocScalar_double: insertion of 1000 3x3 blocs into a CRS made of double using 9 individual scalar insertionBM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocScalar_CRS3x3d: insertion of 1000 3x3 blocs into a CRS made of 3x3 blocs, using 9 individual scalar insertion. This is equivalent to what happens in BaseMatrix' bloc insertion, therefore it corresponds to the previous behavior of bloc insertion (before this PR).Before
After
Conclusion
The benchmarks show that insertion of 3x3 blocs is faster in 3x3 bloc-based CRS matrices than before (the test
Add3x3Bloc_CRS3x3d). It goes almost at the same speed than the bloc insertion specialized for 3x3 CRS matrices (benchmark BM_CRS_Fixture<sofa::type::Mat<3,3,double>>/Add3x3BlocShortcut_CRS3x3d).The speed remains the same for CRS made doubles, which is expected.
TODO: explain the benchmarks and push them
By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).
Reviewers will merge this pull-request only if