LU rewrite, second draft by Andlon · Pull Request #142 · AtheMathmo/rulinalg

Andlon · 2017-01-30T22:00:54Z

Currently doesn't contain any code pertaining to LU whatsoever, but rather some benchmarks for multiplication of a vector by a permutation matrix. The benchmarks are intended to compare performance between using the new PermutationMatrix struct and a full matrix representation of the permutation, and further motivate the use of PermutationMatrix in the LU decomposition. I will continue to develop the LU decomposition changes in this PR. Please see #94 for the first (unfinished) draft.

PermutationMatrix is implemented such that it can be applied in linear time to a vector, whereas full matrix-vector multiplication is quadratic in the size of the vector. The benchmarks support the expectation that the PermutationMatrix approach vastly outperforms the full matrix approach presumably for all n (size of the Vector).

Because the cost of applying a PermutationMatrix is dependent on the specific permutation matrix, there are two kinds of benchmarks: one which applies the identity permutation (which is presumably the cheapest to apply) and one which applies a "perfect shuffle", which one would expect is not overly cache friendly (although perhaps also not the worst example either).

Benchmark results:

test linalg::permutation::identity_permutation_as_matrix_mul_vector_100         ... bench:       4,468 ns/iter (+/- 616)
test linalg::permutation::identity_permutation_as_matrix_mul_vector_1000        ... bench:     535,558 ns/iter (+/- 4,675)
test linalg::permutation::identity_permutation_mul_vector_100                   ... bench:         116 ns/iter (+/- 0)
test linalg::permutation::identity_permutation_mul_vector_1000                  ... bench:         866 ns/iter (+/- 2)
test linalg::permutation::perfect_shuffle_permutation_as_matrix_mul_vector_1000 ... bench:     537,161 ns/iter (+/- 15,565)
test linalg::permutation::perfect_shuffle_permutation_mul_vector_1000           ... bench:         847 ns/iter (+/- 4)

AtheMathmo · 2017-01-30T22:12:17Z

These initial numbers look very promising!

Edit: emphasis.

This commit adds a (for now private) testsupport module. Eventually this module may be made available to the public under a feature flag (e.g. `testsupport`).

For now it only wraps the existing lup_decomp function, without making any invasive changes.

Moved reproducible_random_matrix into this new module (from svd) so that it can be used in other benchmarks.

Andlon · 2017-02-05T12:20:30Z

Recent developments: I first implemented the functionality from draft 1 ( #94 ) in this PR, using Matrix to represent the permutation matrix. That is, I have provided a new LU API based on the structs PartialPivLu and LUP, which together work roughly as described in #94, and it provides e.g. the solve, det and inverse methods. Initially this just used the existing lup_decomp to perform the decomposition with no changes, and then wrapping the result in a LUP struct.

Here are the benchmark results before replacing Matrix with PermutationMatrix.


running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     160,725 ns/iter (+/- 1,205)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         575 ns/iter (+/- 229)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:       4,618 ns/iter (+/- 1,352)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:         179 ns/iter (+/- 102)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,635,504 ns/iter (+/- 451,632)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       3,767 ns/iter (+/- 514)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      16,171 ns/iter (+/- 1,966)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         382 ns/iter (+/- 45)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

I then rewrote these methods and structs to use PermutationMatrix instead. Here are the resulting benchmark numbers:

running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     171,500 ns/iter (+/- 6,305)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         506 ns/iter (+/- 37)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:         428 ns/iter (+/- 119)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:          44 ns/iter (+/- 5)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,046,557 ns/iter (+/- 112,893)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       3,466 ns/iter (+/- 392)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      10,316 ns/iter (+/- 273)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         313 ns/iter (+/- 33)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

As we can see, the numbers are better across the board except for decompose. The reason decompose is slower is likely that I am currently computing an inverse of the permutation matrix at the end of the decomposition. This can actually be avoided, which I will implement in a later commit. I assume this will restore the performance of decompose. However, remember that the primary benefit of using the PermutationMatrix struct is actually the memory savings. For large PRs, these changes cut storage costs by roughly 1/3. Furthermore, I intend to avoid allocating U, instead keeping L and U stored in the original matrix, which would further slash total storage costs by about 1/3, to a total of 2/3. At this point, we are also re-using the original matrix storage, so that additional memory needed is just for storing the PermutationMatrix struct, which is merely O(n).

Andlon · 2017-02-05T12:33:08Z

I just realized that currently the PermutationMatrix::swap_rows() implementation actually swaps columns in the corresponding matrix-representation of the permutation matrix. Just leaving it here so that I remember to deal with this.

Andlon · 2017-02-05T12:58:21Z

Actually, upon re-running the benchmarks a few times, there seems to be somewhat negligible difference in the performance of decompose(), so the difference illustrated in my previous post is not necessarily representative. In light of this, I think I'll just leave the inversion as is for now and focus on other things instead.

Also add some inline notes on performance potential for inverse()

Andlon · 2017-02-05T17:22:44Z

Another update:

I've rewritten the LU code such that it stores both L and U factors in the space of the input matrix. This saves another n by n times matrix worth of memory. Benchmarks also give the same result:

running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     157,970 ns/iter (+/- 8,686)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         469 ns/iter (+/- 18)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:         274 ns/iter (+/- 16)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:          39 ns/iter (+/- 0)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,049,159 ns/iter (+/- 8,299)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       2,567 ns/iter (+/- 26)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      10,422 ns/iter (+/- 50)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         278 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

I think what remains now is first and foremost replacing the LU-related functionality in Matrix with PartialPivLu. Otherwise it's basically polish, although I might take a quick look at profiling decompose since I already spent some time figuring out why I had a regression in solve (hint: I needed row.raw_slice() for auto-vectorization), so it should be quick to take a look.

Andlon · 2017-02-05T22:52:20Z

@AtheMathmo: I just had an idea. Replacing lup_decomp is obviously a breaking change, so how about we just deprecate it, let it co-exist with the new LU implementation and then eventually remove it in 0.5? That way we can merge this earlier.

I'm otherwise pretty much done now, I think. There can still be more tests, but I think I'll tackle that in a later PR. The decomposition itself should at least work as well as the existing one.

AtheMathmo · 2017-02-05T23:38:49Z

I think deprecating and removing later is a good idea!

I should have some time to review the changes here tomorrow or Tuesday. The improvements here are huge, thanks!

Andlon

Reminder to fix typo

Andlon · 2017-02-06T08:43:16Z

+//! Since the right-hand side `b` has no bearing on the LU decomposition,
+//! it follows that one can efficiently solve any such system for any `b`.
+//!
+//! It turns out that the matrices `L` and `U` can be stored compact


typo: compact -> compactly

Also switched internal use of lup_decomp with PartialPivLu.

Andlon · 2017-02-09T21:49:32Z

I deprecated lup_decomp in one of the most recent commits. After having used PartialPivLu a bit, I actually want to propose completely removing lup_decomp in the next backwards-breaking release, without replacing it with anything. I find simply using

let lu = PartialPivLu::decompose(x);

both clean, explicit and clear in terms of intent. It also works well once we have e.g. rook-pivoting or full pivoting implementations in place later on, because there are never any doubts as to what e.g. matrix.lu() would actually compute. I don't see any point in having a method in the (already somewhat bloated) Matrix API anymore. Any thoughts?

AtheMathmo · 2017-02-09T21:56:28Z

I think this does make sense. One of the only reasons I had for preferring a function on Matrix was discoverability - but with how big this API is currently I think it is actually easier to find in the form you suggested.

AtheMathmo

Only some minor comments and questions.

For the most part this looks really, really good!

AtheMathmo · 2017-02-09T22:10:44Z

+        }
+    }
+
+    // TODO: assert is_lower_triangular


Are these TODOs still pending as part of this PR?

I was a bit unsure how to handle it. Currently I put the is_lower_triangular and is_upper_triangular functions in the internal testsupport module, so they're not currently accessible from benches. My plan for testsupport was eventually to make it a public module under a "testsupport" feature flag, but this maybe needs some discussion. I was planning to let it house e.g. Arbitrary implementations of matrix types in the future.

However, perhaps they're

I think this comment got cut off.

Yeah. Hmm, I have no idea what I was planning to say.

In any case, it's not ... essential, but it would be a good idea to be able to double-check that the matrices generated for the benchmarks are actually triangular. It really depends on what we decide to do with the is_*_triangular functions (see my main response outside of individual commit comments).

AtheMathmo · 2017-02-09T22:15:31Z

-
-        let b = try!(forward_substitution(&l, p * y));
-        back_substitution(&u, b)
+        PartialPivLu::decompose(self)?.solve(y)


I just wanted to point out that I haven't used the ? operator anywhere else in rulinalg.

I actually quite like it but am wary of getting into a situation in which we have the two syntax heavily mixed throughout. Do you have any thoughts? I'm happy to leave the ? operator here and consider ways to resolve this in the future (probably switching all try macro invocations to ?).

I actually kinda like them both, for different purposes. I think ? is superior for chaining (like in the above example), but try! is more explicit when you want to store the result of the computation, whereas in this case the question mark is easy to miss. E.g:

let lu = try!(PartialPivLu::decompose(self)); // Do something with lu // vs let lu = PartialPivLu::decompose(self)?; // Do something with lu

In any case, I don't see anything particularly troublesome with having both operators in the code base. I do have the impression that the community tends to prefer ? though, could that be right...?

I also get the impression that most people prefer ?.

I actually agree with you that they work well in different situations. I would say that for now we avoid having any strong opinions on this :)

AtheMathmo · 2017-02-09T22:20:58Z

 pub mod ulp;
 pub mod norm;

+mod testsupport;


Perhaps it is the name that is throwing me off - wouldn't we prefer this to be under the test feature flag?

Otherwise I'm wondering why we want this to be so distinct from the BaseMatrix trait? Should we not add these as functions of this trait - or perhaps a new trait which contains these algebraic condition tests? (is_positive_definite, is_upper_triangular, is_singular, etc.).

Well, my idea was actually to make this a pub module, put it under a feature flag (e.g. "testsupport") and populate it with things that are useful not just for ourselves, but also for users of rulinalg. For instance, someone writing applications involving linear algebra might want to use the Arbitrary implementations that we were talking about in the other thread to test their own code.

Currently, there's not much sense to it as it is though. I suppose it could be put under cfg(test) for now...? See also the above issue with the TODOs in benches.

We might want to add a new trait, yeah. Although this would technically be a breaking change, I think, although a rather minor one (it only breaks if someone has implemented a trait for Matrix, MatrixSlice or MatrixSliceMut which has the same method names as the new trait).

I'd much rather make it a new trait than add it to the base matrix trait, though. Perhaps mainly for checking structural constraints, because checking for e.g. positive definiteness or singularity is very expensive, being O(n³) operations. However, O(n²) checks like upper/lower triangularity, diagonality (there is already is_diag() I think. We could deprecate it and create is_diagonal() in this new trait) could definitely be useful.

AtheMathmo · 2017-02-09T22:22:41Z

+/// Returns true if the matrix is lower triangular, otherwise false.
+/// This generalizes to rectangular matrices, in which case
+/// it returns true if the matrix is lower trapezoidal.
+#[allow(dead_code)]


Would you mind explaining why we need this attribute? I'm not against it being here but would like to follow the motivation.

Only because currently the functions here are only used in test code, so when you compile without test, it warns about unused functions. Depending on the outcome of the two other issues related to testsupport, this can probably be removed!

AtheMathmo · 2017-02-09T22:24:14Z

+//! the system `Ax = b` can be computed in O(n<sup>2</sup>) floating
+//! point operations if the LU decomposition has already been obtained.
+//! Since the right-hand side `b` has no bearing on the LU decomposition,
+//! it follows that one can efficiently solve any such system for any `b`.


Minor language point. I think this reads better as: it follows that one can efficiently solve this system for any b. I think?

Definitely agree :)

AtheMathmo · 2017-02-09T22:35:46Z

+    }
+}
+
+// TODO: Remove Any bound (cannot for the time being, since


I think this Any bound is propagated from our implementation of matrix multiplication. We need to do some dynamic type comparison to call the correct matrixmultiply function.

We might be able to trim the bound in some places, for example back_substitution seems unlikely to need it.

Ah, yes. Because of some changes to TypeId in Rust ... 1.8 or 1.9 maybe?, it only needs to be 'static now, you don't need Any anymore, so we could probably make this a crate-wide change. See here: https://doc.rust-lang.org/std/any/struct.TypeId.html . The example is actually out-of-date, but if you look at the signature it's only 'static and not Any for the trait bound.

Oh nice spot! Yes, I'll write up an issue to remove the Any trait bounds.

AtheMathmo · 2017-02-09T22:42:39Z

+        assert!(b.size() == self.lu.rows(),
+            "Right-hand side vector must have compatible size.");
+        // Note that applying p here implicitly incurs a clone.
+        // TODO: Is it possible to avoid the clone somehow?


I don't see where the clone is here. I'm guessing you mean as part of &self.p * b?

If so, this seems fairly minor to me but it is worth keeping the note here.

You're right, that's it.

I agree it's minor, and we shouldn't worry about it now. However, solving linear systems is the typical kind of thing which happens in some kind of loop (say, for every time step), and so cloning something on every iteration is unnecessary. For a desktop computer, this isn't really a problem, but for e.g. resource constrained environments or real-time applications you'd definitely want to avoid this.

Anyway, this is low priority, but we should think about in the future at some point. I was actually considering adding a solve_into method which would take a buffer to store the solution in, which would avoid the implicit clone.

I see your point. A solve_into function sounds like a pretty reasonable solution (if I understand correctly).

I'll mock it up and see how it looks. The existing solve can be implemented by a single call to solve_into, which is convenient.

Because backward substitution takes ownership, it cannot be easily implemented at the moment if one wishes to take a mutable reference to the buffer. I think I'll just leave it as it is for now. We can revisit in the future. It's just a minor thing at this point anyway :)

AtheMathmo · 2017-02-09T22:43:45Z

+impl<T: 'static + Float> PartialPivLu<T> {
+    /// Performs the decomposition.
+    ///
+    /// ### Panics


Elsewhere in rulinalg we've used h1 headers here (# Panics instead of ### Panics). It's fairly trivial but we should probably be consistent.

Yeah, I agree. I'll change it to make it consistent.

I think I was toying around with header sizes to see what looked better. I think unfortunately there are some visual design issues with rustdoc. When using H1 headers, the headers are very distracting because you get visual spacing in the wrong places. This is however a design problem that needs to be solved at the rustdoc level, and not at the local crate level, so I will definitely change this back to H1.

I agree that it does look a little weird in some places but as you point out consistency is at the local crate level is more important.

AtheMathmo · 2017-02-09T22:44:29Z

+        // Note that this is not optimal in terms of performance,
+        // and there is likely significant potential for improvement.
+        //
+        // A more performant technique is usually to compute the


Thank you for pointing this out!

AtheMathmo · 2017-02-09T22:47:57Z

 //!
-//! 3. [Computation of the SVD]
-//! (http://www.cs.utexas.edu/users/inderjit/public_papers/HLA_SVD.pdf)
+//! Decompositions in `rulinalg` are in general modeled after


Great write up, thank you!

Andlon · 2017-02-10T20:30:14Z

Thanks for the comments!

I think the main issue to sort out is what to do with the is_*_triangular methods. Move them into a new trait for constraints checking? That would be something like:

pub trait MatrixInvariants : BaseMatrix {
    fn is_lower_triangular(&self) { ... }
    fn is_upper_triangular(&self) { ... }
    fn is_diagonal(&self) { ... }
    fn is_symmetric(&self) { ... }
}

Like I said in the other comment, this is technically a breaking change, but it is a very mild one, in that it only breaks if someone implements a trait for Matrix which also has methods with these names. At least, I think that's the only way for it to break. I don't know how strict stability guarantees you want to give for rulinalg in its current state.

Alternatively we could just put #[cfg(test)] on testsupport for now, and then revisit the issue later.

What do you think?

EDIT: drop #[cfg(test)] -> put #[cfg(test)] (unfortunate choice of word)

AtheMathmo · 2017-02-10T21:27:56Z

I'm a little unsure how to tackle the is_*_triangular functions right now. I think the trait as you suggested is fine in a minor release - the breaking change is very unlikely to affect users (I think...). The only real issue I see here is deprecating the is_diag function but that is pretty minor.

Right now my decision would be to tackle the new trait in a new PR. For this PR I think it is best to flag testsupport with #[cfg(test)].

Oh and I'm not sure about the name MatrixInvariants, I had to check but I vaguely remembered these referring to something else. It turns out they are the coefficients of the characteristic polynomial.

Andlon · 2017-02-10T21:30:03Z

All right, sounds good! I'll get it done right away.

Huh, I've never come across that term. TIL :)

Allow using deprecated lup_decomp in tests for lup_decomp

Andlon · 2017-02-10T21:40:42Z

OK, so I think I've taken care of the things you pointed out. Let me know if I missed anything.

The only thing is that I've left the TODO in the benchmark utility functions for now. They should ideally assert that they are triangular, but that would mean exposing is_*_triangular outside of the crate, which we are not ready for yet (in this PR anyway). I can also remove the TODOs entirely if you want. Up to you :)

AtheMathmo · 2017-02-14T16:53:58Z

Looks like I just gave this a conflict - I think it is from the column iterator reformatting.

Andlon · 2017-02-14T16:59:20Z

No worries, I'll take care of it next chance I get!

Andlon mentioned this pull request Jan 30, 2017

LU rewrite, first draft #94

Closed

Andlon added 8 commits February 5, 2017 12:50

Benchmarks for permutation-vector multiply

ef3d8b8

Implement is_{lower/upper}_triangular

dabbf0e

This commit adds a (for now private) testsupport module. Eventually this module may be made available to the public under a feature flag (e.g. `testsupport`).

Initial implementation of PartialPivLu

30d0cd6

For now it only wraps the existing lup_decomp function, without making any invasive changes.

Implement inverse, solve and det for PartialPivLu

da3cca1

Make decomposition a public module and export LU

7f12194

Make utils module for benchmarks

3680d26

Moved reproducible_random_matrix into this new module (from svd) so that it can be used in other benchmarks.

Benchmarks for LU functionality

7179727

Rewrite LU decomposition with PermutationMatrix

a22cd41

Andlon force-pushed the lu_rewrite2 branch from af38b92 to a22cd41 Compare February 5, 2017 12:02

Andlon added 3 commits February 5, 2017 14:13

Remove some allocation from lu inverse

057e90b

Also add some inline notes on performance potential for inverse()

Store L, U factors in a single matrix LU

1a2769d

Rewrite lu_forward_substitution for auto-vec

a0ed93e

Andlon added 5 commits February 5, 2017 21:44

Documentation for Decomposition module

6ef7297

Documentation for PartialPivLu

48f8faa

Replace LU error with more informative one

be8baa9

Add reminder for the future on removing Any

f926ba5

Reformulate Factors doc

bf6b405

Andlon commented Feb 6, 2017

View reviewed changes

Andlon added 3 commits February 9, 2017 22:41

Deprecate lup_decomp

1ab440b

Also switched internal use of lup_decomp with PartialPivLu.

Port existing lup_decomp test to PartialPivLu

ec25eb8

Fix typo: compact -> compactly

f63faed

AtheMathmo reviewed Feb 9, 2017

View reviewed changes

Andlon added 2 commits February 10, 2017 21:56

Fix doc headers (### -> #)

b39ffda

Rephrase sentence in Decomposition docs

5413b99

Andlon added 3 commits February 10, 2017 22:33

Restrict testsupport module to cfg(test)

e042c7e

Fix warnings

4bd18d5

Allow using deprecated lup_decomp in tests for lup_decomp

Remove redundant parity function

306a280

Andlon mentioned this pull request Feb 11, 2017

Cholesky decomposition rewrite: initial work #150

Merged

Merge branch 'master' into lu_rewrite2

e53f7f0

AtheMathmo merged commit 8e8e8fb into AtheMathmo:master Feb 17, 2017

AtheMathmo mentioned this pull request Feb 17, 2017

Rewrite forward_substitution and back_substitution #152

Merged

Andlon deleted the lu_rewrite2 branch February 19, 2017 15:54

Conversation

Andlon commented Jan 30, 2017

Uh oh!

AtheMathmo commented Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andlon commented Feb 5, 2017

Uh oh!

Andlon commented Feb 5, 2017

Uh oh!

Andlon commented Feb 5, 2017

Uh oh!

Andlon commented Feb 5, 2017

Uh oh!

Andlon commented Feb 5, 2017

Uh oh!

AtheMathmo commented Feb 5, 2017

Uh oh!

Andlon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andlon commented Feb 9, 2017

Uh oh!

AtheMathmo commented Feb 9, 2017

Uh oh!

AtheMathmo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AtheMathmo commented Jan 30, 2017 •

edited

Loading

Andlon commented Feb 10, 2017 •

edited

Loading