Skip to content

LU rewrite, second draft#142

Merged
AtheMathmo merged 25 commits intoAtheMathmo:masterfrom
Andlon:lu_rewrite2
Feb 17, 2017
Merged

LU rewrite, second draft#142
AtheMathmo merged 25 commits intoAtheMathmo:masterfrom
Andlon:lu_rewrite2

Conversation

@Andlon
Copy link
Copy Markdown
Collaborator

@Andlon Andlon commented Jan 30, 2017

Currently doesn't contain any code pertaining to LU whatsoever, but rather some benchmarks for multiplication of a vector by a permutation matrix. The benchmarks are intended to compare performance between using the new PermutationMatrix struct and a full matrix representation of the permutation, and further motivate the use of PermutationMatrix in the LU decomposition. I will continue to develop the LU decomposition changes in this PR. Please see #94 for the first (unfinished) draft.

PermutationMatrix is implemented such that it can be applied in linear time to a vector, whereas full matrix-vector multiplication is quadratic in the size of the vector. The benchmarks support the expectation that the PermutationMatrix approach vastly outperforms the full matrix approach presumably for all n (size of the Vector).

Because the cost of applying a PermutationMatrix is dependent on the specific permutation matrix, there are two kinds of benchmarks: one which applies the identity permutation (which is presumably the cheapest to apply) and one which applies a "perfect shuffle", which one would expect is not overly cache friendly (although perhaps also not the worst example either).

Benchmark results:

test linalg::permutation::identity_permutation_as_matrix_mul_vector_100         ... bench:       4,468 ns/iter (+/- 616)
test linalg::permutation::identity_permutation_as_matrix_mul_vector_1000        ... bench:     535,558 ns/iter (+/- 4,675)
test linalg::permutation::identity_permutation_mul_vector_100                   ... bench:         116 ns/iter (+/- 0)
test linalg::permutation::identity_permutation_mul_vector_1000                  ... bench:         866 ns/iter (+/- 2)
test linalg::permutation::perfect_shuffle_permutation_as_matrix_mul_vector_1000 ... bench:     537,161 ns/iter (+/- 15,565)
test linalg::permutation::perfect_shuffle_permutation_mul_vector_1000           ... bench:         847 ns/iter (+/- 4)

@Andlon Andlon mentioned this pull request Jan 30, 2017
@AtheMathmo
Copy link
Copy Markdown
Owner

AtheMathmo commented Jan 30, 2017

These initial numbers look very promising!

Edit: emphasis.

This commit adds a (for now private) testsupport
module. Eventually this module may be made available
to the public under a feature flag (e.g. `testsupport`).
For now it only wraps the existing lup_decomp
function, without making any invasive changes.
Moved reproducible_random_matrix into this new module
(from svd) so that it can be used in other benchmarks.
@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 5, 2017

Recent developments: I first implemented the functionality from draft 1 ( #94 ) in this PR, using Matrix to represent the permutation matrix. That is, I have provided a new LU API based on the structs PartialPivLu and LUP, which together work roughly as described in #94, and it provides e.g. the solve, det and inverse methods. Initially this just used the existing lup_decomp to perform the decomposition with no changes, and then wrapping the result in a LUP struct.

Here are the benchmark results before replacing Matrix with PermutationMatrix.


running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     160,725 ns/iter (+/- 1,205)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         575 ns/iter (+/- 229)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:       4,618 ns/iter (+/- 1,352)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:         179 ns/iter (+/- 102)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,635,504 ns/iter (+/- 451,632)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       3,767 ns/iter (+/- 514)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      16,171 ns/iter (+/- 1,966)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         382 ns/iter (+/- 45)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

I then rewrote these methods and structs to use PermutationMatrix instead. Here are the resulting benchmark numbers:

running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     171,500 ns/iter (+/- 6,305)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         506 ns/iter (+/- 37)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:         428 ns/iter (+/- 119)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:          44 ns/iter (+/- 5)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,046,557 ns/iter (+/- 112,893)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       3,466 ns/iter (+/- 392)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      10,316 ns/iter (+/- 273)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         313 ns/iter (+/- 33)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

As we can see, the numbers are better across the board except for decompose. The reason decompose is slower is likely that I am currently computing an inverse of the permutation matrix at the end of the decomposition. This can actually be avoided, which I will implement in a later commit. I assume this will restore the performance of decompose. However, remember that the primary benefit of using the PermutationMatrix struct is actually the memory savings. For large PRs, these changes cut storage costs by roughly 1/3. Furthermore, I intend to avoid allocating U, instead keeping L and U stored in the original matrix, which would further slash total storage costs by about 1/3, to a total of 2/3. At this point, we are also re-using the original matrix storage, so that additional memory needed is just for storing the PermutationMatrix struct, which is merely O(n).

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 5, 2017

I just realized that currently the PermutationMatrix::swap_rows() implementation actually swaps columns in the corresponding matrix-representation of the permutation matrix. Just leaving it here so that I remember to deal with this.

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 5, 2017

Actually, upon re-running the benchmarks a few times, there seems to be somewhat negligible difference in the performance of decompose(), so the difference illustrated in my previous post is not necessarily representative. In light of this, I think I'll just leave the inversion as is for now and focus on other things instead.

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 5, 2017

Another update:

I've rewritten the LU code such that it stores both L and U factors in the space of the input matrix. This saves another n by n times matrix worth of memory. Benchmarks also give the same result:

running 8 tests
test linalg::lu::partial_piv_lu_decompose_100x100                               ... bench:     157,970 ns/iter (+/- 8,686)
test linalg::lu::partial_piv_lu_decompose_10x10                                 ... bench:         469 ns/iter (+/- 18)
test linalg::lu::partial_piv_lu_det_100x100                                     ... bench:         274 ns/iter (+/- 16)
test linalg::lu::partial_piv_lu_det_10x10                                       ... bench:          39 ns/iter (+/- 0)
test linalg::lu::partial_piv_lu_inverse_100x100                                 ... bench:   1,049,159 ns/iter (+/- 8,299)
test linalg::lu::partial_piv_lu_inverse_10x10                                   ... bench:       2,567 ns/iter (+/- 26)
test linalg::lu::partial_piv_lu_solve_100x100                                   ... bench:      10,422 ns/iter (+/- 50)
test linalg::lu::partial_piv_lu_solve_10x10                                     ... bench:         278 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

I think what remains now is first and foremost replacing the LU-related functionality in Matrix with PartialPivLu. Otherwise it's basically polish, although I might take a quick look at profiling decompose since I already spent some time figuring out why I had a regression in solve (hint: I needed row.raw_slice() for auto-vectorization), so it should be quick to take a look.

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 5, 2017

@AtheMathmo: I just had an idea. Replacing lup_decomp is obviously a breaking change, so how about we just deprecate it, let it co-exist with the new LU implementation and then eventually remove it in 0.5? That way we can merge this earlier.

I'm otherwise pretty much done now, I think. There can still be more tests, but I think I'll tackle that in a later PR. The decomposition itself should at least work as well as the existing one.

@AtheMathmo
Copy link
Copy Markdown
Owner

I think deprecating and removing later is a good idea!

I should have some time to review the changes here tomorrow or Tuesday. The improvements here are huge, thanks!

Copy link
Copy Markdown
Collaborator Author

@Andlon Andlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to fix typo

Comment thread src/matrix/decomposition/mod.rs Outdated
//! Since the right-hand side `b` has no bearing on the LU decomposition,
//! it follows that one can efficiently solve any such system for any `b`.
//!
//! It turns out that the matrices `L` and `U` can be stored compact
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: compact -> compactly

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 9, 2017

I deprecated lup_decomp in one of the most recent commits. After having used PartialPivLu a bit, I actually want to propose completely removing lup_decomp in the next backwards-breaking release, without replacing it with anything. I find simply using

let lu = PartialPivLu::decompose(x);

both clean, explicit and clear in terms of intent. It also works well once we have e.g. rook-pivoting or full pivoting implementations in place later on, because there are never any doubts as to what e.g. matrix.lu() would actually compute. I don't see any point in having a method in the (already somewhat bloated) Matrix API anymore. Any thoughts?

@AtheMathmo
Copy link
Copy Markdown
Owner

I think this does make sense. One of the only reasons I had for preferring a function on Matrix was discoverability - but with how big this API is currently I think it is actually easier to find in the form you suggested.

Copy link
Copy Markdown
Owner

@AtheMathmo AtheMathmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor comments and questions.

For the most part this looks really, really good!

Comment thread benches/linalg/lu.rs
}
}

// TODO: assert is_lower_triangular
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these TODOs still pending as part of this PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit unsure how to handle it. Currently I put the is_lower_triangular and is_upper_triangular functions in the internal testsupport module, so they're not currently accessible from benches. My plan for testsupport was eventually to make it a public module under a "testsupport" feature flag, but this maybe needs some discussion. I was planning to let it house e.g. Arbitrary implementations of matrix types in the future.

However, perhaps they're

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment got cut off.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Hmm, I have no idea what I was planning to say.

In any case, it's not ... essential, but it would be a good idea to be able to double-check that the matrices generated for the benchmarks are actually triangular. It really depends on what we decide to do with the is_*_triangular functions (see my main response outside of individual commit comments).

Comment thread src/matrix/impl_mat.rs

let b = try!(forward_substitution(&l, p * y));
back_substitution(&u, b)
PartialPivLu::decompose(self)?.solve(y)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to point out that I haven't used the ? operator anywhere else in rulinalg.

I actually quite like it but am wary of getting into a situation in which we have the two syntax heavily mixed throughout. Do you have any thoughts? I'm happy to leave the ? operator here and consider ways to resolve this in the future (probably switching all try macro invocations to ?).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually kinda like them both, for different purposes. I think ? is superior for chaining (like in the above example), but try! is more explicit when you want to store the result of the computation, whereas in this case the question mark is easy to miss. E.g:

let lu = try!(PartialPivLu::decompose(self));
// Do something with lu

// vs

let lu = PartialPivLu::decompose(self)?;
// Do something with lu

In any case, I don't see anything particularly troublesome with having both operators in the code base. I do have the impression that the community tends to prefer ? though, could that be right...?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also get the impression that most people prefer ?.

I actually agree with you that they work well in different situations. I would say that for now we avoid having any strong opinions on this :)

Comment thread src/lib.rs
pub mod ulp;
pub mod norm;

mod testsupport;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is the name that is throwing me off - wouldn't we prefer this to be under the test feature flag?

Otherwise I'm wondering why we want this to be so distinct from the BaseMatrix trait? Should we not add these as functions of this trait - or perhaps a new trait which contains these algebraic condition tests? (is_positive_definite, is_upper_triangular, is_singular, etc.).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my idea was actually to make this a pub module, put it under a feature flag (e.g. "testsupport") and populate it with things that are useful not just for ourselves, but also for users of rulinalg. For instance, someone writing applications involving linear algebra might want to use the Arbitrary implementations that we were talking about in the other thread to test their own code.

Currently, there's not much sense to it as it is though. I suppose it could be put under cfg(test) for now...? See also the above issue with the TODOs in benches.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a new trait, yeah. Although this would technically be a breaking change, I think, although a rather minor one (it only breaks if someone has implemented a trait for Matrix, MatrixSlice or MatrixSliceMut which has the same method names as the new trait).

I'd much rather make it a new trait than add it to the base matrix trait, though. Perhaps mainly for checking structural constraints, because checking for e.g. positive definiteness or singularity is very expensive, being O(n3) operations. However, O(n2) checks like upper/lower triangularity, diagonality (there is already is_diag() I think. We could deprecate it and create is_diagonal() in this new trait) could definitely be useful.

Comment thread src/testsupport/constraints.rs Outdated
/// Returns true if the matrix is lower triangular, otherwise false.
/// This generalizes to rectangular matrices, in which case
/// it returns true if the matrix is lower trapezoidal.
#[allow(dead_code)]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining why we need this attribute? I'm not against it being here but would like to follow the motivation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because currently the functions here are only used in test code, so when you compile without test, it warns about unused functions. Depending on the outcome of the two other issues related to testsupport, this can probably be removed!

Comment thread src/matrix/decomposition/mod.rs Outdated
//! the system `Ax = b` can be computed in O(n<sup>2</sup>) floating
//! point operations if the LU decomposition has already been obtained.
//! Since the right-hand side `b` has no bearing on the LU decomposition,
//! it follows that one can efficiently solve any such system for any `b`.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor language point. I think this reads better as: it follows that one can efficiently solve this system for any b. I think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree :)

}
}

// TODO: Remove Any bound (cannot for the time being, since
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this Any bound is propagated from our implementation of matrix multiplication. We need to do some dynamic type comparison to call the correct matrixmultiply function.

We might be able to trim the bound in some places, for example back_substitution seems unlikely to need it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Because of some changes to TypeId in Rust ... 1.8 or 1.9 maybe?, it only needs to be 'static now, you don't need Any anymore, so we could probably make this a crate-wide change. See here: https://doc.rust-lang.org/std/any/struct.TypeId.html . The example is actually out-of-date, but if you look at the signature it's only 'static and not Any for the trait bound.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice spot! Yes, I'll write up an issue to remove the Any trait bounds.

assert!(b.size() == self.lu.rows(),
"Right-hand side vector must have compatible size.");
// Note that applying p here implicitly incurs a clone.
// TODO: Is it possible to avoid the clone somehow?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where the clone is here. I'm guessing you mean as part of &self.p * b?

If so, this seems fairly minor to me but it is worth keeping the note here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's it.

I agree it's minor, and we shouldn't worry about it now. However, solving linear systems is the typical kind of thing which happens in some kind of loop (say, for every time step), and so cloning something on every iteration is unnecessary. For a desktop computer, this isn't really a problem, but for e.g. resource constrained environments or real-time applications you'd definitely want to avoid this.

Anyway, this is low priority, but we should think about in the future at some point. I was actually considering adding a solve_into method which would take a buffer to store the solution in, which would avoid the implicit clone.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. A solve_into function sounds like a pretty reasonable solution (if I understand correctly).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mock it up and see how it looks. The existing solve can be implemented by a single call to solve_into, which is convenient.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because backward substitution takes ownership, it cannot be easily implemented at the moment if one wishes to take a mutable reference to the buffer. I think I'll just leave it as it is for now. We can revisit in the future. It's just a minor thing at this point anyway :)

Comment thread src/matrix/decomposition/lu.rs Outdated
impl<T: 'static + Float> PartialPivLu<T> {
/// Performs the decomposition.
///
/// ### Panics
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in rulinalg we've used h1 headers here (# Panics instead of ### Panics). It's fairly trivial but we should probably be consistent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I'll change it to make it consistent.

I think I was toying around with header sizes to see what looked better. I think unfortunately there are some visual design issues with rustdoc. When using H1 headers, the headers are very distracting because you get visual spacing in the wrong places. This is however a design problem that needs to be solved at the rustdoc level, and not at the local crate level, so I will definitely change this back to H1.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it does look a little weird in some places but as you point out consistency is at the local crate level is more important.

// Note that this is not optimal in terms of performance,
// and there is likely significant potential for improvement.
//
// A more performant technique is usually to compute the
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out!

//!
//! 3. [Computation of the SVD]
//! (http://www.cs.utexas.edu/users/inderjit/public_papers/HLA_SVD.pdf)
//! Decompositions in `rulinalg` are in general modeled after
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great write up, thank you!

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 10, 2017

Thanks for the comments!

I think the main issue to sort out is what to do with the is_*_triangular methods. Move them into a new trait for constraints checking? That would be something like:

pub trait MatrixInvariants : BaseMatrix {
    fn is_lower_triangular(&self) { ... }
    fn is_upper_triangular(&self) { ... }
    fn is_diagonal(&self) { ... }
    fn is_symmetric(&self) { ... }
}

Like I said in the other comment, this is technically a breaking change, but it is a very mild one, in that it only breaks if someone implements a trait for Matrix which also has methods with these names. At least, I think that's the only way for it to break. I don't know how strict stability guarantees you want to give for rulinalg in its current state.

Alternatively we could just put #[cfg(test)] on testsupport for now, and then revisit the issue later.

What do you think?

EDIT: drop #[cfg(test)] -> put #[cfg(test)] (unfortunate choice of word)

@AtheMathmo
Copy link
Copy Markdown
Owner

I'm a little unsure how to tackle the is_*_triangular functions right now. I think the trait as you suggested is fine in a minor release - the breaking change is very unlikely to affect users (I think...). The only real issue I see here is deprecating the is_diag function but that is pretty minor.

Right now my decision would be to tackle the new trait in a new PR. For this PR I think it is best to flag testsupport with #[cfg(test)].

Oh and I'm not sure about the name MatrixInvariants, I had to check but I vaguely remembered these referring to something else. It turns out they are the coefficients of the characteristic polynomial.

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 10, 2017

All right, sounds good! I'll get it done right away.

Huh, I've never come across that term. TIL :)

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 10, 2017

OK, so I think I've taken care of the things you pointed out. Let me know if I missed anything.

The only thing is that I've left the TODO in the benchmark utility functions for now. They should ideally assert that they are triangular, but that would mean exposing is_*_triangular outside of the crate, which we are not ready for yet (in this PR anyway). I can also remove the TODOs entirely if you want. Up to you :)

@AtheMathmo
Copy link
Copy Markdown
Owner

Looks like I just gave this a conflict - I think it is from the column iterator reformatting.

@Andlon
Copy link
Copy Markdown
Collaborator Author

Andlon commented Feb 14, 2017

No worries, I'll take care of it next chance I get!

@AtheMathmo AtheMathmo merged commit 8e8e8fb into AtheMathmo:master Feb 17, 2017
@Andlon Andlon deleted the lu_rewrite2 branch February 19, 2017 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants