[WIP]Replace Matrix with NMatrix for high performance#36
[WIP]Replace Matrix with NMatrix for high performance#36lokeshh wants to merge 5 commits intoSciRuby:masterfrom lokeshh:nmatrix
Conversation
|
I just replaced all instances of |
lib/statsample-glm/glm/mle/base.rb
Outdated
|
|
||
| h = second_derivative(x,y,parameters) | ||
| if h.singular? | ||
| if h.det() == 0 |
There was a problem hiding this comment.
Not sure if this check is necessary. Computing the determinant is very expensive. I think we can continue without it, then methods like invert or solve will complain later on anyway, if h is (nearly) singular.
|
Interesting... I'll take a look sometime this week, and let you know if I can figure something out... |
|
@lokeshh I just had another look at your benchmark code, and I noticed that you generate data from a perfectly linear equation. When you generate the data according to a statistical linear model, there is always a random noise term added to the linear equation. That is, add another vector of (small) random numbers to line 10 (preferably sampled from the Normal distribution). I think the problem why NMatrix versions is slower can lie therein in your example. The data is just too perfectly linear (because there isn't any noise), and therefore the algorithm does not have to work hard at all, which means that most time is spend on internal data preparation rather than number crunching. Anyway, would be great if you can try it out. It's just one little change in line 10 of your code. |
|
@agarie I changed the last line to df[:y] = df.a * 3 + df.b * 5 + df.c * -2 + df.d * 2 + Statsample::Shorthand.rnorm(n)and Result was almost the same. For Matrix it took 1.2 seconds while for NMatrix(lapack) it took 4 seconds. |
|
@lokeshh Thanks for trying it out! I guess the problem lies somewhere else. |
|
Okay. So, I have written a simple gradient descent algorithm from scratch to fit GLMs with NMatrix. which I think is mostly due to the gradient descent method, which requires many more iterations than the algorithm in statsample-glm to converge: NB: @lokeshh I was wrong with my comment above about adding random noise to p = df.a * 3 + df.b * 5 + df.c * -2 + df.d * 2
p.map! { |i| 1/(1+Math::exp(-i)) }
y = p.map do |prob|
unif_draw = rand
unif_draw < prob ? 1 : 0
end
df[:y] = Daru::Vector.new(y) |
No description provided.